hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
hkaiser has quit [Quit: Bye!]
weilewei31 has quit [Ping timeout: 260 seconds]
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 268 seconds]
K-ballo1 is now known as K-ballo
hkaiser has joined #ste||ar
Yorlik_ has joined #ste||ar
Yorlik has quit [Ping timeout: 265 seconds]
hkaiser has quit [Quit: Bye!]
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 252 seconds]
K-ballo1 is now known as K-ballo
Yorlik_ has quit [Read error: Connection reset by peer]
Yorlik has joined #ste||ar
hkaiser has joined #ste||ar
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 260 seconds]
K-ballo1 is now known as K-ballo
<Yorlik>
hkaiser: Still getting this: "ERROR: The specified BOOST_ROOT differs from what has been used when
<Yorlik>
1> configuring and building HPX. Please use the same Boost versions. HPX
<Yorlik>
1 boost is and users is"
<Yorlik>
Funnily enough the Boost versions reported are empty.
<hkaiser>
uhhh
hkaiser has quit [Quit: Bye!]
diehlpk_work has joined #ste||ar
beojan has joined #ste||ar
<beojan>
Is there some way to register a certain function to run when a new OS thread is started / stopped, or to just run a given function on every thread in the pool?
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 255 seconds]
K-ballo1 is now known as K-ballo
hkaiser has joined #ste||ar
<beojan>
hkaiser: Is there some way to register a certain function to run when a new OS thread is started / stopped, or to just run a given function on every thread in the pool?
<beojan>
OK, strangely that hasn't fixed the crash I'm experiencing. It looks like I can get through `hpx::stop()` before all tasks have finished executing.
<beojan>
Is that right?
<hkaiser>
beojan: stop should block until all tasks have finished
<hkaiser>
does it not? that would be a bug
<beojan>
Does that include tasks that are in the queue but haven't started running on a thread yet?
<hkaiser>
beojan: yes
<beojan>
Oh, it might be that `hpx::disconnect` ends up running before other tasks that do work.
<hkaiser>
once stop exits all operations should have run to completion
<hkaiser>
beojan: that could be yes - didn't you say you don't need elsticity?
<hkaiser>
disconnect is for reducing the number of localities and should be called only on localities that have connected late
<beojan>
We don't but finalize just get's called on one locality, right?
<hkaiser>
beojan: you can call it on any locality - it will just signal everybody to start tearing the system down
<hkaiser>
after finalize things will keep running until all work has finished
<hkaiser>
note to self - add a check that disconnect is called on localities that have connected late only
<beojan>
So will finalize return before everything has finished running?
<beojan>
It appears the answer is yes, it will.
<beojan>
I call hpx::async( []() { hpx::finalize(); } ).wait() and other tasks remain running after this has returned.
tufei_ has joined #ste||ar
<beojan>
Another possibility is that the futures simply aren't working properly, and wait is returning before the function has actually run.
<hkaiser>
beojan: yes, finalize doesn't block
diehlpk_work has quit [Remote host closed the connection]
<hkaiser>
it just signals to the runtime to exit at some point
<beojan>
Is there some way to actually block until all running tasks are done?
<beojan>
Without having to hold on to futures for everything
<hkaiser>
stop/init exit only once everything is said and done
<beojan>
Ideally I'd like to wait until all current tasks are done, then submit a new task to every locality to collect some results, and then shutdown everything
<hkaiser>
beojan: ok - that can be done - do you need to wait globally (across localities) or just locally?
tufei_ has quit [Remote host closed the connection]
tufei__ has joined #ste||ar
<beojan>
Globally, but I can always use an action to turn a local wait into a global one, right
<hkaiser>
beojan: well, it's more complicated than that
<hkaiser>
what I can do (and was planning to do at some point anyway) is to expose the global termination detection that runs after finalize was called such that it can be used without tearing the runtime down
<hkaiser>
in order to be sure that no work is in flight anywhere anymore (including network traffic) you need to run a full-blown Djikstra termination detection algorithm across all localities
<hkaiser>
beojan: how urgent would that be? we're currently preparing a release and I'd like to defer implementing that until after that
<beojan>
None of this is urgent since we're still prototyping
<beojan>
We would like to have something working for CHEP though (so by April)
<hkaiser>
beojan: this works only if called from hpx_main, if called from another hpx thread you will need to account for that one as well
<hkaiser>
possibly not '+ 1' but '+ 2' - but that depends on your application logic
<beojan>
I don't have a hpx_main. I have something called a ThreadPoolSvc which is just a class instantiated in the main thread. That class has an initialization function where I call hpx::start and a finalize function where I call hpx::stop