#ste||ar on 2023-02-03 — irc logs at irclog.cct.lsu.edu

2021-08-06 22:55 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu

00:03 hkaiser has quit [Quit: Bye!]

00:51 weilewei31 has quit [Ping timeout: 260 seconds]

01:23 K-ballo1 has joined #ste||ar

01:24 K-ballo has quit [Ping timeout: 268 seconds]

01:24 K-ballo1 is now known as K-ballo

02:07 hkaiser has joined #ste||ar

02:16 Yorlik_ has joined #ste||ar

02:19 Yorlik has quit [Ping timeout: 265 seconds]

04:01 hkaiser has quit [Quit: Bye!]

06:50 K-ballo1 has joined #ste||ar

06:50 K-ballo has quit [Ping timeout: 252 seconds]

06:50 K-ballo1 is now known as K-ballo

07:08 Yorlik_ has quit [Read error: Connection reset by peer]

12:26 Yorlik has joined #ste||ar

13:11 hkaiser has joined #ste||ar

14:26 K-ballo1 has joined #ste||ar

14:26 K-ballo has quit [Ping timeout: 260 seconds]

14:26 K-ballo1 is now known as K-ballo

14:35 <Yorlik> hkaiser: Still getting this: "ERROR: The specified BOOST_ROOT differs from what has been used when

14:35 <Yorlik> 1> configuring and building HPX. Please use the same Boost versions. HPX

14:35 <Yorlik> 1 boost is and users is"

14:36 <Yorlik> Funnily enough the Boost versions reported are empty.

14:52 <hkaiser> uhhh

16:17 hkaiser has quit [Quit: Bye!]

17:01 diehlpk_work has joined #ste||ar

20:28 beojan has joined #ste||ar

20:31 <beojan> Is there some way to register a certain function to run when a new OS thread is started / stopped, or to just run a given function on every thread in the pool?

20:52 K-ballo1 has joined #ste||ar

20:52 K-ballo has quit [Ping timeout: 255 seconds]

20:52 K-ballo1 is now known as K-ballo

20:58 hkaiser has joined #ste||ar

21:15 <beojan> hkaiser: Is there some way to register a certain function to run when a new OS thread is started / stopped, or to just run a given function on every thread in the pool?

21:15 <hkaiser> yes

21:19 <hkaiser> beojan: https://gist.github.com/hkaiser/a473303c38979ebca93e1656667eacae

21:24 <beojan> Thanks

21:25 <hkaiser> beojan: the same can be done for stopping threads, just replace start with stop, see https://github.com/STEllAR-GROUP/hpx/blob/master/libs/core/runtime_local/include/hpx/runtime_local/thread_hooks.hpp

21:57 <beojan> OK, strangely that hasn't fixed the crash I'm experiencing. It looks like I can get through `hpx::stop()` before all tasks have finished executing.

21:57 <beojan> Is that right?

22:02 <hkaiser> beojan: stop should block until all tasks have finished

22:02 <hkaiser> does it not? that would be a bug

22:03 <beojan> Does that include tasks that are in the queue but haven't started running on a thread yet?

22:04 <hkaiser> beojan: yes

22:04 <beojan> Oh, it might be that `hpx::disconnect` ends up running before other tasks that do work.

22:04 <hkaiser> once stop exits all operations should have run to completion

22:05 <hkaiser> beojan: that could be yes - didn't you say you don't need elsticity?

22:06 <hkaiser> disconnect is for reducing the number of localities and should be called only on localities that have connected late

22:06 <beojan> We don't but finalize just get's called on one locality, right?

22:07 <hkaiser> beojan: you can call it on any locality - it will just signal everybody to start tearing the system down

22:07 <hkaiser> after finalize things will keep running until all work has finished

22:08 <hkaiser> note to self - add a check that disconnect is called on localities that have connected late only

22:14 <beojan> So will finalize return before everything has finished running?

22:24 <beojan> It appears the answer is yes, it will.

22:25 <beojan> I call hpx::async( []() { hpx::finalize(); } ).wait() and other tasks remain running after this has returned.

22:35 tufei_ has joined #ste||ar

22:53 <beojan> Another possibility is that the futures simply aren't working properly, and wait is returning before the function has actually run.

22:55 <hkaiser> beojan: yes, finalize doesn't block

22:55 diehlpk_work has quit [Remote host closed the connection]

22:55 <hkaiser> it just signals to the runtime to exit at some point

22:56 <beojan> Is there some way to actually block until all running tasks are done?

22:56 <beojan> Without having to hold on to futures for everything

22:56 <hkaiser> stop/init exit only once everything is said and done

22:59 <beojan> Ideally I'd like to wait until all current tasks are done, then submit a new task to every locality to collect some results, and then shutdown everything

23:00 <hkaiser> beojan: ok - that can be done - do you need to wait globally (across localities) or just locally?

23:01 tufei_ has quit [Remote host closed the connection]

23:01 tufei__ has joined #ste||ar

23:01 <beojan> Globally, but I can always use an action to turn a local wait into a global one, right

23:01 <hkaiser> beojan: well, it's more complicated than that

23:02 <hkaiser> what I can do (and was planning to do at some point anyway) is to expose the global termination detection that runs after finalize was called such that it can be used without tearing the runtime down

23:04 <hkaiser> in order to be sure that no work is in flight anywhere anymore (including network traffic) you need to run a full-blown Djikstra termination detection algorithm across all localities

23:06 <hkaiser> beojan: how urgent would that be? we're currently preparing a release and I'd like to defer implementing that until after that

23:07 <beojan> None of this is urgent since we're still prototyping

23:07 <beojan> We would like to have something working for CHEP though (so by April)

23:08 <hkaiser> beojan: https://github.com/STEllAR-GROUP/hpx/issues/6163

23:15 <beojan> In the meantime, is there a way to wait until all local scheduled tasks are done?

23:16 <beojan> The cross node calls in our application are pretty much entirely sequential, so network traffic isn't such a concern.

23:18 <hkaiser> beojan: something like: https://github.com/STEllAR-GROUP/flecsi2/blob/hpx_backend/flecsi/run/hpx/context.cc#L115-L126

23:20 <beojan> Thanks. I'll see if that helps

23:21 <hkaiser> beojan: this works only if called from hpx_main, if called from another hpx thread you will need to account for that one as well

23:22 <hkaiser> possibly not '+ 1' but '+ 2' - but that depends on your application logic

23:22 <beojan> I don't have a hpx_main. I have something called a ThreadPoolSvc which is just a class instantiated in the main thread. That class has an initialization function where I call hpx::start and a finalize function where I call hpx::stop

23:54 <beojan> Preliminarily it looks like that helped

23:54 <beojan> Thanks

23:55 beojan has quit [Quit: Konversation terminated!]