#ste||ar on 2020-03-12 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

01:04 bita has joined #ste||ar

01:28 diehlpk__ has joined #ste||ar

01:29 <diehlpk__> hkaiser, I see the same behavior as Dominic on Cori for > 64 nodes

01:48 weilewei has joined #ste||ar

01:49 <weilewei> hkaiser Can I ask for writing master project at home tomorrow? I will be reachable via online at irc or telegram though

02:46 diehlpk__ has quit [Ping timeout: 240 seconds]

03:08 <hkaiser> weilewei: sure

03:09 <weilewei> hkaiser thanks

03:09 <hkaiser> :D

03:09 <hkaiser> you don't have to thank me

03:09 <weilewei> yea :)

03:11 hkaiser has quit [Quit: bye]

05:11 weilewei has quit [Remote host closed the connection]

08:44 <simbergm> I've just updated the cdash urls that pycicle generates for cdash

08:44 <simbergm> they should point to the full build history now, but let me know if you notice any problems with it

09:37 Abhishek09 has joined #ste||ar

09:39 Abhishek09 has quit [Remote host closed the connection]

10:19 <mdiers_> simbergm: i have patched the master manual with https://github.com/STEllAR-GROUP/hpx/pull/4306.patch . i still have the old problem then. is there anything else i have overlooked?

10:47 Yorlik has quit [Read error: Connection reset by peer]

11:01 Yorlik has joined #ste||ar

12:02 hkaiser has joined #ste||ar

12:26 <simbergm> mdiers_: as far as I remember that should be it...

12:26 <simbergm> does that mean master with that patch is exactly the same as master without the patch (i.e. all three executors behave wrongly)?

12:29 <hkaiser> simbergm: I should have some time today to go over the PRs, sorry for the delay

12:30 <simbergm> hkaiser: no worries and thanks! sorry if I seem pushy, I just want things to move along wherever they can :)

12:30 <hkaiser> sure, I fully support that and I'm glad you push things

12:32 <simbergm> really we need more people reviewing cough jbjnr (who is not online...)

12:39 diehlpk__ has joined #ste||ar

12:44 nikunj97 has joined #ste||ar

12:50 nikunj has joined #ste||ar

12:52 diehlpk__ has quit [Remote host closed the connection]

12:52 <Yorlik> Is there a way to get all worker threads and run a task on each one of them for initialization or reload?

12:52 diehlpk__ has joined #ste||ar

12:53 nikunj97 has quit [Ping timeout: 256 seconds]

12:53 <Yorlik> I want to be able to soft-restart my server to speed up script development, that's why.

12:53 <hkaiser> Yorlik: not sure what you're after

12:53 <hkaiser> if you need one task per core, start as many tasks as you have cores - workstealing will do the rest

12:54 <Yorlik> Each worker thread has a pool of Lua Engines. I need to be able to tell each worker thread to destroy it's Lua Engines

12:54 <hkaiser> nod, understand

12:54 <Yorlik> So the freshly created Lua states reflect the changes made to scripts

12:55 <hkaiser> the only way to guarantee running things on each core is to reschedule tasks until that criteria is satisfied

12:55 <Yorlik> Every worker has it's own pool

12:55 <hkaiser> helleo_world_X is doing that for instance

12:56 <hkaiser> another otion would be to start N tasks and use a barrier for them to wait for each other

12:56 <hkaiser> but even then there is no guarantee

12:56 nikunj97 has joined #ste||ar

12:56 <mdiers_> simbergm: the block_executor and static_priority_queue_attached_executor behave wrongly, the default_executor behave correctly

12:57 <hkaiser> Yorlik: I think working with thread_locals makes it difficult in your case

12:57 <Yorlik> I could use a static global pool and the lcoal pools borrow engines from there

12:57 <hkaiser> using a vector that holds the data for all cores would simplify this without performance hit as long as you place each of the elements in the vector in its own cache line

12:58 <hkaiser> then you could use one core to clean up things, if needed

12:58 diehlpk__ has quit [Ping timeout: 245 seconds]

12:58 <Yorlik> OK. I'll come up with something. Thanks for the info!

12:59 nikunj has quit [Ping timeout: 256 seconds]

13:05 nikunj has joined #ste||ar

13:07 nikunj97 has quit [Ping timeout: 245 seconds]

13:08 <simbergm> mdiers_: hrm :/

13:09 <simbergm> do you feel adventurous? would you mind trying 4301 instead? I wouldn't want to spend too much time on fixing things that will be deprecated

13:10 <mdiers_> yeap, confused

13:12 nikunj97 has joined #ste||ar

13:13 <Yorlik> Is there a way to join() and recreate all worker threads or would I have to reboot the entire hpx runtime?

13:15 nikunj has quit [Ping timeout: 256 seconds]

13:18 <mdiers_> simbergm: yes, I'm always adventurous, but i think it makes more sense to wait until they are in the master and then take another look at it. when will it be? ;-)

13:18 <hkaiser> well, simply don't schedule any work ;-)

13:18 <hkaiser> why do you want to stop the worker threads?

13:18 <hkaiser> but yah, if you need that you need to re-init hpx

13:19 <simbergm> mdiers_: yeah, sure, might make more sense

13:19 <hkaiser> I'd advise against that

13:19 <simbergm> no promises on when it'll be in but a few weeks at most I hope

13:19 nikunj97 has quit [Read error: Connection reset by peer]

13:20 <Yorlik> hkaiser: Honestly - it feels a bit like I'm not in control of my system. I want to be able to initialize worker threads and assign them their own resources (here: Lua States) and I want to be able to reload/reinitialize these resources.

13:21 <Yorlik> It's a helpt for anyone scripting and who needs to reload the lua states

13:21 nikunj has joined #ste||ar

13:21 <Yorlik> Restarting the server and reloading all data is just overkill for that

13:22 <Yorlik> Since lua states are created as needed i can just kill them all, but that must be guaranteed

13:22 <Yorlik> And doing a check on every single instance I call into a lua state sucks.

13:22 <hkaiser> Yorlik: as said, don't use thread_local foo but vector<foo> instead

13:22 <hkaiser> one element per core

13:23 <hkaiser> even vector<util::cache_line_data<foo>>

13:23 nikunj has quit [Client Quit]

13:23 <hkaiser> that makes sure the data is aligned to a cache line

13:23 <Yorlik> I'll probably use a static object with a function that has the thread id as parameter

13:23 <Yorlik> So its auto thread safe

13:23 <Yorlik> And just give each worker it's own vector

13:24 <hkaiser> but then you can't directly kill things

13:24 <Yorlik> When the simulation is stopped it is safe to just empty them all.

13:24 <hkaiser> ok

13:24 <hkaiser> need to go, sorry

13:24 <Yorlik> OFC I cannot kill lua states while they are being called

13:24 <Yorlik> NP - bye !

13:25 hkaiser has quit [Quit: bye]

13:26 nikunj has joined #ste||ar

13:26 <mdiers_> simbergm: Well, I would have liked it a little sooner, but I'll be able to live with... ;-)

13:27 <simbergm> mdiers_: I would also like that

13:27 <simbergm> I think it is pretty much ready though so I'll try to push it along so that you can try it out a bit sooner

13:28 <mdiers_> simbergm: many thanks for this!

13:32 nikunj97 has joined #ste||ar

13:36 nikunj has quit [Ping timeout: 240 seconds]

13:46 nikunj97 has quit [Remote host closed the connection]

13:49 nikunj has joined #ste||ar

14:14 nikunj97 has joined #ste||ar

14:15 nikunj has quit [Ping timeout: 272 seconds]

14:28 hkaiser has joined #ste||ar

14:39 rtohid has joined #ste||ar

14:44 weilewei has joined #ste||ar

14:47 hkaiser_ has joined #ste||ar

14:49 hkaiser has quit [Ping timeout: 240 seconds]

19:37 hkaiser_ has quit [Quit: bye]

20:37 hkaiser has joined #ste||ar

21:15 MatrixBridge has joined #ste||ar

21:15 MatrixBridge has left #ste||ar ["User left"]

21:16 MatrixBridge has joined #ste||ar

21:16 MatrixBridge has left #ste||ar ["User left"]

21:18 <simbergm> testing, testing

21:19 <simbergm> greetings from the matrix hpx channel

21:19 <hkaiser> greeting back

21:20 <zao> beep boop

22:29 nikunj97 has quit [Ping timeout: 256 seconds]