hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
bita has joined #ste||ar
diehlpk__ has joined #ste||ar
<diehlpk__> hkaiser, I see the same behavior as Dominic on Cori for > 64 nodes
weilewei has joined #ste||ar
<weilewei> hkaiser Can I ask for writing master project at home tomorrow? I will be reachable via online at irc or telegram though
diehlpk__ has quit [Ping timeout: 240 seconds]
<hkaiser> weilewei: sure
<weilewei> hkaiser thanks
<hkaiser> :D
<hkaiser> you don't have to thank me
<weilewei> yea :)
hkaiser has quit [Quit: bye]
weilewei has quit [Remote host closed the connection]
<simbergm> I've just updated the cdash urls that pycicle generates for cdash
<simbergm> they should point to the full build history now, but let me know if you notice any problems with it
Abhishek09 has joined #ste||ar
Abhishek09 has quit [Remote host closed the connection]
<mdiers_> simbergm: i have patched the master manual with https://github.com/STEllAR-GROUP/hpx/pull/4306.patch . i still have the old problem then. is there anything else i have overlooked?
Yorlik has quit [Read error: Connection reset by peer]
Yorlik has joined #ste||ar
hkaiser has joined #ste||ar
<simbergm> mdiers_: as far as I remember that should be it...
<simbergm> does that mean master with that patch is exactly the same as master without the patch (i.e. all three executors behave wrongly)?
<hkaiser> simbergm: I should have some time today to go over the PRs, sorry for the delay
<simbergm> hkaiser: no worries and thanks! sorry if I seem pushy, I just want things to move along wherever they can :)
<hkaiser> sure, I fully support that and I'm glad you push things
<simbergm> really we need more people reviewing cough jbjnr (who is not online...)
diehlpk__ has joined #ste||ar
nikunj97 has joined #ste||ar
nikunj has joined #ste||ar
diehlpk__ has quit [Remote host closed the connection]
<Yorlik> Is there a way to get all worker threads and run a task on each one of them for initialization or reload?
diehlpk__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 256 seconds]
<Yorlik> I want to be able to soft-restart my server to speed up script development, that's why.
<hkaiser> Yorlik: not sure what you're after
<hkaiser> if you need one task per core, start as many tasks as you have cores - workstealing will do the rest
<Yorlik> Each worker thread has a pool of Lua Engines. I need to be able to tell each worker thread to destroy it's Lua Engines
<hkaiser> nod, understand
<Yorlik> So the freshly created Lua states reflect the changes made to scripts
<hkaiser> the only way to guarantee running things on each core is to reschedule tasks until that criteria is satisfied
<Yorlik> Every worker has it's own pool
<hkaiser> helleo_world_X is doing that for instance
<hkaiser> another otion would be to start N tasks and use a barrier for them to wait for each other
<hkaiser> but even then there is no guarantee
nikunj97 has joined #ste||ar
<mdiers_> simbergm: the block_executor and static_priority_queue_attached_executor behave wrongly, the default_executor behave correctly
<hkaiser> Yorlik: I think working with thread_locals makes it difficult in your case
<Yorlik> I could use a static global pool and the lcoal pools borrow engines from there
<hkaiser> using a vector that holds the data for all cores would simplify this without performance hit as long as you place each of the elements in the vector in its own cache line
<hkaiser> then you could use one core to clean up things, if needed
diehlpk__ has quit [Ping timeout: 245 seconds]
<Yorlik> OK. I'll come up with something. Thanks for the info!
nikunj has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
nikunj97 has quit [Ping timeout: 245 seconds]
<simbergm> mdiers_: hrm :/
<simbergm> do you feel adventurous? would you mind trying 4301 instead? I wouldn't want to spend too much time on fixing things that will be deprecated
<mdiers_> yeap, confused
nikunj97 has joined #ste||ar
<Yorlik> Is there a way to join() and recreate all worker threads or would I have to reboot the entire hpx runtime?
nikunj has quit [Ping timeout: 256 seconds]
<mdiers_> simbergm: yes, I'm always adventurous, but i think it makes more sense to wait until they are in the master and then take another look at it. when will it be? ;-)
<hkaiser> well, simply don't schedule any work ;-)
<hkaiser> why do you want to stop the worker threads?
<hkaiser> but yah, if you need that you need to re-init hpx
<simbergm> mdiers_: yeah, sure, might make more sense
<hkaiser> I'd advise against that
<simbergm> no promises on when it'll be in but a few weeks at most I hope
nikunj97 has quit [Read error: Connection reset by peer]
<Yorlik> hkaiser: Honestly - it feels a bit like I'm not in control of my system. I want to be able to initialize worker threads and assign them their own resources (here: Lua States) and I want to be able to reload/reinitialize these resources.
<Yorlik> It's a helpt for anyone scripting and who needs to reload the lua states
nikunj has joined #ste||ar
<Yorlik> Restarting the server and reloading all data is just overkill for that
<Yorlik> Since lua states are created as needed i can just kill them all, but that must be guaranteed
<Yorlik> And doing a check on every single instance I call into a lua state sucks.
<hkaiser> Yorlik: as said, don't use thread_local foo but vector<foo> instead
<hkaiser> one element per core
<hkaiser> even vector<util::cache_line_data<foo>>
nikunj has quit [Client Quit]
<hkaiser> that makes sure the data is aligned to a cache line
<Yorlik> I'll probably use a static object with a function that has the thread id as parameter
<Yorlik> So its auto thread safe
<Yorlik> And just give each worker it's own vector
<hkaiser> but then you can't directly kill things
<Yorlik> When the simulation is stopped it is safe to just empty them all.
<hkaiser> ok
<hkaiser> need to go, sorry
<Yorlik> OFC I cannot kill lua states while they are being called
<Yorlik> NP - bye !
hkaiser has quit [Quit: bye]
nikunj has joined #ste||ar
<mdiers_> simbergm: Well, I would have liked it a little sooner, but I'll be able to live with... ;-)
<simbergm> mdiers_: I would also like that
<simbergm> I think it is pretty much ready though so I'll try to push it along so that you can try it out a bit sooner
<mdiers_> simbergm: many thanks for this!
nikunj97 has joined #ste||ar
nikunj has quit [Ping timeout: 240 seconds]
nikunj97 has quit [Remote host closed the connection]
nikunj has joined #ste||ar
nikunj97 has joined #ste||ar
nikunj has quit [Ping timeout: 272 seconds]
hkaiser has joined #ste||ar
rtohid has joined #ste||ar
weilewei has joined #ste||ar
hkaiser_ has joined #ste||ar
hkaiser has quit [Ping timeout: 240 seconds]
hkaiser_ has quit [Quit: bye]
hkaiser has joined #ste||ar
MatrixBridge has joined #ste||ar
MatrixBridge has left #ste||ar ["User left"]
MatrixBridge has joined #ste||ar
MatrixBridge has left #ste||ar ["User left"]
<simbergm> testing, testing
<simbergm> greetings from the matrix hpx channel
<hkaiser> greeting back
<zao> beep boop
nikunj97 has quit [Ping timeout: 256 seconds]