hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
nikunj has quit [Ping timeout: 255 seconds]
nikunj has joined #ste||ar
diehlpk has joined #ste||ar
diehlpk has joined #ste||ar
diehlpk has quit [Changing host]
diehlpk has quit [Ping timeout: 256 seconds]
gonidelis has joined #ste||ar
gonidelis has quit [Ping timeout: 240 seconds]
hkaiser has quit [Quit: bye]
nikunj97 has joined #ste||ar
mdiers_ has quit [Quit: mdiers_]
mdiers_ has joined #ste||ar
mdiers_ has quit [Client Quit]
mdiers_ has joined #ste||ar
<mdiers_> simbergm: yt?
<simbergm> mdiers_: online, but can I ping you a bit later?
<mdiers_> yes of course
<simbergm> mdiers_: still there?
<mdiers_> here
<mdiers_> I have a problem with the schedule hint from the executors. I saw that you are already working on it: #4306 #4301
<mdiers_> A little test tool for my problem is here: https://gist.github.com/m-diers/853c980bbd45e027a12e906e015470e3
<mdiers_> Did i take the right executors there? Since the attached_executors are deprecated, which one should I use now?
<simbergm> mdiers_: so... it seems like you only want to schedule threads on a particular worker thread (not a range), is that correct?
<simbergm> in that case using scheduling hints is most likely easiest
<simbergm> but as you've already seen things are a bit in flux
<simbergm> so currently on master you can use either default_executor which has a constructor that takes a scheduling hint
<simbergm> or do what you've done now with the attached_executor
<simbergm> they're more or less equivalent in behaviour except that the attached executor allows you to specify a range of threads to schedule threads on
<simbergm> after things have been a bit cleaned up the default executor will just be an alias to the parallel executor (which currently doesn't take a scheduling hint, but will)
<simbergm> so usage of the default executor should remain more or less unchanged (it's just the implementation that will change)
<mdiers_> i would like to work only numa safe, one work job per numa.
<simbergm> the attached executor will also get a replacement but it'll be renamed (because the static_priority/local/etc. part of attached executor is a lie)
<simbergm> one work job per numa? does that mean only one task per numa node?
<mdiers_> one work job per numa, which can be processed in parallel on the numa node
<simbergm> I see
<mdiers_> see numa_hint_server:work_load
<simbergm> for that case there is also a numa scheduling hint, but only the shared_priority_queue_scheduler actually takes that into account
<simbergm> so I'd recommend you use block_executor
<simbergm> it uses the attached executors as an implementation detail
<simbergm> there's an example somewhere, let me dig it out
<mdiers_> thanks, i will then adapt the example and test it. currently i get a random distribution which is not bound to the numa nodes.
<simbergm> yeah, note that currently even if you specify that tasks should be scheduled on a specific thread or numa node, they may still get stolen to another worker thread
<mdiers_> yeah, but all?
<simbergm> if you really must have tasks stay on the worker thread the shared priority queue scheduler supports some experimental options to make sure they stay, but that's very much wip
<simbergm> hard to say, they should not all be randomly distributed, but we may of course have a bug...
<simbergm> if you have a small example that demonstrates it that would be helpful
<simbergm> note that the constructor takes the index of the first worker thread and then the number of worker threads
<simbergm> I have trouble telling if that's what you're passing to the executor
<mdiers_> yes, I know. the distribution is determined by numa_hint_server:num_pus
<mdiers_> is there a function to create hpx::threads::mask_type from the index of the first worker thread and then the number of worker threads?
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
<simbergm> not sure if it's public but the block executor does roughly that internally
<simbergm> it gets a mask from a target
<simbergm> mdiers_: ^
Abhishek09 has joined #ste||ar
<mdiers_> Yeah, I just meant the other direction. ;-)
nikunj has quit [Ping timeout: 240 seconds]
Abhishek09 has quit [Remote host closed the connection]
nikunj has joined #ste||ar
<simbergm> target from a mask? not sure...
<mdiers_> no, mask from a index and count
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
<mdiers_> simbergm: same behaviour with static_priority_queue_attached_executor, default_executor executor und block_executor
Abhishek09 has joined #ste||ar
nikunj has quit [Read error: Connection reset by peer]
Abhishek09 has quit [Remote host closed the connection]
<simbergm> mdiers_: could you show how you used default_executor and block_executor, and how you determine that tasks are distributed randomly?
<mdiers_> simbergm: one moment please
<mdiers_> simbergm: updated the gist. the check is in update_hint
<simbergm> what is your system topology (multiple numa nodes, etc.)? what do you expect num_hint to be and what is it in your case? do you run this once on every numa node at the same time?
<simbergm> you can try if `--hpx:numa-sensitive=2` does anything
<simbergm> mdiers_: ^
MatrixBridge has joined #ste||ar
MatrixBridge has left #ste||ar ["User left"]
MatrixBridge has joined #ste||ar
MatrixBridge has left #ste||ar ["User left"]
MatrixBridge has joined #ste||ar
MatrixBridge has left #ste||ar ["User left"]
<mdiers_> simbergm: some behaviour with `--hpx:numa-sensitive=2`
<simbergm> mdiers_: can you remove the sleep_for?
<mdiers_> simbergm: there is a scheduling-loop to utilize all numa nodes: https://gist.github.com/m-diers/853c980bbd45e027a12e906e015470e3#file-testnumacompute-cpp-L212-L220
<simbergm> ah, you get the worker num before sleeping...
<simbergm> mdiers_: thanks, I have something to look into now
<simbergm> I suppose your gist compiles standalone?
<mdiers_> some behaviour without sleep_for, because deadlocks in the hpx scheduler
<simbergm> "because deadlocks in the hpx scheduler"? can you be more specific? ;)
<mdiers_> runs without any result and there is no end. i had it in the debugger before, and all threads got stuck somewhere in the scheduler.
<mdiers_> #4410 could also be based on my problem, I have not tested with the 1.3
hkaiser has joined #ste||ar
<hkaiser> hey simbergm, godd morning
<hkaiser> simbergm: would you be able to slap the piz-daint builders to do something about #4294? they are sitting there for several days already...
<hkaiser> *good morning*
<simbergm> hkaiser: hey
<simbergm> yeah, I'm working on it
<hkaiser> ahh, great!
<simbergm> (updating the gcc-oldest builder as well)
<hkaiser> perfect!
<simbergm> I think there's been some maintenance so the builders were down
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 258 seconds]
K-ballo1 is now known as K-ballo
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar
hkaiser has quit [Ping timeout: 240 seconds]
diehlpk has joined #ste||ar
diehlpk has quit [Changing host]
diehlpk has joined #ste||ar
diehlpk_work has joined #ste||ar
<diehlpk_work> simbergm, Shoud we write a blog post about GSoC?
<diehlpk_work> If so, dou you have time to write one?
<diehlpk_work> From my feeling not so many students showed up so far.
<diehlpk_work> Students apply: March 16 - 31, 2020
diehlpk__ has joined #ste||ar
diehlpk has quit [Ping timeout: 240 seconds]
gonidelis has joined #ste||ar
mdiers_ has quit [Quit: mdiers_]
mdiers_ has joined #ste||ar
hkaiser has joined #ste||ar
<Yorlik> hkaiser: yt?
<Yorlik> So - it seems I finally manage to get some race conditions:
<Yorlik> are used in parallel loops to update entities.
<Yorlik> functions, some of these using hpx::async to retrieve a value, sometimes with
<Yorlik> I have Lua States which are thread_local static on the HPX worker threads and which
<Yorlik> In the update function of the entities, on the lua side, Lua can call back into exposed C++
<Yorlik> future.get() built-in to the exposed C++ function, to return the resolved value from the future
<Yorlik> (For development only).
<Yorlik> All these exposed functions which call hpx::async crash the Lua engine sooner or later.
<Yorlik> If I create many many entities such, that the chunks on the parallel loop which runs
<Yorlik> the updates are really big it takes a longer time until lua crashes with all sorts of errors,
<Yorlik> usually something being a nullptr deep inside the Lua Call Stack or an unuathorized access to something.
<Yorlik> So the call sequence is:
<Yorlik> Parallel Loop -> Update() inside Lua -> Some Exposed Function In C++ -> hpx::async... crash.
<Yorlik> Other exposed C++ functions work as intended and the engine doesn't crash.
<Yorlik> It does not play any role if I .get() the future returned by async inside
<Yorlik> the c++ function or not it crashes either way..
<Yorlik> Since all parameters to the actions are copied and the reads are done on structures
<Yorlik> outside of Lua I don't understand what's going wrong here.
<Yorlik> It's really hard to debug, since the crash happens deep inside the Lua engine, at different places.
<Yorlik> Ideas where and how to look?
hkaiser_ has joined #ste||ar
<hkaiser_> Yorlik: are you sure that the HPX threads are not moved to another core?
<Yorlik> You mean my thread_local stuff could become invalid ?
<hkaiser_> right
<Yorlik> I thought HPX workers are pinned?
<Yorlik> Could they be not-pinned?
<hkaiser_> the worker threads are pinned, the HPX threads (tasks) can move around
hkaiser has quit [Ping timeout: 258 seconds]
<Yorlik> I looked if there was a way to yield, like a hpx sleep - nothing
<Yorlik> Does an async allow a yield?
<Yorlik> I though it just gives you a future and done?
<hkaiser_> sure hpx::this_thread::yield() (equivalent to std::this_thread::yield())
<hkaiser_> but this is unrelated, isn't it?
<Yorlik> I have no idea where it might yield+
<Yorlik> I was thinking of that actually
<hkaiser_> HPX threads (tasks) can move to any other core at any suspension point (future::get() or similar)
<Yorlik> Ohh !
<Yorlik> Actually that might be the problem
<hkaiser_> it's task stealing after all
<Yorlik> Yes - makes sense
<Yorlik> I totally had not thought of get()
<Yorlik> But I remember crashes without get()
<Yorlik> Lemme run a test really quick
bita_ has joined #ste||ar
diehlpk_work has quit [Ping timeout: 240 seconds]
bita has quit [Ping timeout: 256 seconds]
diehlpk_work has joined #ste||ar
<Yorlik> I have a function which does not get() the future and still crashes. Others using or not using get() don't crash.
<Yorlik> hkaiser ^^
<Yorlik> E.G. this one likes to cause crashes:
<Yorlik> void send_object_message( hpx::id_type id, M msg ) {
<Yorlik> hpx::async<agns::gameobject::gameobject::send_message_action<M>>( id, msg );
<Yorlik> }
<hkaiser_> Yorlik: it might suspend inside HPX whenever it needs to acquire a lock
<Yorlik> It's the tem,plated one - but it runs for a while before Lua goes down.
<hkaiser_> I wouldn't rely on just .get() suspending
<Yorlik> So - the yield may happen, even before the future is returned?
<Yorlik> This causes a major problem - I need this one task pinned
<Yorlik> Because of the Lua Engine being thread local - otherwise it gets really really messy and ugly.
<Yorlik> BTW - does thread_local also mean 2core local" ??
<Yorlik> I might have to rethink the storage type of my lua engines then.
<hkaiser_> thread_local means local to the current kernel thread
<hkaiser_> each kernel thread has its own instance
<Yorlik> So that shouzld work for a worker.
<Yorlik> But you are saying the worker might change inside the task, right?
<hkaiser_> if it's not running on a HPX thread, sure
<Yorlik> I am using the hpx parrallel loop
<hkaiser_> a task can start on one kernel thread (core), suspend, and resume on another core
<Yorlik> Which means the lua enbgine is gone
<hkaiser_> sure, parallel loops create tasks and suspend things
<hkaiser_> if needed
<Yorlik> If that happen inside a running Lua script I have a proiblem
<hkaiser_> indeed
<Yorlik> So if the task yields and switches workers while I am callingf a C++ function ...
<Yorlik> That's pretty horrible.
<Yorlik> I need to think.
<hkaiser_> Yorlik: we've had the same problem when implementing HPX Lua bindings
<hkaiser_> I don't really remember how we solved things, need to ask around
<hkaiser_> Yorlik: but all the code is here: https://github.com/STEllAR-GROUP/hpx_script feel free to find out
<Yorlik> I'm sure I could code my way arouznd somehow, but I'm afraid efficiency might suffer a lot.
<Yorlik> I'll look into it
<hkaiser_> Yorlik: there is not much we can do at this point
<Yorlik> I think just getting a future shouzld not yield.
<Yorlik> It's pretty harsh if i cannot rely on anything thread_local almost ever
<Yorlik> Even a simßple assignment of a future could explode
<simbergm> diehlpk_work: yes and preferably not but I can put together something simple
MatrixBridge has joined #ste||ar
MatrixBridge has left #ste||ar ["User left"]
<Yorlik> Like a = someAsync Could have no more an a to assign the result to.
MatrixBridge has joined #ste||ar
MatrixBridge has left #ste||ar ["User left"]
<Yorlik> The entire LuaStack gone ...
<diehlpk_work> simbergm, Ok, sounds good
<diehlpk_work> We are currently super busy with the SC paper
<simbergm> hkaiser_: would you have time to comment on https://github.com/STEllAR-GROUP/hpx/pull/4270#discussion_r380167118?
<simbergm> diehlpk_work: yep, ok, I'll do it this week
<Yorlik> hkaiser: I think I'll just use a luastate pool and not make the states thread local. I'll just assign a state from the pool to a task until it's finished.
gonidelis has quit [Quit: Ping timeout (120 seconds)]
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 255 seconds]
K-ballo1 is now known as K-ballo
<hkaiser_> simbergm: will look
<simbergm> hkaiser_: also thanks!
<hkaiser_> simbergm: looks good, thanks!
<Yorlik> Just for fun: Some Lua vs. HPX code complexity basics: https://imgur.com/a/eF49yhA
<zao> hkaiser_: did you remember to say something to the SoC person from last night?
<hkaiser_> zao: I was waiting for him to show up again
<zao> Ack.
gonidelis has joined #ste||ar
hkaiser_ has quit [Ping timeout: 256 seconds]
nan has joined #ste||ar
nan is now known as Guest90784
gonidelis has quit [Ping timeout: 240 seconds]
hkaiser has joined #ste||ar
AndroUser has joined #ste||ar
AndroUser has quit [Client Quit]
<Yorlik> hkaiser: I replaced the thread local Lua states with states pulled from a thread-safe thread_local pool (to reduce contention) and can now call all these functions which used to crash the Lua Engine. I still need to test it more and check performance but at first look the issue seems to be solved.
<hkaiser> Yorlik: nice!
<hkaiser> Yorlik: you need to make sure that the hpx threads (tasks) always return to the same lua state they were launched from
<hkaiser> they should 'carry' the state around
<Yorlik> The state is kept around until the function returns
<hkaiser> that'snot what I meant
<Yorlik> Its in a unique_ptr with a custom deleter which gives it back to the pool after it goes out of scope
<hkaiser> the task needs to always return to its originating state
<Yorlik> So the state is guaranteed to be the same - it's no longer thread_local.
<hkaiser> ok, it's passed to the task, good
<Yorlik> I just use get_luaengine inside my updater inside the parloop
<Yorlik> Then the real updated in Lua gets called.
<Yorlik> updater
<Yorlik> Even if the task gets moved to another core - the luastate is given back after the lua function returns.
<Yorlik> That was different wioth the thread local engines i used.
<Yorlik> They just poofed aay on a task migration
hkaiser has quit [Quit: bye]
<diehlpk_work> simbergm, Looks good
Guest90784 has quit [Remote host closed the connection]
nanmiao has joined #ste||ar
hkaiser has joined #ste||ar
diehlpk__ has quit [Ping timeout: 268 seconds]
hkaiser has quit [Ping timeout: 256 seconds]
nanmiao has quit [Remote host closed the connection]
hkaiser has joined #ste||ar