#ste||ar on 2020-03-09 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

00:24 nikunj has quit [Ping timeout: 255 seconds]

00:25 nikunj has joined #ste||ar

01:22 diehlpk has joined #ste||ar

01:22 diehlpk has quit [Changing host]

01:28 diehlpk has quit [Ping timeout: 256 seconds]

01:33 gonidelis has joined #ste||ar

01:38 gonidelis has quit [Ping timeout: 240 seconds]

02:11 hkaiser has quit [Quit: bye]

06:06 nikunj97 has joined #ste||ar

06:55 mdiers_ has quit [Quit: mdiers_]

06:55 mdiers_ has joined #ste||ar

06:56 mdiers_ has quit [Client Quit]

06:57 mdiers_ has joined #ste||ar

07:18 <mdiers_> simbergm: yt?

07:32 <simbergm> mdiers_: online, but can I ping you a bit later?

07:35 <mdiers_> yes of course

08:11 <simbergm> mdiers_: still there?

08:22 <mdiers_> here

08:22 <mdiers_> I have a problem with the schedule hint from the executors. I saw that you are already working on it: #4306 #4301

08:23 <mdiers_> A little test tool for my problem is here: https://gist.github.com/m-diers/853c980bbd45e027a12e906e015470e3

08:27 <mdiers_> Did i take the right executors there? Since the attached_executors are deprecated, which one should I use now?

08:39 <simbergm> mdiers_: so... it seems like you only want to schedule threads on a particular worker thread (not a range), is that correct?

08:39 <simbergm> in that case using scheduling hints is most likely easiest

08:39 <simbergm> but as you've already seen things are a bit in flux

08:41 <simbergm> so currently on master you can use either default_executor which has a constructor that takes a scheduling hint

08:41 <simbergm> or do what you've done now with the attached_executor

08:41 <simbergm> they're more or less equivalent in behaviour except that the attached executor allows you to specify a range of threads to schedule threads on

08:42 <simbergm> after things have been a bit cleaned up the default executor will just be an alias to the parallel executor (which currently doesn't take a scheduling hint, but will)

08:43 <simbergm> so usage of the default executor should remain more or less unchanged (it's just the implementation that will change)

08:43 <mdiers_> i would like to work only numa safe, one work job per numa.

08:44 <simbergm> the attached executor will also get a replacement but it'll be renamed (because the static_priority/local/etc. part of attached executor is a lie)

08:44 <simbergm> one work job per numa? does that mean only one task per numa node?

08:46 <mdiers_> one work job per numa, which can be processed in parallel on the numa node

08:48 <simbergm> I see

08:48 <mdiers_> see numa_hint_server:work_load

08:49 <simbergm> for that case there is also a numa scheduling hint, but only the shared_priority_queue_scheduler actually takes that into account

08:49 <simbergm> so I'd recommend you use block_executor

08:49 <simbergm> it uses the attached executors as an implementation detail

08:49 <simbergm> there's an example somewhere, let me dig it out

08:51 <mdiers_> thanks, i will then adapt the example and test it. currently i get a random distribution which is not bound to the numa nodes.

08:51 <simbergm> mdiers_: see e.g. here https://github.com/STEllAR-GROUP/tutorials/blob/master/examples/05_saxpy/parallel_numa.cpp

08:52 <simbergm> yeah, note that currently even if you specify that tasks should be scheduled on a specific thread or numa node, they may still get stolen to another worker thread

08:53 <mdiers_> yeah, but all?

08:53 <simbergm> if you really must have tasks stay on the worker thread the shared priority queue scheduler supports some experimental options to make sure they stay, but that's very much wip

08:54 <simbergm> hard to say, they should not all be randomly distributed, but we may of course have a bug...

08:54 <simbergm> if you have a small example that demonstrates it that would be helpful

08:56 <simbergm> https://gist.github.com/m-diers/853c980bbd45e027a12e906e015470e3#file-testnumacompute-cpp-L57-L58

08:56 <simbergm> https://github.com/STEllAR-GROUP/hpx/blob/0d54cea69d5055ed89e22eabc6388f28e18f7846/hpx/runtime/threads/executors/thread_pool_attached_executors.hpp#L105-L109

08:57 <simbergm> note that the constructor takes the index of the first worker thread and then the number of worker threads

09:00 <simbergm> I have trouble telling if that's what you're passing to the executor

09:00 <mdiers_> yes, I know. the distribution is determined by numa_hint_server:num_pus

09:11 <mdiers_> is there a function to create hpx::threads::mask_type from the index of the first worker thread and then the number of worker threads?

09:11 nikunj has quit [Read error: Connection reset by peer]

09:12 nikunj has joined #ste||ar

09:12 <simbergm> not sure if it's public but the block executor does roughly that internally

09:12 <simbergm> it gets a mask from a target

09:12 <simbergm> mdiers_: ^

09:13 Abhishek09 has joined #ste||ar

09:15 <mdiers_> Yeah, I just meant the other direction. ;-)

09:16 nikunj has quit [Ping timeout: 240 seconds]

09:17 Abhishek09 has quit [Remote host closed the connection]

09:17 nikunj has joined #ste||ar

09:20 <simbergm> target from a mask? not sure...

09:22 <mdiers_> no, mask from a index and count

09:26 nikunj has quit [Read error: Connection reset by peer]

09:26 nikunj has joined #ste||ar

09:28 nikunj has quit [Read error: Connection reset by peer]

09:29 nikunj has joined #ste||ar

09:34 nikunj has quit [Read error: Connection reset by peer]

09:35 nikunj has joined #ste||ar

09:41 <mdiers_> simbergm: same behaviour with static_priority_queue_attached_executor, default_executor executor und block_executor

09:43 Abhishek09 has joined #ste||ar

09:49 nikunj has quit [Read error: Connection reset by peer]

09:53 Abhishek09 has quit [Remote host closed the connection]

10:17 <simbergm> mdiers_: could you show how you used default_executor and block_executor, and how you determine that tasks are distributed randomly?

10:30 <mdiers_> simbergm: one moment please

10:42 <mdiers_> simbergm: updated the gist. the check is in update_hint

10:51 <mdiers_> simbergm: https://gist.github.com/m-diers/853c980bbd45e027a12e906e015470e3#file-testnumacompute-cpp-L144-L145

11:06 <simbergm> what is your system topology (multiple numa nodes, etc.)? what do you expect num_hint to be and what is it in your case? do you run this once on every numa node at the same time?

11:07 <simbergm> you can try if `--hpx:numa-sensitive=2` does anything

11:07 <simbergm> mdiers_: ^

11:13 MatrixBridge has joined #ste||ar

11:13 MatrixBridge has left #ste||ar ["User left"]

11:13 <mdiers_> simbergm: the system topology: https://gist.github.com/m-diers/853c980bbd45e027a12e906e015470e3#file-epyc-png

11:14 MatrixBridge has joined #ste||ar

11:14 MatrixBridge has left #ste||ar ["User left"]

11:14 MatrixBridge has joined #ste||ar

11:14 MatrixBridge has left #ste||ar ["User left"]

11:21 <mdiers_> simbergm: the hpx-config: https://gist.github.com/m-diers/853c980bbd45e027a12e906e015470e3#gistcomment-3205324

11:25 <mdiers_> simbergm: some behaviour with `--hpx:numa-sensitive=2`

11:27 <simbergm> mdiers_: can you remove the sleep_for?

11:27 <mdiers_> simbergm: there is a scheduling-loop to utilize all numa nodes: https://gist.github.com/m-diers/853c980bbd45e027a12e906e015470e3#file-testnumacompute-cpp-L212-L220

11:27 <simbergm> ah, you get the worker num before sleeping...

11:29 <simbergm> mdiers_: thanks, I have something to look into now

11:29 <simbergm> I suppose your gist compiles standalone?

11:29 <mdiers_> some behaviour without sleep_for, because deadlocks in the hpx scheduler

11:29 <simbergm> "because deadlocks in the hpx scheduler"? can you be more specific? ;)

11:33 <mdiers_> runs without any result and there is no end. i had it in the debugger before, and all threads got stuck somewhere in the scheduler.

11:36 <mdiers_> #4410 could also be based on my problem, I have not tested with the 1.3

11:47 hkaiser has joined #ste||ar

11:54 <hkaiser> hey simbergm, godd morning

11:55 <hkaiser> simbergm: would you be able to slap the piz-daint builders to do something about #4294? they are sitting there for several days already...

11:55 <hkaiser> *good morning*

11:55 <simbergm> hkaiser: hey

11:55 <simbergm> yeah, I'm working on it

11:56 <hkaiser> ahh, great!

11:56 <simbergm> (updating the gcc-oldest builder as well)

11:56 <hkaiser> perfect!

11:56 <simbergm> I think there's been some maintenance so the builders were down

12:45 <mdiers_> simbergm: a backtrace of the run with no end: https://gist.github.com/m-diers/853c980bbd45e027a12e906e015470e3#gistcomment-3205387

13:01 K-ballo1 has joined #ste||ar

13:01 K-ballo has quit [Ping timeout: 258 seconds]

13:01 K-ballo1 is now known as K-ballo

13:46 hkaiser has quit [Quit: bye]

13:46 hkaiser has joined #ste||ar

13:57 hkaiser has quit [Ping timeout: 240 seconds]

14:13 diehlpk has joined #ste||ar

14:13 diehlpk has quit [Changing host]

14:13 diehlpk has joined #ste||ar

14:36 diehlpk_work has joined #ste||ar

14:36 <diehlpk_work> simbergm, Shoud we write a blog post about GSoC?

14:36 <diehlpk_work> If so, dou you have time to write one?

14:43 <diehlpk_work> From my feeling not so many students showed up so far.

14:43 <diehlpk_work> Students apply: March 16 - 31, 2020

14:44 diehlpk__ has joined #ste||ar

14:47 diehlpk has quit [Ping timeout: 240 seconds]

14:51 gonidelis has joined #ste||ar

15:04 mdiers_ has quit [Quit: mdiers_]

15:05 mdiers_ has joined #ste||ar

15:12 hkaiser has joined #ste||ar

15:23 <Yorlik> hkaiser: yt?

15:28 <Yorlik> So - it seems I finally manage to get some race conditions:

15:28 <Yorlik> are used in parallel loops to update entities.

15:28 <Yorlik> functions, some of these using hpx::async to retrieve a value, sometimes with

15:28 <Yorlik> I have Lua States which are thread_local static on the HPX worker threads and which

15:28 <Yorlik> In the update function of the entities, on the lua side, Lua can call back into exposed C++

15:28 <Yorlik> future.get() built-in to the exposed C++ function, to return the resolved value from the future

15:28 <Yorlik> (For development only).

15:28 <Yorlik> All these exposed functions which call hpx::async crash the Lua engine sooner or later.

15:28 <Yorlik> If I create many many entities such, that the chunks on the parallel loop which runs

15:28 <Yorlik> the updates are really big it takes a longer time until lua crashes with all sorts of errors,

15:28 <Yorlik> usually something being a nullptr deep inside the Lua Call Stack or an unuathorized access to something.

15:28 <Yorlik> So the call sequence is:

15:28 <Yorlik> Parallel Loop -> Update() inside Lua -> Some Exposed Function In C++ -> hpx::async... crash.

15:28 <Yorlik> Other exposed C++ functions work as intended and the engine doesn't crash.

15:28 <Yorlik> It does not play any role if I .get() the future returned by async inside

15:28 <Yorlik> the c++ function or not it crashes either way..

15:28 <Yorlik> Since all parameters to the actions are copied and the reads are done on structures

15:28 <Yorlik> outside of Lua I don't understand what's going wrong here.

15:28 <Yorlik> It's really hard to debug, since the crash happens deep inside the Lua engine, at different places.

15:28 <Yorlik> Ideas where and how to look?

15:33 hkaiser_ has joined #ste||ar

15:33 <hkaiser_> Yorlik: are you sure that the HPX threads are not moved to another core?

15:34 <Yorlik> You mean my thread_local stuff could become invalid ?

15:34 <hkaiser_> right

15:34 <Yorlik> I thought HPX workers are pinned?

15:34 <Yorlik> Could they be not-pinned?

15:35 <hkaiser_> the worker threads are pinned, the HPX threads (tasks) can move around

15:35 hkaiser has quit [Ping timeout: 258 seconds]

15:35 <Yorlik> I looked if there was a way to yield, like a hpx sleep - nothing

15:35 <Yorlik> Does an async allow a yield?

15:36 <Yorlik> I though it just gives you a future and done?

15:36 <hkaiser_> sure hpx::this_thread::yield() (equivalent to std::this_thread::yield())

15:36 <hkaiser_> but this is unrelated, isn't it?

15:36 <Yorlik> I have no idea where it might yield+

15:36 <Yorlik> I was thinking of that actually

15:36 <hkaiser_> HPX threads (tasks) can move to any other core at any suspension point (future::get() or similar)

15:36 <Yorlik> Ohh !

15:37 <Yorlik> Actually that might be the problem

15:37 <hkaiser_> it's task stealing after all

15:37 <Yorlik> Yes - makes sense

15:37 <Yorlik> I totally had not thought of get()

15:37 <Yorlik> But I remember crashes without get()

15:37 <Yorlik> Lemme run a test really quick

15:38 bita_ has joined #ste||ar

15:40 diehlpk_work has quit [Ping timeout: 240 seconds]

15:41 bita has quit [Ping timeout: 256 seconds]

15:42 diehlpk_work has joined #ste||ar

15:42 <Yorlik> I have a function which does not get() the future and still crashes. Others using or not using get() don't crash.

15:43 <Yorlik> hkaiser ^^

15:44 <Yorlik> E.G. this one likes to cause crashes:

15:44 <Yorlik> void send_object_message( hpx::id_type id, M msg ) {

15:44 <Yorlik> hpx::async<agns::gameobject::gameobject::send_message_action<M>>( id, msg );

15:44 <Yorlik> }

15:44 <hkaiser_> Yorlik: it might suspend inside HPX whenever it needs to acquire a lock

15:44 <Yorlik> It's the tem,plated one - but it runs for a while before Lua goes down.

15:45 <hkaiser_> I wouldn't rely on just .get() suspending

15:45 <Yorlik> So - the yield may happen, even before the future is returned?

15:46 <Yorlik> This causes a major problem - I need this one task pinned

15:46 <Yorlik> Because of the Lua Engine being thread local - otherwise it gets really really messy and ugly.

15:46 <Yorlik> BTW - does thread_local also mean 2core local" ??

15:47 <Yorlik> I might have to rethink the storage type of my lua engines then.

15:48 <hkaiser_> thread_local means local to the current kernel thread

15:48 <hkaiser_> each kernel thread has its own instance

15:48 <Yorlik> So that shouzld work for a worker.

15:48 <Yorlik> But you are saying the worker might change inside the task, right?

15:48 <hkaiser_> if it's not running on a HPX thread, sure

15:49 <Yorlik> I am using the hpx parrallel loop

15:49 <hkaiser_> a task can start on one kernel thread (core), suspend, and resume on another core

15:49 <Yorlik> Which means the lua enbgine is gone

15:49 <hkaiser_> sure, parallel loops create tasks and suspend things

15:49 <hkaiser_> if needed

15:49 <Yorlik> If that happen inside a running Lua script I have a proiblem

15:49 <hkaiser_> indeed

15:50 <Yorlik> So if the task yields and switches workers while I am callingf a C++ function ...

15:50 <Yorlik> That's pretty horrible.

15:51 <Yorlik> I need to think.

15:51 <hkaiser_> Yorlik: we've had the same problem when implementing HPX Lua bindings

15:51 <hkaiser_> I don't really remember how we solved things, need to ask around

15:52 <hkaiser_> Yorlik: but all the code is here: https://github.com/STEllAR-GROUP/hpx_script feel free to find out

15:52 <Yorlik> I'm sure I could code my way arouznd somehow, but I'm afraid efficiency might suffer a lot.

15:52 <Yorlik> I'll look into it

15:53 <hkaiser_> Yorlik: there is not much we can do at this point

15:53 <Yorlik> I think just getting a future shouzld not yield.

15:53 <Yorlik> It's pretty harsh if i cannot rely on anything thread_local almost ever

15:54 <Yorlik> Even a simßple assignment of a future could explode

15:54 <simbergm> diehlpk_work: yes and preferably not but I can put together something simple

15:54 MatrixBridge has joined #ste||ar

15:54 MatrixBridge has left #ste||ar ["User left"]

15:54 <Yorlik> Like a = someAsync Could have no more an a to assign the result to.

15:54 MatrixBridge has joined #ste||ar

15:54 MatrixBridge has left #ste||ar ["User left"]

15:54 <Yorlik> The entire LuaStack gone ...

15:55 <diehlpk_work> simbergm, Ok, sounds good

15:55 <diehlpk_work> We are currently super busy with the SC paper

15:56 <simbergm> hkaiser_: would you have time to comment on https://github.com/STEllAR-GROUP/hpx/pull/4270#discussion_r380167118?

15:57 <simbergm> diehlpk_work: yep, ok, I'll do it this week

15:59 <Yorlik> hkaiser: I think I'll just use a luastate pool and not make the states thread local. I'll just assign a state from the pool to a task until it's finished.

16:00 gonidelis has quit [Quit: Ping timeout (120 seconds)]

16:25 K-ballo1 has joined #ste||ar

16:26 K-ballo has quit [Ping timeout: 255 seconds]

16:26 K-ballo1 is now known as K-ballo

16:27 <hkaiser_> simbergm: will look

16:40 <simbergm> diehlpk_work: hkaiser_: http://hpx.stellar-group.org/2020/03/09/hpx-accepted-for-gsoc-2020/ ok?

16:46 <simbergm> hkaiser_: also thanks!

17:02 <hkaiser_> simbergm: looks good, thanks!

17:06 <Yorlik> Just for fun: Some Lua vs. HPX code complexity basics: https://imgur.com/a/eF49yhA

17:21 <zao> hkaiser_: did you remember to say something to the SoC person from last night?

17:42 <hkaiser_> zao: I was waiting for him to show up again

17:43 <zao> Ack.

18:15 gonidelis has joined #ste||ar

18:39 hkaiser_ has quit [Ping timeout: 256 seconds]

19:21 nan has joined #ste||ar

19:21 nan is now known as Guest90784

19:23 gonidelis has quit [Ping timeout: 240 seconds]

19:27 hkaiser has joined #ste||ar

19:58 AndroUser has joined #ste||ar

20:00 AndroUser has quit [Client Quit]

20:27 <Yorlik> hkaiser: I replaced the thread local Lua states with states pulled from a thread-safe thread_local pool (to reduce contention) and can now call all these functions which used to crash the Lua Engine. I still need to test it more and check performance but at first look the issue seems to be solved.

20:31 <hkaiser> Yorlik: nice!

20:32 <hkaiser> Yorlik: you need to make sure that the hpx threads (tasks) always return to the same lua state they were launched from

20:32 <hkaiser> they should 'carry' the state around

20:32 <Yorlik> The state is kept around until the function returns

20:32 <hkaiser> that'snot what I meant

20:32 <Yorlik> Its in a unique_ptr with a custom deleter which gives it back to the pool after it goes out of scope

20:33 <hkaiser> the task needs to always return to its originating state

20:33 <Yorlik> So the state is guaranteed to be the same - it's no longer thread_local.

20:33 <hkaiser> ok, it's passed to the task, good

20:34 <Yorlik> I just use get_luaengine inside my updater inside the parloop

20:34 <Yorlik> Then the real updated in Lua gets called.

20:34 <Yorlik> updater

20:35 <Yorlik> Even if the task gets moved to another core - the luastate is given back after the lua function returns.

20:35 <Yorlik> That was different wioth the thread local engines i used.

20:35 <Yorlik> They just poofed aay on a task migration

20:35 hkaiser has quit [Quit: bye]

20:44 <diehlpk_work> simbergm, Looks good

21:16 Guest90784 has quit [Remote host closed the connection]

21:17 nanmiao has joined #ste||ar

21:33 hkaiser has joined #ste||ar

23:01 diehlpk__ has quit [Ping timeout: 268 seconds]

23:07 hkaiser has quit [Ping timeout: 256 seconds]

23:12 nanmiao has quit [Remote host closed the connection]

23:20 hkaiser has joined #ste||ar