#ste||ar on 2019-07-11 — irc logs at irclog.cct.lsu.edu

2019-06-17 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoD: https://developers.google.com/season-of-docs/

00:05 diehlpk has joined #ste||ar

00:18 quaz0r has joined #ste||ar

00:19 <diehlpk> hkaiser, I try to compile the latest version of HPX on Daint just in case we need to run any benchmark

00:19 <hkaiser> ok

00:19 <hkaiser> problems?

00:19 <diehlpk> However, I get a very strange error message

00:20 <diehlpk> Yes, without problems I would not ask you

00:20 <hkaiser> ok, what's the problem?

00:20 <diehlpk> https://pastebin.com/zvG1Pwvv

00:20 <diehlpk> and see pm

00:21 <hkaiser> diehlpk: well, cmake doesn't find hpx

00:21 <hkaiser> do you have those files somewhere?

00:22 <hkaiser> if yes, use -DHPX_DIR=<the_dir_where_those_files_are_located>

00:23 <diehlpk> No, I saw this error while I compiled HPX

00:23 <hkaiser> huh?

00:24 <diehlpk> And therefore was not understanding what is going wrong

00:24 <diehlpk> But now I got it

00:24 <hkaiser> what was it?

00:25 <diehlpk> To embarrassing to say, Ava captured the laptop and undid some changes and I was doing still git clone octotiger and not hpx

00:26 <diehlpk> So I checked out octotiger but run the cmake to compile HPX

00:32 jaafar_ has joined #ste||ar

00:36 jaafar has quit [Ping timeout: 276 seconds]

01:08 <diehlpk> hkaiser, HPX compiled, octo is next

01:08 <diehlpk> After that I will run the test problem

01:09 <hkaiser> diehlpk: bad Ava!

01:09 <diehlpk> At least we can run the closest problem to pvfmm

01:10 <diehlpk> So gregor can run the pvfmm code tomorrow and we can send the committee some meaningless plot

01:12 <hkaiser> ok

01:12 <hkaiser> good luck

01:33 <diehlpk> No luck, octotiger does not compile

01:34 <diehlpk> hkaiser, We have compilation errors in the cuda scheduler

01:39 <diehlpk> https://pastebin.com/Wkv8E1Ua

01:41 <hkaiser> yah sure, this is invalid code

01:41 <hkaiser> diehlpk: write it as (void const*)(&cuda_multipole_interactions_kernel_rho) for now

01:50 <diehlpk> hkaiser, Thanks, octo compiled

01:50 <diehlpk> Now I am running the benchmarks

01:56 hkaiser has quit [Quit: bye]

02:22 <Yorlik> Does hpx::this_thread::sleep_for(...) also yield?

03:28 diehlpk has quit [Ping timeout: 252 seconds]

04:25 <heller> Yorlik: sure

04:26 <Yorlik> So a sleep can make a task lose control ...

04:26 <Yorlik> Thats a tricky side effect

04:27 <Yorlik> Does the sleep need task time to go away or is it strictly using the real time clock? Like hpx::this_thread::sleep_for(10s) - would it require the task to burn 10 seconds while scheduled?

05:10 <heller> No

05:10 <heller> It gets suspended, and woken up after 10 seconds

05:11 <heller> So yes, it's using a clock in the background

05:12 <Yorlik> Good :)

05:39 Yorlik has quit [Read error: No route to host]

05:41 Yorlik has joined #ste||ar

07:06 nikunj has quit [Remote host closed the connection]

08:50 rori has joined #ste||ar

10:59 hkaiser has joined #ste||ar

12:55 daissgr_work has quit [Read error: Connection reset by peer]

12:56 daissgr_work has joined #ste||ar

12:59 hkaiser has quit [Quit: bye]

13:31 aserio has joined #ste||ar

13:34 hkaiser has joined #ste||ar

13:52 daissgr has joined #ste||ar

14:03 eschnett has joined #ste||ar

14:08 quaz0r has quit [Quit: bbl]

14:34 aserio has quit [Ping timeout: 250 seconds]

15:01 aserio has joined #ste||ar

15:04 hkaiser has quit [Quit: bye]

15:25 eschnett has quit [Quit: eschnett]

15:38 aserio has quit [Ping timeout: 272 seconds]

15:42 rori has quit [Quit: byr]

15:47 aserio has joined #ste||ar

15:49 hkaiser has joined #ste||ar

15:59 <diehlpk_work> simbergm, hkaiser parsa GSoD is submitted and the student should start with the community bonding phase next month

15:59 <simbergm> diehlpk_work: thanks! let's hope we get what we asked for

15:59 <parsa> ?

16:00 <diehlpk_work> I think we should meet with her in person and help the student to setup irc

16:00 <diehlpk_work> and parsa should meet with the student once a week at the beginning

16:29 eschnett has joined #ste||ar

17:12 <Yorlik> Is there a way to have a "sleep at least", like a timed yield with no guaranteed/enforced rescheduling ?

17:15 <hkaiser> Yorlik: what should that do?

17:15 <Yorlik> wait and then give the scheduler leeway to decide when to schedule

17:16 <hkaiser> sleep_for just guarantees that the task doesn't come back earlier than specified already

17:16 <Yorlik> Or is that happening anyways?

17:16 <Yorlik> OK - then it's already implemented like that

17:16 <Yorlik> I realize one particular difficulty this app lockup I had (seems fixed now, still testing):

17:17 <Yorlik> Since the output is totally async I never know where I am in the app

17:17 <Yorlik> Is there a built-in way to have sync output?

17:17 <hkaiser> use std::flush

17:17 <hkaiser> that should do the trick

17:17 <Yorlik> Does that also work when using printf?

17:17 <Yorlik> like fater

17:17 <Yorlik> after

17:18 <hkaiser> no

17:18 <Yorlik> I prefer printf for the built in formatting

17:18 <hkaiser> there you use fflush

17:18 <Yorlik> ok - seems I'll have top use fmt

17:18 <Yorlik> Ah ..

17:18 <Yorlik> like fflush(); ?

17:18 <hkaiser> you shouldn't rely on kernel io

17:18 <hkaiser> like std::cout or printf

17:18 <Yorlik> What would you recommend top use?

17:19 <Yorlik> Write my own logger task?

17:19 <hkaiser> hpx::cout and hpx::util::format or fmt{}

17:19 <Yorlik> oh - didn't know hpx has fmt

17:19 <Yorlik> I'll look that up

17:20 <hkaiser> or move the output code into a lambda that you run through hpx::threads::run_as_os_thread(f)

17:20 <hkaiser> the important part is that you don't suspend any hpx threads/tasks with a kernel mutex

17:20 <Yorlik> BTW: Indeed it was a race

17:20 <Yorlik> Thomas had the glorious idea to test with thread=1 and it all worked

17:21 <Yorlik> Seems I have to learn how to trace races :)

17:21 <Yorlik> the async output was a major hurdle debugging this

17:24 <hkaiser> it didn't even work with hpx:threads=1 for me

17:24 <Yorlik> I probably already had another version

17:25 <hkaiser> Yorlik: but yah, it's easy to always suspect HPX is broken... you would never claimed that if your version hanged using std::thread

17:26 <Yorlik> he race was, that under certain conditions a worker couzld get a limit of -1 (FFFFFFFF...) and create an insanely huge batch and never return from it

17:26 <Yorlik> Yes

17:26 <Yorlik> std::thread worked perfectly

17:26 aserio has quit [Ping timeout: 250 seconds]

17:26 <Yorlik> Now I need to find a good way to manage yields and not add a ton of overhead, like it is right now

17:27 <Yorlik> Controlling granularity efficiently is a thing I have to learn now.

17:28 <hkaiser> Yorlik: I'd suggest creating smaller tasks instead of yielding

17:29 <Yorlik> They are already pretty short

17:29 <hkaiser> how short?

17:29 <Yorlik> With OS threads sub microseconds

17:29 <Yorlik> Only in this specific test case ofc

17:30 <hkaiser> that's way too short for std::thread, even for hpx::threads

17:31 <Yorlik> Here are some test runs: https://gist.github.com/McKillroy/d8d04acd97ec4665f60fc9ef4456e43a

17:31 <Yorlik> thats from the non hpx version

17:31 <hkaiser> so your task is running for 4.2 seconds?

17:31 <Yorlik> the program

17:31 <Yorlik> The entire task yes

17:32 <hkaiser> that's what I meantt, create smaller tasks

17:32 <hkaiser> ~200us is a good measure

17:32 <Yorlik> Buit there could be MUCH more tasks (consumers)

17:32 <Yorlik> Thats where HPX will shine

17:32 <hkaiser> you don't gain anything from hpx if your task runs for 4 s each

17:32 <Yorlik> My plan for the nmext test is to launch like 50 consumers and do it both ways - OS and HPX

17:33 <Yorlik> I expect the OS swap penalty to be brutal

17:33 <Yorlik> e.g. look at this:

17:33 <Yorlik> Swarm [48] has returned ...

17:33 <Yorlik> Swarm [49] has returned ...

17:33 <Yorlik> Data sz: 1024

17:33 <Yorlik> RBuf sz: 1073741888, RBuf slots: 1048576 ( 1073741824 Bytes Buffer )

17:33 <Yorlik> Producer produced 1048576 items in 129.22 seconds = 8114.41 OPs/sec. = 8.31 MB/sec, Batches: 1485381 = 0.71 runs/batch average

17:33 <Yorlik> Results:

17:33 <Yorlik> Checker checked 1048576 items in 129.22 seconds = 8114.41 OPs/sec., error_count = 0, Batches: 1342626 = 0.78 runs/batch average

17:33 <Yorlik> Changer changed 1048576 items in 129.22 seconds = 8114.41 OPs/sec., MB changed = 1073.74, Batches: 509 = 2060.07 runs/batch average

17:34 <Yorlik> Checker2 checked 1048576 items in 129.22 seconds = 8114.41 OPs/sec., error_count = 0, Batches: 10 = 104857.60 runs/batch average

17:34 <Yorlik> Average Latency: 123.2375698.4 microseconds

17:34 <Yorlik> Measured runtime = 129.22395710.6 seconds

17:34 <Yorlik> Done !

17:34 <Yorlik> Thats a run with 54 consumers

17:34 <Yorlik> Still pretty low througfhput

17:34 <Yorlik> But quite a bunch of threads

17:34 <hkaiser> Yorlik: let me repeat what I said

17:34 <Yorlik> my 4 default consumers + 50 swarm

17:34 <hkaiser> you don't gain anything from hpx if your task runs for 4 s each

17:35 <Yorlik> A single step is a run around the buffer - thats super short

17:35 <hkaiser> ok

17:35 <Yorlik> the setup is like - I give you a set time and see how far the system goes

17:35 <Yorlik> then they start running in circles

17:36 aserio has joined #ste||ar

17:36 <Yorlik> I expect once we have message handlers using Lua for consuming the single items will be much slower

17:36 <Yorlik> means tha batches too

17:36 <hkaiser> whatever, you apparently don't want to listen to what I said

17:37 <Yorlik> I cannot let 50 threads run for 4 seconds without yield

17:37 <hkaiser> Yorlik: I didn't say you should

17:37 <Yorlik> So the question is when to yield?

17:37 <hkaiser> I said you should _not_ have tasks that run for more than a couple 100us

17:37 <Yorlik> But they run inside the HPX workers

17:37 <Yorlik> Yes

17:37 <hkaiser> Yorlik: whenever you want to yield stop this thread and create a new one

17:38 <Yorlik> In the moment the yields happen way to often - after each batch of items

17:39 <Yorlik> What would that be good for?

17:39 <Yorlik> Once a consumer is biting the tail of its predecessor I yield

17:40 <Yorlik> What I think I should do is to check the current batch size and not yield if it was too short

17:41 <hkaiser> Yorlik: instead of spin-looping on an atomic

17:41 <Yorlik> err - the other way around

17:42 <Yorlik> There is no cas loop in my code btw.

17:42 daissgr has quit [Ping timeout: 272 seconds]

17:42 <Yorlik> just acquire - release

17:48 <Yorlik> there are few loops for safety checks, but not in the hot path

18:11 aserio has quit [Quit: aserio]

18:20 aserio has joined #ste||ar

18:50 aserio has quit [Ping timeout: 252 seconds]

18:50 aserio1 has joined #ste||ar

18:53 aserio1 is now known as aserio

19:16 aserio has quit [Ping timeout: 245 seconds]

19:20 aserio has joined #ste||ar

20:18 hkaiser has quit [Quit: bye]

20:23 aserio has quit [Ping timeout: 268 seconds]

20:25 aserio has joined #ste||ar

20:35 eschnett has quit [Quit: eschnett]

21:06 K-ballo1 has joined #ste||ar

21:07 K-ballo has quit [Ping timeout: 268 seconds]

21:07 K-ballo1 is now known as K-ballo

21:16 aserio has quit [Quit: aserio]

21:50 nikunj has joined #ste||ar

23:52 jaafar_ is now known as jaafar