#ste||ar on 2019-10-24 — irc logs at irclog.cct.lsu.edu

2019-06-17 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoD: https://developers.google.com/season-of-docs/

00:58 Guest70891 has quit [Ping timeout: 276 seconds]

00:59 Guest70891 has joined #ste||ar

01:22 jaafar has quit [Quit: Konversation terminated!]

01:22 Guest70891 has quit [Ping timeout: 265 seconds]

01:23 K-ballo has quit [Quit: K-ballo]

01:24 Guest70891 has joined #ste||ar

01:25 jaafar has joined #ste||ar

01:39 <jaafar> Is there any place I could log scheduling decisions within HPX? I'm seeing some idle periods that would be very interesting to understand

01:44 <jaafar> Mysterious gaps, for one - every so often a benchmark run has what appears to be idle periods in the middle

01:44 <jaafar> but there are also just "decisions" I don't understand.

01:45 <jaafar> For example, the scan algorithms begin by launching async tasks for each chunk as the first stage

01:46 <jaafar> Then all the dataflow items are entered, which depend on the async tasks and also each other

01:47 <jaafar> What I see is that in almost all cases the initial async tasks are executed before the dataflow continuations, even if the dataflow is ready to go

01:48 <jaafar> I'm going to attach some pictures to my bug report

02:07 <hkaiser> jaafar: interesting insight

02:07 <hkaiser> we have no way of logging this, sorry

02:10 <jaafar> hkaiser: if you can point me to somewhere in the code I might be able to :)

02:10 <jaafar> also see my issue update for a nice picture

02:10 <hkaiser> point where

02:10 <hkaiser> ?

02:11 <jaafar> I mean, assuming there is a point where the "next task" is chosen from a set of available work

02:11 <jaafar> a point in the code

02:11 <jaafar> I am using linux tracepoints

02:11 <hkaiser> interesting graph

02:12 <hkaiser> jaafar: I can point you to the scheduling loop

02:12 <jaafar> thanks!

02:13 <hkaiser> https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/runtime/threads/detail/scheduling_loop.hpp#L570-L1018

02:13 <hkaiser> the important pieces are: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/runtime/threads/detail/scheduling_loop.hpp#L586

02:13 <hkaiser> where the next thread is fetched

02:14 <hkaiser> and here: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/runtime/threads/detail/scheduling_loop.hpp#L667

02:14 <jaafar> thanks!

02:14 <hkaiser> where it's executed

02:15 <hkaiser> jaafar: good luck - this is the real center of the action, but usually difficult to follow, especially with more than one thread

02:15 <hkaiser> (core)

02:16 <hkaiser> jaafar: so from you picture, stage 3 starts only after stage 1 is done

02:17 <hkaiser> it should start right away, shouldn't it?

02:17 <hkaiser> do we have that over-constrained somehow? is the dependency logic too strict?

02:18 <hkaiser> essentially our algorithm is a glorified sequential one :/

02:18 <hkaiser> doh!

02:19 <hkaiser> jaafar: I'm convinced that your cache related conclusions are a red herring (sorry)

02:19 <hkaiser> I think the underlying algorithm is just plain wrong

02:20 <jaafar> hkaiser: as I (barely) understand it the dataflow items and the async tasks that are initially launched are equally valid things to run

02:20 <jaafar> because the dataflow items actually have their inputs available well before they are run

02:20 <hkaiser> well, we can raise the priority of certain tasks, if that helps

02:20 <jaafar> but the async stage 1 things happen instead - don't know why

02:20 <hkaiser> but I'm not sure we have too strict dependencies defined

02:21 <jaafar> I understand your skepticism about my caching theories :)

02:22 <jaafar> One thing I could easily do is benchmark the "warm cache" vs "cold cache" situation and measure the performance difference

02:22 <hkaiser> you can raise task priorities by using launch::async(threads::thread_priority_high)

02:23 <hkaiser> jaafar: we're talking about milliseconds here, caches will not make a dent

02:23 <hkaiser> if you tried raising the priotities of stage 2 and 3 it might change the picture

02:23 <jaafar> hkaiser: the perf data suggests L3 cache misses are dominating the performance costs

02:23 <hkaiser> nah

02:24 <jaafar> :)

02:24 <jaafar> OK!

02:24 <hkaiser> it's logic error here or some wrong assumption

02:24 <hkaiser> this is too glaring

02:24 <hkaiser> things are usually executed in the order they are scheduled

02:25 <hkaiser> so if you schedule a lot of stage 1 first before stage 3, that latter will be executed too late

02:25 <jaafar> I figured

02:25 <jaafar> well, the sad truth is I did try launching stage 2 and 3 with async and high priority

02:25 <jaafar> result: worse performance :)

02:25 <hkaiser> so doing dataflow(launch::async(thread_priority_high), f, ...) for stage 3 might change the picture as that will make those tasks execute right away

02:26 <jaafar> you would think so, wouldn't you :)

02:26 <hkaiser> ok

02:26 <hkaiser> you're way ahead of me here

02:26 <hkaiser> can you produce such an image for using high priority?

02:26 <jaafar> my intuition is clearly lacking in some important ways

02:26 <jaafar> yes! I will do that

02:26 <hkaiser> stage 2 should still be sync, I think

02:27 <hkaiser> no point in creating a separate task for those as the work is minimal

02:27 <jaafar> yeah

02:28 <hkaiser> jaafar: anyways - many thanks for your insights, very interesting!

02:28 <jaafar> you're welcome! I hope it helps

02:28 <hkaiser> it will!

02:31 <hkaiser> jaafar: one last question - how many cores did you use for creating that image ?

02:33 <jaafar> 4 cores

02:33 <hkaiser> k

02:33 <hkaiser> thanks

02:33 <jaafar> IIRC that was the sweet spot on my system

02:33 <jaafar> I think also the number of "true" cores i.e. not hyperthreading

02:33 <hkaiser> yah, you can see that, there are mostly 4 tasks runnin concurrently in stage 1

02:34 <jaafar> yep

02:34 <hkaiser> fun!

02:34 <jaafar> OK gotta get out of my window system to make the system closer to idle for benchmarking brb

02:35 jaafar has quit [Quit: Konversation terminated!]

02:36 hkaiser has quit [Quit: bye]

02:37 jaafar has joined #ste||ar

02:40 <jaafar> oops

02:40 <jaafar> well, the picture looks very similar

02:45 jaafar has quit [Quit: Konversation terminated!]

02:45 Guest70891 has quit [Ping timeout: 240 seconds]

02:45 Guest70891 has joined #ste||ar

02:46 jaafar has joined #ste||ar

02:48 <jaafar> correction: actually that did change things

02:55 jaafar has quit [Quit: Konversation terminated!]

02:58 jaafar has joined #ste||ar

03:30 Guest70891 has quit [Ping timeout: 268 seconds]

03:31 Guest70891 has joined #ste||ar

03:57 mdiers_1 has joined #ste||ar

03:59 mdiers_ has quit [Ping timeout: 265 seconds]

03:59 mdiers_1 is now known as mdiers_

04:01 weilewei has quit [Remote host closed the connection]

04:27 Guest70891 has quit [Ping timeout: 264 seconds]

04:28 Guest70891 has joined #ste||ar

05:28 Guest70891 has quit [Ping timeout: 264 seconds]

05:29 Guest70891 has joined #ste||ar

05:45 Guest70891 has quit [Ping timeout: 265 seconds]

05:47 Guest70891 has joined #ste||ar

06:18 <jbjnr> jaafar: I can help you with tracing activity in the scheduler

06:18 <jbjnr> where are these pictures?

06:23 Guest70891 has quit [Ping timeout: 268 seconds]

06:23 Guest70891 has joined #ste||ar

07:37 Guest70891 has quit [Quit: WeeChat 2.2]

07:37 Amy has joined #ste||ar

07:37 Amy is now known as Guest65867

08:09 <simbergm> jbjnr: https://github.com/STEllAR-GROUP/hpx/issues/3733#issuecomment-545710329

08:10 <jbjnr> thanks. I found them after I posted earlier.

09:26 Guest65867 has quit [Ping timeout: 240 seconds]

09:27 Guest65867 has joined #ste||ar

09:31 nikunj has quit [Remote host closed the connection]

10:07 Guest65867 has quit [Ping timeout: 240 seconds]

10:07 Guest65867 has joined #ste||ar

10:13 coldblackice has quit [Ping timeout: 240 seconds]

10:15 coldblackice has joined #ste||ar

10:41 Coldblackice_ has joined #ste||ar

10:41 coldblackice has quit [Ping timeout: 252 seconds]

11:03 K-ballo has joined #ste||ar

11:05 Coldblackice_ has quit [Ping timeout: 252 seconds]

11:56 <zao> Hey, you people have webex experience... are there usable clients on Linux, or do people just phone in somehow?

12:14 hkaiser has joined #ste||ar

12:15 coldblackice has joined #ste||ar

12:24 coldblackice has quit [Ping timeout: 268 seconds]

13:05 coldblackice has joined #ste||ar

13:14 coldblackice has quit [Ping timeout: 276 seconds]

13:25 aserio has joined #ste||ar

13:34 hkaiser has quit [Quit: bye]

14:08 hkaiser has joined #ste||ar

14:20 coldblackice has joined #ste||ar

14:27 Coldblackice_ has joined #ste||ar

14:29 coldblackice has quit [Ping timeout: 240 seconds]

14:36 Coldblackice_ has quit [Ping timeout: 240 seconds]

14:36 <heller> hkaiser: https://gist.github.com/sithhell/2e07c8c6139e9b7beec9908c75409a38

14:37 <heller> zao: web client, mostly

14:37 <heller> hkaiser: the basic stuff seems to work ;)

14:42 weilewei has joined #ste||ar

14:47 aserio1 has joined #ste||ar

14:50 aserio has quit [Ping timeout: 245 seconds]

14:50 aserio1 is now known as aserio

15:44 <hkaiser> heller: nice

15:47 aserio has quit [Ping timeout: 264 seconds]

15:48 <heller> hkaiser: i also got it down that the future itself is a sender

15:48 <heller> So that's pretty cool

15:49 <heller> With this, we can finally get non allocating futures ;)

16:17 <hkaiser> heller: you up for a challenge ?

16:17 <hkaiser> could you look at the failing test on the execution_context branch - I'm running out of ideas what's going on there

16:18 <hkaiser> heller: https://circleci.com/gh/STEllAR-GROUP/hpx/138294

16:20 aserio has joined #ste||ar

16:25 <heller> hkaiser: ugh, I didn't run into this when testing...

16:26 <heller> I can't seem to rerun it right now

16:26 <hkaiser> heller: it started to happen just recently after the last rebase

16:27 <hkaiser> it works on master, though...

16:27 <hkaiser> most likely I screwed up something, but I have no idea where

16:28 aserio has quit [Ping timeout: 240 seconds]

16:45 <heller> Ok, will have a look

16:50 aserio has joined #ste||ar

17:22 aserio has quit [Remote host closed the connection]

17:23 aserio has joined #ste||ar

17:29 aserio has quit [Ping timeout: 250 seconds]

17:58 aserio has joined #ste||ar

18:14 weilewei has quit [Remote host closed the connection]

18:50 aserio has quit [Ping timeout: 246 seconds]

18:57 jaafar has quit [Quit: Konversation terminated!]

18:58 coldblackice has joined #ste||ar

18:59 jaafar has joined #ste||ar

19:14 zbyerly has joined #ste||ar

19:15 weilewei has joined #ste||ar

19:18 coldblackice has quit [Ping timeout: 268 seconds]

19:28 coldblackice has joined #ste||ar

19:59 Coldblackice_ has joined #ste||ar

20:00 coldblackice has quit [Ping timeout: 265 seconds]

20:04 aserio has joined #ste||ar

20:08 aserio has quit [Ping timeout: 245 seconds]

20:13 Coldblackice_ has quit [Ping timeout: 264 seconds]

20:14 coldblackice has joined #ste||ar

20:17 aserio has joined #ste||ar

20:21 aserio has quit [Ping timeout: 250 seconds]

20:28 coldblackice has quit [Ping timeout: 245 seconds]

20:29 aserio has joined #ste||ar

20:33 aserio has quit [Ping timeout: 264 seconds]

20:37 hkaiser has quit [Quit: bye]

20:47 aserio has joined #ste||ar

20:49 coldblackice has joined #ste||ar

21:05 coldblackice has quit [Ping timeout: 240 seconds]

21:09 <weilewei> Sorry if this error is not related to hpx, I am getting this error after I updated hpx

21:09 <weilewei> https://gist.github.com/weilewei/01e6179f2742d66576da77ddf5fc7d4f

21:18 <K-ballo> are you using some C library that #defines B0 ?

21:19 <weilewei> That's something I could not control

21:20 <K-ballo> termios?

21:20 <K-ballo> yeah, termios

21:20 <K-ballo> bad library

21:20 <weilewei> I searched that, it does not have it?

21:21 <weilewei> Before that, I can compile the same code

21:21 <K-ballo> it doe not have what?

21:21 <weilewei> It does not have B0

21:21 <K-ballo> #define B0 0000000

21:21 <K-ballo> the error message there is telling you that "B0" is being replaced by a numeric constant

21:22 <K-ballo> you need to either fix termios, or work around it in these other libraries that use "B0" as an identifier

21:22 <weilewei> I see

21:23 <K-ballo> maybe look at your includes, see if you can keep termios walled off

21:23 <zao> Worst case, you could #undef it?

21:24 <K-ballo> possibly

21:24 <weilewei> zao also a good suggestion, let me try

21:26 coldblackice has joined #ste||ar

21:27 aserio has quit [Quit: aserio]

21:30 <weilewei> boom, #undef does the trick zao K-ballo everything complies

22:48 hkaiser has joined #ste||ar

22:49 <hkaiser> jaafar: yt?

23:55 <jaafar> hkaiser: here now!

23:57 <hkaiser> hey jaafar

23:57 <hkaiser> thanks for looking into what's going on!

23:58 <jaafar> I hope it's helpful

23:58 <hkaiser> well, it demonstrates that something is wron g;-)

23:59 <hkaiser> how much work would it be for you to do a parameter sweep over the chunk size and plot the execution time over the chunk size?