hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoD: https://developers.google.com/season-of-docs/
Guest70891 has quit [Ping timeout: 276 seconds]
Guest70891 has joined #ste||ar
jaafar has quit [Quit: Konversation terminated!]
Guest70891 has quit [Ping timeout: 265 seconds]
K-ballo has quit [Quit: K-ballo]
Guest70891 has joined #ste||ar
jaafar has joined #ste||ar
<jaafar> Is there any place I could log scheduling decisions within HPX? I'm seeing some idle periods that would be very interesting to understand
<jaafar> Mysterious gaps, for one - every so often a benchmark run has what appears to be idle periods in the middle
<jaafar> but there are also just "decisions" I don't understand.
<jaafar> For example, the scan algorithms begin by launching async tasks for each chunk as the first stage
<jaafar> Then all the dataflow items are entered, which depend on the async tasks and also each other
<jaafar> What I see is that in almost all cases the initial async tasks are executed before the dataflow continuations, even if the dataflow is ready to go
<jaafar> I'm going to attach some pictures to my bug report
<hkaiser> jaafar: interesting insight
<hkaiser> we have no way of logging this, sorry
<jaafar> hkaiser: if you can point me to somewhere in the code I might be able to :)
<jaafar> also see my issue update for a nice picture
<hkaiser> point where
<hkaiser> ?
<jaafar> I mean, assuming there is a point where the "next task" is chosen from a set of available work
<jaafar> a point in the code
<jaafar> I am using linux tracepoints
<hkaiser> interesting graph
<hkaiser> jaafar: I can point you to the scheduling loop
<jaafar> thanks!
<hkaiser> where the next thread is fetched
<jaafar> thanks!
<hkaiser> where it's executed
<hkaiser> jaafar: good luck - this is the real center of the action, but usually difficult to follow, especially with more than one thread
<hkaiser> (core)
<hkaiser> jaafar: so from you picture, stage 3 starts only after stage 1 is done
<hkaiser> it should start right away, shouldn't it?
<hkaiser> do we have that over-constrained somehow? is the dependency logic too strict?
<hkaiser> essentially our algorithm is a glorified sequential one :/
<hkaiser> doh!
<hkaiser> jaafar: I'm convinced that your cache related conclusions are a red herring (sorry)
<hkaiser> I think the underlying algorithm is just plain wrong
<jaafar> hkaiser: as I (barely) understand it the dataflow items and the async tasks that are initially launched are equally valid things to run
<jaafar> because the dataflow items actually have their inputs available well before they are run
<hkaiser> well, we can raise the priority of certain tasks, if that helps
<jaafar> but the async stage 1 things happen instead - don't know why
<hkaiser> but I'm not sure we have too strict dependencies defined
<jaafar> I understand your skepticism about my caching theories :)
<jaafar> One thing I could easily do is benchmark the "warm cache" vs "cold cache" situation and measure the performance difference
<hkaiser> you can raise task priorities by using launch::async(threads::thread_priority_high)
<hkaiser> jaafar: we're talking about milliseconds here, caches will not make a dent
<hkaiser> if you tried raising the priotities of stage 2 and 3 it might change the picture
<jaafar> hkaiser: the perf data suggests L3 cache misses are dominating the performance costs
<hkaiser> nah
<jaafar> :)
<jaafar> OK!
<hkaiser> it's logic error here or some wrong assumption
<hkaiser> this is too glaring
<hkaiser> things are usually executed in the order they are scheduled
<hkaiser> so if you schedule a lot of stage 1 first before stage 3, that latter will be executed too late
<jaafar> I figured
<jaafar> well, the sad truth is I did try launching stage 2 and 3 with async and high priority
<jaafar> result: worse performance :)
<hkaiser> so doing dataflow(launch::async(thread_priority_high), f, ...) for stage 3 might change the picture as that will make those tasks execute right away
<jaafar> you would think so, wouldn't you :)
<hkaiser> ok
<hkaiser> you're way ahead of me here
<hkaiser> can you produce such an image for using high priority?
<jaafar> my intuition is clearly lacking in some important ways
<jaafar> yes! I will do that
<hkaiser> stage 2 should still be sync, I think
<hkaiser> no point in creating a separate task for those as the work is minimal
<jaafar> yeah
<hkaiser> jaafar: anyways - many thanks for your insights, very interesting!
<jaafar> you're welcome! I hope it helps
<hkaiser> it will!
<hkaiser> jaafar: one last question - how many cores did you use for creating that image ?
<jaafar> 4 cores
<hkaiser> k
<hkaiser> thanks
<jaafar> IIRC that was the sweet spot on my system
<jaafar> I think also the number of "true" cores i.e. not hyperthreading
<hkaiser> yah, you can see that, there are mostly 4 tasks runnin concurrently in stage 1
<jaafar> yep
<hkaiser> fun!
<jaafar> OK gotta get out of my window system to make the system closer to idle for benchmarking brb
jaafar has quit [Quit: Konversation terminated!]
hkaiser has quit [Quit: bye]
jaafar has joined #ste||ar
<jaafar> oops
<jaafar> well, the picture looks very similar
jaafar has quit [Quit: Konversation terminated!]
Guest70891 has quit [Ping timeout: 240 seconds]
Guest70891 has joined #ste||ar
jaafar has joined #ste||ar
<jaafar> correction: actually that did change things
jaafar has quit [Quit: Konversation terminated!]
jaafar has joined #ste||ar
Guest70891 has quit [Ping timeout: 268 seconds]
Guest70891 has joined #ste||ar
mdiers_1 has joined #ste||ar
mdiers_ has quit [Ping timeout: 265 seconds]
mdiers_1 is now known as mdiers_
weilewei has quit [Remote host closed the connection]
Guest70891 has quit [Ping timeout: 264 seconds]
Guest70891 has joined #ste||ar
Guest70891 has quit [Ping timeout: 264 seconds]
Guest70891 has joined #ste||ar
Guest70891 has quit [Ping timeout: 265 seconds]
Guest70891 has joined #ste||ar
<jbjnr> jaafar: I can help you with tracing activity in the scheduler
<jbjnr> where are these pictures?
Guest70891 has quit [Ping timeout: 268 seconds]
Guest70891 has joined #ste||ar
Guest70891 has quit [Quit: WeeChat 2.2]
Amy has joined #ste||ar
Amy is now known as Guest65867
<jbjnr> thanks. I found them after I posted earlier.
Guest65867 has quit [Ping timeout: 240 seconds]
Guest65867 has joined #ste||ar
nikunj has quit [Remote host closed the connection]
Guest65867 has quit [Ping timeout: 240 seconds]
Guest65867 has joined #ste||ar
coldblackice has quit [Ping timeout: 240 seconds]
coldblackice has joined #ste||ar
Coldblackice_ has joined #ste||ar
coldblackice has quit [Ping timeout: 252 seconds]
K-ballo has joined #ste||ar
Coldblackice_ has quit [Ping timeout: 252 seconds]
<zao> Hey, you people have webex experience... are there usable clients on Linux, or do people just phone in somehow?
hkaiser has joined #ste||ar
coldblackice has joined #ste||ar
coldblackice has quit [Ping timeout: 268 seconds]
coldblackice has joined #ste||ar
coldblackice has quit [Ping timeout: 276 seconds]
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar
coldblackice has joined #ste||ar
Coldblackice_ has joined #ste||ar
coldblackice has quit [Ping timeout: 240 seconds]
Coldblackice_ has quit [Ping timeout: 240 seconds]
<heller> zao: web client, mostly
<heller> hkaiser: the basic stuff seems to work ;)
weilewei has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 245 seconds]
aserio1 is now known as aserio
<hkaiser> heller: nice
aserio has quit [Ping timeout: 264 seconds]
<heller> hkaiser: i also got it down that the future itself is a sender
<heller> So that's pretty cool
<heller> With this, we can finally get non allocating futures ;)
<hkaiser> heller: you up for a challenge ?
<hkaiser> could you look at the failing test on the execution_context branch - I'm running out of ideas what's going on there
aserio has joined #ste||ar
<heller> hkaiser: ugh, I didn't run into this when testing...
<heller> I can't seem to rerun it right now
<hkaiser> heller: it started to happen just recently after the last rebase
<hkaiser> it works on master, though...
<hkaiser> most likely I screwed up something, but I have no idea where
aserio has quit [Ping timeout: 240 seconds]
<heller> Ok, will have a look
aserio has joined #ste||ar
aserio has quit [Remote host closed the connection]
aserio has joined #ste||ar
aserio has quit [Ping timeout: 250 seconds]
aserio has joined #ste||ar
weilewei has quit [Remote host closed the connection]
aserio has quit [Ping timeout: 246 seconds]
jaafar has quit [Quit: Konversation terminated!]
coldblackice has joined #ste||ar
jaafar has joined #ste||ar
zbyerly has joined #ste||ar
weilewei has joined #ste||ar
coldblackice has quit [Ping timeout: 268 seconds]
coldblackice has joined #ste||ar
Coldblackice_ has joined #ste||ar
coldblackice has quit [Ping timeout: 265 seconds]
aserio has joined #ste||ar
aserio has quit [Ping timeout: 245 seconds]
Coldblackice_ has quit [Ping timeout: 264 seconds]
coldblackice has joined #ste||ar
aserio has joined #ste||ar
aserio has quit [Ping timeout: 250 seconds]
coldblackice has quit [Ping timeout: 245 seconds]
aserio has joined #ste||ar
aserio has quit [Ping timeout: 264 seconds]
hkaiser has quit [Quit: bye]
aserio has joined #ste||ar
coldblackice has joined #ste||ar
coldblackice has quit [Ping timeout: 240 seconds]
<weilewei> Sorry if this error is not related to hpx, I am getting this error after I updated hpx
<K-ballo> are you using some C library that #defines B0 ?
<weilewei> That's something I could not control
<K-ballo> termios?
<K-ballo> yeah, termios
<K-ballo> bad library
<weilewei> I searched that, it does not have it?
<weilewei> Before that, I can compile the same code
<K-ballo> it doe not have what?
<weilewei> It does not have B0
<K-ballo> #define B0 0000000
<K-ballo> the error message there is telling you that "B0" is being replaced by a numeric constant
<K-ballo> you need to either fix termios, or work around it in these other libraries that use "B0" as an identifier
<weilewei> I see
<K-ballo> maybe look at your includes, see if you can keep termios walled off
<zao> Worst case, you could #undef it?
<K-ballo> possibly
<weilewei> zao also a good suggestion, let me try
coldblackice has joined #ste||ar
aserio has quit [Quit: aserio]
<weilewei> boom, #undef does the trick zao K-ballo everything complies
hkaiser has joined #ste||ar
<hkaiser> jaafar: yt?
<jaafar> hkaiser: here now!
<hkaiser> hey jaafar
<hkaiser> thanks for looking into what's going on!
<jaafar> I hope it's helpful
<hkaiser> well, it demonstrates that something is wron g;-)
<hkaiser> how much work would it be for you to do a parameter sweep over the chunk size and plot the execution time over the chunk size?