hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: Bye!]
KordeJong[m] has quit [Ping timeout: 268 seconds]
pedro_barbosa[m] has quit [Ping timeout: 268 seconds]
heller[m] has quit [Ping timeout: 268 seconds]
KordeJong[m] has joined #ste||ar
heller[m] has joined #ste||ar
pedro_barbosa[m] has joined #ste||ar
jehelset has joined #ste||ar
<bhumit[m]> Hello, I was going through https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2021#implement-shift_left-and-shift_right-parallel-algorithms and was wondering if projects marked as `HPX User` are given a lower priority than `HPX Core` during slot matching?
<bhumit[m]> s/#implement-shift_left-and-shift_right-parallel-algorithms//
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
ms[m] has joined #ste||ar
<ms[m]> hkaiser: yt?
<hkaiser> ms[m]: hey
srinivasyadav227 has joined #ste||ar
<hkaiser> g'morning
<ms[m]> hey, I just wanted to let you know that we saw a perf regression in dla-future which turned out to be from the small_vector pr (even with that one fix that you added later)
<hkaiser> interesting
<ms[m]> I haven't investigated what the problem is yet, but I just wanted to let you know in case you happen to have any ideas
<ms[m]> if I find something I'll let you know
<hkaiser> ok, thanks a lot - no ideas right away, however
<hkaiser> I'll try to investigate as well
<ms[m]> yep, no worries and me neither... I hope it's something simple
<hkaiser> ms[m]: do you have a benchmark? or is it just apparent in DLA future?
<hkaiser> any idea what operations are causing this? insert? delete? move/copy?
<ms[m]> no, no ideas yet
<ms[m]> we have a miniapp in dlaf which reproduces, unfortunately nothing standalone
<ms[m]> but the first thing I wanted to try is create something standalone with dataflow/future::then
<ms[m]> it's obviously the continuations that end up using the small vector so I'm hoping that would show the same regression
<hkaiser> ms[m]: ok, please share that if you have it. I'd like to help investigating
<ms[m]> yeah, will do
<ms[m]> at the moment we've just reverted back to using boost's small vector, but I'm going to try to do something about it in the next weeks
<hkaiser> +1
<hkaiser> how serious is the regression?
<ms[m]> depends how you see things... let me share a plot with you
<ms[m]> hkaiser: do those two plots show up as links for you on the irc side?
<hkaiser> yes
<hkaiser> could you explain, please?
<ms[m]> yes :P I just wanted to know if you could see them before I explain
<hkaiser> I do see them
<ms[m]> so those are strong scaling runs of cholesky with block sizes 128x128 and 512x512
<ms[m]> the orange and blue lines are commits from hpx master and pika, the green one is hpx 1.7.1
<hkaiser> ok, so it hits small task sizes (obviously)
<ms[m]> with the bigger block size it's all good and that's what we would use in practice anyway because performance is generally better
<hkaiser> nod, makes sense
<hkaiser> ms[m]: I'm still not sure I got the pmr stuff correct - so that might be causing the regression because of unneeded allocations
<ms[m]> yeah, that's the main suspect, there might be something else that's still missing in the logic
<ms[m]> the small blocksizes aren't that important for us, but there's still a clear overhead
<hkaiser> indeed
<hkaiser> ms[m]: do you know how large (small) are the tasks for this?
<hkaiser> ms[m]: what I have seen wrt the pmr memory management is that we pre-allocate exactly as many bytes as we think are needed for N elements
<hkaiser> but the internal pmr logic adds additional storage needs for linking the blocks
<ms[m]> hkaiser: nope, unfortunately not, that would add a lot of context
<hkaiser> that could be causing allocations even in the case when none should happen
<ms[m]> yep
<ms[m]> I suspect stepping through it with a debugger will tell quite a lot, I just haven't had time to do that yet
<hkaiser> I tried, it's quite convoluted
<ms[m]> :p
<hkaiser> ms[m]: could you try increasing N by one, just to give it more memory to breath?
<ms[m]> hkaiser: yes, I can definitely do that
hkaiser has quit [Quit: Bye!]
diehlpk_work has joined #ste||ar
hkaiser has joined #ste||ar
jehelset has quit [Ping timeout: 250 seconds]
<gonidelis[m]> hkaiser: pm please
jehelset has joined #ste||ar
<dkaratza[m]> hkaiser: since all `for_loops` become `experimental`, this applies also to the ranges version of for_loops?
<hkaiser> dkaratza[m]: there are no ranges version for for_loop
<hkaiser> for_each is not experimental, though
<dkaratza[m]> hkaiser: but the docs have it
<dkaratza[m]> `hpx::ranges::for_loop
<dkaratza[m]> hpx::ranges::for_loop_strided`
<hkaiser> uhh, hold on
<dkaratza[m]> if you look here at the end of the list
<hkaiser> you're righ!
<hkaiser> I completely forgot about those
<dkaratza[m]> haha
<hkaiser> those belong into hpx::ranges::experimental
<hkaiser> the for_loop ones
<dkaratza[m]> great thanxxx
K-ballo has quit [Quit: K-ballo]
K-ballo has joined #ste||ar