hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
hkaiser has joined #ste||ar
hkaiser has quit [Quit: bye]
akheir has quit [Quit: Leaving]
Jedi18 has joined #ste||ar
Jedi18 has quit [Client Quit]
bita has quit [Ping timeout: 244 seconds]
jaafar has joined #ste||ar
hkaiser has joined #ste||ar
zatumil has quit [Quit: leaving]
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar
<parsa> hkaiser: sorry, apparently i was editing the same spot you are. i'll hold
<hkaiser> parsa: no, please go ahead
<hkaiser> I'm sorry - didn't mean to be disruptive
<hkaiser> parsa: I'm done with fig 3, please let me know if you want me to change things
bita has joined #ste||ar
<hkaiser> parsa: fig 2 looks great!
<parsa> hkaiser: thanks a lot!
<hkaiser> parsa: I hope fig 3 is what you wanted
<parsa> i wanted to specifically explain the three configurations of the experiment. what you made is more generic and better. no need to change it
<parsa> it's more interesting in this form
<hkaiser> parsa: ok
<parsa> hkaiser: but does the description of the experiment in 4.1 make sense to you?
<hkaiser> parsa: I know what you mean, I'd like to go over the description and make a little less terse, however
<hkaiser> would you mind me doing that?
<parsa> not at all
<parsa> me running out of ideas on how to explain them accurately without overloading the reader is why i haven't touched that bit
<hkaiser> ok, will do that later tonight
<hkaiser> parsa: I also owe you a paragraph explaining fig 3
<parsa> thanks a lot. i'm still doing the runs
<hkaiser> ok
<hkaiser> do the results look ok?
<parsa> don't know yet. i'm still checking whether or not things are actually running on one node or if mpirun is working as it used to
<parsa> yeah, it is working
<parsa> i think jenkins has died on one of the medusa nodes. it's been occupied for several hours now
<parsa> squeue has been showing its state as CG this whole time
<hkaiser> parsa: ask alireza to reboot the node
<parsa> waiting on him
<hkaiser> parsa: does that plot show weak scaling resuts?
<parsa> it's strong scaling
<hkaiser> why isn't getting faster?
<hkaiser> *it*
<hkaiser> the baseline at least
<parsa> it's stencil_8 with --np=1000 --nx=100
<parsa> i don't remember ever seeing it do well with strong scaling
<hkaiser> parsa: yah, the problem is too small to actually scale
<hkaiser> parsa: then you might want to plot raltive performance compared to the baseline instead
<hkaiser> *relative*
<parsa> i have been attempting to go higher this whole past year but don't want to deal with the mpi crashes
<hkaiser> sure
<parsa> higher->make the problem size go above 512mb
<hkaiser> just plot slowdown rel to baseline
<parsa> okay
<hkaiser> this is not a paer about demonstrating scaling
<parsa> you're right... that's why i didn't even pay attention to the horrific strong scaling behavior. relative performance it is
<parsa> --np=1000 --nx=1000 seems to work and show marginal increase (e.g. 7% between 10 nodes and 14 nodes) in speedup. would that work instead of relative times?
<parsa> hkaiser: ^ and maybe some reviewer would not like relative slowdown since it may look we're trying to hide the actual exec times
<hkaiser> we're not making a point about scaling
<parsa> hkaiser: updated plot 1 to relative slowdown, added the missing data
<hkaiser> ok
<hkaiser> interesting, the outlier is wierd
<parsa> it may look strange, but it really is what happens
<parsa> it's not an anomaly
<hkaiser> are those using blocking?
<parsa> no these are the overlapped ones
<hkaiser> on 7 nodes? or is it 8?
<parsa> ugh. i don't know why the axis is off… it's 8
<hkaiser> nod
<parsa> it's off by one everywhere
<hkaiser> still, why that outlier? any idea?
<hkaiser> this plot shows times for migrating each of the 1000 partitions once, correct?
<parsa> don't know the why but i've run this experiment enough times to know it's consistent
<parsa> the impaired case is migrating 1000/(n-1) partitions
<parsa> per locality
<parsa> the shifted case migrates all, yes
<parsa> i mean in the impaired case the migration from 0->0 won't do anything
<hkaiser> right
<hkaiser> still it is slower
<hkaiser> well, I'd say collect all the data, and if we have time collect some perf counters, would be interesting to understand th eoutlier
<hkaiser> we ight want to try a larger problem size after all as the little data we have doesn't allow to hide things
<hkaiser> *little work*
<parsa> how large would it make sense if things work?
<hkaiser> let's collect the data for this plot first
<hkaiser> then you can try increasing the problem
<parsa> aside from the anomaly, my take from this plot is that most of the slowdown is coming from the fact that we have network communication at all, the size of it does not have a massive impact with our problem size yet
bita has quit [Ping timeout: 258 seconds]