hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoD: https://developers.google.com/season-of-docs/
hkaiser has joined #ste||ar
nikunj has quit [Remote host closed the connection]
nikunj has joined #ste||ar
hkaiser has quit [Quit: bye]
K-ballo has quit [Quit: K-ballo]
nikunj has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
<simbergm> tarzeau: yt? the -dev package also needs the libraries as dependencies :)
<simbergm> and you can remove the gfortran dependency
<tarzeau> yt?
<tarzeau> i have these: Depends: libatomic1, libhpx1 (= 1.3.0-1)
<tarzeau> which libs are missing ?
nikunj has joined #ste||ar
quaz0r has quit [Ping timeout: 258 seconds]
quaz0r has joined #ste||ar
quaz0r has quit [Ping timeout: 246 seconds]
<nikunj> hkaiser: want to listen to some good news?
<nikunj> I don't see any difference in overheads on my laptop
<nikunj> I'll run them on marvin now and see if it's the same case there as well
<hkaiser> nikunj: what do you mean?
<nikunj> I mean that the running replay over the normal one has no overhead
<hkaiser> heh
<nikunj> they run about the same time
<hkaiser> on stencil1d_4?
quaz0r has joined #ste||ar
<nikunj> that's without errors though
<nikunj> I mean the implementation overheads only
<hkaiser> ok
<hkaiser> cool
<nikunj> yes, on stencil1d_4
<hkaiser> nice
<nikunj> I'll add the checksum function in the evening
<hkaiser> without errors there shouldn't be too much overhead to begin with
<nikunj> till then I'll write a script to compare the standard with replay
<hkaiser> ok
<nikunj> GaTech have 3s of overhead without failures xD
<nikunj> but they have more workers and more iterations
<nikunj> we don't have multiple time steps in a single iteration so we can't compare directly
<nikunj> but overall, I really like where we're going
<hkaiser> ok, good
<nikunj> just ran some on marvin
<nikunj> 1600 points per tiles is not good enough work to hide the overheads
<nikunj> I see some 1.6s difference
<nikunj> but 32000 points per tile reduces this to 0.5-0.7s
<hkaiser> ok
<hkaiser> how long does one thread/tile take?
<nikunj> didn't get what you mean
<hkaiser> how much work (time-wise) is '1600 points/tile'?
<nikunj> 16000 points within one tile with 128 tiles in total over 8192 iterations take some 3.8s
<nikunj> with replay added it increases to 5.5s
<nikunj> it's over 4 os threads
<hkaiser> nikunj: I meant per tile/timestep? how much work is that?
<nikunj> wait no, over 16 os threads
<nikunj> subdomain width right?
<hkaiser> let's talk tomorrow ;-)
<nikunj> I think I'm mis understanding. alright, let's do it tomorrow :)
<nikunj> I'll run some tests in the meantime
<nikunj> so we can show them the results on tuesday
<nikunj> btw Jackson's code benchmarks are here, I'll generate graphs for them as well
<hkaiser> good