hkaiser has joined #ste||ar
nikunj has quit [Remote host closed the connection]
nikunj has joined #ste||ar
hkaiser has quit [Quit: bye]
K-ballo has quit [Quit: K-ballo]
nikunj has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
<
simbergm>
tarzeau: yt? the -dev package also needs the libraries as dependencies :)
<
simbergm>
and you can remove the gfortran dependency
<
tarzeau>
i have these: Depends: libatomic1, libhpx1 (= 1.3.0-1)
<
tarzeau>
which libs are missing ?
nikunj has joined #ste||ar
quaz0r has quit [Ping timeout: 258 seconds]
quaz0r has joined #ste||ar
quaz0r has quit [Ping timeout: 246 seconds]
<
nikunj>
hkaiser: want to listen to some good news?
<
nikunj>
I don't see any difference in overheads on my laptop
<
nikunj>
I'll run them on marvin now and see if it's the same case there as well
<
hkaiser>
nikunj: what do you mean?
<
nikunj>
I mean that the running replay over the normal one has no overhead
<
nikunj>
they run about the same time
<
hkaiser>
on stencil1d_4?
quaz0r has joined #ste||ar
<
nikunj>
that's without errors though
<
nikunj>
I mean the implementation overheads only
<
nikunj>
yes, on stencil1d_4
<
nikunj>
I'll add the checksum function in the evening
<
hkaiser>
without errors there shouldn't be too much overhead to begin with
<
nikunj>
till then I'll write a script to compare the standard with replay
<
nikunj>
GaTech have 3s of overhead without failures xD
<
nikunj>
but they have more workers and more iterations
<
nikunj>
we don't have multiple time steps in a single iteration so we can't compare directly
<
nikunj>
but overall, I really like where we're going
<
nikunj>
just ran some on marvin
<
nikunj>
1600 points per tiles is not good enough work to hide the overheads
<
nikunj>
I see some 1.6s difference
<
nikunj>
but 32000 points per tile reduces this to 0.5-0.7s
<
hkaiser>
how long does one thread/tile take?
<
nikunj>
didn't get what you mean
<
hkaiser>
how much work (time-wise) is '1600 points/tile'?
<
nikunj>
16000 points within one tile with 128 tiles in total over 8192 iterations take some 3.8s
<
nikunj>
with replay added it increases to 5.5s
<
nikunj>
it's over 4 os threads
<
hkaiser>
nikunj: I meant per tile/timestep? how much work is that?
<
nikunj>
wait no, over 16 os threads
<
nikunj>
subdomain width right?
<
hkaiser>
let's talk tomorrow ;-)
<
nikunj>
I think I'm mis understanding. alright, let's do it tomorrow :)
<
nikunj>
I'll run some tests in the meantime
<
nikunj>
so we can show them the results on tuesday
<
nikunj>
btw Jackson's code benchmarks are here, I'll generate graphs for them as well