aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
Smasher has quit [Quit: Connection reset by beer]
EverYoung has joined #ste||ar
Smasher has joined #ste||ar
EverYoun_ has joined #ste||ar
EverYoun_ has quit [Remote host closed the connection]
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 240 seconds]
jaafar_ has joined #ste||ar
jaafar has quit [Ping timeout: 252 seconds]
EverYoun_ has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
<github>
hpx/release 7077bc0 Mikael Simberg: Small formatting fix in release_procedure.rst
<heller_>
marco: one other thing, std::vector<std::vector<float>> is not exactly good for performance ;)
<heller_>
especially with such stencil things
<heller_>
you should linearize that
<heller_>
this should give you an overall better performance result
<marco>
I mean with single value access, only an access to the centered element of the stencil operator, without the neighbour elements.
<heller_>
ah ok
<heller_>
yeah, that might hint to false sharing
<heller_>
try the suggestion with the linearized 2d vector
<marco>
yes, I know that vtune works well with O3. O3 is the standard for our release versions, but our compile environment set O2 for profiling. I must change it, ...
<zao>
Always fun.
<marco>
ok, I test it with 2/4 cores, and linearized it.
<marco>
thanks for the first tips, I will contact you later
<heller_>
sure
<heller_>
no problem, always here to help
zbyerly_ has joined #ste||ar
zbyerly__ has quit [Ping timeout: 276 seconds]
Smasher has quit [Quit: Connection reset by beer]
Smasher has joined #ste||ar
hkaiser has joined #ste||ar
<hkaiser>
simbergm: so what's the merge procedure for the release now? do we merge to master or to release?
<simbergm>
hkaiser: I don't have a strong as I don't know yet which one works better in practice, but my feeling is that we should keep merging to master and I will pick from there to the release branch
<hkaiser>
nod, let's remove it, but leave a note in the what's new section
<hkaiser>
simbergm: ok - heller_ will disagree ;-)
<simbergm>
like this we can keep merging things to master that need not go in the release
<simbergm>
if heller_ has good arguments for making PRs to release I'm more than okay with that as well :)
<heller_>
I don't like cherry-picking from master
<heller_>
since it makes it hard to reintegrate release back to master
<simbergm>
okay, that's fair
<heller_>
in a perfect world, we would just branch off of master, and call it a release
<simbergm>
agreed
<hkaiser>
simbergm: last release we decided to merge to release and do one merge back to master after th erelease - worked quite well, actually
<heller_>
*nod*
<heller_>
the only problem we had was that testing was a bit of a mess, IIRC
<hkaiser>
yes
<simbergm>
meaning cherry picking from master before release, and then merging to master after release?
<hkaiser>
no
<heller_>
that is, things that landed on master during the release weren't properly tested etc.
<hkaiser>
during the release time merge all PRs to release, leaving master alone
<simbergm>
hkaiser: sorry misread
<simbergm>
I see now
<simbergm>
okay, so let's do that now as well
<simbergm>
already open PRs against master can be picked to release
<hkaiser>
simbergm: requires changes to testing infrastructure, tests should run off of release
<simbergm>
hkaiser: yes
<hkaiser>
existing PRs can be merged manually to release after being merged to master
<simbergm>
I didn't want to do it yet since there's no rc
<simbergm>
yeah
<hkaiser>
let's avoid cherry picking
<simbergm>
okay, missed the distinction between merging and cherry picking
<simbergm>
I see
<hkaiser>
cherry picking creates a new independent commit, merging does not
<simbergm>
so at the moment I think master is in a pretty good state (i.e. not too many failing tests, and all failures are occasional except for the stacksize test)
<simbergm>
in your opinion is this a good state for an rc?
<simbergm>
what has buildbot looked like usually during an rc?
<heller_>
same or even worse :P
<hkaiser>
rc means thatthere will be no new features or refactorings, only bug fixes before the release
<heller_>
I'd like to get the stacksize problem fixed though
<hkaiser>
sure, that's a bug
<heller_>
let's also try to keep the timeframe for doing the release as minimal as possible
<heller_>
last time, we had lots of trouble keeping everything in sync...
<simbergm>
but then I would stick to master still for now and try to keep fixing as much as possible
<hkaiser>
heller_: it wasn't too bad
<simbergm>
I guess the only feature still going in is my suspension PR
<simbergm>
do you have anything planned?
<simbergm>
anything else
<hkaiser>
simbergm: and the thread scheduler changes heller_ has in the pipeline
<heller_>
should they go into the release?
<simbergm>
they change APIs?
<heller_>
a little, yes
<hkaiser>
heller_: I thought we delayed the release for those
<simbergm>
ok
<heller_>
I was aware that we delayed the release for them
<heller_>
and jbjnr reported no real speedup in his application
<hkaiser>
heller_: so why do we do those, then?
<heller_>
so I guess they need more work
<heller_>
first of all, he didn't test the full set
<heller_>
second of all, the full set needs more work
<heller_>
probably another week or so
<hkaiser>
heller_: so you're contradicting yourself here ;)
<heller_>
for my micro benchmarks, they showed better performance
<heller_>
that's all I said
<hkaiser>
sure
<simbergm>
in my opinion we're not really behind schedule as I set the rc date quite conservatively and optimistically, so let's keep working on master still until e.g. wednesday next week and see again where we are
<heller_>
sounds good
<heller_>
night shifts ahead!
<heller_>
;)
<heller_>
we are not doing the tutorial in march anyway
<hkaiser>
nice
<simbergm>
yeah, that's good
<jbjnr>
heller_: my best results were about 985GFlops before, but this week I got 1010 peak, so there's been a 2% or so improvment as a result of some general cleanup from your continuations and the profiling fixes etc. gtg.
<heller_>
ok
<heller_>
I wasn't aware that the 2% were due to my changes
<hkaiser>
that's a nice result as well
<heller_>
and since the grain size is relatively large for your application, I am not sure they'll matter
<heller_>
(except for the allocations etc.)
<heller_>
I am working on trying to reduce binary size right now...
<heller_>
which I am hoping helps with partitioned_vector tests
<hkaiser>
heller_: let's finish the other stuff first
<hkaiser>
we've been living with the partitioned_vector things for a while, no need to work on it now
<heller_>
well ... low risk change, I was in "let's improve the code as it is without changing functionality"
<hkaiser>
heller_: we have enough half-way done things
<heller_>
we do
<hkaiser>
#3031 has inspect problems, #3036 is still open
<hkaiser>
I meant #3130
<heller_>
#3036 can't be closed until we fix the partitioned_vector compile problems
<hkaiser>
heller_: ok - was not aware of that
<heller_>
well, we can merge it
<heller_>
but then we'll have to live with failing tests until the compile/link problems are fixed
<simbergm>
jbjnr: for dca++ you're going to need cuda support, no? have you compiled hpx (with cuda) successfully using something else than what's on rostam?
<simbergm>
heller_: #3131 seems to have broken reduce_by_key, any guesses why? I'll try to look tomorrow
<hkaiser>
simbergm: heh
<mbremer>
@hkaiser: Also do you have a bibtex entry for the GB paper?
<mbremer>
Alternatively, I suppose the scaling results are also mentioned in the OpenSuCo paper
<jbjnr>
simbergm: yes. I am running dca++ with cuda on my laptop and on daint using (hpx+cuda)+(dca+++cuda)
<jbjnr>
no problems now.
<jbjnr>
heller_: I ran the cholesky several times and discovered that the map change was not really making any differnce, it is within the noise - but variance of runs is quite high - however, everything is 'just a bit' faster than it used to be - so I'm assuming the string cleanup and your continuation fixes are the main thing
aserio has quit [Ping timeout: 252 seconds]
vamatya has joined #ste||ar
jaafar_ has joined #ste||ar
david_pfander has quit [Ping timeout: 265 seconds]
daissgr has quit [Ping timeout: 252 seconds]
<heller_>
jbjnr: great!
<heller_>
simbergm: I'll have a look as well
<heller_>
simbergm: reduce_by_key might be related to the scan_partitioner changes after all and meaning it's not fixed yet
aserio has joined #ste||ar
<heller_>
simbergm: same problem as before, I guess
daissgr has joined #ste||ar
aserio1 has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
aserio has quit [Ping timeout: 265 seconds]
aserio1 is now known as aserio
twwright_ has joined #ste||ar
twwright has quit [Read error: Connection reset by peer]
twwright_ is now known as twwright
daissgr has quit [Ping timeout: 252 seconds]
<heller_>
simbergm: since it's just reduce_by_key, my guess would be some kind of race in the algorithm itself, or the partitioner, or wrong usage of it
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
aserio has quit [Ping timeout: 276 seconds]
aserio has joined #ste||ar
daissgr has joined #ste||ar
<heller_>
simbergm: nope. It's a bug in the scan partitioner/executors ignoring the sync policy
<heller_>
jbjnr: did I read the code correctly that all algorithms should execute sequentially for reduce_by_key?
<jbjnr>
the scan part is not sequential, but there was one sequential bit in there. I can't remember without looking and I'm in a meeting right now.
Smasher has quit [Ping timeout: 240 seconds]
Smasher has joined #ste||ar
Smasher is now known as Smashor
hkaiser has quit [Quit: bye]
aserio has quit [Ping timeout: 240 seconds]
aserio has joined #ste||ar
Smashor has quit [Remote host closed the connection]
aserio has quit [Quit: aserio]
hkaiser has joined #ste||ar
<github>
[hpx] hkaiser deleted coroutine_cleanup at 9e1648c: https://git.io/vNbl9
EverYoung has quit [Read error: Connection reset by peer]