aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
eschnett has quit [Quit: eschnett]
https_GK1wmSU has joined #ste||ar
bikineev has quit [Remote host closed the connection]
https_GK1wmSU has left #ste||ar [#ste||ar]
bikineev has joined #ste||ar
bikineev has quit [Ping timeout: 240 seconds]
<github>
[hpx] hkaiser pushed 1 new commit to resource_partitioner: https://git.io/v7Bat
<heller>
jfbastien: for a matter of fact, all the performance tuned, tightly coupled parallel applications, avoid lock/atomic contention as much as possible
<heller>
this is mostly achievable by choosing the right granularity (trade of between possible amount of concurrency, essentially inhibits scaling without increasing the problem size). What happens if there is too much work is that the lockfree queues in the thread management get heavily contented due to work stealing. Once you have that under control, the most contentious points would in the synchronization of the shared state, which never occured in our
<heller>
profiles so far, leading to the assumption that the "no concurrent access" case is the most common one, nevertheless has to be synchronized using atomics.
<heller>
there are lots of places where we require locks/atomics for correctness even though the majority of accesses don't seem to be concurrent
<heller>
which is mostly an implication of choosing the right granularity when writing application code
Matombo has joined #ste||ar
Matombo has quit [Remote host closed the connection]
david_pfander has joined #ste||ar
Matombo has joined #ste||ar
bikineev has joined #ste||ar
Matombo has quit [Remote host closed the connection]
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
pree_ has quit [Read error: Connection reset by peer]
<jfbastien>
heller I understand this. I'm measuring the performance of a new atomic / lock implementation for a yet unmeasured virtual ISA. So I want uncontended as well as contended usecases, ideally real-world stuff which otherwise performs useful work.
<jfbastien>
heller and I like bugging wash ;)
<heller>
;)
<heller>
the easiest way to go then would be to choose any application we know scales well, that would be the uncontented case
<heller>
decrease the granularity of work to observe contention
<jfbastien>
heller cool! Any preferred one? I'm going in and out of playing with this, context switching myself.
<heller>
the fib one is probably nice since you can arbitrarily set the granularity and increase the number of tasks generated
<jfbastien>
heller yeah that's what hkaiser / wash recommended. Seems to work when I tried yesterday. I haven't measured contention yet.
<heller>
however, I am not sure what meaningful measure you'll get out of it, since I don't even know if it is mem bound, computational bound or something else
<heller>
there is another interesting benchmark, the stream benchmark
<heller>
and yes, i hear you saying: but it is embarrassingly parallel
<jfbastien>
heller well, I like the idea of measuring contended versus not because it gives a baseline for what the cost of that contention is, and I can compare different architecture's costs.
<heller>
there is a catch to it though
<heller>
the catch is in the fork/join (implicit barriers) of the executed parallel algorithms
<heller>
our tests show, that it severly hurts performance when scaling out
<jfbastien>
it's un-intuitive, but I don't necessarily care if the code is even good! I'm purposefully looking at some silly code as well because it should perform as well as silly will allow it to. Basically I can't pessimise it.
<heller>
easily observable on a KNL system where we do the stream from the HBM
<jfbastien>
ah interesting. How many cores does this manifest at?
<heller>
let me pull it out real quick
<heller>
it manifests at around 60 cores
<jfbastien>
heller OK interesting!
<heller>
it gets worse, once you add the logical cores
<jfbastien>
I'm trying out fewer cores for now, since that's easier, but it's good to have on my list as something that'll scale poorly later
<heller>
the stencil examples mentioned are nice as well
<jfbastien>
yeah stencil 8 seemed neat
<heller>
they are mostly impaired by the overheads of memory allocation or lock contention (with high granularity)
<hkaiser>
heller, jfbastien: I doubt the application matters if you want to look at the locks in the scheduler
<hkaiser>
as long as sufficient work is generated, that is
<heller>
well, it is nice, if we have some model of how well the application *should* perform
<heller>
that is having an upper bound
<hkaiser>
heller: I don't think jfbastien cares how good the applictaion itself is performing
<heller>
yeah
<heller>
but then, you could measure all kinds of other different effects
<heller>
what's evident though, across all profiling runs over all kinds of applications: once granularity gets too high, we see a significant contention in the scheduler
<jfbastien>
right, I care about how my ISA is performing compared to others, specifically on atomic / lock :)
<jfbastien>
ISA / microkernel
<heller>
that should be sufficient then
<heller>
is it just the ISA that's special or also some novel architectural improvements?
<heller>
like TMS or automatic lock elision etc.
<jfbastien>
heller yes
<heller>
will i get answers if asking further?
<jfbastien>
:)
pree_ has joined #ste||ar
<heller>
are task blocks part of parallelism TS V2?
<hkaiser>
not sure, don't think so
<heller>
they are
<zao>
I like the architecture they invented for this year's defcon CTF contest. 9-bit bytes, middle-endian 3-byte registers, instructions taking register ranges and other madness.
<hkaiser>
jfbastien: whatever you design there, please add one-cycle context switches and hardware support for global memory to it ;)
<jfbastien>
hkaiser done
<hkaiser>
thanks
<heller>
;)
<jfbastien>
well that was easy
<hkaiser>
we'll make hpx fly on that platform, then
<heller>
err
<heller>
hkaiser: btw, text is complete now
zbyerly_ has joined #ste||ar
mars0000 has quit [Quit: mars0000]
<heller>
hkaiser: going from cover to cover now and fix all those fixmes. turns out, it requires a shitload of time
david_pfander has quit [Ping timeout: 258 seconds]
david_pfander1 is now known as david_pfander
ajaivgeorge has joined #ste||ar
pree_ has quit [Ping timeout: 240 seconds]
david_pfander has quit [Ping timeout: 276 seconds]
bikineev has joined #ste||ar
mars0000 has joined #ste||ar
hkaiser has quit [Quit: bye]
pree_ has joined #ste||ar
pree_ has quit [Ping timeout: 240 seconds]
bikineev_ has joined #ste||ar
bikineev has quit [Ping timeout: 240 seconds]
mcopik has joined #ste||ar
hkaiser has joined #ste||ar
bikineev_ has quit [Remote host closed the connection]
bikineev has joined #ste||ar
<github>
[hpx] ajaivgeorge opened pull request #2792: Implemented segmented find and its variations for partitioned vector (master...segmented_find2) https://git.io/v7EvV
patg[[w]] has joined #ste||ar
eschnett has quit [Quit: eschnett]
<github>
[hpx] ajaivgeorge opened pull request #2793: Implemented segmented find_end and find_first_of for partitioned vector (master...segmented_find_end) https://git.io/v7EJe
<diehlpk_work>
We should check our issues. Some of them never got a response or were merged, but not closed
patg[[w]] has quit [Quit: Leaving]
<hkaiser>
diehlpk_work: which ones didn't get closed?
<diehlpk_work>
hkaiser, Wrote it as a comment in the issue
aserio has quit [Quit: aserio]
<hkaiser>
diehlpk_work: ok, thanks
<hkaiser>
dienext time you create a PR which fixes and issue just add 'Fixies #NNNN' to the description, that will auto-close the issue once the PR is merged
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<hkaiser>
diehlpk_work: ^^
<diehlpk_work>
hkaiser, Will do that. I think there are more kind of this issues.
<diehlpk_work>
I will comment them to
<hkaiser>
thanks
<diehlpk_work>
\away
<diehlpk_work>
I was just loooking at the issues to see what easy things I can contribute
mars0000 has quit [Quit: mars0000]
<github>
[hpx] hkaiser force-pushed pv_serializer from e1cc39c to a5b25d0: https://git.io/v7ECd
<github>
hpx/pv_serializer a5b25d0 Hartmut Kaiser: Fixing parallel::fill to make partitioned_vector serialization work...
zbyerly_ has joined #ste||ar
zbyerly_ has quit [Ping timeout: 240 seconds]
bikineev has quit [Remote host closed the connection]