aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
eschnett has quit [Quit: eschnett]
https_GK1wmSU has joined #ste||ar
bikineev has quit [Remote host closed the connection]
https_GK1wmSU has left #ste||ar [#ste||ar]
bikineev has joined #ste||ar
bikineev has quit [Ping timeout: 240 seconds]
<github> [hpx] hkaiser pushed 1 new commit to resource_partitioner: https://git.io/v7Bat
<github> hpx/resource_partitioner 07aa6b3 Hartmut Kaiser: Fixing warnings, re-implemented missing pieces...
vamatya has quit [Ping timeout: 260 seconds]
eschnett has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
bikineev has joined #ste||ar
hkaiser has quit [Quit: bye]
bikineev has quit [Ping timeout: 246 seconds]
vamatya has joined #ste||ar
https_GK1wmSU has joined #ste||ar
https_GK1wmSU has left #ste||ar [#ste||ar]
bikineev has joined #ste||ar
bikineev has quit [Ping timeout: 276 seconds]
vamatya has quit [Ping timeout: 260 seconds]
<heller> jfbastien: for a matter of fact, all the performance tuned, tightly coupled parallel applications, avoid lock/atomic contention as much as possible
<heller> this is mostly achievable by choosing the right granularity (trade of between possible amount of concurrency, essentially inhibits scaling without increasing the problem size). What happens if there is too much work is that the lockfree queues in the thread management get heavily contented due to work stealing. Once you have that under control, the most contentious points would in the synchronization of the shared state, which never occured in our
<heller> profiles so far, leading to the assumption that the "no concurrent access" case is the most common one, nevertheless has to be synchronized using atomics.
<heller> there are lots of places where we require locks/atomics for correctness even though the majority of accesses don't seem to be concurrent
<heller> which is mostly an implication of choosing the right granularity when writing application code
Matombo has joined #ste||ar
Matombo has quit [Remote host closed the connection]
david_pfander has joined #ste||ar
Matombo has joined #ste||ar
bikineev has joined #ste||ar
Matombo has quit [Remote host closed the connection]
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/v7Bxp
<github> hpx/gh-pages 7d7a884 StellarBot: Updating docs
mcopik has joined #ste||ar
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
hkaiser has joined #ste||ar
K-ballo has joined #ste||ar
eschnett has quit [Quit: eschnett]
zbyerly_ has joined #ste||ar
eschnett has joined #ste||ar
<github> [hpx] hkaiser pushed 2 new commits to master: https://git.io/v7RBj
<github> hpx/master 6103869 JF Bastien: Fix OSX build...
<github> hpx/master 9c758ec Hartmut Kaiser: Merge pull request #2790 from jfbastien/build-fix...
david_pfander has quit [Ping timeout: 240 seconds]
taeguk[m] has quit [Ping timeout: 246 seconds]
thundergroudon[m has quit [Ping timeout: 258 seconds]
david_pfander has joined #ste||ar
bikineev has quit [Remote host closed the connection]
eschnett has quit [Quit: eschnett]
david_pfander has quit [Quit: david_pfander]
eschnett has joined #ste||ar
david_pfander has joined #ste||ar
eschnett has quit [Client Quit]
eschnett has joined #ste||ar
hkaiser has quit [Quit: bye]
david_pfander1 has joined #ste||ar
david_pfander has quit [Read error: Connection reset by peer]
david_pfander has joined #ste||ar
david_pfander1 has quit [Ping timeout: 276 seconds]
david_pfander has quit [Remote host closed the connection]
david_pfander1 has joined #ste||ar
david_pfander1 is now known as david_pfander
thundergroudon[m has joined #ste||ar
taeguk[m] has joined #ste||ar
pree_ has joined #ste||ar
pree_ has quit [Read error: Connection reset by peer]
pree_ has joined #ste||ar
hkaiser has joined #ste||ar
aserio has joined #ste||ar
<heller> it is empty indeed
<Reazul> @heller: Thanks :)
pree_ has quit [Read error: Connection reset by peer]
zbyerly_ has quit [Ping timeout: 246 seconds]
aserio1 has joined #ste||ar
pree_ has joined #ste||ar
pree_ has quit [Read error: Connection reset by peer]
aserio has quit [Ping timeout: 246 seconds]
aserio1 is now known as aserio
pree_ has joined #ste||ar
mars0000 has joined #ste||ar
pree_ has quit [Read error: Connection reset by peer]
<github> [hpx] hkaiser created pv_serializer (+20 new commits): https://git.io/v70sT
<github> hpx/pv_serializer b5d5f0b ct-clmsn: initial import
<github> hpx/pv_serializer 2399fe4 ct-clmsn: fix implementation to support friendship in partitioned_vector implementation file
<github> hpx/pv_serializer 00924d6 ct-clmsn: added friend type to the partitioned_vector_segmented_serializer
bibek_desktop has quit [Quit: Leaving]
<github> [hpx] hkaiser opened pull request #2791: Circumvent scary warning about placement new (master...fixing_any_warning) https://git.io/v70sX
bibek_desktop has joined #ste||ar
vamatya has joined #ste||ar
pree_ has joined #ste||ar
<github> [hpx] hkaiser force-pushed resource_partitioner from 07aa6b3 to 2c246d2: https://git.io/v7lfK
<github> hpx/resource_partitioner 2c246d2 Hartmut Kaiser: Fixing warnings, re-implemented missing pieces...
<github> [hpx] hkaiser force-pushed resource_partitioner from 2c246d2 to c75fb59: https://git.io/v7lfK
<github> hpx/resource_partitioner c75fb59 Hartmut Kaiser: Fixing warnings, re-implemented missing pieces...
pree_ has quit [Read error: Connection reset by peer]
<jfbastien> heller I understand this. I'm measuring the performance of a new atomic / lock implementation for a yet unmeasured virtual ISA. So I want uncontended as well as contended usecases, ideally real-world stuff which otherwise performs useful work.
<jfbastien> heller and I like bugging wash ;)
<heller> ;)
<heller> the easiest way to go then would be to choose any application we know scales well, that would be the uncontented case
<heller> decrease the granularity of work to observe contention
<jfbastien> heller cool! Any preferred one? I'm going in and out of playing with this, context switching myself.
<heller> the fib one is probably nice since you can arbitrarily set the granularity and increase the number of tasks generated
<jfbastien> heller yeah that's what hkaiser / wash recommended. Seems to work when I tried yesterday. I haven't measured contention yet.
<heller> however, I am not sure what meaningful measure you'll get out of it, since I don't even know if it is mem bound, computational bound or something else
<heller> there is another interesting benchmark, the stream benchmark
<heller> and yes, i hear you saying: but it is embarrassingly parallel
<jfbastien> heller well, I like the idea of measuring contended versus not because it gives a baseline for what the cost of that contention is, and I can compare different architecture's costs.
<heller> there is a catch to it though
<heller> the catch is in the fork/join (implicit barriers) of the executed parallel algorithms
<heller> our tests show, that it severly hurts performance when scaling out
<jfbastien> it's un-intuitive, but I don't necessarily care if the code is even good! I'm purposefully looking at some silly code as well because it should perform as well as silly will allow it to. Basically I can't pessimise it.
<heller> easily observable on a KNL system where we do the stream from the HBM
<jfbastien> ah interesting. How many cores does this manifest at?
<heller> let me pull it out real quick
<heller> it manifests at around 60 cores
<jfbastien> heller OK interesting!
<heller> it gets worse, once you add the logical cores
<jfbastien> I'm trying out fewer cores for now, since that's easier, but it's good to have on my list as something that'll scale poorly later
<heller> the stencil examples mentioned are nice as well
<jfbastien> yeah stencil 8 seemed neat
<heller> they are mostly impaired by the overheads of memory allocation or lock contention (with high granularity)
<hkaiser> heller, jfbastien: I doubt the application matters if you want to look at the locks in the scheduler
<hkaiser> as long as sufficient work is generated, that is
<heller> well, it is nice, if we have some model of how well the application *should* perform
<heller> that is having an upper bound
<hkaiser> heller: I don't think jfbastien cares how good the applictaion itself is performing
<heller> yeah
<heller> but then, you could measure all kinds of other different effects
<heller> what's evident though, across all profiling runs over all kinds of applications: once granularity gets too high, we see a significant contention in the scheduler
<jfbastien> right, I care about how my ISA is performing compared to others, specifically on atomic / lock :)
<jfbastien> ISA / microkernel
<heller> that should be sufficient then
<heller> is it just the ISA that's special or also some novel architectural improvements?
<heller> like TMS or automatic lock elision etc.
<jfbastien> heller yes
<heller> will i get answers if asking further?
<jfbastien> :)
pree_ has joined #ste||ar
<heller> are task blocks part of parallelism TS V2?
<hkaiser> not sure, don't think so
<heller> they are
<zao> I like the architecture they invented for this year's defcon CTF contest. 9-bit bytes, middle-endian 3-byte registers, instructions taking register ranges and other madness.
<hkaiser> jfbastien: whatever you design there, please add one-cycle context switches and hardware support for global memory to it ;)
<jfbastien> hkaiser done
<hkaiser> thanks
<heller> ;)
<jfbastien> well that was easy
<hkaiser> we'll make hpx fly on that platform, then
<heller> err
<heller> hkaiser: btw, text is complete now
zbyerly_ has joined #ste||ar
mars0000 has quit [Quit: mars0000]
<heller> hkaiser: going from cover to cover now and fix all those fixmes. turns out, it requires a shitload of time
pree_ has quit [Ping timeout: 276 seconds]
<hkaiser> heller: I hear you
pree_ has joined #ste||ar
zbyerly_ has quit [Ping timeout: 240 seconds]
pree_ has quit [Ping timeout: 258 seconds]
mcopik has quit [Ping timeout: 246 seconds]
aserio has quit [Ping timeout: 246 seconds]
pree_ has joined #ste||ar
<github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/v70wE
<github> hpx/master 248c03c Hartmut Kaiser: Remove HPX_CONSTEXPR on function returning void for gcc 4.9
pree_ has quit [Ping timeout: 255 seconds]
aserio has joined #ste||ar
jfbastien has quit [Quit: Textual IRC Client: www.textualapp.com]
david_pfander1 has joined #ste||ar
pree_ has joined #ste||ar
david_pfander has quit [Ping timeout: 258 seconds]
david_pfander1 is now known as david_pfander
ajaivgeorge has joined #ste||ar
pree_ has quit [Ping timeout: 240 seconds]
david_pfander has quit [Ping timeout: 276 seconds]
bikineev has joined #ste||ar
mars0000 has joined #ste||ar
hkaiser has quit [Quit: bye]
pree_ has joined #ste||ar
pree_ has quit [Ping timeout: 240 seconds]
bikineev_ has joined #ste||ar
bikineev has quit [Ping timeout: 240 seconds]
mcopik has joined #ste||ar
hkaiser has joined #ste||ar
bikineev_ has quit [Remote host closed the connection]
bikineev has joined #ste||ar
<github> [hpx] ajaivgeorge opened pull request #2792: Implemented segmented find and its variations for partitioned vector (master...segmented_find2) https://git.io/v7EvV
patg[[w]] has joined #ste||ar
eschnett has quit [Quit: eschnett]
<github> [hpx] ajaivgeorge opened pull request #2793: Implemented segmented find_end and find_first_of for partitioned vector (master...segmented_find_end) https://git.io/v7EJe
<diehlpk_work> We should check our issues. Some of them never got a response or were merged, but not closed
patg[[w]] has quit [Quit: Leaving]
<hkaiser> diehlpk_work: which ones didn't get closed?
<diehlpk_work> hkaiser, Wrote it as a comment in the issue
aserio has quit [Quit: aserio]
<hkaiser> diehlpk_work: ok, thanks
<hkaiser> dienext time you create a PR which fixes and issue just add 'Fixies #NNNN' to the description, that will auto-close the issue once the PR is merged
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<hkaiser> diehlpk_work: ^^
<diehlpk_work> hkaiser, Will do that. I think there are more kind of this issues.
<diehlpk_work> I will comment them to
<hkaiser> thanks
<diehlpk_work> \away
<diehlpk_work> I was just loooking at the issues to see what easy things I can contribute
mars0000 has quit [Quit: mars0000]
<github> [hpx] hkaiser force-pushed pv_serializer from e1cc39c to a5b25d0: https://git.io/v7ECd
<github> hpx/pv_serializer a5b25d0 Hartmut Kaiser: Fixing parallel::fill to make partitioned_vector serialization work...
zbyerly_ has joined #ste||ar
zbyerly_ has quit [Ping timeout: 240 seconds]
bikineev has quit [Remote host closed the connection]