#ste||ar on 2017-08-01 — irc logs at irclog.cct.lsu.edu

2017-05-17 13:54 aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:19 eschnett has quit [Quit: eschnett]

00:27 https_GK1wmSU has joined #ste||ar

00:27 bikineev has quit [Remote host closed the connection]

00:28 https_GK1wmSU has left #ste||ar [#ste||ar]

00:32 bikineev has joined #ste||ar

00:36 bikineev has quit [Ping timeout: 240 seconds]

01:15 <github> [hpx] hkaiser pushed 1 new commit to resource_partitioner: https://git.io/v7Bat

01:15 <github> hpx/resource_partitioner 07aa6b3 Hartmut Kaiser: Fixing warnings, re-implemented missing pieces...

01:30 vamatya has quit [Ping timeout: 260 seconds]

01:34 eschnett has joined #ste||ar

02:12 K-ballo has quit [Quit: K-ballo]

02:33 bikineev has joined #ste||ar

02:36 hkaiser has quit [Quit: bye]

02:38 bikineev has quit [Ping timeout: 246 seconds]

02:44 vamatya has joined #ste||ar

02:52 https_GK1wmSU has joined #ste||ar

02:53 https_GK1wmSU has left #ste||ar [#ste||ar]

03:35 bikineev has joined #ste||ar

03:40 bikineev has quit [Ping timeout: 276 seconds]

04:36 vamatya has quit [Ping timeout: 260 seconds]

04:38 <heller> jfbastien: for a matter of fact, all the performance tuned, tightly coupled parallel applications, avoid lock/atomic contention as much as possible

04:42 <heller> this is mostly achievable by choosing the right granularity (trade of between possible amount of concurrency, essentially inhibits scaling without increasing the problem size). What happens if there is too much work is that the lockfree queues in the thread management get heavily contented due to work stealing. Once you have that under control, the most contentious points would in the synchronization of the shared state, which never occured in our

04:42 <heller> profiles so far, leading to the assumption that the "no concurrent access" case is the most common one, nevertheless has to be synchronized using atomics.

04:43 <heller> there are lots of places where we require locks/atomics for correctness even though the majority of accesses don't seem to be concurrent

04:44 <heller> which is mostly an implication of choosing the right granularity when writing application code

06:46 Matombo has joined #ste||ar

07:08 Matombo has quit [Remote host closed the connection]

07:43 david_pfander has joined #ste||ar

07:59 Matombo has joined #ste||ar

08:02 bikineev has joined #ste||ar

08:06 Matombo has quit [Remote host closed the connection]

08:25 bikineev has quit [Remote host closed the connection]

08:35 bikineev has joined #ste||ar

08:55 bikineev has quit [Remote host closed the connection]

08:59 bikineev has joined #ste||ar

09:08 <github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/v7Bxp

09:08 <github> hpx/gh-pages 7d7a884 StellarBot: Updating docs

10:03 mcopik has joined #ste||ar

10:51 bikineev has quit [Remote host closed the connection]

11:00 bikineev has joined #ste||ar

11:22 hkaiser has joined #ste||ar

12:06 K-ballo has joined #ste||ar

12:06 eschnett has quit [Quit: eschnett]

12:08 zbyerly_ has joined #ste||ar

12:37 eschnett has joined #ste||ar

12:46 <github> [hpx] hkaiser pushed 2 new commits to master: https://git.io/v7RBj

12:46 <github> hpx/master 6103869 JF Bastien: Fix OSX build...

12:46 <github> hpx/master 9c758ec Hartmut Kaiser: Merge pull request #2790 from jfbastien/build-fix...

13:02 david_pfander has quit [Ping timeout: 240 seconds]

13:02 taeguk[m] has quit [Ping timeout: 246 seconds]

13:02 thundergroudon[m has quit [Ping timeout: 258 seconds]

13:07 david_pfander has joined #ste||ar

13:10 bikineev has quit [Remote host closed the connection]

13:12 eschnett has quit [Quit: eschnett]

13:12 david_pfander has quit [Quit: david_pfander]

13:12 eschnett has joined #ste||ar

13:12 david_pfander has joined #ste||ar

13:13 eschnett has quit [Client Quit]

13:13 eschnett has joined #ste||ar

13:20 hkaiser has quit [Quit: bye]

13:23 david_pfander1 has joined #ste||ar

13:24 david_pfander has quit [Read error: Connection reset by peer]

13:26 david_pfander has joined #ste||ar

13:28 david_pfander1 has quit [Ping timeout: 276 seconds]

13:35 david_pfander has quit [Remote host closed the connection]

13:35 david_pfander1 has joined #ste||ar

13:37 david_pfander1 is now known as david_pfander

13:41 thundergroudon[m has joined #ste||ar

13:47 taeguk[m] has joined #ste||ar

14:06 pree_ has joined #ste||ar

14:07 pree_ has quit [Read error: Connection reset by peer]

14:10 pree_ has joined #ste||ar

14:22 hkaiser has joined #ste||ar

14:27 aserio has joined #ste||ar

14:30 <Reazul> @hkaiser: https://stellar-group.github.io/hpx/docs/html/hpx/manual/components/use_components.html <---- This seems empty

14:33 <heller> it is empty indeed

14:33 <heller> Reazul: https://stellar-group.github.io/tutorials/hlrs2017/session5/#21

14:34 <Reazul> @heller: Thanks :)

14:45 pree_ has quit [Read error: Connection reset by peer]

14:53 zbyerly_ has quit [Ping timeout: 246 seconds]

14:59 aserio1 has joined #ste||ar

15:00 pree_ has joined #ste||ar

15:01 pree_ has quit [Read error: Connection reset by peer]

15:02 aserio has quit [Ping timeout: 246 seconds]

15:02 aserio1 is now known as aserio

15:17 pree_ has joined #ste||ar

15:40 mars0000 has joined #ste||ar

15:42 pree_ has quit [Read error: Connection reset by peer]

15:42 <github> [hpx] hkaiser created pv_serializer (+20 new commits): https://git.io/v70sT

15:42 <github> hpx/pv_serializer b5d5f0b ct-clmsn: initial import

15:42 <github> hpx/pv_serializer 2399fe4 ct-clmsn: fix implementation to support friendship in partitioned_vector implementation file

15:42 <github> hpx/pv_serializer 00924d6 ct-clmsn: added friend type to the partitioned_vector_segmented_serializer

15:46 bibek_desktop has quit [Quit: Leaving]

15:46 <github> [hpx] hkaiser opened pull request #2791: Circumvent scary warning about placement new (master...fixing_any_warning) https://git.io/v70sX

15:53 bibek_desktop has joined #ste||ar

15:55 vamatya has joined #ste||ar

15:55 pree_ has joined #ste||ar

16:01 <github> [hpx] hkaiser force-pushed resource_partitioner from 07aa6b3 to 2c246d2: https://git.io/v7lfK

16:01 <github> hpx/resource_partitioner 2c246d2 Hartmut Kaiser: Fixing warnings, re-implemented missing pieces...

16:04 <github> [hpx] hkaiser force-pushed resource_partitioner from 2c246d2 to c75fb59: https://git.io/v7lfK

16:04 <github> hpx/resource_partitioner c75fb59 Hartmut Kaiser: Fixing warnings, re-implemented missing pieces...

16:16 pree_ has quit [Read error: Connection reset by peer]

16:18 <jfbastien> heller I understand this. I'm measuring the performance of a new atomic / lock implementation for a yet unmeasured virtual ISA. So I want uncontended as well as contended usecases, ideally real-world stuff which otherwise performs useful work.

16:18 <jfbastien> heller and I like bugging wash ;)

16:18 <heller> ;)

16:19 <heller> the easiest way to go then would be to choose any application we know scales well, that would be the uncontented case

16:19 <heller> decrease the granularity of work to observe contention

16:20 <jfbastien> heller cool! Any preferred one? I'm going in and out of playing with this, context switching myself.

16:21 <heller> the fib one is probably nice since you can arbitrarily set the granularity and increase the number of tasks generated

16:22 <jfbastien> heller yeah that's what hkaiser / wash recommended. Seems to work when I tried yesterday. I haven't measured contention yet.

16:22 <heller> however, I am not sure what meaningful measure you'll get out of it, since I don't even know if it is mem bound, computational bound or something else

16:22 <heller> there is another interesting benchmark, the stream benchmark

16:23 <heller> and yes, i hear you saying: but it is embarrassingly parallel

16:23 <jfbastien> heller well, I like the idea of measuring contended versus not because it gives a baseline for what the cost of that contention is, and I can compare different architecture's costs.

16:23 <heller> there is a catch to it though

16:23 <heller> the catch is in the fork/join (implicit barriers) of the executed parallel algorithms

16:24 <heller> our tests show, that it severly hurts performance when scaling out

16:24 <jfbastien> it's un-intuitive, but I don't necessarily care if the code is even good! I'm purposefully looking at some silly code as well because it should perform as well as silly will allow it to. Basically I can't pessimise it.

16:24 <heller> easily observable on a KNL system where we do the stream from the HBM

16:24 <jfbastien> ah interesting. How many cores does this manifest at?

16:25 <heller> let me pull it out real quick

16:27 <heller> it manifests at around 60 cores

16:27 <jfbastien> heller OK interesting!

16:27 <heller> it gets worse, once you add the logical cores

16:28 <jfbastien> I'm trying out fewer cores for now, since that's easier, but it's good to have on my list as something that'll scale poorly later

16:28 <heller> the stencil examples mentioned are nice as well

16:29 <jfbastien> yeah stencil 8 seemed neat

16:29 <heller> they are mostly impaired by the overheads of memory allocation or lock contention (with high granularity)

16:29 <hkaiser> heller, jfbastien: I doubt the application matters if you want to look at the locks in the scheduler

16:30 <hkaiser> as long as sufficient work is generated, that is

16:30 <heller> well, it is nice, if we have some model of how well the application *should* perform

16:30 <heller> that is having an upper bound

16:30 <hkaiser> heller: I don't think jfbastien cares how good the applictaion itself is performing

16:30 <heller> yeah

16:31 <heller> but then, you could measure all kinds of other different effects

16:31 <heller> what's evident though, across all profiling runs over all kinds of applications: once granularity gets too high, we see a significant contention in the scheduler

16:32 <jfbastien> right, I care about how my ISA is performing compared to others, specifically on atomic / lock :)

16:32 <jfbastien> ISA / microkernel

16:32 <heller> that should be sufficient then

16:33 <heller> is it just the ISA that's special or also some novel architectural improvements?

16:34 <heller> like TMS or automatic lock elision etc.

16:34 <jfbastien> heller yes

16:35 <heller> will i get answers if asking further?

16:36 <jfbastien> :)

16:38 pree_ has joined #ste||ar

16:43 <heller> are task blocks part of parallelism TS V2?

16:45 <hkaiser> not sure, don't think so

16:45 <heller> they are

16:48 <zao> I like the architecture they invented for this year's defcon CTF contest. 9-bit bytes, middle-endian 3-byte registers, instructions taking register ranges and other madness.

16:50 <hkaiser> jfbastien: whatever you design there, please add one-cycle context switches and hardware support for global memory to it ;)

16:52 <jfbastien> hkaiser done

16:52 <hkaiser> thanks

16:52 <heller> ;)

16:52 <jfbastien> well that was easy

16:52 <hkaiser> we'll make hpx fly on that platform, then

16:52 <heller> err

16:55 <heller> hkaiser: btw, text is complete now

16:55 zbyerly_ has joined #ste||ar

16:55 mars0000 has quit [Quit: mars0000]

16:55 <heller> hkaiser: going from cover to cover now and fix all those fixmes. turns out, it requires a shitload of time

16:59 pree_ has quit [Ping timeout: 276 seconds]

17:02 <hkaiser> heller: I hear you

17:10 pree_ has joined #ste||ar

17:17 zbyerly_ has quit [Ping timeout: 240 seconds]

17:21 pree_ has quit [Ping timeout: 258 seconds]

17:29 mcopik has quit [Ping timeout: 246 seconds]

17:30 aserio has quit [Ping timeout: 246 seconds]

17:33 pree_ has joined #ste||ar

17:47 <github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/v70wE

17:47 <github> hpx/master 248c03c Hartmut Kaiser: Remove HPX_CONSTEXPR on function returning void for gcc 4.9

17:48 pree_ has quit [Ping timeout: 255 seconds]

17:50 aserio has joined #ste||ar

17:57 jfbastien has quit [Quit: Textual IRC Client: www.textualapp.com]

18:01 david_pfander1 has joined #ste||ar

18:01 pree_ has joined #ste||ar

18:03 david_pfander has quit [Ping timeout: 258 seconds]

18:03 david_pfander1 is now known as david_pfander

18:03 ajaivgeorge has joined #ste||ar

18:06 pree_ has quit [Ping timeout: 240 seconds]

18:08 david_pfander has quit [Ping timeout: 276 seconds]

18:08 bikineev has joined #ste||ar

18:16 mars0000 has joined #ste||ar

18:19 hkaiser has quit [Quit: bye]

18:19 pree_ has joined #ste||ar

18:30 pree_ has quit [Ping timeout: 240 seconds]

19:01 bikineev_ has joined #ste||ar

19:04 bikineev has quit [Ping timeout: 240 seconds]

19:22 mcopik has joined #ste||ar

19:26 hkaiser has joined #ste||ar

20:00 bikineev_ has quit [Remote host closed the connection]

20:00 bikineev has joined #ste||ar

20:27 <github> [hpx] ajaivgeorge opened pull request #2792: Implemented segmented find and its variations for partitioned vector (master...segmented_find2) https://git.io/v7EvV

20:29 patg[[w]] has joined #ste||ar

20:31 eschnett has quit [Quit: eschnett]

20:36 <github> [hpx] ajaivgeorge opened pull request #2793: Implemented segmented find_end and find_first_of for partitioned vector (master...segmented_find_end) https://git.io/v7EJe

20:43 <diehlpk_work> We should check our issues. Some of them never got a response or were merged, but not closed

20:43 patg[[w]] has quit [Quit: Leaving]

20:46 <hkaiser> diehlpk_work: which ones didn't get closed?

20:47 <diehlpk_work> hkaiser, Wrote it as a comment in the issue

20:52 aserio has quit [Quit: aserio]

20:53 <hkaiser> diehlpk_work: ok, thanks

20:53 <hkaiser> dienext time you create a PR which fixes and issue just add 'Fixies #NNNN' to the description, that will auto-close the issue once the PR is merged

20:54 hkaiser has quit [Read error: Connection reset by peer]

20:54 hkaiser has joined #ste||ar

20:54 <hkaiser> diehlpk_work: ^^

20:55 <diehlpk_work> hkaiser, Will do that. I think there are more kind of this issues.

20:55 <diehlpk_work> I will comment them to

20:55 <hkaiser> thanks

20:55 <diehlpk_work> \away

20:56 <diehlpk_work> I was just loooking at the issues to see what easy things I can contribute

21:45 mars0000 has quit [Quit: mars0000]

22:10 <github> [hpx] hkaiser force-pushed pv_serializer from e1cc39c to a5b25d0: https://git.io/v7ECd

22:10 <github> hpx/pv_serializer a5b25d0 Hartmut Kaiser: Fixing parallel::fill to make partitioned_vector serialization work...

22:20 zbyerly_ has joined #ste||ar

23:18 zbyerly_ has quit [Ping timeout: 240 seconds]

23:47 bikineev has quit [Remote host closed the connection]