#ste||ar on 2017-06-06 — irc logs at irclog.cct.lsu.edu

2017-05-17 13:54 aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:19 <K-ballo> hkaiser: how does one use the tagged_tuple ?

00:19 <hkaiser> make_tagged_tuple(...)

00:19 <hkaiser> well, make_tagged_tuple<tag1, tag2>(val1, val2)

00:20 <hkaiser> then you can access the elements of the tagged_tuple using the tag names: tt = make_tagged_tuple<t1, t2>(v1, v2); tt.t1() -> first element, tt.t2() second element

00:20 <K-ballo> there's no tag/type relation?

00:21 <hkaiser> no, types are deduced from the arguments

00:21 <hkaiser> let me have a look again, hold on

00:21 <K-ballo> can I use util::get<tag>(tt) as well?

00:22 <hkaiser> ahh, it's make_tagged_tuple<tag1(T1), tag2(T2), ...>(...)

00:22 <hkaiser> K-ballo: yes, tagged_tuple is 'derived' from tuple

00:23 <hkaiser> K-ballo: here: https://github.com/STEllAR-GROUP/hpx/blob/master/tests/unit/util/tagged.cpp

00:24 <K-ballo> I see, I'm not sure it's what I need

00:24 <K-ballo> is it just about adding named members to tuple?

00:24 <hkaiser> yes

00:56 EverYoung has quit [Ping timeout: 246 seconds]

01:02 shoshijak has joined #ste||ar

01:31 <K-ballo> https://twitter.com/MalwareMinigun/status/871900369118535680

01:37 <zao> Eew.

01:47 diehlpk has quit [Ping timeout: 255 seconds]

01:49 <hkaiser> heh

01:55 <github> [hpx] hkaiser created rolling_statistics_counters (+1 new commit): https://git.io/vHwXg

01:55 <github> hpx/rolling_statistics_counters 340aca3 Hartmut Kaiser: Adding new statistics performance counters:...

01:57 <github> [hpx] hkaiser force-pushed fixing_future_serialization from 19972e8 to 33ba6b1: https://git.io/vHzFI

01:57 <github> hpx/fixing_future_serialization 33ba6b1 Hartmut Kaiser: Turning assertions into exceptions...

01:57 hkaiser has quit [Quit: bye]

03:04 K-ballo has quit [Quit: K-ballo]

04:06 eschnett has quit [Quit: eschnett]

04:06 eschnett has joined #ste||ar

05:16 jbjnr_ has joined #ste||ar

05:18 jbjnr has quit [Ping timeout: 246 seconds]

05:18 jbjnr_ is now known as jbjnr

05:35 shoshijak has quit [Ping timeout: 255 seconds]

05:40 pree has joined #ste||ar

06:06 jaafar has quit [Ping timeout: 240 seconds]

06:14 <github> [hpx] biddisco created fixing_2679 (+1 new commit): https://git.io/vHwFv

06:14 <github> hpx/fixing_2679 b94b8cf John Biddiscombe: Fix bad size and optimization flags during archive creation...

06:15 <github> [hpx] biddisco opened pull request #2680: Fix bad size and optimization flags during archive creation (master...fixing_2679) https://git.io/vHwFJ

06:17 <github> [hpx] biddisco force-pushed fixing_2679 from b94b8cf to d65d18b: https://git.io/vHwFY

06:17 <github> hpx/fixing_2679 d65d18b John Biddiscombe: Fix bad size during archive creation...

06:48 VeXocide has quit [Read error: Connection reset by peer]

06:48 VeXocide has joined #ste||ar

06:50 shoshijak has joined #ste||ar

07:29 <taeguk> Excuse me, Is there anyone who has experience to benchmark memory bandwidth of parallel algorithms?

07:31 <taeguk> I want to experiment about whether memory bandwidth is a bottleneck of parallel algorithm performance.

07:32 <taeguk> I have no experience about that. So, I want to ask anyone about tools, materials, or some tips.

07:33 <jbjnr> taeguk one thing you can do, is compute the number of memory accesses during the algorithm (should be some factor of N - the array size) and then you can do a simple memory BW calculation by saying I did M reads and N writes in X seconds. Then look at the specs of your processor to see what the mem BW is

07:34 <jbjnr> on some modern processors, the mem BW is a few hundred GB/s, on older ones, much less.

07:34 <jbjnr> Look at the stream benchmark in hpx

07:34 <jbjnr> to give yourself an idea. - and also an estimate of the mem BW of your machine

07:42 david_pfander has joined #ste||ar

07:43 <taeguk> jbjnr: Thank you for your advices. But I have worries about differences between theory and realworld.

07:43 <taeguk> If you consider 'cache', the way you presented might not be accurate. And maybe there is a difference between the number of memory accesses I can guess in C++ codes and real things in compiled assembly codes.

07:43 <jbjnr> yes. The difference between theory and practice is huge

07:43 <jbjnr> PS. https://github.com/STEllAR-GROUP/hpx/blob/67870fcef17302de3cd375627bbbd44e31cf105f/tests/performance/local/stream.cpp

07:43 <jbjnr> ^^ strwam benchmark

07:44 <jbjnr> ^stream benchmark

07:44 <jbjnr> First you start with the theoreticl peak performance of the algorithm - assuming that the memory BW is fully used

07:44 <jbjnr> then you look at the cache hit/miss etc and see how this affects your algorithm

07:45 <jbjnr> measuring cache missed requires special profiling tools like papi

07:48 <jbjnr> an algorithm like is_heap will be memory bound - how do we know? - because all it does is iterate of a list and make a simple comparison. it accesses memory continuously and does almost no calculation on the data.

07:49 <taeguk> jbjnr: Yes. I think so, too. Can I find the solution of that problem?

07:49 <jbjnr> not really. it's a common problem

07:50 <jbjnr> most of the algorithms we develop are memory bound, few are compute bound it seems - at least the computer science ones (as opposed to the physics solvers)

07:50 <jbjnr> do the sums anyway and see how close to peak memory you are getting ....

07:51 <taeguk> Right.

07:52 <taeguk> But, in case of is_heap, it is bad aspect of cache.

07:52 <taeguk> https://github.com/STEllAR-GROUP/hpx/blob/67870fcef17302de3cd375627bbbd44e31cf105f/hpx/parallel/algorithms/is_heap.hpp#L75

07:53 <jbjnr> yes

07:53 <taeguk> because it must access parent elements.

07:53 <taeguk> I have no idea to solve that cache problem.

07:54 <jbjnr> it's a problem for sure.

07:55 <jbjnr> if you think of a way to solve that cache miss problem, by developing a better algorithm, then you are a star! (I haven't researched the is_heap, so I'm really not sure how many ways there are of doing it, but I guess, not many)

08:07 <heller_> jbjnr: https://github.com/STEllAR-GROUP/hpx/pull/2619 <--- this should be ready now

08:08 <heller_> jbjnr: https://github.com/STEllAR-GROUP/hpx/pull/2656 <--- should I add the runtime option?

08:09 <jbjnr> heller_: I'll have a play with the LF PR, but the MAX_TERMINATED_THREADS one I suspect hartmut want's to make a user option instead of just reducing it

08:10 <heller_> jbjnr: ok, just asking if I can reduce your load today

08:10 <heller_> rehearsal is boring

08:10 <jbjnr> rehaearsal?

08:11 <heller_> rehearsal for the review tomorrow

08:19 <jbjnr> I don't think we ever did rehearsals for the smaller projects. Always got top marks and it was easy :)

08:19 <jbjnr> do good work, and all wil be fine

08:19 <jbjnr> my rma stuff is now officially awesome

08:30 <heller_> great!

08:33 <heller_> jbjnr: osu latency in the native MPI ballpark now?

08:34 <jbjnr> no of course not. just the design is lovely

08:34 <jbjnr> just about to run first full osu test after more tweaking

08:34 <heller_> woohoo! :D

08:35 <jbjnr> will have results soon. won't be much better than current implementation, but migh show improvement on smaler thread counts, and uses less memory

08:35 <heller_> k

08:35 <jbjnr> on higher thread couts the registration costs of memory are hidden by the overlapping sends

08:35 <jbjnr> so I hope to see better perf on single thread compared to non rma version

08:37 <heller_> that'll be great already

08:37 <heller_> that's more or less exactly what we need

08:37 <jbjnr> fingers crossed

08:45 bikineev has joined #ste||ar

09:19 pree has quit [Ping timeout: 255 seconds]

09:20 bikineev has quit [Remote host closed the connection]

09:26 Matombo has joined #ste||ar

09:37 Remko has joined #ste||ar

10:05 pree has joined #ste||ar

11:33 Remko has quit [Remote host closed the connection]

11:36 K-ballo has joined #ste||ar

11:38 bikineev has joined #ste||ar

11:45 <shoshijak> is there a reason why this doesn't throw an exception if the runtime has not yet been initialized? https://github.com/STEllAR-GROUP/hpx/blob/master/src/runtime.cpp#L677

11:52 josef__k has joined #ste||ar

11:53 <heller_> shoshijak: for some reason, it is seen as if the key hasn't been found

11:53 ajaivgeorge has joined #ste||ar

11:53 <heller_> no real rationale, I guess

11:54 pree has quit [Ping timeout: 260 seconds]

11:55 pree has joined #ste||ar

11:55 ajaivgeorge has quit [Remote host closed the connection]

11:55 ajaivgeorge has joined #ste||ar

11:57 <shoshijak> can I add an exception? It would make my life easier

11:57 <shoshijak> and seems like the correct behavior ...

11:59 <heller_> it sounds like a good idea, yeah

12:02 ajaivgeorge has quit [Quit: ajaivgeorge]

12:02 <josef__k> Where can I find a robust comparison of the available malloc implementations for HPX? I.e., jemalloc vs tcmalloc vs tbb?

12:03 <josef__k> Or even just a crude heuristic to help me choose? :)

12:10 hkaiser has joined #ste||ar

12:15 <heller_> josef__k: there is no comparison ;)

12:15 <github> [hpx] sithhell pushed 1 new commit to master: https://git.io/vHrlr

12:15 <github> hpx/master 396e999 Thomas Heller: Fixing resource manager test execution

12:16 <heller_> josef__k: I mostly use jemalloc nowadays because it tends to work on more platforms ;)

12:16 <heller_> they all should be in the same ballpark

12:17 <K-ballo> new nuwen release https://nuwen.net/mingw.html

12:18 <jbjnr> I use jemalloc on all my builds, here's a comparison - https://suniphrase.wordpress.com/2015/10/27/jemalloc-vs-tcmalloc-vs-dlmalloc/

12:19 <heller_> and in the meantime, there have been a thousand new releases ;)

12:19 <jbjnr> sorry

12:20 <heller_> not your fault

12:20 <heller_> it's always difficult to keep with those things...

12:21 <heller_> that's what I wanted to convey...

12:21 <hkaiser> heller_: thanks for the resource_manager fix

12:21 <heller_> np

12:21 <heller_> it was bothering me for weeks now ;)

12:21 Matombo has quit [Ping timeout: 260 seconds]

12:21 <hkaiser> nod

12:22 <jbjnr> stop creating resource managers anyway. There can be only one!

12:45 Matombo has joined #ste||ar

12:47 bikineev has quit [Ping timeout: 246 seconds]

12:57 <github> [hpx] K-ballo force-pushed compat-exception from 60ac848 to 46373fe: https://git.io/vH8FM

12:57 <github> hpx/compat-exception 46373fe Agustin K-ballo Berge: Remove compatibility layer for std::exception_ptr, mark support as required

12:59 <jbjnr> when gdb dies "Reading symbols from /scratch/snx3000/biddisco/build/hvtkm/bin/osu_latency...Segmentation fault" what can I do next?

13:04 <heller_> jbjnr: update gdb

13:04 <heller_> jbjnr: or reduce the length of the names

13:04 <heller_> jbjnr: the mangled name is larger than 256 characters

13:10 <jbjnr> heller_: really? does that cause problems

13:10 <heller_> yes

13:10 <heller_> on older gdb versions

13:10 <heller_> newer ones didn't seem to have that problem

13:10 <jbjnr> GNU gdb (GDB) 7.6.2-7.0.2

13:20 eschnett has quit [Quit: eschnett]

13:26 zbyerly_ has joined #ste||ar

13:34 <heller_> hmm, should be fine then

13:34 <heller_> gdb gdb ;)

13:41 aserio has joined #ste||ar

13:45 hkaiser has quit [Quit: bye]

13:49 aserio has quit [Quit: aserio]

14:00 mcopik has quit [Ping timeout: 240 seconds]

14:05 aserio has joined #ste||ar

14:09 eschnett has joined #ste||ar

14:16 pree has quit [Ping timeout: 260 seconds]

14:17 hkaiser has joined #ste||ar

14:28 pree has joined #ste||ar

14:29 <zao> Oh boy, I haven't crashed a debugger with overlong symbols since using Boost on AIX.

14:29 Matombo has quit [Ping timeout: 255 seconds]

14:29 <zao> Last time I saw symbol snafus was when VC++ truncated Spirit symbols to like 4096.

14:31 <K-ballo> slightly over an hour, that's how long it took

14:31 <K-ballo> (for someone to mention spirit in relation to overly long symbols :))

14:31 <zao> :D

14:38 <josef__k> How do I access the measurements made by -DHPX_WITH_THREAD_IDLE_RATES=On ?

14:39 aserio has quit [Ping timeout: 246 seconds]

14:39 <heller_> josef__k: performance counter

14:40 <heller_> josef__k: --hpx:print-counter='...'

14:41 <jbjnr> josef__k: http://stellar-group.github.io/hpx/docs/html/hpx/manual/performance_counters.html

14:41 <josef__k> Thanks.

14:43 aserio has joined #ste||ar

14:47 josef__k has quit [Ping timeout: 246 seconds]

14:51 aserio has quit [Quit: aserio]

14:52 aserio has joined #ste||ar

15:02 EverYoung has joined #ste||ar

15:02 EverYoung has quit [Remote host closed the connection]

15:03 EverYoung has joined #ste||ar

15:09 hkaiser has quit [Quit: bye]

15:11 <github> [hpx] hkaiser opened pull request #2681: Attempt to fix problem in managed_component_base (master...fixing_2663) https://git.io/vHrya

15:12 hkaiser has joined #ste||ar

15:14 aserio has quit [Quit: aserio]

15:14 aserio has joined #ste||ar

15:20 hkaiser has quit [Quit: bye]

15:24 hkaiser has joined #ste||ar

15:28 denis_blank has joined #ste||ar

15:29 pree_ has joined #ste||ar

15:29 pree has quit [Read error: Connection reset by peer]

15:31 aserio has quit [Ping timeout: 246 seconds]

15:42 <shoshijak> affinity_data::init(...) is supposed to return the number of cores needed. If so, why does it return max(num_unique_cores,max_cores) and not the min? https://github.com/STEllAR-GROUP/hpx/blob/master/src/runtime/threads/policies/affinity_data.cpp#L135

15:56 jaafar has joined #ste||ar

15:57 mcopik has joined #ste||ar

16:01 <denis_blank> Is there a function inside the codebase already, that merges two hpx::tuples, or inserts one element at the back of the tuple?

16:01 shoshijak has quit [Ping timeout: 255 seconds]

16:02 <K-ballo> tuple_cat, same as std::

16:02 <K-ballo> not usually recommended though

16:04 aserio has joined #ste||ar

16:05 <denis_blank> K-ballo: thanks

16:15 jaafar has quit [Quit: Konversation terminated!]

16:16 jaafar has joined #ste||ar

16:20 <github> [hpx] K-ballo opened pull request #2683: Replace boost::exception_ptr with std::exception_ptr (master...compat-exception) https://git.io/vHrAZ

16:21 bikineev has joined #ste||ar

16:31 EverYoun_ has joined #ste||ar

16:32 EverYoung has quit [Ping timeout: 258 seconds]

16:32 david_pfander has quit [Ping timeout: 240 seconds]

16:34 bikineev has quit [Ping timeout: 240 seconds]

16:39 zahra123 has joined #ste||ar

16:46 bikineev has joined #ste||ar

16:46 shoshijak has joined #ste||ar

16:51 bikineev has quit [Ping timeout: 255 seconds]

17:00 pree_ has quit [Read error: Connection reset by peer]

17:01 Matombo has joined #ste||ar

17:13 <github> [hpx] hkaiser force-pushed rolling_statistics_counters from b9c9966 to 5a0bbdb: https://git.io/vHr99

17:13 <github> hpx/rolling_statistics_counters 5a0bbdb Hartmut Kaiser: Adding new statistics performance counters:...

17:16 <github> [hpx] hkaiser closed pull request #2673: Inhibit direct conversion from future<future<T>> --> future<void> (master...fixing_2667) https://git.io/vHuec

17:16 <github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/vHoTf

17:16 <github> hpx/master 5e88a63 Hartmut Kaiser: Merge pull request #2674 from STEllAR-GROUP/fixing_future_serialization...

17:18 <github> [hpx] hkaiser opened pull request #2684: Adding new statistics performance counters: (master...rolling_statistics_counters) https://git.io/vHoTl

17:24 <diehlpk_work> heller_, hkaiser Skype meeting today?

17:26 <hkaiser> diehlpk_work: what time?

17:26 aserio has quit [Ping timeout: 246 seconds]

17:26 <diehlpk_work> hkaiser, It depends on heller_ , because he did not reply

17:27 <hkaiser> yah

17:28 hkaiser has quit [Read error: Connection reset by peer]

17:28 hkaiser has joined #ste||ar

17:28 <hkaiser> diehlpk_work: for me it would work only after 2pm which is most likely too late for heller_ :/

17:31 <github> [hpx] hkaiser force-pushed rolling_statistics_counters from 5a0bbdb to fca4aaa: https://git.io/vHr99

17:31 <github> hpx/rolling_statistics_counters fca4aaa Hartmut Kaiser: Adding new statistics performance counters:...

17:40 <diehlpk_work> Ok, so maybe Thursday or Friday

17:40 <hkaiser> diehlpk_work: sure, depends on heller_

17:42 <zbyerly> hkaiser, you might need to use -DWITH_SCOTCH=false

17:42 <zbyerly> building libgeodecomp

17:43 <hkaiser> zbyerly: ok

17:51 aserio has joined #ste||ar

18:03 <heller_> hkaiser: diehlpk_work: I can't do it today. I'm fully booked this week. Sorry

18:05 bikineev has joined #ste||ar

18:21 <jbjnr> HPX_REGISTER_BASE_LCO_WITH_VALUE_DECLARATION - do we need this?

18:21 <jbjnr> what does it do? I've asked before,but forgotten

18:21 <hkaiser> jbjnr: only for heterogeneous settings

18:21 <jbjnr> aha. thanks

18:22 <jbjnr> should put that int the name really

18:22 <hkaiser> also, it avoids instantiating a base_lco_with_value<T> for each T only once across the whole application

18:22 <hkaiser> more than once* even

18:22 <jbjnr> ok

18:54 aserio has quit [Ping timeout: 260 seconds]

19:08 aserio has joined #ste||ar

19:08 hkaiser has quit [Quit: bye]

19:23 aserio has quit [Ping timeout: 246 seconds]

19:24 bikineev has quit [Ping timeout: 240 seconds]

19:27 hkaiser has joined #ste||ar

19:34 aserio has joined #ste||ar

19:37 <K-ballo> I've implemented a variation of peter's exception_info proposal

19:37 <K-ballo> I'll try to integrate into HPX, see if it can replace boost.exception

19:39 <hkaiser> K-ballo: ohh cool!

19:39 akheir has joined #ste||ar

20:14 <github> [hpx] hkaiser force-pushed rolling_statistics_counters from fca4aaa to 65c8ec1: https://git.io/vHr99

20:14 <github> hpx/rolling_statistics_counters 65c8ec1 Hartmut Kaiser: Adding new statistics performance counters:...

20:23 hkaiser has quit [Quit: bye]

20:28 david_pf_ has joined #ste||ar

20:44 eschnett has quit [Quit: eschnett]

20:45 hkaiser has joined #ste||ar

21:02 <github> [hpx] Naios opened pull request #2685: Add support of std::array to hpx::util::tuple_size and tuple_element (master...size_std_array) https://git.io/vHoKz

21:05 <K-ballo> ha! ^ interesting... I was considering doing that, and dropping boost::array support in the process

21:07 <hkaiser> we should move away from boost::integral_constant et.al. as well

21:07 <hkaiser> that's another painful thing (lot'sa work)

21:07 <K-ballo> yep, yep yep yep

21:07 <K-ballo> and util::decay

21:07 <hkaiser> right

21:07 <K-ballo> but a massive overall PR would be disruptive

21:07 <hkaiser> nod

21:08 <K-ballo> besides painful :P

21:08 <hkaiser> one thing at a time...

21:08 <heller_> hkaiser: are there still any problems with #2619?

21:08 <hkaiser> have not looked yet

21:08 <heller_> ok

21:09 bikineev has joined #ste||ar

21:24 shoshijak has quit [Ping timeout: 255 seconds]

21:49 denis_blank has quit [Quit: denis_blank]

22:07 mcopik_ has joined #ste||ar

22:07 mcopik_ has quit [Client Quit]

22:08 <jbjnr> heller_: hkaiser https://gist.github.com/biddisco/0e28393c7f718c6c1917ba230c9150fd

22:09 <jbjnr> what you see there is the rma vector (rma) versus the serialize_buffer(ser)

22:09 <jbjnr> when the number of threads is smallish (2 in this case) and the number of messages in flight (window size) is amllish, then the cost of rma memory registration can be seen

22:11 <jbjnr> the rma version has preregistered buffers on both ends and outperforms the serialize buffer quite nicely on the large message sizes. for <4096 they are both using copy mode, but above that they switch to rendezvous protocol and the rma kicks in.

22:11 <jbjnr> for 524288 the differnce is 14GB/s to 10GB/s lovely.

22:12 <jbjnr> I will plot some graphs and submit a paper on friday.

22:13 <jbjnr> there's an odd bump at 4096 I need to look into, but apaer from that all is well.

22:13 <jbjnr> see ya tomorrow

22:18 david_pf_ has quit [Quit: david_pf_]

22:19 <github> [hpx] K-ballo created throw_with_info (+1 new commit): https://git.io/vHo9A

22:19 <github> hpx/throw_with_info 3790289 Agustin K-ballo Berge: (draft) exception_info implementation (P0640)

22:25 aserio has quit [Quit: aserio]

22:36 <K-ballo> hkaiser: VeXocide suggests tags based on string literals instead of typeid, thoughts?

22:48 Matombo has quit [Quit: Leaving]

22:58 diehlpk has joined #ste||ar

23:00 Matombo has joined #ste||ar

23:00 <hkaiser> K-ballo: tags for what?

23:00 <hkaiser> jbjnr: lovely!

23:02 <K-ballo> exception information tags

23:03 <K-ballo> like typedef boost::error_info<detail::tag_throw_thread_id, std::size_t> throw_thread_id;

23:14 <hkaiser> K-ballo: ahh

23:14 Matombo has quit [Remote host closed the connection]

23:14 <hkaiser> sure, I don't care about that as it's completely hidden anyways

23:53 bikineev has quit [Ping timeout: 260 seconds]