aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
<K-ballo> hkaiser: how does one use the tagged_tuple ?
<hkaiser> make_tagged_tuple(...)
<hkaiser> well, make_tagged_tuple<tag1, tag2>(val1, val2)
<hkaiser> then you can access the elements of the tagged_tuple using the tag names: tt = make_tagged_tuple<t1, t2>(v1, v2); tt.t1() -> first element, tt.t2() second element
<K-ballo> there's no tag/type relation?
<hkaiser> no, types are deduced from the arguments
<hkaiser> let me have a look again, hold on
<K-ballo> can I use util::get<tag>(tt) as well?
<hkaiser> ahh, it's make_tagged_tuple<tag1(T1), tag2(T2), ...>(...)
<hkaiser> K-ballo: yes, tagged_tuple is 'derived' from tuple
<K-ballo> I see, I'm not sure it's what I need
<K-ballo> is it just about adding named members to tuple?
<hkaiser> yes
EverYoung has quit [Ping timeout: 246 seconds]
shoshijak has joined #ste||ar
<zao> Eew.
diehlpk has quit [Ping timeout: 255 seconds]
<hkaiser> heh
<github> [hpx] hkaiser created rolling_statistics_counters (+1 new commit): https://git.io/vHwXg
<github> hpx/rolling_statistics_counters 340aca3 Hartmut Kaiser: Adding new statistics performance counters:...
<github> [hpx] hkaiser force-pushed fixing_future_serialization from 19972e8 to 33ba6b1: https://git.io/vHzFI
<github> hpx/fixing_future_serialization 33ba6b1 Hartmut Kaiser: Turning assertions into exceptions...
hkaiser has quit [Quit: bye]
K-ballo has quit [Quit: K-ballo]
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
jbjnr_ has joined #ste||ar
jbjnr has quit [Ping timeout: 246 seconds]
jbjnr_ is now known as jbjnr
shoshijak has quit [Ping timeout: 255 seconds]
pree has joined #ste||ar
jaafar has quit [Ping timeout: 240 seconds]
<github> [hpx] biddisco created fixing_2679 (+1 new commit): https://git.io/vHwFv
<github> hpx/fixing_2679 b94b8cf John Biddiscombe: Fix bad size and optimization flags during archive creation...
<github> [hpx] biddisco opened pull request #2680: Fix bad size and optimization flags during archive creation (master...fixing_2679) https://git.io/vHwFJ
<github> [hpx] biddisco force-pushed fixing_2679 from b94b8cf to d65d18b: https://git.io/vHwFY
<github> hpx/fixing_2679 d65d18b John Biddiscombe: Fix bad size during archive creation...
VeXocide has quit [Read error: Connection reset by peer]
VeXocide has joined #ste||ar
shoshijak has joined #ste||ar
<taeguk> Excuse me, Is there anyone who has experience to benchmark memory bandwidth of parallel algorithms?
<taeguk> I want to experiment about whether memory bandwidth is a bottleneck of parallel algorithm performance.
<taeguk> I have no experience about that. So, I want to ask anyone about tools, materials, or some tips.
<jbjnr> taeguk one thing you can do, is compute the number of memory accesses during the algorithm (should be some factor of N - the array size) and then you can do a simple memory BW calculation by saying I did M reads and N writes in X seconds. Then look at the specs of your processor to see what the mem BW is
<jbjnr> on some modern processors, the mem BW is a few hundred GB/s, on older ones, much less.
<jbjnr> Look at the stream benchmark in hpx
<jbjnr> to give yourself an idea. - and also an estimate of the mem BW of your machine
david_pfander has joined #ste||ar
<taeguk> jbjnr: Thank you for your advices. But I have worries about differences between theory and realworld.
<taeguk> If you consider 'cache', the way you presented might not be accurate. And maybe there is a difference between the number of memory accesses I can guess in C++ codes and real things in compiled assembly codes.
<jbjnr> yes. The difference between theory and practice is huge
<jbjnr> ^^ strwam benchmark
<jbjnr> ^stream benchmark
<jbjnr> First you start with the theoreticl peak performance of the algorithm - assuming that the memory BW is fully used
<jbjnr> then you look at the cache hit/miss etc and see how this affects your algorithm
<jbjnr> measuring cache missed requires special profiling tools like papi
<jbjnr> an algorithm like is_heap will be memory bound - how do we know? - because all it does is iterate of a list and make a simple comparison. it accesses memory continuously and does almost no calculation on the data.
<taeguk> jbjnr: Yes. I think so, too. Can I find the solution of that problem?
<jbjnr> not really. it's a common problem
<jbjnr> most of the algorithms we develop are memory bound, few are compute bound it seems - at least the computer science ones (as opposed to the physics solvers)
<jbjnr> do the sums anyway and see how close to peak memory you are getting ....
<taeguk> Right.
<taeguk> But, in case of is_heap, it is bad aspect of cache.
<jbjnr> yes
<taeguk> because it must access parent elements.
<taeguk> I have no idea to solve that cache problem.
<jbjnr> it's a problem for sure.
<jbjnr> if you think of a way to solve that cache miss problem, by developing a better algorithm, then you are a star! (I haven't researched the is_heap, so I'm really not sure how many ways there are of doing it, but I guess, not many)
<heller_> jbjnr: https://github.com/STEllAR-GROUP/hpx/pull/2619 <--- this should be ready now
<heller_> jbjnr: https://github.com/STEllAR-GROUP/hpx/pull/2656 <--- should I add the runtime option?
<jbjnr> heller_: I'll have a play with the LF PR, but the MAX_TERMINATED_THREADS one I suspect hartmut want's to make a user option instead of just reducing it
<heller_> jbjnr: ok, just asking if I can reduce your load today
<heller_> rehearsal is boring
<jbjnr> rehaearsal?
<heller_> rehearsal for the review tomorrow
<jbjnr> I don't think we ever did rehearsals for the smaller projects. Always got top marks and it was easy :)
<jbjnr> do good work, and all wil be fine
<jbjnr> my rma stuff is now officially awesome
<heller_> great!
<heller_> jbjnr: osu latency in the native MPI ballpark now?
<jbjnr> no of course not. just the design is lovely
<jbjnr> just about to run first full osu test after more tweaking
<heller_> woohoo! :D
<jbjnr> will have results soon. won't be much better than current implementation, but migh show improvement on smaler thread counts, and uses less memory
<heller_> k
<jbjnr> on higher thread couts the registration costs of memory are hidden by the overlapping sends
<jbjnr> so I hope to see better perf on single thread compared to non rma version
<heller_> that'll be great already
<heller_> that's more or less exactly what we need
<jbjnr> fingers crossed
bikineev has joined #ste||ar
pree has quit [Ping timeout: 255 seconds]
bikineev has quit [Remote host closed the connection]
Matombo has joined #ste||ar
Remko has joined #ste||ar
pree has joined #ste||ar
Remko has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
bikineev has joined #ste||ar
<shoshijak> is there a reason why this doesn't throw an exception if the runtime has not yet been initialized? https://github.com/STEllAR-GROUP/hpx/blob/master/src/runtime.cpp#L677
josef__k has joined #ste||ar
<heller_> shoshijak: for some reason, it is seen as if the key hasn't been found
ajaivgeorge has joined #ste||ar
<heller_> no real rationale, I guess
pree has quit [Ping timeout: 260 seconds]
pree has joined #ste||ar
ajaivgeorge has quit [Remote host closed the connection]
ajaivgeorge has joined #ste||ar
<shoshijak> can I add an exception? It would make my life easier
<shoshijak> and seems like the correct behavior ...
<heller_> it sounds like a good idea, yeah
ajaivgeorge has quit [Quit: ajaivgeorge]
<josef__k> Where can I find a robust comparison of the available malloc implementations for HPX? I.e., jemalloc vs tcmalloc vs tbb?
<josef__k> Or even just a crude heuristic to help me choose? :)
hkaiser has joined #ste||ar
<heller_> josef__k: there is no comparison ;)
<github> [hpx] sithhell pushed 1 new commit to master: https://git.io/vHrlr
<github> hpx/master 396e999 Thomas Heller: Fixing resource manager test execution
<heller_> josef__k: I mostly use jemalloc nowadays because it tends to work on more platforms ;)
<heller_> they all should be in the same ballpark
<K-ballo> new nuwen release https://nuwen.net/mingw.html
<jbjnr> I use jemalloc on all my builds, here's a comparison - https://suniphrase.wordpress.com/2015/10/27/jemalloc-vs-tcmalloc-vs-dlmalloc/
<heller_> and in the meantime, there have been a thousand new releases ;)
<jbjnr> sorry
<heller_> not your fault
<heller_> it's always difficult to keep with those things...
<heller_> that's what I wanted to convey...
<hkaiser> heller_: thanks for the resource_manager fix
<heller_> np
<heller_> it was bothering me for weeks now ;)
Matombo has quit [Ping timeout: 260 seconds]
<hkaiser> nod
<jbjnr> stop creating resource managers anyway. There can be only one!
Matombo has joined #ste||ar
bikineev has quit [Ping timeout: 246 seconds]
<github> [hpx] K-ballo force-pushed compat-exception from 60ac848 to 46373fe: https://git.io/vH8FM
<github> hpx/compat-exception 46373fe Agustin K-ballo Berge: Remove compatibility layer for std::exception_ptr, mark support as required
<jbjnr> when gdb dies "Reading symbols from /scratch/snx3000/biddisco/build/hvtkm/bin/osu_latency...Segmentation fault" what can I do next?
<heller_> jbjnr: update gdb
<heller_> jbjnr: or reduce the length of the names
<heller_> jbjnr: the mangled name is larger than 256 characters
<jbjnr> heller_: really? does that cause problems
<heller_> yes
<heller_> on older gdb versions
<heller_> newer ones didn't seem to have that problem
<jbjnr> GNU gdb (GDB) 7.6.2-7.0.2
eschnett has quit [Quit: eschnett]
zbyerly_ has joined #ste||ar
<heller_> hmm, should be fine then
<heller_> gdb gdb ;)
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
aserio has quit [Quit: aserio]
mcopik has quit [Ping timeout: 240 seconds]
aserio has joined #ste||ar
eschnett has joined #ste||ar
pree has quit [Ping timeout: 260 seconds]
hkaiser has joined #ste||ar
pree has joined #ste||ar
<zao> Oh boy, I haven't crashed a debugger with overlong symbols since using Boost on AIX.
Matombo has quit [Ping timeout: 255 seconds]
<zao> Last time I saw symbol snafus was when VC++ truncated Spirit symbols to like 4096.
<K-ballo> slightly over an hour, that's how long it took
<K-ballo> (for someone to mention spirit in relation to overly long symbols :))
<zao> :D
<josef__k> How do I access the measurements made by -DHPX_WITH_THREAD_IDLE_RATES=On ?
aserio has quit [Ping timeout: 246 seconds]
<heller_> josef__k: performance counter
<heller_> josef__k: --hpx:print-counter='...'
<josef__k> Thanks.
aserio has joined #ste||ar
josef__k has quit [Ping timeout: 246 seconds]
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
hkaiser has quit [Quit: bye]
<github> [hpx] hkaiser opened pull request #2681: Attempt to fix problem in managed_component_base (master...fixing_2663) https://git.io/vHrya
hkaiser has joined #ste||ar
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar
denis_blank has joined #ste||ar
pree_ has joined #ste||ar
pree has quit [Read error: Connection reset by peer]
aserio has quit [Ping timeout: 246 seconds]
<shoshijak> affinity_data::init(...) is supposed to return the number of cores needed. If so, why does it return max(num_unique_cores,max_cores) and not the min? https://github.com/STEllAR-GROUP/hpx/blob/master/src/runtime/threads/policies/affinity_data.cpp#L135
jaafar has joined #ste||ar
mcopik has joined #ste||ar
<denis_blank> Is there a function inside the codebase already, that merges two hpx::tuples, or inserts one element at the back of the tuple?
shoshijak has quit [Ping timeout: 255 seconds]
<K-ballo> tuple_cat, same as std::
<K-ballo> not usually recommended though
aserio has joined #ste||ar
<denis_blank> K-ballo: thanks
jaafar has quit [Quit: Konversation terminated!]
jaafar has joined #ste||ar
<github> [hpx] K-ballo opened pull request #2683: Replace boost::exception_ptr with std::exception_ptr (master...compat-exception) https://git.io/vHrAZ
bikineev has joined #ste||ar
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 258 seconds]
david_pfander has quit [Ping timeout: 240 seconds]
bikineev has quit [Ping timeout: 240 seconds]
zahra123 has joined #ste||ar
bikineev has joined #ste||ar
shoshijak has joined #ste||ar
bikineev has quit [Ping timeout: 255 seconds]
pree_ has quit [Read error: Connection reset by peer]
Matombo has joined #ste||ar
<github> [hpx] hkaiser force-pushed rolling_statistics_counters from b9c9966 to 5a0bbdb: https://git.io/vHr99
<github> hpx/rolling_statistics_counters 5a0bbdb Hartmut Kaiser: Adding new statistics performance counters:...
<github> [hpx] hkaiser closed pull request #2673: Inhibit direct conversion from future<future<T>> --> future<void> (master...fixing_2667) https://git.io/vHuec
<github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/vHoTf
<github> hpx/master 5e88a63 Hartmut Kaiser: Merge pull request #2674 from STEllAR-GROUP/fixing_future_serialization...
<github> [hpx] hkaiser opened pull request #2684: Adding new statistics performance counters: (master...rolling_statistics_counters) https://git.io/vHoTl
<diehlpk_work> heller_, hkaiser Skype meeting today?
<hkaiser> diehlpk_work: what time?
aserio has quit [Ping timeout: 246 seconds]
<diehlpk_work> hkaiser, It depends on heller_ , because he did not reply
<hkaiser> yah
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<hkaiser> diehlpk_work: for me it would work only after 2pm which is most likely too late for heller_ :/
<github> [hpx] hkaiser force-pushed rolling_statistics_counters from 5a0bbdb to fca4aaa: https://git.io/vHr99
<github> hpx/rolling_statistics_counters fca4aaa Hartmut Kaiser: Adding new statistics performance counters:...
<diehlpk_work> Ok, so maybe Thursday or Friday
<hkaiser> diehlpk_work: sure, depends on heller_
<zbyerly> hkaiser, you might need to use -DWITH_SCOTCH=false
<zbyerly> building libgeodecomp
<hkaiser> zbyerly: ok
aserio has joined #ste||ar
<heller_> hkaiser: diehlpk_work: I can't do it today. I'm fully booked this week. Sorry
bikineev has joined #ste||ar
<jbjnr> HPX_REGISTER_BASE_LCO_WITH_VALUE_DECLARATION - do we need this?
<jbjnr> what does it do? I've asked before,but forgotten
<hkaiser> jbjnr: only for heterogeneous settings
<jbjnr> aha. thanks
<jbjnr> should put that int the name really
<hkaiser> also, it avoids instantiating a base_lco_with_value<T> for each T only once across the whole application
<hkaiser> more than once* even
<jbjnr> ok
aserio has quit [Ping timeout: 260 seconds]
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
aserio has quit [Ping timeout: 246 seconds]
bikineev has quit [Ping timeout: 240 seconds]
hkaiser has joined #ste||ar
aserio has joined #ste||ar
<K-ballo> I've implemented a variation of peter's exception_info proposal
<K-ballo> I'll try to integrate into HPX, see if it can replace boost.exception
<hkaiser> K-ballo: ohh cool!
akheir has joined #ste||ar
<github> [hpx] hkaiser force-pushed rolling_statistics_counters from fca4aaa to 65c8ec1: https://git.io/vHr99
<github> hpx/rolling_statistics_counters 65c8ec1 Hartmut Kaiser: Adding new statistics performance counters:...
hkaiser has quit [Quit: bye]
david_pf_ has joined #ste||ar
eschnett has quit [Quit: eschnett]
hkaiser has joined #ste||ar
<github> [hpx] Naios opened pull request #2685: Add support of std::array to hpx::util::tuple_size and tuple_element (master...size_std_array) https://git.io/vHoKz
<K-ballo> ha! ^ interesting... I was considering doing that, and dropping boost::array support in the process
<hkaiser> we should move away from boost::integral_constant et.al. as well
<hkaiser> that's another painful thing (lot'sa work)
<K-ballo> yep, yep yep yep
<K-ballo> and util::decay
<hkaiser> right
<K-ballo> but a massive overall PR would be disruptive
<hkaiser> nod
<K-ballo> besides painful :P
<hkaiser> one thing at a time...
<heller_> hkaiser: are there still any problems with #2619?
<hkaiser> have not looked yet
<heller_> ok
bikineev has joined #ste||ar
shoshijak has quit [Ping timeout: 255 seconds]
denis_blank has quit [Quit: denis_blank]
mcopik_ has joined #ste||ar
mcopik_ has quit [Client Quit]
<jbjnr> what you see there is the rma vector (rma) versus the serialize_buffer(ser)
<jbjnr> when the number of threads is smallish (2 in this case) and the number of messages in flight (window size) is amllish, then the cost of rma memory registration can be seen
<jbjnr> the rma version has preregistered buffers on both ends and outperforms the serialize buffer quite nicely on the large message sizes. for <4096 they are both using copy mode, but above that they switch to rendezvous protocol and the rma kicks in.
<jbjnr> for 524288 the differnce is 14GB/s to 10GB/s lovely.
<jbjnr> I will plot some graphs and submit a paper on friday.
<jbjnr> there's an odd bump at 4096 I need to look into, but apaer from that all is well.
<jbjnr> see ya tomorrow
david_pf_ has quit [Quit: david_pf_]
<github> [hpx] K-ballo created throw_with_info (+1 new commit): https://git.io/vHo9A
<github> hpx/throw_with_info 3790289 Agustin K-ballo Berge: (draft) exception_info implementation (P0640)
aserio has quit [Quit: aserio]
<K-ballo> hkaiser: VeXocide suggests tags based on string literals instead of typeid, thoughts?
Matombo has quit [Quit: Leaving]
diehlpk has joined #ste||ar
Matombo has joined #ste||ar
<hkaiser> K-ballo: tags for what?
<hkaiser> jbjnr: lovely!
<K-ballo> exception information tags
<K-ballo> like typedef boost::error_info<detail::tag_throw_thread_id, std::size_t> throw_thread_id;
<hkaiser> K-ballo: ahh
Matombo has quit [Remote host closed the connection]
<hkaiser> sure, I don't care about that as it's completely hidden anyways
bikineev has quit [Ping timeout: 260 seconds]