hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC2018: https://wp.me/p4pxJf-k1
diehlpk has joined #ste||ar
<hkaiser> diehlpk: see pm, pls
<diehlpk> hkaiser, please send again, I do not have access to my log at work
diehlpk has quit [Ping timeout: 244 seconds]
diehlpk has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
diehlpk has quit [Ping timeout: 240 seconds]
nikunj has joined #ste||ar
hkaiser has quit [Quit: bye]
nanashi55 has quit [Ping timeout: 264 seconds]
nanashi55 has joined #ste||ar
anushi has joined #ste||ar
anushi has quit [Remote host closed the connection]
nikunj has quit [Remote host closed the connection]
nikunj has joined #ste||ar
nikunj has quit [Remote host closed the connection]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 240 seconds]
jakub_golinowski has joined #ste||ar
<jakub_golinowski> M-ms, hey I have the html result of the max_idle_loop_count x max_idle_backoff_sweep
<jakub_golinowski> it seems like for the dnn tests they should actually be bigger
<M-ms> jakub_golinowski: that's a good one
<M-ms> I guess with 4 threads it works against us because then the threads have to be woken up again to do work
<M-ms> in that case you can do the short run from yesterday with one of the good sets of parameters from this run and 4 threads
<jakub_golinowski> I copied that to libre office
<jakub_golinowski> and I am going to choose the one with the shortest overall runtime
jakub_golinowski has quit [Quit: Ex-Chat]
jakub_golinowski has joined #ste||ar
jaafar has quit [Ping timeout: 256 seconds]
<heller_> jakub_golinowski: hey
<heller_> jakub_golinowski: one comment regarding your plots: Speedup and everything is nice, what about absolute performance though?
<jakub_golinowski> heller_, hey, what do you mean by absolute performance?
<heller_> jakub_golinowski: something like runtime or another performance metric
<heller_> in this image
<jakub_golinowski> oh runtime is always there, the speedup is computed from it
<heller_> sure
<heller_> greatr
<heller_> thanks
<heller_> so it not only scales worse, but also performs worse
<heller_> do we know why we get a performance regression for larger images? I would assume that we get way more coarse grained tasks. From past experiences, this would lead to a better performance alltogether
<heller_> jakub_golinowski: FWIW, a good metric for this comparison would be "pixels/second"
<jakub_golinowski> heller_, thanks for the interest and advice :D
<jakub_golinowski> So you say that it would help to see us more about the task distribution?
<jakub_golinowski> I mean we can get sth lice pixels/second from the plot I have sent by just dividing the number of pixels by the runtime
<heller_> yes
<heller_> it's just how you present the results
<jakub_golinowski> heller_, also now when I think about it is for me a bit counter intuitive that there is a performance slow-down for larger images
<heller_> pixels/s gives, IMHO, a better idea of the overall performance
<heller_> right, it doesn't make sense
<heller_> if anything, it should be the other way around
<jakub_golinowski> heller_, is there an easy way to make sure how many tasks were spawned by hpx?
<heller_> jakub_golinowski: so, with pixels/s, it should be easier to see where the different backend performance lacks
<heller_> do you have the raw data somewhere?
<jakub_golinowski> the logs should be in the directory raw_logs next to the images/ directory you were looking at
<heller_> ah
<heller_> but no script ;)
<heller_> anyways
<jakub_golinowski> ah yeah, I was thinking that including the exact script that was used to produce the results is a good idea
<jakub_golinowski> but only today when I was thinking about putting the dnn tests to the repo
<heller_> regarding how much tasks are spawned: Not sure I understand the question
<heller_> so, you are using parallel::for_each, right?
<jakub_golinowski> heller_, I use parallel_for
<jakub_golinowski> exactly
<heller_> can you point me to the code?
<jakub_golinowski> right away
<heller_> regarding the comments in the OpenCV PR. I think you and nikunj should work together such that the opencv code can run on a HPX backend without any other changes. Nikunj is working on the "C main replacement" stuff
<jakub_golinowski> heller_, ok this is an interesting idea and might solve the issue
<heller_> right
diehlpk_work has quit [Remote host closed the connection]
<heller_> so what needs to be solved is to have the main wrapper functions available even when hpx_main was not included. In some way or another...
diehlpk_work has joined #ste||ar
<heller_> jakub_golinowski: how did you implement cv::parallel_for_ in terms of HPX?
<heller_> that one I guess?
<jakub_golinowski> heller_, but there should be elasticity left as well, if sb wants to have a more control over the suspending/resuming backend etc?
<heller_> sure
<jakub_golinowski> heller_, yes
<heller_> that could be controlled over a singleton or so
<heller_> anyways
<heller_> this stripRange, what is it?
<heller_> stripeRange
<heller_> also, you don't need to capture everything by reference
<jakub_golinowski> it is a way of allowing the caller of cv::parallel_for_() to suggest/induce a partitioning
<heller_> capturing "this" should be enough
<heller_> ok, what does it return?
<jakub_golinowski> heller_, noted - will try this
<heller_> how many elements do you have between stripeRange.start and stripeRange.end?
<jakub_golinowski> so imagine you have a range 1-1000
<jakub_golinowski> then a user passes nstripes=10 to the cv::parallel_for
<jakub_golinowski> it causes the stripeRange to be created that is actually smaller
<jakub_golinowski> than the original range
<jakub_golinowski> and in the each call to the ParallelLoopBodyWrapper::operator()(
<jakub_golinowski> + cv::Range(i, i + 1));
<jakub_golinowski> it is translated to a respective subrange of the original range
<heller_> did you experiment with that parameter?
<heller_> so it looks like OpenCV is trying to perform its own chunking
<jakub_golinowski> yes, there were two additional versions of backend treating it differently
<jakub_golinowski> and some experiments on that
<heller_> did you look into setting the chunksize for hpx::parallel::for_each to 1?
<jakub_golinowski> heller_, yes there was a version like this
<heller_> so, by default, you get a stripeRange [0, -1)?
<jakub_golinowski> heller_, just checking that
<jakub_golinowski> sorry but I was working on that a month ago and some details got rusty
<heller_> sure
<jakub_golinowski> the default value of nstripes (3rd arg of cv::parallel_for) is -1
<jakub_golinowski> for this particular value the stripeRange is simply equal to the wholeRange
<jakub_golinowski> so there is no tinkering with chunking by opencv
<jakub_golinowski> or more specifically the caller of cv::parallel_for
<heller_> I see
<heller_> that's interesting, I don't see any special handling of the -1 case
<jakub_golinowski> so after the call the nstripes is first rounded like this
<jakub_golinowski> nstripes = cvRound(_nstripes <= 0 ? len : MIN(MAX(_nstripes, 1.), len));
nikunj97 has joined #ste||ar
<jakub_golinowski> so for anything les equall to zero it is len (which is the len of the wholeRange)
<heller_> ahh, gotcha
<heller_> could you do me the favor and plot pixels/s instead of runtime please?
<jakub_golinowski> heller_, ok will do that but I have to leave for the visit to the doctor in 15mins and need to prepare
<heller_> sure
<jakub_golinowski> I will do it when I get home
<heller_> great
<jakub_golinowski> M-ms, here is the .ods with the "baseline" run and 3 reperitions of the run with 1kk milc and milbt: https://drive.google.com/file/d/1YlhiFEcblGl8YRT0YL41xHp0CcZDSx68/view?usp=sharing
jakub_golinowski has quit [Ping timeout: 264 seconds]
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/fN3iA
<github> hpx/gh-pages 978c74c StellarBot: Updating docs
jakub_golinowski has joined #ste||ar
<jakub_golinowski> heller_, yt?
<jakub_golinowski> I modified the plot script on the go: https://ibb.co/hqQuOd
<heller_> jakub_golinowski: thanks
<heller_> very interesting
<heller_> the performance seems to be mostly constant for HPX
<heller_> same for the other backends, except for the small resolutions
<heller_> which makes sense
jakub_golinowski has quit [Ping timeout: 240 seconds]
hkaiser has joined #ste||ar
K-ballo has joined #ste||ar
jakub_golinowski has joined #ste||ar
<jbjnr> K-ballo: does gcc with -std=c++17 work with hpx?
<jbjnr> home/biddisco/apps/boost/1.67.0/include/boost/spirit/home/support/detail/sign.hpp:60:36: error: no type named ‘bits’ in ‘traits_type {aka struct boost::math::detail::fp_traits_non_native<long double, boost::math::detail::extended_double_precision>}’
<hkaiser> jbjnr: looks like a Boost problem to me :/
hkaiser has quit [Quit: bye]
<K-ballo> I don't think I've tried gcc with -std=c++17 yet
aserio has joined #ste||ar
hkaiser has joined #ste||ar
hkaiser has quit [Quit: bye]
aserio has quit [Ping timeout: 260 seconds]
aserio has joined #ste||ar
nikunj97 has quit [Ping timeout: 256 seconds]
aserio has quit [Read error: Connection reset by peer]
aserio has joined #ste||ar
galabc has joined #ste||ar
jakub_golinowski has quit [Ping timeout: 268 seconds]
aserio has quit [Ping timeout: 240 seconds]
david_pfander1 has joined #ste||ar
jakub_golinowski has joined #ste||ar
aserio has joined #ste||ar
david_pfander1 has quit [Ping timeout: 240 seconds]
david_pfander has quit [Ping timeout: 265 seconds]
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
hkaiser has joined #ste||ar
nanashi55 has quit [Ping timeout: 240 seconds]
nanashi55 has joined #ste||ar
<jakub_golinowski> M-ms, yt?
jaafar has joined #ste||ar
aserio has quit [Ping timeout: 276 seconds]
galabc has quit [Quit: Leaving]
<M-ms> hey jakub_golinowski
<M-ms> I saw the results you posted earlier
<jakub_golinowski> M-ms, can you take a look at the Comm with opencv gdoc?
<M-ms> yeah, will do
<jakub_golinowski> I propose a certain summary.html there
<jakub_golinowski> and actually the error messages when hpx_main is not included are not uniformmed
<jakub_golinowski> unform
<jakub_golinowski> it depends on the situation
<M-ms> what options are there?
<jakub_golinowski> so now I keep getting this
<M-ms> like heller mentioned before this other gsoc student's work could make it a bit easier
<jakub_golinowski> I pasted the two options in the gdoc
<heller_> hkaiser: ready whenever you are
<heller_> hkaiser: appear.in or skype or another service?
<heller_> brb
<heller_> ok
<hkaiser> heller_: sec
<hkaiser> sorry...
<hkaiser> heller_: appear.in
nikunj has joined #ste||ar
<nikunj> hkaiser, yt?
hkaiser has quit [Quit: bye]
<github> [hpx] NK-Nikunj opened pull request #3385: Adds Debug option for hpx initializing from main (master...Debug_hpx_main) https://git.io/fNs6H
aserio has joined #ste||ar
diehlpk has joined #ste||ar
eschnett has joined #ste||ar
diehlpk has quit [Ping timeout: 244 seconds]
<jakub_golinowski> M-ms, thank you very much for the help with composing the message - I posted it
diehlpk has joined #ste||ar
hkaiser has joined #ste||ar
diehlpk has quit [Ping timeout: 268 seconds]
<nikunj> hkaiser, yt?
aserio has quit [Quit: aserio]
<nikunj> hkaiser, I was able to add the runtime error using weak linkage. I didn't knew that libhpx.so symbol were by-default hidden. That was the primary reason for unexpected behavior at runtime which made me believe that it is not possible. Adding a default visibility along with weak symbolic linking is able to make it work.
<nikunj> I have added a pr for the same (https://github.com/STEllAR-GROUP/hpx/pull/3385)
nikunj has quit [Quit: goodnight]
aserio has joined #ste||ar
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
diehlpk has joined #ste||ar
diehlpk has quit [Ping timeout: 240 seconds]
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 276 seconds]
aserio1 is now known as aserio
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
aserio1 is now known as aserio
aserio has quit [Quit: aserio]
eschnett has quit [Quit: eschnett]
jbjnr_ has joined #ste||ar
<jbjnr_> can anyone remember how to get the (libfabric) parcelport from the parcelhandler
<jbjnr_> hpx::parcelset::parcelhandler &ph = hpx::get_runtime().get_parcel_handler();
<jbjnr_> gives me the parcelhandler, but I can't remeber how to get the parcelport itself
<jbjnr_> auto pp = ph.get_default_parcelport();
<jbjnr_> balls. that's not in master
jbjnr_ has quit [Ping timeout: 265 seconds]