#ste||ar on 2018-07-17 — irc logs at irclog.cct.lsu.edu

2018-04-23 16:40 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC2018: https://wp.me/p4pxJf-k1

01:11 diehlpk has joined #ste||ar

01:11 <hkaiser> diehlpk: see pm, pls

01:12 <diehlpk> hkaiser, please send again, I do not have access to my log at work

01:30 diehlpk has quit [Ping timeout: 244 seconds]

01:34 diehlpk has joined #ste||ar

02:09 K-ballo has quit [Quit: K-ballo]

02:18 diehlpk has quit [Ping timeout: 240 seconds]

03:07 nikunj has joined #ste||ar

03:18 hkaiser has quit [Quit: bye]

03:18 nanashi55 has quit [Ping timeout: 264 seconds]

03:18 nanashi55 has joined #ste||ar

04:27 anushi has joined #ste||ar

05:20 anushi has quit [Remote host closed the connection]

05:44 nikunj has quit [Remote host closed the connection]

05:44 nikunj has joined #ste||ar

05:46 nikunj has quit [Remote host closed the connection]

05:46 nikunj has joined #ste||ar

05:58 nikunj has quit [Ping timeout: 240 seconds]

06:56 jakub_golinowski has joined #ste||ar

07:02 <jakub_golinowski> M-ms, hey I have the html result of the max_idle_loop_count x max_idle_backoff_sweep

07:03 <jakub_golinowski> it seems like for the dnn tests they should actually be bigger

07:04 <jakub_golinowski> Here is the summary file: https://drive.google.com/file/d/1sGjOjBXb7aX6lCZ5KHL8n84o2iSG98EA/view?usp=sharing

07:10 <M-ms> jakub_golinowski: that's a good one

07:11 <M-ms> I guess with 4 threads it works against us because then the threads have to be woken up again to do work

07:12 <M-ms> in that case you can do the short run from yesterday with one of the good sets of parameters from this run and 4 threads

07:13 <jakub_golinowski> I copied that to libre office

07:13 <jakub_golinowski> and I am going to choose the one with the shortest overall runtime

07:19 jakub_golinowski has quit [Quit: Ex-Chat]

07:30 jakub_golinowski has joined #ste||ar

07:34 jaafar has quit [Ping timeout: 256 seconds]

07:41 <heller_> jakub_golinowski: hey

07:42 <heller_> jakub_golinowski: one comment regarding your plots: Speedup and everything is nice, what about absolute performance though?

07:44 <jakub_golinowski> heller_, hey, what do you mean by absolute performance?

07:45 <heller_> jakub_golinowski: something like runtime or another performance metric

07:45 <heller_> https://github.com/Jakub-Golinowski/opencv_hpx_backend/blob/master/experiments/2018-06-17-22.33-all_over_workload_num_threasds_0-100_nstripes/images/Speed-up_as_function_of_image_size.png

07:45 <heller_> in this image

07:45 <jakub_golinowski> oh runtime is always there, the speedup is computed from it

07:45 <heller_> sure

07:46 <jakub_golinowski> https://github.com/Jakub-Golinowski/opencv_hpx_backend/blob/master/experiments/2018-06-17-22.33-all_over_workload_num_threasds_0-100_nstripes/images/Parallel_processing_time_as_function_of_image_size.png

07:46 <heller_> greatr

07:46 <heller_> thanks

07:46 <heller_> so it not only scales worse, but also performs worse

07:47 <heller_> do we know why we get a performance regression for larger images? I would assume that we get way more coarse grained tasks. From past experiences, this would lead to a better performance alltogether

07:48 <heller_> jakub_golinowski: FWIW, a good metric for this comparison would be "pixels/second"

07:49 <jakub_golinowski> heller_, thanks for the interest and advice :D

07:49 <jakub_golinowski> So you say that it would help to see us more about the task distribution?

07:51 <jakub_golinowski> I mean we can get sth lice pixels/second from the plot I have sent by just dividing the number of pixels by the runtime

07:56 <heller_> yes

07:57 <heller_> it's just how you present the results

07:57 <jakub_golinowski> heller_, also now when I think about it is for me a bit counter intuitive that there is a performance slow-down for larger images

07:57 <heller_> pixels/s gives, IMHO, a better idea of the overall performance

07:57 <heller_> right, it doesn't make sense

07:58 <heller_> if anything, it should be the other way around

07:58 <jakub_golinowski> heller_, is there an easy way to make sure how many tasks were spawned by hpx?

07:59 <heller_> jakub_golinowski: so, with pixels/s, it should be easier to see where the different backend performance lacks

07:59 <heller_> do you have the raw data somewhere?

08:00 <jakub_golinowski> the logs should be in the directory raw_logs next to the images/ directory you were looking at

08:00 <heller_> ah

08:00 <heller_> but no script ;)

08:00 <heller_> anyways

08:00 <jakub_golinowski> ah yeah, I was thinking that including the exact script that was used to produce the results is a good idea

08:01 <jakub_golinowski> but only today when I was thinking about putting the dnn tests to the repo

08:01 <heller_> regarding how much tasks are spawned: Not sure I understand the question

08:01 <heller_> so, you are using parallel::for_each, right?

08:01 <jakub_golinowski> heller_, I use parallel_for

08:01 <jakub_golinowski> exactly

08:02 <heller_> can you point me to the code?

08:02 <jakub_golinowski> right away

08:03 <jakub_golinowski> heller_, this is the benchmark app: https://github.com/Jakub-Golinowski/opencv_hpx_backend/blob/master/examples/opencv_mandelbrot/opencv_mandelbrot.cpp

08:04 <heller_> regarding the comments in the OpenCV PR. I think you and nikunj should work together such that the opencv code can run on a HPX backend without any other changes. Nikunj is working on the "C main replacement" stuff

08:05 <jakub_golinowski> heller_, ok this is an interesting idea and might solve the issue

08:05 <heller_> right

08:06 diehlpk_work has quit [Remote host closed the connection]

08:06 <heller_> so what needs to be solved is to have the main wrapper functions available even when hpx_main was not included. In some way or another...

08:06 diehlpk_work has joined #ste||ar

08:07 <heller_> jakub_golinowski: how did you implement cv::parallel_for_ in terms of HPX?

08:08 <heller_> https://github.com/opencv/opencv/pull/11897/files#diff-f3b4393df86251f502cee9ba1da034a5R428

08:08 <heller_> that one I guess?

08:08 <jakub_golinowski> heller_, but there should be elasticity left as well, if sb wants to have a more control over the suspending/resuming backend etc?

08:08 <heller_> sure

08:08 <jakub_golinowski> heller_, yes

08:09 <heller_> that could be controlled over a singleton or so

08:09 <heller_> anyways

08:09 <heller_> this stripRange, what is it?

08:09 <heller_> stripeRange

08:10 <heller_> also, you don't need to capture everything by reference

08:10 <jakub_golinowski> it is a way of allowing the caller of cv::parallel_for_() to suggest/induce a partitioning

08:10 <heller_> capturing "this" should be enough

08:10 <heller_> ok, what does it return?

08:10 <jakub_golinowski> heller_, noted - will try this

08:10 <heller_> how many elements do you have between stripeRange.start and stripeRange.end?

08:10 <jakub_golinowski> so imagine you have a range 1-1000

08:11 <jakub_golinowski> then a user passes nstripes=10 to the cv::parallel_for

08:12 <jakub_golinowski> it causes the stripeRange to be created that is actually smaller

08:12 <jakub_golinowski> than the original range

08:13 <jakub_golinowski> and in the each call to the ParallelLoopBodyWrapper::operator()(

08:13 <jakub_golinowski> + cv::Range(i, i + 1));

08:13 <jakub_golinowski> it is translated to a respective subrange of the original range

08:14 <heller_> did you experiment with that parameter?

08:14 <heller_> so it looks like OpenCV is trying to perform its own chunking

08:14 <jakub_golinowski> yes, there were two additional versions of backend treating it differently

08:14 <jakub_golinowski> and some experiments on that

08:15 <heller_> did you look into setting the chunksize for hpx::parallel::for_each to 1?

08:15 <jakub_golinowski> heller_, yes there was a version like this

08:17 <heller_> so, by default, you get a stripeRange [0, -1)?

08:17 <jakub_golinowski> heller_, just checking that

08:18 <jakub_golinowski> sorry but I was working on that a month ago and some details got rusty

08:18 <heller_> sure

08:20 <jakub_golinowski> the default value of nstripes (3rd arg of cv::parallel_for) is -1

08:21 <jakub_golinowski> for this particular value the stripeRange is simply equal to the wholeRange

08:21 <jakub_golinowski> so there is no tinkering with chunking by opencv

08:21 <jakub_golinowski> or more specifically the caller of cv::parallel_for

08:22 <heller_> I see

08:23 <heller_> that's interesting, I don't see any special handling of the -1 case

08:25 <jakub_golinowski> so after the call the nstripes is first rounded like this

08:25 <jakub_golinowski> nstripes = cvRound(_nstripes <= 0 ? len : MIN(MAX(_nstripes, 1.), len));

08:25 nikunj97 has joined #ste||ar

08:26 <jakub_golinowski> so for anything les equall to zero it is len (which is the len of the wholeRange)

08:26 <heller_> ahh, gotcha

08:27 <heller_> could you do me the favor and plot pixels/s instead of runtime please?

08:28 <jakub_golinowski> heller_, ok will do that but I have to leave for the visit to the doctor in 15mins and need to prepare

08:28 <heller_> sure

08:28 <jakub_golinowski> I will do it when I get home

08:31 <heller_> great

08:32 <jakub_golinowski> M-ms, here is the .ods with the "baseline" run and 3 reperitions of the run with 1kk milc and milbt: https://drive.google.com/file/d/1YlhiFEcblGl8YRT0YL41xHp0CcZDSx68/view?usp=sharing

08:36 jakub_golinowski has quit [Ping timeout: 264 seconds]

09:05 <github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/fN3iA

09:05 <github> hpx/gh-pages 978c74c StellarBot: Updating docs

09:35 jakub_golinowski has joined #ste||ar

09:40 <jakub_golinowski> heller_, yt?

09:40 <jakub_golinowski> I modified the plot script on the go: https://ibb.co/hqQuOd

09:45 <heller_> jakub_golinowski: thanks

09:45 <heller_> very interesting

09:46 <heller_> the performance seems to be mostly constant for HPX

09:46 <heller_> same for the other backends, except for the small resolutions

09:46 <heller_> which makes sense

10:01 jakub_golinowski has quit [Ping timeout: 240 seconds]

11:34 hkaiser has joined #ste||ar

12:08 K-ballo has joined #ste||ar

12:49 jakub_golinowski has joined #ste||ar

13:00 <jbjnr> K-ballo: does gcc with -std=c++17 work with hpx?

13:00 <jbjnr> home/biddisco/apps/boost/1.67.0/include/boost/spirit/home/support/detail/sign.hpp:60:36: error: no type named ‘bits’ in ‘traits_type {aka struct boost::math::detail::fp_traits_non_native<long double, boost::math::detail::extended_double_precision>}’

13:14 <hkaiser> jbjnr: looks like a Boost problem to me :/

13:20 hkaiser has quit [Quit: bye]

13:22 <K-ballo> I don't think I've tried gcc with -std=c++17 yet

13:23 aserio has joined #ste||ar

14:15 hkaiser has joined #ste||ar

14:29 hkaiser has quit [Quit: bye]

14:33 aserio has quit [Ping timeout: 260 seconds]

14:35 aserio has joined #ste||ar

14:54 nikunj97 has quit [Ping timeout: 256 seconds]

15:06 aserio has quit [Read error: Connection reset by peer]

15:08 aserio has joined #ste||ar

15:10 galabc has joined #ste||ar

15:18 jakub_golinowski has quit [Ping timeout: 268 seconds]

15:18 aserio has quit [Ping timeout: 240 seconds]

15:29 david_pfander1 has joined #ste||ar

15:36 jakub_golinowski has joined #ste||ar

15:41 aserio has joined #ste||ar

15:43 david_pfander1 has quit [Ping timeout: 240 seconds]

15:50 david_pfander has quit [Ping timeout: 265 seconds]

16:07 aserio has quit [Quit: aserio]

16:09 aserio has joined #ste||ar

16:16 hkaiser has joined #ste||ar

16:52 nanashi55 has quit [Ping timeout: 240 seconds]

16:53 nanashi55 has joined #ste||ar

17:04 <jakub_golinowski> M-ms, yt?

17:06 jaafar has joined #ste||ar

17:20 aserio has quit [Ping timeout: 276 seconds]

17:22 galabc has quit [Quit: Leaving]

17:23 <M-ms> hey jakub_golinowski

17:23 <M-ms> I saw the results you posted earlier

17:24 <jakub_golinowski> M-ms, can you take a look at the Comm with opencv gdoc?

17:24 <M-ms> yeah, will do

17:24 <jakub_golinowski> I propose a certain summary.html there

17:24 <jakub_golinowski> and actually the error messages when hpx_main is not included are not uniformmed

17:24 <jakub_golinowski> unform

17:24 <jakub_golinowski> it depends on the situation

17:25 <M-ms> what options are there?

17:26 <jakub_golinowski> so now I keep getting this

17:26 <M-ms> like heller mentioned before this other gsoc student's work could make it a bit easier

17:27 <jakub_golinowski> I pasted the two options in the gdoc

17:53 <heller_> hkaiser: ready whenever you are

17:55 <heller_> hkaiser: appear.in or skype or another service?

17:57 <heller_> brb

18:05 <heller_> ok

18:07 <hkaiser> heller_: sec

18:07 <hkaiser> sorry...

18:09 <hkaiser> heller_: appear.in

18:11 nikunj has joined #ste||ar

18:44 <nikunj> hkaiser, yt?

18:59 hkaiser has quit [Quit: bye]

19:00 <github> [hpx] NK-Nikunj opened pull request #3385: Adds Debug option for hpx initializing from main (master...Debug_hpx_main) https://git.io/fNs6H

19:06 aserio has joined #ste||ar

19:16 diehlpk has joined #ste||ar

19:28 eschnett has joined #ste||ar

19:28 diehlpk has quit [Ping timeout: 244 seconds]

19:30 <jakub_golinowski> M-ms, thank you very much for the help with composing the message - I posted it

19:32 diehlpk has joined #ste||ar

19:38 hkaiser has joined #ste||ar

19:47 diehlpk has quit [Ping timeout: 268 seconds]

19:52 <nikunj> hkaiser, yt?

19:55 aserio has quit [Quit: aserio]

19:55 <nikunj> hkaiser, I was able to add the runtime error using weak linkage. I didn't knew that libhpx.so symbol were by-default hidden. That was the primary reason for unexpected behavior at runtime which made me believe that it is not possible. Adding a default visibility along with weak symbolic linking is able to make it work.

19:56 <nikunj> I have added a pr for the same (https://github.com/STEllAR-GROUP/hpx/pull/3385)

19:57 nikunj has quit [Quit: goodnight]

20:03 aserio has joined #ste||ar

20:06 eschnett has quit [Quit: eschnett]

20:14 eschnett has joined #ste||ar

20:17 diehlpk has joined #ste||ar

20:51 diehlpk has quit [Ping timeout: 240 seconds]

20:58 aserio1 has joined #ste||ar

21:00 aserio has quit [Ping timeout: 276 seconds]

21:00 aserio1 is now known as aserio

21:01 aserio1 has joined #ste||ar

21:04 aserio has quit [Ping timeout: 240 seconds]

21:04 aserio1 is now known as aserio

21:30 aserio has quit [Quit: aserio]

21:49 eschnett has quit [Quit: eschnett]

21:50 jbjnr_ has joined #ste||ar

21:50 <jbjnr_> can anyone remember how to get the (libfabric) parcelport from the parcelhandler

21:51 <jbjnr_> hpx::parcelset::parcelhandler &ph = hpx::get_runtime().get_parcel_handler();

21:51 <jbjnr_> gives me the parcelhandler, but I can't remeber how to get the parcelport itself

22:01 <jbjnr_> auto pp = ph.get_default_parcelport();

22:04 <jbjnr_> balls. that's not in master

22:35 jbjnr_ has quit [Ping timeout: 265 seconds]