#ste||ar on 2018-07-13 — irc logs at irclog.cct.lsu.edu

2018-04-23 16:40 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC2018: https://wp.me/p4pxJf-k1

00:32 diehlpk has joined #ste||ar

01:01 diehlpk has quit [Ping timeout: 260 seconds]

01:07 diehlpk has joined #ste||ar

01:40 K-ballo has quit [Quit: K-ballo]

02:15 diehlpk has quit [Ping timeout: 264 seconds]

02:42 hkaiser has quit [Quit: bye]

03:23 nanashi55 has quit [Ping timeout: 260 seconds]

03:26 nanashi55 has joined #ste||ar

03:38 nikunj has joined #ste||ar

03:55 wash has quit [Ping timeout: 256 seconds]

04:06 wash has joined #ste||ar

05:08 wash has quit [Ping timeout: 240 seconds]

05:09 wash has joined #ste||ar

05:27 wash has quit [Ping timeout: 240 seconds]

05:33 wash has joined #ste||ar

05:34 jbjnr has quit [Read error: Connection reset by peer]

05:38 wash has quit [Ping timeout: 264 seconds]

05:39 wash has joined #ste||ar

05:43 wash has quit [Ping timeout: 240 seconds]

05:44 wash has joined #ste||ar

07:54 jakub_golinowski has joined #ste||ar

09:36 hkaiser has joined #ste||ar

09:49 <hkaiser> heller___: yt?

10:02 <heller___> hkaiser: hey

10:02 <hkaiser> hey

10:02 <hkaiser> heller___: would you have time to look into #3378?

10:03 <heller___> hkaiser: probably a missing include

10:03 <hkaiser> yah

10:03 <hkaiser> shouldn't be much

10:06 <jakub_golinowski> M-ms, yt?

10:12 <nikunj> hkaiser, yt?

10:16 <heller___> hkaiser: not at all

10:17 <heller___> took the oportunity to update the code in total

10:17 <hkaiser> heller___: cool

10:18 <hkaiser> heller___: thanks a lot!

10:19 <heller___> no problem...

10:36 <heller___> uh oh

10:36 <heller___> https://gist.github.com/sithhell/892491076911f65a6893f96c8ed97963

10:36 <heller___> nikunj: ^^

10:36 <nikunj> heller___, could you please share the code

10:38 <nikunj> heller___, also which environment are you on?

10:46 <M-ms> jakub_golinowski: here

10:47 <jakub_golinowski> M-ms, could we discuss the paste I posted the day before yesterday?

10:48 <M-ms> yes, definitely

10:48 <M-ms> first, what did you mean by the opencv tests being randomized? are the inputs random?

10:50 <heller___> nikunj: it's clang with ubsan

10:54 <jakub_golinowski> M-ms, so as for the randomization I mean that even though I run exactly the same test, namely "TestStereoCorresp_SGBM.SGBM/2"

10:54 <jakub_golinowski> hpx got 100 samples and pthreads 10

10:55 <M-ms> but if you run the same test with pthreads, do the timings differ significantly?

10:55 <M-ms> it might be that it doesn't rerun the test so many times if the variance is low

10:57 <nikunj> heller___, looks odd

10:57 <M-ms> so from the log you posted we can't tell so much except that with the hpx backend at least 10% of the time is spent doing non-work

10:58 <jakub_golinowski> M-ms, good idea with the dependency on the stddev

10:58 <nikunj> heller___, could you try the same with this pr

10:58 <heller___> which PR?

10:58 <M-ms> so could I ask you to run the hpx backend either with only 4 threads or 8 threads and set the idling parameters so that it idles very quickly?

10:58 <nikunj> https://github.com/stellAR-GROUP/hpx/pull/3375

10:58 <nikunj> heller___, ^^

10:59 <M-ms> at least from some quick testing that I did yesterday the variance with the hpx backend went down considerably and the timings were very comparable to the pthreads backend

10:59 <heller___> nikunj: i will

10:59 <jakub_golinowski> M-ms, ok will do that

10:59 <nikunj> heller___, thanks.

11:00 <M-ms> --hpx:ini=hpx.max_idle_loop_count=1 --hpx:ini=hpx.max_idle_backoff_time=10000 or something like that

11:00 <M-ms> ok, thanks

11:01 <jakub_golinowski> M-ms, why do you say that from the log we know that the 10% hpx was doing non-work?

11:01 <M-ms> jakub_golinowski: btw, check how many threads the pthreads backend uses by default, it might be 4 while hpx would use 8

11:04 <M-ms> jakub_golinowski: 9.02% opencv_perf_cal libhpx.so.1.2.0 [.] hpx::threads::detail::scheduling_loop<hpx::threads::policies::local_priority_queue_scheduler<std::mutex, hpx::threads::policies::lockfree_fif

11:05 <M-ms> it's actually more, but at least 10

11:06 <M-ms> also with the pthreads backend at least 40 + 30 + 12 = 82 % seems to be work, and the same sum is much smaller for hpx

11:06 <jakub_golinowski> M-ms, because I was working the other way - since the 3 cv::calc... functions add up to ~56% then the 44% is non-work

11:07 <M-ms> either way is good, that's why I said at least :)

11:08 <jakub_golinowski> M-ms, so this is not very good - do you know why is that?

11:09 <jakub_golinowski> M-ms, scheduling_loop of 10% is sth expected?

11:09 <jakub_golinowski> or the program is too short?

11:09 <M-ms> it's not necessarily bad, because during the serial portions the scheduler doesn't have any work, so this is not 100% representative

11:10 <M-ms> that's why it's better to try changing the idling parameters

11:10 <M-ms> at least before we draw any final conclusions

11:11 <jakub_golinowski> ok let me remind myself about sepcifics of these params in the hpx docs and run some experimetns

11:11 <M-ms> and like hkaiser said something like apex would give us traces per thread which can be much easier to interpret

11:15 <heller___> nikunj: perfect.

11:15 <nikunj> heller___, does that fix things?

11:16 <heller___> yes

11:16 <nikunj> heller___, that's good to hear

11:22 <jakub_golinowski> M-ms, wow that looks way better

11:36 <jakub_golinowski> M-ms, pthreads makes use of 8 threads

11:37 <M-ms> jakub_golinowski: ok, nice

11:37 <jakub_golinowski> and my observation is that HPX with 8 threads performs slower than with 4 threads - therefore it seems that for this particular tests using hyperthreading introduces a penalty

11:37 <jakub_golinowski> is it known why is that generally?

11:37 <M-ms> so if things look reasonaby I think you should do what the opencv guy said and post some numbers from the dnn tests

11:37 <M-ms> ok, that's quite possible

11:37 <M-ms> is it for both hpx and pthreads?

11:40 <jakub_golinowski> M-ms, I need to figure out how to set number of threads for pthreads from the CL

11:41 <M-ms> have you tried --perf_threads=N?

11:42 <jakub_golinowski> M-ms, nope

11:42 <jakub_golinowski> M-ms, how did you know?

11:43 <M-ms> --help :)

11:43 <jakub_golinowski> ah ...

11:44 <jakub_golinowski> I started reading the source code

11:44 <M-ms> (I tried it yesterday, and I'm only assuming that it actually sets the number of threads for the parallel regions and not something else)

11:48 <jakub_golinowski> M-ms, ok thank you very much for all the hints

11:48 <M-ms> jakub_golinowski: no problem

11:49 <M-ms> things are looking better than I though a week ago

11:49 <M-ms> thought

11:49 <jakub_golinowski> M-ms, so for this test the pthreads backend seems to respect the --perf_threads but hpx backend not

11:49 <jakub_golinowski> I need to use --hpx:threads=4

11:49 <M-ms> yeah, that's expected when you're using hpx_main

11:49 <jakub_golinowski> Ah I mean it will be probably for any test

11:50 <jakub_golinowski> but what I also want to say is that for pthreads the performance increases slightly when hyperthreading is introduce

11:50 <M-ms> but I think you can pass --perf_threads=4 --hpx:threads=4 and one will just be ignored depending on which backend you're using

11:51 <M-ms> ok, that's interesting, in that case you can run the tests with both 4 and 8 threads

11:52 <jakub_golinowski> hmm, strange -> pthreads has also bigger variation for 4 threads

11:52 <jakub_golinowski> and the utilization over time looks like it is not optimal

11:53 <jakub_golinowski> M-ms, So before I satrt running the tests I think we should agree on some parameter scope

11:53 <jakub_golinowski> M-ms, 4threads vs 8 threads is for me obligatory

11:54 <jakub_golinowski> M-ms, and the Q is: what backoff params should I use -> just the ones that seem good now?

11:57 <M-ms> jakub_golinowski: yep, 4 and 8 threads

11:57 <M-ms> use max_idle_loop_count-1

11:57 <M-ms> =1

11:58 <M-ms> max_idle_backoff_time doesn't matter too much

11:58 <M-ms> and if you want you can run with the default hpx settings as well

11:58 <jakub_golinowski> Because I seem to not be able to change --hpx:ini=hpx.max_idle_backoff_time=10000 from the CL - which I believe is due to cmake setting

11:58 <jakub_golinowski> and I would have to rebuild hpx and then opencv

11:58 <M-ms> it complains that it' doesn't exist?

11:59 <M-ms> I might have gotten it wrong

11:59 <jakub_golinowski> no it just seems to be not affected in the config dump

11:59 <jakub_golinowski> but the --hpx:ini=hpx.max_idle_backoff_time=10000 is not listed as unknown option as well

12:00 <M-ms> hmm, ok, could be a bug in the dump as well (hope not), but I remember checking that it makes a difference... so leave that one out

12:00 <jakub_golinowski> M-ms, hmm but HPX_WITH_THREAD_MANAGER_IDLE_BACKOFF is set to ON, when I view it in ccmake of the hpx build

12:01 <jakub_golinowski> M-ms, it makes a difference or not?

12:01 <M-ms> yeah, it would complain about max_idle_backoff_time otherwise

12:01 <M-ms> it shouldn't make a big difference, the default is 100 ms which is more than enough to have practically 0% cpu usage when idling

12:03 <jakub_golinowski> ...

12:03 <jakub_golinowski> sorry but it seems to work now

12:04 <M-ms> ok, good

12:04 <jakub_golinowski> so to sum up: (1) 4threads and hpx.max_idle_loop_count=1 (2) 8threads and hpx.max_idle_loop_count=(default)

12:08 <jakub_golinowski> (3) 4threads hpx.max_idle_loop_count=(default) and (4) 8threads and hpx.max_idle_loop_count=1

12:10 <M-ms> jakub_golinowski: yep, looks ok

12:18 K-ballo has joined #ste||ar

12:50 <jakub_golinowski> M-ms, here are the results: https://docs.google.com/spreadsheets/d/1Iu0jFMiLvj9NrOL4pP8gjR6vNUAT5UiJbmCKYhHe4cY/edit?usp=sharing

12:50 <jakub_golinowski> this is just for the dnn test

12:52 <M-ms> jakub_golinowski: cool, so both get worse with 8 threads and hpx is basically as good as pthreads with 4 threads

12:53 <M-ms> that's the total time of the full test I suppose? are there any concerning variations within the tests?

12:53 <jakub_golinowski> M-ms, yes, this is how it looks like from the table

12:58 <jakub_golinowski> M-ms, from the eyballing in meld it seems that hpx has more often more samples but most of the time they have equal number of samples

12:59 <jakub_golinowski> M-ms, also it seems like the hpx is winning on the longer tests

13:18 <M-ms> jakub_golinowski: ok, that's not bad though

13:18 <nikunj> hkaiser, yt?

13:18 <M-ms> maybe you could rerun your mandelbrot benchmarks overnight now with the new idling settings?

13:18 <hkaiser> here

13:19 <nikunj> hkaiser, is there a way I could use my variables defined in hpx_wrap or hpx_main in hpx_init.cpp (where we're trying to debug for hpx::init running more than once)?

13:19 <nikunj> I was getting undefined symbol when I tried to extern the symbol and use it

13:20 <hkaiser> nikunj: hpx_init is linked whenever your library is linked as well, so it should work - could be a question of library sequencing on the linker command line, though

13:22 <nikunj> hkaiser, yes it seems like a library sequencing issue to me as well

13:24 <nikunj> hkaiser, I got the undefined symbol while linking libhpx.so after building it

13:24 <hkaiser> libhpx.so does not link against hpx_init or hpx_wrap for that matter

13:25 <nikunj> sorry it was libhpx_init

13:25 <hkaiser> ahh yes, hpx_init.cpp is part of libhpx, not libhpx_init

13:25 <hkaiser> that should work, then

13:28 <nikunj> hkaiser, here: https://gist.github.com/NK-Nikunj/2d99cb62574bc5d2f663de7a7ee31af8

13:29 <hkaiser> it is libhpx.so after all

13:29 <hkaiser> gtg, sorry

13:29 hkaiser has quit [Quit: bye]

14:05 hkaiser has joined #ste||ar

14:08 diehlpk_work has quit [Read error: Connection reset by peer]

14:08 diehlpk has joined #ste||ar

14:25 <nikunj> hkaiser, I think the reason it can't find the symbol while linking libhpx is because libhpx_wrap has not been build/linked

14:29 <hkaiser> no

14:29 <hkaiser> libhpx does not link against libhpx_wrap

14:29 <hkaiser> it should not

14:30 <hkaiser> at least I believe it shouldn't

14:30 <nikunj> yes libhpx_wrap should not be linked with any other shared/static library

14:30 <nikunj> but it won't generate the shared object file when I'm trying to use a symbol from libhpx_wrap

14:35 <nikunj> hkaiser, do you have an idea to get a check for the value of include_libhpx_wrap inside of hpx_init.cpp?

14:39 <hkaiser> make nclude_libhpx_wrap a weak symbol in libhpx.so?

14:39 <nikunj> hkaiser, I tried it, but it somehow does not override it

14:40 <nikunj> so even when hpx_main is included, it won't change it's value

14:40 <nikunj> that was my idea yesterday when I told you that I might have got something wrong. But I somehow could not make it to work

14:43 <hkaiser> k

14:47 <nikunj> hkaiser, btw I think the pr(#3375) solves most of the problems which were otherwise breaking master.

15:37 _bibek_ has joined #ste||ar

15:38 diehlpk_work has joined #ste||ar

15:41 bibek has quit [Ping timeout: 260 seconds]

15:44 quaz0r has quit [Ping timeout: 244 seconds]

15:44 quaz0rus has joined #ste||ar

15:45 quaz0rus is now known as quaz0r

16:06 <nikunj> hkaiser, did I fail 2nd evaluation coz I see evaluation as complete and not passed?

16:13 anushi has quit [Remote host closed the connection]

16:14 anushi has joined #ste||ar

16:16 galabc has joined #ste||ar

16:23 anushi has quit [Ping timeout: 265 seconds]

16:26 jakub_golinowski has quit [Quit: Ex-Chat]

16:26 jakub_golinowski has joined #ste||ar

16:28 <nikunj> it says passed now :)

16:31 K-ballo has quit [Quit: K-ballo]

16:31 jakub_golinowski has quit [Ping timeout: 256 seconds]

16:31 jakub_golinowski has joined #ste||ar

16:43 jakub_golinowski has quit [Ping timeout: 256 seconds]

16:45 jakub_golinowski has joined #ste||ar

16:59 K-ballo has joined #ste||ar

17:09 anushi has joined #ste||ar

17:16 eschnett has joined #ste||ar

17:22 <jakub_golinowski> M-ms, just to keep you up-to-date

17:22 <jakub_golinowski> I realized that to have the dnn perf tests requested in the pr I need to download some data

17:22 anushi has quit [Ping timeout: 265 seconds]

17:23 <jakub_golinowski> I started it a long time ago, a few hours already but because of the internet breaking/slowing down a lot it is still not finished.

17:25 <jakub_golinowski> In the mean time I tried running the python scripts from opencv to see if they will make things easier but I discovered some errors in the script and am currently in the discussion at opencv irc

17:36 mcopik has joined #ste||ar

17:40 <M-ms> jakub_golinowski: thanks for the heads up, following your conversation

17:41 <jakub_golinowski> M-ms, btw do you have it all - because there seems to be no opencv irc log and I was constantly dropped and logged in :|

17:42 <M-ms> so do you know what the dnn perf test runs if you don't have that data? there's already some in the opencv_extra repo but you mean there's more?

17:42 <M-ms> you mean a log of the opencv channel?

17:42 <M-ms> yeah, I can paste it somewhere if you want

17:43 <jakub_golinowski> M-ms, without data first dnn tests are skipped

17:43 <jakub_golinowski> (can be seen in the log run)

17:43 <jakub_golinowski> there is a python script in the opecnv_extra/testdata/dnn/ for downloading the dnn test data

17:43 <jakub_golinowski> I mean some heavy extra data

17:44 <M-ms> ah, ok, wasn't looking that carefully...

18:04 mcopik has quit [Ping timeout: 244 seconds]

18:20 mcopik has joined #ste||ar

18:27 <github> [hpx] hkaiser pushed 2 new commits to master: https://git.io/fNqQD

18:27 <github> hpx/master 7282bcd Nikunj Gupta: Replaces wrapper for __libc_start_main with main

18:27 <github> hpx/master 7dacb53 Hartmut Kaiser: Merge pull request #3375 from NK-Nikunj/Linux_better_impl...

18:30 <nikunj> hkaiser, yt?

18:34 diehlpk has quit [Ping timeout: 240 seconds]

18:48 nikunj has quit [Quit: goodnight]

18:49 <hkaiser> nihere

18:50 mcopik has quit [Ping timeout: 265 seconds]

19:04 diehlpk has joined #ste||ar

19:18 galabc has quit [Quit: Leaving]

19:36 hkaiser has quit [Quit: bye]

20:01 jakub_golinowski has quit [Quit: Ex-Chat]

20:38 diehlpk has quit [Ping timeout: 260 seconds]

20:59 hkaiser has joined #ste||ar

21:03 <github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/fNmJZ

21:03 <github> hpx/master 5192142 Hartmut Kaiser: Merge pull request #3377 from STEllAR-GROUP/integrate_hpxmp...

22:39 eschnett has quit [Quit: eschnett]

22:51 rod_ has joined #ste||ar

22:53 rod_ has left #ste||ar [#ste||ar]

23:02 heller_ has joined #ste||ar

23:02 heller___ has quit [Read error: Connection reset by peer]