hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC2018: https://wp.me/p4pxJf-k1
diehlpk has joined #ste||ar
diehlpk has quit [Ping timeout: 260 seconds]
diehlpk has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
diehlpk has quit [Ping timeout: 264 seconds]
hkaiser has quit [Quit: bye]
nanashi55 has quit [Ping timeout: 260 seconds]
nanashi55 has joined #ste||ar
nikunj has joined #ste||ar
wash has quit [Ping timeout: 256 seconds]
wash has joined #ste||ar
wash has quit [Ping timeout: 240 seconds]
wash has joined #ste||ar
wash has quit [Ping timeout: 240 seconds]
wash has joined #ste||ar
jbjnr has quit [Read error: Connection reset by peer]
wash has quit [Ping timeout: 264 seconds]
wash has joined #ste||ar
wash has quit [Ping timeout: 240 seconds]
wash has joined #ste||ar
jakub_golinowski has joined #ste||ar
hkaiser has joined #ste||ar
<hkaiser> heller___: yt?
<heller___> hkaiser: hey
<hkaiser> hey
<hkaiser> heller___: would you have time to look into #3378?
<heller___> hkaiser: probably a missing include
<hkaiser> yah
<hkaiser> shouldn't be much
<jakub_golinowski> M-ms, yt?
<nikunj> hkaiser, yt?
<heller___> hkaiser: not at all
<heller___> took the oportunity to update the code in total
<hkaiser> heller___: cool
<hkaiser> heller___: thanks a lot!
<heller___> no problem...
<heller___> uh oh
<heller___> nikunj: ^^
<nikunj> heller___, could you please share the code
<nikunj> heller___, also which environment are you on?
<M-ms> jakub_golinowski: here
<jakub_golinowski> M-ms, could we discuss the paste I posted the day before yesterday?
<M-ms> yes, definitely
<M-ms> first, what did you mean by the opencv tests being randomized? are the inputs random?
<heller___> nikunj: it's clang with ubsan
<jakub_golinowski> M-ms, so as for the randomization I mean that even though I run exactly the same test, namely "TestStereoCorresp_SGBM.SGBM/2"
<jakub_golinowski> hpx got 100 samples and pthreads 10
<M-ms> but if you run the same test with pthreads, do the timings differ significantly?
<M-ms> it might be that it doesn't rerun the test so many times if the variance is low
<nikunj> heller___, looks odd
<M-ms> so from the log you posted we can't tell so much except that with the hpx backend at least 10% of the time is spent doing non-work
<jakub_golinowski> M-ms, good idea with the dependency on the stddev
<nikunj> heller___, could you try the same with this pr
<heller___> which PR?
<M-ms> so could I ask you to run the hpx backend either with only 4 threads or 8 threads and set the idling parameters so that it idles very quickly?
<nikunj> heller___, ^^
<M-ms> at least from some quick testing that I did yesterday the variance with the hpx backend went down considerably and the timings were very comparable to the pthreads backend
<heller___> nikunj: i will
<jakub_golinowski> M-ms, ok will do that
<nikunj> heller___, thanks.
<M-ms> --hpx:ini=hpx.max_idle_loop_count=1 --hpx:ini=hpx.max_idle_backoff_time=10000 or something like that
<M-ms> ok, thanks
<jakub_golinowski> M-ms, why do you say that from the log we know that the 10% hpx was doing non-work?
<M-ms> jakub_golinowski: btw, check how many threads the pthreads backend uses by default, it might be 4 while hpx would use 8
<M-ms> jakub_golinowski: 9.02% opencv_perf_cal libhpx.so.1.2.0 [.] hpx::threads::detail::scheduling_loop<hpx::threads::policies::local_priority_queue_scheduler<std::mutex, hpx::threads::policies::lockfree_fif
<M-ms> it's actually more, but at least 10
<M-ms> also with the pthreads backend at least 40 + 30 + 12 = 82 % seems to be work, and the same sum is much smaller for hpx
<jakub_golinowski> M-ms, because I was working the other way - since the 3 cv::calc... functions add up to ~56% then the 44% is non-work
<M-ms> either way is good, that's why I said at least :)
<jakub_golinowski> M-ms, so this is not very good - do you know why is that?
<jakub_golinowski> M-ms, scheduling_loop of 10% is sth expected?
<jakub_golinowski> or the program is too short?
<M-ms> it's not necessarily bad, because during the serial portions the scheduler doesn't have any work, so this is not 100% representative
<M-ms> that's why it's better to try changing the idling parameters
<M-ms> at least before we draw any final conclusions
<jakub_golinowski> ok let me remind myself about sepcifics of these params in the hpx docs and run some experimetns
<M-ms> and like hkaiser said something like apex would give us traces per thread which can be much easier to interpret
<heller___> nikunj: perfect.
<nikunj> heller___, does that fix things?
<heller___> yes
<nikunj> heller___, that's good to hear
<jakub_golinowski> M-ms, wow that looks way better
<jakub_golinowski> M-ms, pthreads makes use of 8 threads
<M-ms> jakub_golinowski: ok, nice
<jakub_golinowski> and my observation is that HPX with 8 threads performs slower than with 4 threads - therefore it seems that for this particular tests using hyperthreading introduces a penalty
<jakub_golinowski> is it known why is that generally?
<M-ms> so if things look reasonaby I think you should do what the opencv guy said and post some numbers from the dnn tests
<M-ms> ok, that's quite possible
<M-ms> is it for both hpx and pthreads?
<jakub_golinowski> M-ms, I need to figure out how to set number of threads for pthreads from the CL
<M-ms> have you tried --perf_threads=N?
<jakub_golinowski> M-ms, nope
<jakub_golinowski> M-ms, how did you know?
<M-ms> --help :)
<jakub_golinowski> ah ...
<jakub_golinowski> I started reading the source code
<M-ms> (I tried it yesterday, and I'm only assuming that it actually sets the number of threads for the parallel regions and not something else)
<jakub_golinowski> M-ms, ok thank you very much for all the hints
<M-ms> jakub_golinowski: no problem
<M-ms> things are looking better than I though a week ago
<M-ms> thought
<jakub_golinowski> M-ms, so for this test the pthreads backend seems to respect the --perf_threads but hpx backend not
<jakub_golinowski> I need to use --hpx:threads=4
<M-ms> yeah, that's expected when you're using hpx_main
<jakub_golinowski> Ah I mean it will be probably for any test
<jakub_golinowski> but what I also want to say is that for pthreads the performance increases slightly when hyperthreading is introduce
<M-ms> but I think you can pass --perf_threads=4 --hpx:threads=4 and one will just be ignored depending on which backend you're using
<M-ms> ok, that's interesting, in that case you can run the tests with both 4 and 8 threads
<jakub_golinowski> hmm, strange -> pthreads has also bigger variation for 4 threads
<jakub_golinowski> and the utilization over time looks like it is not optimal
<jakub_golinowski> M-ms, So before I satrt running the tests I think we should agree on some parameter scope
<jakub_golinowski> M-ms, 4threads vs 8 threads is for me obligatory
<jakub_golinowski> M-ms, and the Q is: what backoff params should I use -> just the ones that seem good now?
<M-ms> jakub_golinowski: yep, 4 and 8 threads
<M-ms> use max_idle_loop_count-1
<M-ms> =1
<M-ms> max_idle_backoff_time doesn't matter too much
<M-ms> and if you want you can run with the default hpx settings as well
<jakub_golinowski> Because I seem to not be able to change --hpx:ini=hpx.max_idle_backoff_time=10000 from the CL - which I believe is due to cmake setting
<jakub_golinowski> and I would have to rebuild hpx and then opencv
<M-ms> it complains that it' doesn't exist?
<M-ms> I might have gotten it wrong
<jakub_golinowski> no it just seems to be not affected in the config dump
<jakub_golinowski> but the --hpx:ini=hpx.max_idle_backoff_time=10000 is not listed as unknown option as well
<M-ms> hmm, ok, could be a bug in the dump as well (hope not), but I remember checking that it makes a difference... so leave that one out
<jakub_golinowski> M-ms, hmm but HPX_WITH_THREAD_MANAGER_IDLE_BACKOFF is set to ON, when I view it in ccmake of the hpx build
<jakub_golinowski> M-ms, it makes a difference or not?
<M-ms> yeah, it would complain about max_idle_backoff_time otherwise
<M-ms> it shouldn't make a big difference, the default is 100 ms which is more than enough to have practically 0% cpu usage when idling
<jakub_golinowski> ...
<jakub_golinowski> sorry but it seems to work now
<M-ms> ok, good
<jakub_golinowski> so to sum up: (1) 4threads and hpx.max_idle_loop_count=1 (2) 8threads and hpx.max_idle_loop_count=(default)
<jakub_golinowski> (3) 4threads hpx.max_idle_loop_count=(default) and (4) 8threads and hpx.max_idle_loop_count=1
<M-ms> jakub_golinowski: yep, looks ok
K-ballo has joined #ste||ar
<jakub_golinowski> this is just for the dnn test
<M-ms> jakub_golinowski: cool, so both get worse with 8 threads and hpx is basically as good as pthreads with 4 threads
<M-ms> that's the total time of the full test I suppose? are there any concerning variations within the tests?
<jakub_golinowski> M-ms, yes, this is how it looks like from the table
<jakub_golinowski> M-ms, from the eyballing in meld it seems that hpx has more often more samples but most of the time they have equal number of samples
<jakub_golinowski> M-ms, also it seems like the hpx is winning on the longer tests
<M-ms> jakub_golinowski: ok, that's not bad though
<nikunj> hkaiser, yt?
<M-ms> maybe you could rerun your mandelbrot benchmarks overnight now with the new idling settings?
<hkaiser> here
<nikunj> hkaiser, is there a way I could use my variables defined in hpx_wrap or hpx_main in hpx_init.cpp (where we're trying to debug for hpx::init running more than once)?
<nikunj> I was getting undefined symbol when I tried to extern the symbol and use it
<hkaiser> nikunj: hpx_init is linked whenever your library is linked as well, so it should work - could be a question of library sequencing on the linker command line, though
<nikunj> hkaiser, yes it seems like a library sequencing issue to me as well
<nikunj> hkaiser, I got the undefined symbol while linking libhpx.so after building it
<hkaiser> libhpx.so does not link against hpx_init or hpx_wrap for that matter
<nikunj> sorry it was libhpx_init
<hkaiser> ahh yes, hpx_init.cpp is part of libhpx, not libhpx_init
<hkaiser> that should work, then
<hkaiser> it is libhpx.so after all
<hkaiser> gtg, sorry
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar
diehlpk_work has quit [Read error: Connection reset by peer]
diehlpk has joined #ste||ar
<nikunj> hkaiser, I think the reason it can't find the symbol while linking libhpx is because libhpx_wrap has not been build/linked
<hkaiser> no
<hkaiser> libhpx does not link against libhpx_wrap
<hkaiser> it should not
<hkaiser> at least I believe it shouldn't
<nikunj> yes libhpx_wrap should not be linked with any other shared/static library
<nikunj> but it won't generate the shared object file when I'm trying to use a symbol from libhpx_wrap
<nikunj> hkaiser, do you have an idea to get a check for the value of include_libhpx_wrap inside of hpx_init.cpp?
<hkaiser> make nclude_libhpx_wrap a weak symbol in libhpx.so?
<nikunj> hkaiser, I tried it, but it somehow does not override it
<nikunj> so even when hpx_main is included, it won't change it's value
<nikunj> that was my idea yesterday when I told you that I might have got something wrong. But I somehow could not make it to work
<hkaiser> k
<nikunj> hkaiser, btw I think the pr(#3375) solves most of the problems which were otherwise breaking master.
_bibek_ has joined #ste||ar
diehlpk_work has joined #ste||ar
bibek has quit [Ping timeout: 260 seconds]
quaz0r has quit [Ping timeout: 244 seconds]
quaz0rus has joined #ste||ar
quaz0rus is now known as quaz0r
<nikunj> hkaiser, did I fail 2nd evaluation coz I see evaluation as complete and not passed?
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
galabc has joined #ste||ar
anushi has quit [Ping timeout: 265 seconds]
jakub_golinowski has quit [Quit: Ex-Chat]
jakub_golinowski has joined #ste||ar
<nikunj> it says passed now :)
K-ballo has quit [Quit: K-ballo]
jakub_golinowski has quit [Ping timeout: 256 seconds]
jakub_golinowski has joined #ste||ar
jakub_golinowski has quit [Ping timeout: 256 seconds]
jakub_golinowski has joined #ste||ar
K-ballo has joined #ste||ar
anushi has joined #ste||ar
eschnett has joined #ste||ar
<jakub_golinowski> M-ms, just to keep you up-to-date
<jakub_golinowski> I realized that to have the dnn perf tests requested in the pr I need to download some data
anushi has quit [Ping timeout: 265 seconds]
<jakub_golinowski> I started it a long time ago, a few hours already but because of the internet breaking/slowing down a lot it is still not finished.
<jakub_golinowski> In the mean time I tried running the python scripts from opencv to see if they will make things easier but I discovered some errors in the script and am currently in the discussion at opencv irc
mcopik has joined #ste||ar
<M-ms> jakub_golinowski: thanks for the heads up, following your conversation
<jakub_golinowski> M-ms, btw do you have it all - because there seems to be no opencv irc log and I was constantly dropped and logged in :|
<M-ms> so do you know what the dnn perf test runs if you don't have that data? there's already some in the opencv_extra repo but you mean there's more?
<M-ms> you mean a log of the opencv channel?
<M-ms> yeah, I can paste it somewhere if you want
<jakub_golinowski> M-ms, without data first dnn tests are skipped
<jakub_golinowski> (can be seen in the log run)
<jakub_golinowski> there is a python script in the opecnv_extra/testdata/dnn/ for downloading the dnn test data
<jakub_golinowski> I mean some heavy extra data
<M-ms> ah, ok, wasn't looking that carefully...
mcopik has quit [Ping timeout: 244 seconds]
mcopik has joined #ste||ar
<github> [hpx] hkaiser pushed 2 new commits to master: https://git.io/fNqQD
<github> hpx/master 7282bcd Nikunj Gupta: Replaces wrapper for __libc_start_main with main
<github> hpx/master 7dacb53 Hartmut Kaiser: Merge pull request #3375 from NK-Nikunj/Linux_better_impl...
<nikunj> hkaiser, yt?
diehlpk has quit [Ping timeout: 240 seconds]
nikunj has quit [Quit: goodnight]
<hkaiser> nihere
mcopik has quit [Ping timeout: 265 seconds]
diehlpk has joined #ste||ar
galabc has quit [Quit: Leaving]
hkaiser has quit [Quit: bye]
jakub_golinowski has quit [Quit: Ex-Chat]
diehlpk has quit [Ping timeout: 260 seconds]
hkaiser has joined #ste||ar
<github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/fNmJZ
<github> hpx/master 5192142 Hartmut Kaiser: Merge pull request #3377 from STEllAR-GROUP/integrate_hpxmp...
eschnett has quit [Quit: eschnett]
rod_ has joined #ste||ar
rod_ has left #ste||ar [#ste||ar]
heller_ has joined #ste||ar
heller___ has quit [Read error: Connection reset by peer]