hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC2018: https://wp.me/p4pxJf-k1
diehlpk has joined #ste||ar
diehlpk has quit [Ping timeout: 260 seconds]
diehlpk has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
diehlpk has quit [Ping timeout: 264 seconds]
hkaiser has quit [Quit: bye]
nanashi55 has quit [Ping timeout: 260 seconds]
nanashi55 has joined #ste||ar
nikunj has joined #ste||ar
wash has quit [Ping timeout: 256 seconds]
wash has joined #ste||ar
wash has quit [Ping timeout: 240 seconds]
wash has joined #ste||ar
wash has quit [Ping timeout: 240 seconds]
wash has joined #ste||ar
jbjnr has quit [Read error: Connection reset by peer]
wash has quit [Ping timeout: 264 seconds]
wash has joined #ste||ar
wash has quit [Ping timeout: 240 seconds]
wash has joined #ste||ar
jakub_golinowski has joined #ste||ar
hkaiser has joined #ste||ar
<hkaiser>
heller___: yt?
<heller___>
hkaiser: hey
<hkaiser>
hey
<hkaiser>
heller___: would you have time to look into #3378?
<heller___>
hkaiser: probably a missing include
<hkaiser>
yah
<hkaiser>
shouldn't be much
<jakub_golinowski>
M-ms, yt?
<nikunj>
hkaiser, yt?
<heller___>
hkaiser: not at all
<heller___>
took the oportunity to update the code in total
<M-ms>
at least from some quick testing that I did yesterday the variance with the hpx backend went down considerably and the timings were very comparable to the pthreads backend
<heller___>
nikunj: i will
<jakub_golinowski>
M-ms, ok will do that
<nikunj>
heller___, thanks.
<M-ms>
--hpx:ini=hpx.max_idle_loop_count=1 --hpx:ini=hpx.max_idle_backoff_time=10000 or something like that
<M-ms>
ok, thanks
<jakub_golinowski>
M-ms, why do you say that from the log we know that the 10% hpx was doing non-work?
<M-ms>
jakub_golinowski: btw, check how many threads the pthreads backend uses by default, it might be 4 while hpx would use 8
<M-ms>
also with the pthreads backend at least 40 + 30 + 12 = 82 % seems to be work, and the same sum is much smaller for hpx
<jakub_golinowski>
M-ms, because I was working the other way - since the 3 cv::calc... functions add up to ~56% then the 44% is non-work
<M-ms>
either way is good, that's why I said at least :)
<jakub_golinowski>
M-ms, so this is not very good - do you know why is that?
<jakub_golinowski>
M-ms, scheduling_loop of 10% is sth expected?
<jakub_golinowski>
or the program is too short?
<M-ms>
it's not necessarily bad, because during the serial portions the scheduler doesn't have any work, so this is not 100% representative
<M-ms>
that's why it's better to try changing the idling parameters
<M-ms>
at least before we draw any final conclusions
<jakub_golinowski>
ok let me remind myself about sepcifics of these params in the hpx docs and run some experimetns
<M-ms>
and like hkaiser said something like apex would give us traces per thread which can be much easier to interpret
<heller___>
nikunj: perfect.
<nikunj>
heller___, does that fix things?
<heller___>
yes
<nikunj>
heller___, that's good to hear
<jakub_golinowski>
M-ms, wow that looks way better
<jakub_golinowski>
M-ms, pthreads makes use of 8 threads
<M-ms>
jakub_golinowski: ok, nice
<jakub_golinowski>
and my observation is that HPX with 8 threads performs slower than with 4 threads - therefore it seems that for this particular tests using hyperthreading introduces a penalty
<jakub_golinowski>
is it known why is that generally?
<M-ms>
so if things look reasonaby I think you should do what the opencv guy said and post some numbers from the dnn tests
<M-ms>
ok, that's quite possible
<M-ms>
is it for both hpx and pthreads?
<jakub_golinowski>
M-ms, I need to figure out how to set number of threads for pthreads from the CL
<M-ms>
have you tried --perf_threads=N?
<jakub_golinowski>
M-ms, nope
<jakub_golinowski>
M-ms, how did you know?
<M-ms>
--help :)
<jakub_golinowski>
ah ...
<jakub_golinowski>
I started reading the source code
<M-ms>
(I tried it yesterday, and I'm only assuming that it actually sets the number of threads for the parallel regions and not something else)
<jakub_golinowski>
M-ms, ok thank you very much for all the hints
<M-ms>
jakub_golinowski: no problem
<M-ms>
things are looking better than I though a week ago
<M-ms>
thought
<jakub_golinowski>
M-ms, so for this test the pthreads backend seems to respect the --perf_threads but hpx backend not
<jakub_golinowski>
I need to use --hpx:threads=4
<M-ms>
yeah, that's expected when you're using hpx_main
<jakub_golinowski>
Ah I mean it will be probably for any test
<jakub_golinowski>
but what I also want to say is that for pthreads the performance increases slightly when hyperthreading is introduce
<M-ms>
but I think you can pass --perf_threads=4 --hpx:threads=4 and one will just be ignored depending on which backend you're using
<M-ms>
ok, that's interesting, in that case you can run the tests with both 4 and 8 threads
<jakub_golinowski>
hmm, strange -> pthreads has also bigger variation for 4 threads
<jakub_golinowski>
and the utilization over time looks like it is not optimal
<jakub_golinowski>
M-ms, So before I satrt running the tests I think we should agree on some parameter scope
<jakub_golinowski>
M-ms, 4threads vs 8 threads is for me obligatory
<jakub_golinowski>
M-ms, and the Q is: what backoff params should I use -> just the ones that seem good now?
<M-ms>
jakub_golinowski: yep, 4 and 8 threads
<M-ms>
use max_idle_loop_count-1
<M-ms>
=1
<M-ms>
max_idle_backoff_time doesn't matter too much
<M-ms>
and if you want you can run with the default hpx settings as well
<jakub_golinowski>
Because I seem to not be able to change --hpx:ini=hpx.max_idle_backoff_time=10000 from the CL - which I believe is due to cmake setting
<jakub_golinowski>
and I would have to rebuild hpx and then opencv
<M-ms>
it complains that it' doesn't exist?
<M-ms>
I might have gotten it wrong
<jakub_golinowski>
no it just seems to be not affected in the config dump
<jakub_golinowski>
but the --hpx:ini=hpx.max_idle_backoff_time=10000 is not listed as unknown option as well
<M-ms>
hmm, ok, could be a bug in the dump as well (hope not), but I remember checking that it makes a difference... so leave that one out
<jakub_golinowski>
M-ms, hmm but HPX_WITH_THREAD_MANAGER_IDLE_BACKOFF is set to ON, when I view it in ccmake of the hpx build
<jakub_golinowski>
M-ms, it makes a difference or not?
<M-ms>
yeah, it would complain about max_idle_backoff_time otherwise
<M-ms>
it shouldn't make a big difference, the default is 100 ms which is more than enough to have practically 0% cpu usage when idling
<jakub_golinowski>
...
<jakub_golinowski>
sorry but it seems to work now
<M-ms>
ok, good
<jakub_golinowski>
so to sum up: (1) 4threads and hpx.max_idle_loop_count=1 (2) 8threads and hpx.max_idle_loop_count=(default)
<jakub_golinowski>
(3) 4threads hpx.max_idle_loop_count=(default) and (4) 8threads and hpx.max_idle_loop_count=1
<M-ms>
jakub_golinowski: cool, so both get worse with 8 threads and hpx is basically as good as pthreads with 4 threads
<M-ms>
that's the total time of the full test I suppose? are there any concerning variations within the tests?
<jakub_golinowski>
M-ms, yes, this is how it looks like from the table
<jakub_golinowski>
M-ms, from the eyballing in meld it seems that hpx has more often more samples but most of the time they have equal number of samples
<jakub_golinowski>
M-ms, also it seems like the hpx is winning on the longer tests
<M-ms>
jakub_golinowski: ok, that's not bad though
<nikunj>
hkaiser, yt?
<M-ms>
maybe you could rerun your mandelbrot benchmarks overnight now with the new idling settings?
<hkaiser>
here
<nikunj>
hkaiser, is there a way I could use my variables defined in hpx_wrap or hpx_main in hpx_init.cpp (where we're trying to debug for hpx::init running more than once)?
<nikunj>
I was getting undefined symbol when I tried to extern the symbol and use it
<hkaiser>
nikunj: hpx_init is linked whenever your library is linked as well, so it should work - could be a question of library sequencing on the linker command line, though
<nikunj>
hkaiser, yes it seems like a library sequencing issue to me as well
<nikunj>
hkaiser, I got the undefined symbol while linking libhpx.so after building it
<hkaiser>
libhpx.so does not link against hpx_init or hpx_wrap for that matter
<nikunj>
sorry it was libhpx_init
<hkaiser>
ahh yes, hpx_init.cpp is part of libhpx, not libhpx_init
diehlpk_work has quit [Read error: Connection reset by peer]
diehlpk has joined #ste||ar
<nikunj>
hkaiser, I think the reason it can't find the symbol while linking libhpx is because libhpx_wrap has not been build/linked
<hkaiser>
no
<hkaiser>
libhpx does not link against libhpx_wrap
<hkaiser>
it should not
<hkaiser>
at least I believe it shouldn't
<nikunj>
yes libhpx_wrap should not be linked with any other shared/static library
<nikunj>
but it won't generate the shared object file when I'm trying to use a symbol from libhpx_wrap
<nikunj>
hkaiser, do you have an idea to get a check for the value of include_libhpx_wrap inside of hpx_init.cpp?
<hkaiser>
make nclude_libhpx_wrap a weak symbol in libhpx.so?
<nikunj>
hkaiser, I tried it, but it somehow does not override it
<nikunj>
so even when hpx_main is included, it won't change it's value
<nikunj>
that was my idea yesterday when I told you that I might have got something wrong. But I somehow could not make it to work
<hkaiser>
k
<nikunj>
hkaiser, btw I think the pr(#3375) solves most of the problems which were otherwise breaking master.
_bibek_ has joined #ste||ar
diehlpk_work has joined #ste||ar
bibek has quit [Ping timeout: 260 seconds]
quaz0r has quit [Ping timeout: 244 seconds]
quaz0rus has joined #ste||ar
quaz0rus is now known as quaz0r
<nikunj>
hkaiser, did I fail 2nd evaluation coz I see evaluation as complete and not passed?
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
galabc has joined #ste||ar
anushi has quit [Ping timeout: 265 seconds]
jakub_golinowski has quit [Quit: Ex-Chat]
jakub_golinowski has joined #ste||ar
<nikunj>
it says passed now :)
K-ballo has quit [Quit: K-ballo]
jakub_golinowski has quit [Ping timeout: 256 seconds]
jakub_golinowski has joined #ste||ar
jakub_golinowski has quit [Ping timeout: 256 seconds]
jakub_golinowski has joined #ste||ar
K-ballo has joined #ste||ar
anushi has joined #ste||ar
eschnett has joined #ste||ar
<jakub_golinowski>
M-ms, just to keep you up-to-date
<jakub_golinowski>
I realized that to have the dnn perf tests requested in the pr I need to download some data
anushi has quit [Ping timeout: 265 seconds]
<jakub_golinowski>
I started it a long time ago, a few hours already but because of the internet breaking/slowing down a lot it is still not finished.
<jakub_golinowski>
In the mean time I tried running the python scripts from opencv to see if they will make things easier but I discovered some errors in the script and am currently in the discussion at opencv irc
mcopik has joined #ste||ar
<M-ms>
jakub_golinowski: thanks for the heads up, following your conversation
<jakub_golinowski>
M-ms, btw do you have it all - because there seems to be no opencv irc log and I was constantly dropped and logged in :|
<M-ms>
so do you know what the dnn perf test runs if you don't have that data? there's already some in the opencv_extra repo but you mean there's more?
<M-ms>
you mean a log of the opencv channel?
<M-ms>
yeah, I can paste it somewhere if you want
<jakub_golinowski>
M-ms, without data first dnn tests are skipped
<jakub_golinowski>
(can be seen in the log run)
<jakub_golinowski>
there is a python script in the opecnv_extra/testdata/dnn/ for downloading the dnn test data