hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC2018: https://wp.me/p4pxJf-k1
galabc has quit [Quit: Leaving]
diehlpk has joined #ste||ar
diehlpk has quit [Ping timeout: 260 seconds]
diehlpk has joined #ste||ar
twwright_ has joined #ste||ar
hkaiser has quit [Quit: bye]
diehlpk has quit [Ping timeout: 256 seconds]
twwright_ has quit [Quit: twwright_]
K-ballo has quit [Quit: K-ballo]
nanashi55 has quit [Ping timeout: 256 seconds]
nanashi55 has joined #ste||ar
eschnett has quit [Quit: eschnett]
anushi has joined #ste||ar
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
anushi has quit [Remote host closed the connection]
<M-ms>
jakub_golinowski: yt? is it the ml test you were running that contains the kmeans test? did you find out anything new yesterday? I have a bit of time right now to look at it so will see if I can reproduce it
<jakub_golinowski>
M-ms, this is in file test_math.cpp
<jakub_golinowski>
line 2802: class CV_KMeansSingularTest : public cvtest::BaseTest
<M-ms>
ok, thanks
zura has quit [Quit: Leaving]
<M-ms>
jakub_golinowski: two things to look into:
<M-ms>
1) I added some print statements to the hpx backend (starting, stopping, calling parallel_for) and it keeps starting and stopping the backend, it's not completely stuck. How long does it take for that test to run with other backends?
<M-ms>
2. The cout output is interleaved, as if there are at least two threads starting and stopping the backend. Could you check if the kmeans test or gtest is spawning multiple threads?
<M-ms>
and for 1. you can also check if the kmeans function is making progress (printing some variables might help).
anushi has quit [Ping timeout: 264 seconds]
anushi has joined #ste||ar
<M-ms>
(also check that it's actually looping the same way for you, or if it's actually stuck)
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
<jakub_golinowski>
M-ms, thank you for the hints
<jakub_golinowski>
when I paused the program after it hangs (CPU usage drops to 0) I see that there are 6 threads in total (thread 1 and 5 threads from HPX)
<jakub_golinowski>
and also I did not explicitly asked for more threads and for non-parallel for cases I see only single core being occupied so I believe that test framework itself is single-threaded
<jakub_golinowski>
M-ms, for the point 1) -> this tests seems to be very quick in other backends (pthreads 165ms, tbb 212ms) but for HPX with include hpx_main it takes 50 seconds! but is passed correctly.
anushi has quit [Ping timeout: 256 seconds]
anushi has joined #ste||ar
<M-ms>
jakub_golinowski: that's interesting, so then printing something from kmeans might be useful (something each iteration, and total number of iterations for example)
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
<M-ms>
it looks like the kmeans test also doesn't check the result too strictly, it may be a hint that something is still wrong in the backend (but it still looks ok to me)
<M-ms>
sorry I can't be of more help at the moment
<M-ms>
and about the next call, looks like tuesday would be easiest for me, would that still be okay for you?
<jakub_golinowski>
M-ms, yes Tuesday is OK for me
<M-ms>
jakub_golinowski: btw, if you're completely stuck try disabling that test and see if other tests after it fail
<jakub_golinowski>
M-ms, I think I might have to do with the start-stop backend not being fully smart or some race conditions
<jakub_golinowski>
because the hpx_main backend is passing this test
anushi has quit [Read error: Connection reset by peer]
anushi has joined #ste||ar
anushi has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
anushi has joined #ste||ar
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
K-ballo has joined #ste||ar
diehlpk has joined #ste||ar
diehlpk has quit [Ping timeout: 240 seconds]
mbremer_ has joined #ste||ar
<mbremer_>
@hkaiser: yt?
<hkaiser>
mbremer: here
<K-ballo>
what was the scheduler that needed the non trivial atomics?
<K-ballo>
ABP?
david_pf_ has joined #ste||ar
<hkaiser>
yes
<mbremer_>
@hkaiser: Sorry chrome didn't ping me. I figured my bug out last night.
<mbremer_>
I just used the default move constructor. I think there are probably a few extra moves in migration that caused my pointers to be invalidated
eschnett has joined #ste||ar
<jakub_golinowski>
M-ms, hkaiser: Is it possible that there is a rare race condition within the hpx::stop() triggered when the HPX runtime is satrted and stopped multiple times for a very short period of time for each start()?
<ASamir>
heller: I have a question. For the context pointer that I pass to the data transfer operation, It should contain an fi_context struct that is used by the provider in something related to tracking the request, right?
anushi has quit [Read error: Connection reset by peer]
anushi has joined #ste||ar
<nikunj1997>
hkaiser, it seems i'm back to square 1 wrt global object
<ASamir>
I've looked into HPX libfabric parcelport but didn't find a fi_context struct in the sender or receiver structs although the FI_CONTEXT mode is set.
<hkaiser>
nikunj1997: ok
<nikunj1997>
hkaiser, I can't seem to implement by creating wrapper functions
ASamir has quit [Ping timeout: 264 seconds]
<hkaiser>
nikunj1997: anything I can do to help?
<nikunj1997>
hkaiser, I think the only way to implement it for both would be to create another toolchain
<nikunj1997>
hkaiser, and that introduces portability issues
<nikunj1997>
hkaiser, so I'm against it
<nikunj1997>
hkaiser, I can't think of any other way to implement it
<nikunj1997>
do you have any implementation ideas in mind?
<hkaiser>
no, sorry
<hkaiser>
and I agree - we should not even consider a new toolchain
<nikunj1997>
any resources that you'd like me to go through?
<nikunj1997>
hkaiser, the code for static/dynamic will work fine for non global object codes though since the runtime system initializes at the program entry point (i.e C main) and not before it
<hkaiser>
yah, I understand
eschnett has quit [Quit: eschnett]
<github>
[hpx] K-ballo force-pushed logging from 519b391 to bf61d0f: https://git.io/vx6Yc