hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC2018: https://wp.me/p4pxJf-k1
galabc has quit [Quit: Leaving]
diehlpk has joined #ste||ar
diehlpk has quit [Ping timeout: 260 seconds]
diehlpk has joined #ste||ar
twwright_ has joined #ste||ar
hkaiser has quit [Quit: bye]
diehlpk has quit [Ping timeout: 256 seconds]
twwright_ has quit [Quit: twwright_]
K-ballo has quit [Quit: K-ballo]
nanashi55 has quit [Ping timeout: 256 seconds]
nanashi55 has joined #ste||ar
eschnett has quit [Quit: eschnett]
anushi has joined #ste||ar
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
nikunj97 has joined #ste||ar
jakub_golinowski has joined #ste||ar
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/f4ypW
<github> hpx/gh-pages 2633fca StellarBot: Updating docs
zura has joined #ste||ar
<M-ms> jakub_golinowski: yt? is it the ml test you were running that contains the kmeans test? did you find out anything new yesterday? I have a bit of time right now to look at it so will see if I can reproduce it
<jakub_golinowski> M-ms, this is in file test_math.cpp
<jakub_golinowski> line 2802: class CV_KMeansSingularTest : public cvtest::BaseTest
<M-ms> ok, thanks
zura has quit [Quit: Leaving]
<M-ms> jakub_golinowski: two things to look into:
<M-ms> 1) I added some print statements to the hpx backend (starting, stopping, calling parallel_for) and it keeps starting and stopping the backend, it's not completely stuck. How long does it take for that test to run with other backends?
<M-ms> 2. The cout output is interleaved, as if there are at least two threads starting and stopping the backend. Could you check if the kmeans test or gtest is spawning multiple threads?
<M-ms> and for 1. you can also check if the kmeans function is making progress (printing some variables might help).
anushi has quit [Ping timeout: 264 seconds]
anushi has joined #ste||ar
<M-ms> (also check that it's actually looping the same way for you, or if it's actually stuck)
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
<jakub_golinowski> M-ms, thank you for the hints
<jakub_golinowski> when I paused the program after it hangs (CPU usage drops to 0) I see that there are 6 threads in total (thread 1 and 5 threads from HPX)
<jakub_golinowski> and also I did not explicitly asked for more threads and for non-parallel for cases I see only single core being occupied so I believe that test framework itself is single-threaded
<jakub_golinowski> M-ms, for the point 1) -> this tests seems to be very quick in other backends (pthreads 165ms, tbb 212ms) but for HPX with include hpx_main it takes 50 seconds! but is passed correctly.
anushi has quit [Ping timeout: 256 seconds]
anushi has joined #ste||ar
<M-ms> jakub_golinowski: that's interesting, so then printing something from kmeans might be useful (something each iteration, and total number of iterations for example)
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
<M-ms> it looks like the kmeans test also doesn't check the result too strictly, it may be a hint that something is still wrong in the backend (but it still looks ok to me)
<M-ms> sorry I can't be of more help at the moment
<M-ms> and about the next call, looks like tuesday would be easiest for me, would that still be okay for you?
<jakub_golinowski> M-ms, yes Tuesday is OK for me
<M-ms> jakub_golinowski: btw, if you're completely stuck try disabling that test and see if other tests after it fail
<jakub_golinowski> M-ms, I think I might have to do with the start-stop backend not being fully smart or some race conditions
<jakub_golinowski> because the hpx_main backend is passing this test
anushi has quit [Read error: Connection reset by peer]
anushi has joined #ste||ar
anushi has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
anushi has joined #ste||ar
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
K-ballo has joined #ste||ar
diehlpk has joined #ste||ar
diehlpk has quit [Ping timeout: 240 seconds]
mbremer_ has joined #ste||ar
<mbremer_> @hkaiser: yt?
<hkaiser> mbremer: here
<K-ballo> what was the scheduler that needed the non trivial atomics?
<K-ballo> ABP?
david_pf_ has joined #ste||ar
<hkaiser> yes
<mbremer_> @hkaiser: Sorry chrome didn't ping me. I figured my bug out last night.
<mbremer_> I just used the default move constructor. I think there are probably a few extra moves in migration that caused my pointers to be invalidated
eschnett has joined #ste||ar
<jakub_golinowski> M-ms, hkaiser: Is it possible that there is a rare race condition within the hpx::stop() triggered when the HPX runtime is satrted and stopped multiple times for a very short period of time for each start()?
<jakub_golinowski> In this simple example I observer that the program hangs after some (non-deterministic) number of runs: https://github.com/Jakub-Golinowski/opencv_hpx_backend/tree/master/hpx_start_stop
<jakub_golinowski> And also each time this situation happens there is the same configuration of HPX threads
<M-ms> jakub_golinowski: definitely possible (likely even), but if the kmeans test hangs every time it's most likely something different
<M-ms> We still have random timeouts on our tests
<jakub_golinowski> But as I said it hangs in case of simple counting to 10 for each start and stop
<jakub_golinowski> without any opencv and tests
<M-ms> Do you still have the segfaults with hpx_main?
<jakub_golinowski> I do believe so, I will investigate it now
<jakub_golinowski> I was taking screenshots of the stack traces for the 6 threads config now
<jakub_golinowski> It seems to be always 5 HPX threads -> 4 with the same stack trace and one with different
<hkaiser> jakub_golinowski: those are the (unrelated) service thereads
<jakub_golinowski> but I am not sure what they are
<hkaiser> you can ignore them
<jakub_golinowski> ok, but the thread_1 is waiting infinitely in t.join
eschnett has quit [Quit: eschnett]
nikunj1997 has joined #ste||ar
nikunj97 has quit [Ping timeout: 240 seconds]
mbremer_ has quit [Quit: Page closed]
anushi has quit [Remote host closed the connection]
<github> [hpx] K-ballo force-pushed logging from 6c95363 to 519b391: https://git.io/vx6Yc
<github> hpx/logging 519b391 Agustin K-ballo Berge: pruning util/logging
anushi has joined #ste||ar
eschnett has joined #ste||ar
ASamir has joined #ste||ar
<ASamir> heller: I have a question. For the context pointer that I pass to the data transfer operation, It should contain an fi_context struct that is used by the provider in something related to tracking the request, right?
anushi has quit [Read error: Connection reset by peer]
anushi has joined #ste||ar
<nikunj1997> hkaiser, it seems i'm back to square 1 wrt global object
<ASamir> I've looked into HPX libfabric parcelport but didn't find a fi_context struct in the sender or receiver structs although the FI_CONTEXT mode is set.
<hkaiser> nikunj1997: ok
<nikunj1997> hkaiser, I can't seem to implement by creating wrapper functions
ASamir has quit [Ping timeout: 264 seconds]
<hkaiser> nikunj1997: anything I can do to help?
<nikunj1997> hkaiser, I think the only way to implement it for both would be to create another toolchain
<nikunj1997> hkaiser, and that introduces portability issues
<nikunj1997> hkaiser, so I'm against it
<nikunj1997> hkaiser, I can't think of any other way to implement it
<nikunj1997> do you have any implementation ideas in mind?
<hkaiser> no, sorry
<hkaiser> and I agree - we should not even consider a new toolchain
<nikunj1997> any resources that you'd like me to go through?
<nikunj1997> hkaiser, the code for static/dynamic will work fine for non global object codes though since the runtime system initializes at the program entry point (i.e C main) and not before it
<hkaiser> yah, I understand
eschnett has quit [Quit: eschnett]
<github> [hpx] K-ballo force-pushed logging from 519b391 to bf61d0f: https://git.io/vx6Yc
<github> hpx/logging bf61d0f Agustin K-ballo Berge: pruning util/logging
<hkaiser> nikunj1997: I still need to get you in contact with that MS person
<nikunj1997> hkaiser, the main issue is of the iostream buffer. The ios_base::Init is not able to properly initialize the buffers
<hkaiser> Billy O'Neal
<hkaiser> sure, if not that, then something else
<hkaiser> all c++ code relies on the C++ runtime to be properly initialized
<nikunj1997> when should I get a mail from him?
<nikunj1997> *when should I expect
<hkaiser> nikunj1997: I have not sent the mail from my end
<hkaiser> will do later today - sorry
<nikunj1997> hkaiser, ok thanks. I'll ask him the doubts related to MSVC from him.
<nikunj1997> hkaiser, I will continue trying to solve initialization issues
<nikunj1997> We have a lot of examples and test codes to work with
<nikunj1997> If I can make them work with all of them, then I think we should be able to handle most of the scenario
<hkaiser> nikunj1997: email sent
<nikunj1997> hkaiser, thanks a lot
ASamir has joined #ste||ar
eschnett has joined #ste||ar
diehlpk has joined #ste||ar
nikunj1997 has quit [Ping timeout: 260 seconds]
nikunj has joined #ste||ar
diehlpk has quit [Ping timeout: 240 seconds]
HoloIRCUser has joined #ste||ar
HoloIRCUser has quit [Client Quit]
david_pf_ has quit [Ping timeout: 245 seconds]
david_pf_ has joined #ste||ar
nikunj has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
david_pf_ has quit [Ping timeout: 276 seconds]
<github> [hpx] K-ballo force-pushed logging from bf61d0f to 0032f70: https://git.io/vx6Yc
<github> hpx/logging 0032f70 Agustin K-ballo Berge: pruning util/logging
david_pf_ has joined #ste||ar
eschnett has quit [Quit: eschnett]
jakub_golinowski has quit [Quit: Ex-Chat]
<K-ballo> what ever happen with -latomic needed when using certain integers as atomic?
david_pf_ has quit [Quit: david_pf_]
<hkaiser> K-ballo: not sure, we've seen linker problems on some compilers
<K-ballo> as far as I recall, the feature test for -latomic needed to list a bunch more std::atomic specializations
jakub_golinowski has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
K-ballo has joined #ste||ar
nikunj has quit [Quit: Leaving]
<hkaiser> K-ballo: that could be it, yes
jakub_golinowski has quit [Quit: Ex-Chat]
ASamir has quit [Quit: Leaving]