hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
jaafar has quit [Ping timeout: 246 seconds]
jbjnr__ has joined #ste||ar
jbjnr_ has quit [Ping timeout: 240 seconds]
akheir1 has joined #ste||ar
wash[m]_ has joined #ste||ar
wash[m] has quit [*.net *.split]
akheir has quit [*.net *.split]
diehlpk_work has quit [*.net *.split]
wash[m]_ is now known as wash[m]
jaafar has joined #ste||ar
jaafar has quit [Client Quit]
jaafar has joined #ste||ar
jaafar has quit [Remote host closed the connection]
jaafar has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: bye]
weilewei09 has joined #ste||ar
<weilewei09> Hi, I have an inspect issue on the circle CI for my dist_object branch in HPX, which I cannot infer much info from the error report. The link is here https://circleci.com/gh/STEllAR-GROUP/hpx/81736?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-checks-link and the error msg is :"#!/bin/bash -eo pipefail
<weilewei09> ./bin/inspect --all --output=./hpx_inspect_report.html /hpx/source
<weilewei09> Exited with code 1" Does anyone have an idea about this? Thanks for your attention!
jaafar has quit [Ping timeout: 268 seconds]
jaafar has joined #ste||ar
jaafar has quit [Client Quit]
jaafar has joined #ste||ar
<jbjnr__> weilewei09: no idea. looks like a fail of the test runner in some way. Probably you should ignore it for now. if you want to double check, then build the inspect tool locally (cmake -DHPX_WITH_TOOLS=ON . && make -j4 inspect) then run bin/inspect /path/to/hpx -a -o inspect.html
<simbergm> if you're logged in to circleci you should see an "artifacts" tab which usually contains error reports if there was an error (and that's the case now)
<simbergm> it's complaining about you using a "C-style assert macro"
<simbergm> you should instead be using HPX_ASSERT
K-ballo has joined #ste||ar
david_pfander has joined #ste||ar
david_pfander has quit [Ping timeout: 264 seconds]
hkaiser has joined #ste||ar
eschnett has quit [Quit: eschnett]
diehlpk has joined #ste||ar
eschnett has joined #ste||ar
aserio has joined #ste||ar
jbjnr_ has joined #ste||ar
jbjnr__ has quit [Ping timeout: 240 seconds]
<diehlpk> hkaiser, We need to find a better title for the hpxmp 2 paper
<hkaiser> diehlpk: ok
<hkaiser> diehlpk: 'Moving into the future - ensuring deep compatibility between OpenMP and AMTs'
david_pfander has joined #ste||ar
daissgr has joined #ste||ar
<aserio> hkaiser: Operation Bell Meeting?
<hkaiser> aserio: could you paste the link for the meeting here, please?
<aserio> hkaiser: ^^
<hkaiser> tks
diehlpk has quit [Ping timeout: 268 seconds]
nikunj has joined #ste||ar
jpenuchot has joined #ste||ar
daissgr has quit [Ping timeout: 252 seconds]
diehlpk has joined #ste||ar
<diehlpk> hkaiser, I think we should go with this title. aserio Any comments?
<hkaiser> ;-)
<hkaiser> its at least flashy, if nothing else
<hkaiser> diehlpk: I have added the hpx section to the paper, pls let me know if you need more
diehlpk has quit [Ping timeout: 252 seconds]
diehlpk has joined #ste||ar
<diehlpk> hkaiser, Thanks, looks good to me
<diehlpk> I will read the paper on Tuesday again and try to make the introduction more related to the current results
aserio has quit [Ping timeout: 268 seconds]
diehlpk has quit [Ping timeout: 264 seconds]
aserio has joined #ste||ar
<hkaiser> simbergm: yt?
<simbergm> hkaiser: yeah
<hkaiser> simbergm: could you please explain to me under what circumstances the latch was racey?
<hkaiser> I can't figure it out
<simbergm> did the commit message make any sense?
<hkaiser> not really :/
<hkaiser> in the parallel algorithms we have to different types of threads touching on the latch
<simbergm> ok, I'll try again :P
<hkaiser> the main thread that will eventually call wait and all the others that call count-down (eventually one will call notify_all)
<simbergm> yep
<hkaiser> that notify_all wakes up the main thread if its sitting in the CV
<hkaiser> so where is the race
<hkaiser> ?
<simbergm> note: race might not be the correct term here
<hkaiser> ok
<simbergm> basically wait can return before the last count_down calls notify_all
weilewei09_ has joined #ste||ar
<simbergm> so:
<hkaiser> how's that?
<simbergm> 1. count_down decrements the counter to zero outside the lock
<hkaiser> how can wait return before the counter reaches zero?
weilewei09_ has quit [Client Quit]
<simbergm> at the same time (just after) wait checks the counter and sees that it's zero and decides not to wait_all
<simbergm> and after that count_down continues to the lock and calls notify_all
<hkaiser> ok, now I see it
<simbergm> but by then wait has already returned
<simbergm> ok
<hkaiser> thanks, makes sense now
<simbergm> another (I think incorrect) way of fixing it would be to again make sure that each thread keeps the latch alive
<hkaiser> yah
<hkaiser> or call count_down_and_wait on the main thread and use N+1 as the counter value
<hkaiser> but you're right, this is a problem in our implementation - thanks for figuring that out
<weilewei09> @simbergm @jbjnr__ thanks! I think that is the problem, I will check it and get back to you later
<simbergm> yeah, but is it any better than fixing count_down?
<hkaiser> simbergm: no, we need to fix count_down anyways
<hkaiser> simbergm: that flag of yours needs to be set back to false inside reset(), I think
<simbergm> I can't really think of a way to avoid the lock in at least some place :/
<simbergm> ah, good catch
<simbergm> I forgot that it can be reset
<hkaiser> simbergm: that also allows to change the assert in reset, which now can check for isnotified == true
<simbergm> weilewei09: good! let us know if you have any other problems
<simbergm> hkaiser: I guess calling reset just after constructing is illegal (it is pointless)?
<hkaiser> right
<hkaiser> we can make it illegal ;-)
<simbergm> oh wait, it is already illegal :)
<hkaiser> yah, the counter has to be zero, currently
<hkaiser> this is really mostly to make sure nobody resets things in the middle of a sync operation but only after all threads have left the latch
<simbergm> sure, it definitely makes sense
<hkaiser> simbergm: we could also count down the counter only after notify_all has returned
<hkaiser> not sure...
<simbergm> I thought about that and decided against it but I've already forgotten why
<simbergm> ah, it gets messy
<simbergm> and doesn't help in the end
<simbergm> we need to check if counter_ == n
<hkaiser> yes
<simbergm> which might be false first but by the time we get to decrement it it might already be true
<simbergm> I couldn't get it to work
<hkaiser> nod
<hkaiser> darn atmoics
<simbergm> mmh
<simbergm> techically I'd need to lock in reset if I want to set notified_ = false, no? because it's a plain bool
<simbergm> making it atomic is maybe not the end of the world
<hkaiser> grabbing the lock in reset is not the end of the world either
<hkaiser> it's called infrequently
<simbergm> good point
<simbergm> that sounds better
<simbergm> it shouldn't really be conteded either unless it's called in the wrong place
<hkaiser> indeed
khuck has joined #ste||ar
<khuck> parsa: did you add the "pp" stuff to hpx?
<hkaiser> khuck: that was heller
<hkaiser> khuck: anything I can help with?
<hkaiser> khuck I think I know what's wrong
<hkaiser> could it be that apex misses a include directory path (for some reason)?
<hkaiser> we might have forgotten to export that
aserio has quit [Ping timeout: 246 seconds]
<hkaiser> khuck: I'll have a look
<khuck> I've added ${HPX.pp_SOURCE_DIR}/include to the APEX CMakeLists.txt file
<khuck> that seems to have done the trick.
<khuck> I don't explicitly add the HPX include paths, so I don't know how those get added.
aserio has joined #ste||ar
aserio has quit [Client Quit]
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<hkaiser> khuck: ok, I checked
<hkaiser> khuck: the build system should create links inside of <build_dir>/include that refer to the correct directories
<hkaiser> <build_dir>/include/hpx/ that is
<hkaiser> it should also add <src_root>/libs/pp/include to the include search path
<khuck> it should, or it does?
<khuck> because from what I can see, it doesn't
<hkaiser> it does for me
<khuck> for the apex directory?
<hkaiser> khuck: in any case pls report this as a ticket, if nothing happens (heller is mostly MIA) I'll revert that change to HPX tomorrow
<khuck> ok
<hkaiser> thanks
<hkaiser> khuck: who is generating the include path (-I directives) for compiling APEX?
<hkaiser> khuck: is that some manual setup?
<khuck> there's a apex/src/apex/CMakeLists.hpx file that is used
<khuck> I tried adding ${HPX.pp_SOURCE_DIR}/include explicitly, and that worked once, but not again
<hkaiser> khuck: can you import HPX as a cmake target? that should take care of things
<khuck> how would I do that?
<hkaiser> find_package(HPX) add_target_directory(APEX_target HPX) add_target_link_library(APEX_target HPX) or somesuch, need to have a look
<hkaiser> the HPX target name is lower case hpx (not upp case as shown above)
<hkaiser> khuck: calling hpx_setup_target(APEX-target) should do the trick as well
<hkaiser> khuck: I have to admit I lost track of how to do this kind of thing after recent changes :/
<khuck> ?
<hkaiser> after
<khuck> no, that gave me an error:
<khuck> CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
<khuck> "apex" of type STATIC_LIBRARY
<khuck> depends on "hpx" (weak)
<khuck> "hpx" of type SHARED_LIBRARY
<khuck> depends on "apex" (weak)
<khuck> At least one of these targets is not a STATIC_LIBRARY. Cyclic dependencies are allowed only among static libraries.
<hkaiser> right
<hkaiser> we link hpx against APEX
<hkaiser> forgot about that
<hkaiser> then we should add the include directories only, not the libraries, setup_target does both
<hkaiser> try adding a target_include_directories(apex, hpx) instead of the setup_target
<hkaiser> (no comma betwee apex and hpx)
<hkaiser> target_include_directories(apex hpx)
<khuck> hkaiser: another error
<khuck> CMake Error at apex/src/apex/CMakeLists.hpx:357 (target_include_directories):
<khuck> target_include_directories called with invalid arguments
<hkaiser> grrr
<hkaiser> the hpx target might not have been defined yet
<hkaiser> khuck: currently, youhave listed the include directories manually here: https://github.com/khuck/xpress-apex/blob/c5e7cd56c54cf56b0b0dc4f5fe1c35c5ba36b363/src/apex/CMakeLists.hpx#L31
<khuck> yes, that was the fix I did an hour ago, it didn't work
<khuck> I haven't had to explicitly set an HPX include path before
<hkaiser> ok
<hkaiser> khuck: those are probably inherited from the 'parent' directory
<hkaiser> let me see
<hkaiser> and remove the target_include_directories from above
<khuck> no, that didn't work either
<hkaiser> khuck: could you paste the generated command line, pls?
<khuck> you mean the "make VERBOSE=1" output?
<hkaiser> yes
<hkaiser> for that file
<khuck> cd /home/users/khuck/src/phylanx/tools/buildbot/build-delphi-x86_64-Linux-gcc/hpx-Debug/apex/src/apex && /packages/gcc/7.1/bin/g++ -DAPEX_HAVE_ACTIVEHARMONY -DAPEX_HAVE_HPX_CONFIG -DAPEX_HAVE_OTF2 -DAPEX_HAVE_PAPI -DAPEX_HAVE_POWERCAP_POWER -DAPEX_HAVE_PROC -DBOOST_ALL_NO_LIB -DDEBUG -DHPX_COROUTINE_EXPORTS -DHPX_EXPORTS -DHPX_LIBRARY_EXPORTS -D_GNU_SOURCE -I/home/users/khuck/src/phylanx/tools/buildbot/src/hpx -isystem /usr
<khuck> /local/packages/boost/1.65.0-gcc7/include -I/home/users/khuck/src/phylanx/tools/buildbot/build-delphi-x86_64-Linux-gcc/hpx-Debug -I/home/users/khuck/src/phylanx/tools/buildbot/src/hpx/apex/src/apex -I/home/users/khuck/src/phylanx/tools/buildbot/build-delphi-x86_64-Linux-gcc/hpx-Debug/apex/src/apex -I/home/users/khuck/src/phylanx/tools/buildbot/src/hpx/apex/src/contrib -I/usr/local/packages/papi/papi-knl/5.5.0/include -I/usr/
<khuck> local/packages/activeharmony/4.6.0-gcc/include -I/usr/local/packages/otf2-2.1/include -fPIC -g -fPIC -std=c++17 -o CMakeFiles/apex.dir/apex.cpp.o -c /home/users/khuck/src/phylanx/tools/buildbot/src/hpx/apex/src/apex/apex.cpp
<hkaiser> khuck: the former line 1817 (the one you moved), could you change it to include_directories("${CMAKE_BINARY_DIR}/include"), please?
<hkaiser> this should still come before apex is configured, i.e. before line 1622 or so
<khuck> I don't think that will help, will it? The cat.hpp file is in libs/pp/include, in the source tree
<khuck> hpx/libs/pp/include/hpx/pp/cat.hpp
<hkaiser> khuck: that should help
<hkaiser> as I said, the build system creates links to the directories in libs located under <build_dir>include
<khuck> new error:
<hkaiser> good
<khuck> In file included from /home/users/khuck/src/phylanx/tools/buildbot/src/hpx/hpx/config.hpp:17:0,
<khuck> from /home/users/khuck/src/phylanx/tools/buildbot/src/hpx/src/hpx_wrap.cpp:6:
<khuck> /home/users/khuck/src/phylanx/tools/buildbot/src/hpx/hpx/config/attributes.hpp:9:10: fatal error: hpx/config/defines.hpp: No such file or directory
<khuck> #include <hpx/config/defines.hpp>
<hkaiser> progress ;-)
<khuck> ^~~~~~~~~~~~~~~~~~~~~~~~
<khuck> compilation terminated.
<hkaiser> ok, so it needs both directories
<hkaiser> the old one and the changed one
<khuck> source and binary?
<khuck> or binary with/without include?
<hkaiser> include_directories("${CMAKE_BINARY_DIR}") and include_directories("${CMAKE_BINARY_DIR}/include")
<khuck> that seems to have done it. I'll submit a patch/PR
<hkaiser> thanks!
khuck has quit []
eschnett has quit [Quit: eschnett]