aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | | HPX: A cure for performance impaired parallel applications | | Buildbot: | Log:
<diehlpk> Ok, hkaiser I will do it later
<diehlpk> zao, I invited you
parsa has quit [Quit: Zzzzzzzzzzzz]
<diehlpk> hkaiser, zao
parsa has joined #ste||ar
<hkaiser> try again but run with 'catch throw'
<hkaiser> that will stop at the spot where the initial execption is thrown
<hkaiser> the one in yor listing is the rethrown exception after it has been handled in the scheduler
eschnett has joined #ste||ar
<hkaiser> diehlpk: uhh
parsa has quit [Quit: Zzzzzzzzzzzz]
<diehlpk> hkaiser, I am not able to reproduce it on my local machine
<hkaiser> it's a strange one
<hkaiser> diehlpk: is it a debug/release mismatch?
<hkaiser> the hpx docker image has a debug build, yours seem to be a release build
<hkaiser> relwithdebinfo, that is
parsa has joined #ste||ar
<diehlpk> Ok, hpx is using this one here stellargroup/build_env:debian_clang
<hkaiser> yes, that's a debug build
<diehlpk> I am using stellargroup/hpx:dev
<hkaiser> that's a debug build as well
<hkaiser> the first is a barebone docker image, the second one has hpx built in
<hkaiser> (well, the first has the prerequisites installed)
<diehlpk> Ok, - docker run -v $PWD:/hpx -w /hpx/build -e "CIRCLECI=true" ${IMAGE_NAME} cmake -DCMAKE_BUILD_TYPE=Debug -DHPX_WITH_MALLOC=system -DPHPX_Doc=ON -DPHPX_Test=ON ..
<hkaiser> yes
<hkaiser> what CMAKE_BUILD_TYPE do you use?
<diehlpk> Debug
<hkaiser> hmm
<hkaiser> something is off, then
<diehlpk> This line above is from my circle-ci
<hkaiser> ah
<diehlpk> I will try to use the same kind of main as in the hpx example
<hkaiser> that's not the problem
<diehlpk> And we use the same line for HPXCL
<hkaiser> and it breaks there as well?
<diehlpk> I have to check it again, I removed the test cases there because they were not working and I can not remember why
<hkaiser> k
<hkaiser> this really looks like a release/debug mismatch to me
<diehlpk> Ok, I will check if there is a mismatch or not
<hkaiser> otoh, it all happens inside the hpx core library - hmmm
<hkaiser> no idea what's wrong
<diehlpk> But should hpx not complain if I combine my own code as debug with release hpx build or vice versa?
<hkaiser> yah, it should
<hkaiser> as I said, not sure what's going on
<hkaiser> should not link, actualy
<diehlpk> Ok, I will investigate more tomorrow morning
<hkaiser> k
<diehlpk> The gsoc student is doing quite well for writing the paper
hkaiser has quit [Quit: bye]
K-ballo has quit [Quit: K-ballo]
EverYoung has joined #ste||ar
<zao> I went and intentionally mismatched build-types in my own build of a simple executable, RWDI HPX with Debug project -
diehlpk has quit [Ping timeout: 264 seconds]
<zao> (results in a rather uninformative trace of a segfault)
EverYoung has quit [Remote host closed the connection]
nanashi55 has quit [Ping timeout: 248 seconds]
nanashi55 has joined #ste||ar
<zao> Debug HPX with RWDI project is more fun:
<zao> terminate called after throwing an instance of 'std::invalid_argument'
<zao> what(): hpx::resource::get_partitioner() can be called only after the resource partitioner has been allowed to parse the command line options.
<zao> Aborted (core dumped)
EverYoung has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
<zao> diehlpk_work: Are you aware of these warnings?
<zao> This feels race:y... only triggers outside of GDB
<zao> terminate called after throwing an instance of 'hpx::detail::exception_with_info<hpx::exception>'
<zao> what(): description is nullptr: HPX(bad_parameter)
<zao> And not reliably, like half of the times I run it.
<zao> (debug HPX, debug PDHPX)
<zao> This happens on a "working" run, after a few minutes:
<zao> Seems like it's spending its time slowly eating up my measly 32G of memory.
simbergm has joined #ste||ar
simbergm is now known as msimberg
david_pfander has joined #ste||ar
<heller> msimberg: still working on it ... got side tracked with this USL thingy ...
hkaiser has joined #ste||ar
<jbjnr> I just realized why gdb is so slow attaching to hpx process on daint when I use px:attach-debugger=exception
<jbjnr> it's because all threads are spinning in the schedule loop and using 100% cpu whilst gdb is trying to laod.
<jbjnr> I wonder if we can use fancy new suspend runtime feature to halt hpx when an exception is hit and awaken it once we've attached the debugger!
<github> [hpx] StellarBot pushed 1 new commit to gh-pages:
<github> hpx/gh-pages e35141a StellarBot: Updating docs
<heller> jbjnr: go for it!
<jbjnr> msimberg: hope you're paying attention :)
<msimberg> jbjnr, heller: listening :)
<msimberg> heller, I'm also still working on it
<heller> msimberg: ok
<msimberg> I have the throttle and timed tests passing simultaneously, but not sure if this is the best way...
<msimberg> basically I took jbjnr's idea of different exit modes, so now the throttle test and shutdown remove processing units in different ways
<msimberg> are going some other way?
<msimberg> are you...
<msimberg> jbjnr: your threads are correctly a spinning at 100%? i.e. they actually have work to do?
<jbjnr> no, no work to do, but they sit in a wait state with the cpu consuming 100%
<msimberg> thinking if this idle backoff fix that I was thinking of would help in your case
<msimberg> basically make the backoff exponential
<jbjnr> I didn't look closely, but probably a spinlock with no backoff or no actual susped of underlying pthread
<msimberg> do you have IDLE_BACKOFF on?
<jbjnr> doubt it
<msimberg> if yes, you can try a one-line change to try it
<msimberg> ok
<jbjnr> yes. I should try setting HPX_HAVE_THREAD_MANAGER_IDLE_BACKOFF / WITH...XXX
<jbjnr> that ought to fix it.
<jbjnr> ignore my previous comments then
<msimberg> heller: see above question :)
<heller> msimberg: ahh, yeah, different exit modes make sense I am perfectly ok with it
<heller> msimberg: I am trying a different approach
<msimberg> I'm thinking if it's even okay to allow remove_processing_unit outside of shutdown, and only allow suspend_processing_unit...
<msimberg> the difference being that a suspended pu will be woken up again to finish off its work
<msimberg> and then it can have strict termination criteria again
<msimberg> and secondly if when removing/suspending (doesn't matter which) the scheduling_loop should wait for suspended tasks? they could take arbitrarily long to finish, but then again one might have to wait for them to finish
<msimberg> if suspending the pu suspended tasks get taken care of once resumed, but if removing the pu one would have to make sure that the pu gets added back again to finish the suspended tasks
<msimberg> otherwise suspended tasks would have to be stolen or something
<msimberg> heller?
<msimberg> sorry :)
<heller> hm?
<msimberg> well, no specific question but thoughts?
<msimberg> on the above?
<heller> let's make what we have to work first
<msimberg> ok, fair enough
<msimberg> should I wait for what you have or can I go ahead with my approach?
<msimberg> it needs some cleaning up first though...
<msimberg> would be curious how you tried to fix it as well
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
parsa has quit [Client Quit]
parsa has joined #ste||ar
parsa has quit [Client Quit]
<heller> msimberg: no, go ahead, let's pick whoever finishes first ;)
<msimberg> heller: ok :)