hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
hkaiser has joined #ste||ar
jaafar has quit [Ping timeout: 268 seconds]
hkaiser has quit [Quit: bye]
K-ballo has quit [Quit: K-ballo]
mbremer has quit [Quit: Leaving.]
nikunj1997 has quit [Quit: Leaving]
parsa is now known as parsa_
jaafar has joined #ste||ar
<simbergm>
heller_: yes! sounds like a decision
<simbergm>
we should try to agree on something similar to K-ballo's rule on 10 supported boost releases
<simbergm>
that's a bit over 3 years for boost
<simbergm>
gcc 4.9.4 is almost 3 years old and clang 3.9.1 is a bit over 2 years old
<Yorlik>
I can now build hpx using MSBuild but there is one peculiarity which I don't understand: I gave the compiler the Boost Directive "BOOST_AUTO_LINK_NOMANGLE" to not use mangled boost dll names, but looking at the import table of hpx.dll it's still asking for "boost_program_options-vc141-mt-x64-1_69.dll" instead of plain "boost_program_options.dll". Is there any way to control this behavior from CMake or am I doing
<Yorlik>
something wrong here?
<Yorlik>
It's the same with all imported boost dlls ofc.
<simbergm>
daint is down for maintenance today, so no pycicle until the evening
david_pfander has joined #ste||ar
jbjnr has joined #ste||ar
jaafar has quit [Ping timeout: 268 seconds]
hkaiser has joined #ste||ar
<Yorlik>
hkaiser: The vcpkg version of the hpx.dll still has the mangled boost library name in its import Directory (e.g. boost_program_options-vc141-mt-x64-1_69.dll). Is that intended?
<hkaiser>
Yorlik: no idea - vcpkg is configuring things in certain ways
<Yorlik>
I find myself in a situation where I have to keep two versions of the boost libraries - mangled and unmangled. Creating targets in my project I'll also have to deal with these mangled names or use two versions of the boos löibraries.
<hkaiser>
simbergm: we discussed that with heller_ yesterday
<hkaiser>
we should consider adding #3663 to the release
<simbergm>
hkaiser: supported compilers?
<hkaiser>
simbergm: that as well, although I think this shouldn't change for a point release
<simbergm>
no, not for the point release, not that it matters because 1.2.0 already worked with those
<hkaiser>
#3663 is fixing a race in hpx::future that has caused most of the issues we're seeing
<simbergm>
yeah, that would be a good one
<simbergm>
I'll add that
<hkaiser>
thanks! and thanks that you're doing this!
<simbergm>
no problem
<hkaiser>
simbergm: you now should have a list of things needed for a point release
<hkaiser>
we've never had one before
<simbergm>
hkaiser: yes, more or less in my head at least
<jbjnr__>
hkaiser: was this race in hpx:future always there, or was it introduced in the changes to atomic future?
<hkaiser>
is that worth writing down? oint_release_procedure?
<hkaiser>
jbjnr__: introduced by the atomic future state
<jbjnr__>
seems pretty fundamental to have a race in future<> :(
<simbergm>
I need to think about itstill but I'll add whatever I come up with to the release procedure of course
<hkaiser>
jbjnr__: indeed
<hkaiser>
we've seen problems everywhere after all
<hkaiser>
simbergm: marvelous, thanks!
<simbergm>
diehlpk_work: comment on your fedora post: "Some dependency version problems in “rawhide”, but fedora29 seems to work."
<simbergm>
don't know what rawhide is or if we care...
<simbergm>
I restarted buildbot with the release branch until we get 1.2.1 out
<jbjnr__>
hkaiser: are you up late or in a non USA timezone?
<zao>
simbergm: rawhide is the Fedora rolling distribution, kind of like debian sid.
<zao>
So whatever soup of as-new-as-possible package versions, ticking rapidly.
<simbergm>
zao: ah, thanks, so probably that person was just hitting the problems we're trying fix with 1.2.1...
<zao>
If there's version problems in rawhide, it might be indicative of future problems that one might want to address.
<hkaiser>
jbjnr__: can't sleep
K-ballo has joined #ste||ar
<heller_>
hkaiser: simbergm: should we disable our biggest offenders for now and open a ticket to fix them? That would be migrate_component and cancelable_action
<simbergm>
heller_: maybe, at least we'd spot other mistakes more easily
<simbergm>
migrate_component already has an issue
<jbjnr__>
Yorlik / hkaiser : I have not used my windows build for a while, so I tried a completely clean build of boost 1.68, and hpx and a binary of hwloc. Things were ok until link time. hpx tries to link against libboost_filesystem-vc141-mt-gd-x64-1_68.lib, but the boost lib is called boost_filesystem-vc141-mt-gd-x64-1_68.lib (without the 'lib' prefix). I tried setting Boost_LIB_PREFIX='' and BOOST_LIB_PREFIX='' but I can't remember the
<jbjnr__>
right way to fix this. Cmake detects the libs correctly, but I suppose the AUTO link stuff in boost screws up.
<jbjnr__>
I don't remember the right fix.
<Yorlik>
I used versioned build
<Yorlik>
oh libboost
<Yorlik>
So thats the static one
<K-ballo>
libboost is static, boost is shared
<heller_>
simbergm: the biggest problem with master failing all the time is, IMHO, that the docker image isn't updated, things like phylanx can't pick up the latest bugfixes then
<jbjnr__>
I'll add Boost_USE_STATIC_LIBS to my CMake then
<simbergm>
heller_: yeah, that's bad
<heller_>
right now, it seems like I am the only one that actually cares though ;)
<jbjnr__>
stop breaking master then!
<jbjnr__>
:)
<simbergm>
both failures seem to be pretty well isolated to whatever they're testing, and I've disabled broken examples as well...
<jbjnr__>
smile more. Commit less.
<heller_>
simbergm: yeah, eventually, those need fixing
<Yorlik>
I wish there was a good way to work with two repos - a private and a project one. I'm commiting a ton of silly stuff I'd love to not have in our team git.
<heller_>
the downside of disabling them is that we just forget about it...
<heller_>
Yorlik: you know, setup different remotes
<simbergm>
heller_: yeah, obviusly, what I'm trying to say is that at least they shouldn't affect anything else (unlike say the future race)
<heller_>
git is a distributed version control...
<jbjnr__>
windows question. I have upgraded to visual studio 2017 - when I rerun cmake, I do not get the 'reload solution something has changed' option any more. Does it now do it silently under the hood, or do I have to manaualy unoad the solution and then reload it
<simbergm>
we might as well consider migrate_component an unstable part of hpx, like partitioned vector
<heller_>
simbergm: sure, the problem is, that migration is kind of an important feature, cancelling tasks is important as well, I think
<Yorlik>
heller_: Woops - missed that idea actually - but makes a ton of sense.
<simbergm>
heller_: ok, I didn't realize it was being used that much
<heller_>
not sure if it is use
<heller_>
d
<heller_>
but we advertise it :P
<K-ballo>
jbjnr__: you only get the "reload" if something that needs reloading, like sources in the project, or configuration options, change
<simbergm>
yeah... best would be fixing it of course, but that seems difficult
<simbergm>
which leaves us with two bad options: disable the tests or live with docker image not updating
<simbergm>
and since we already shipped 1.2.0 with broken migrate_component I would just consider a known bug for now, i.e. I'd be okay with disabling it
<simbergm>
would be good to still have it built though
<heller_>
let's leave it in
<heller_>
it only takes one or two rebuilds to have it green again anyway
<heller_>
that reminds us that we need to fix this...
<simbergm>
sure
<simbergm>
I can have a look at cancelable_action again for the other timeouts there
<heller_>
it looks like that there is a race between trying to set the interruption point and the call to suspend
<heller_>
and I am still not sure if that suspend is a yield now, or a real suspend ;)
<heller_>
when switching to this_thread::yield(), we get an unhandled exception...
<heller_>
ah, no. suspend() should be the equivalent to yield()
<simbergm>
yep, should be equivalent, but even a timed suspend should be fine
<Yorlik>
It seems when I built the INSTALL target FindHPX.camke wasn't created. Do I have to do something special to get it?
<Yorlik>
s/camke/cmake/g
<hkaiser>
Yorlik: there is no FindHPX.cmake, never was
<hkaiser>
configuring hpx created <builddir>/lib/cmake/HPX/HPXConfog.cmake and friends
<heller_>
or <installdir>
<hkaiser>
that directory is where you point dependent cmake projects to
<simbergm>
plus if it becomes a chore for me I need to make sure it's not
<simbergm>
and that's not by doing fewer releases or hating you...
<jbjnr__>
K-ballo: that's what I used to use ...
<hkaiser>
right... you need to do more releases, then
<K-ballo>
I do something like `b2 --build-type=complete address-model=32,64 -j10` have to look it up every time, guess which options take dashes, which don't, etc
<jbjnr__>
simbergm: why are you having everyone?
<jbjnr__>
^hating
<simbergm>
hkaiser: exactly ;)
<simbergm>
jbjnr__: I'm *not* hating anyone
<jbjnr__>
WHY WOULD YOU SAY THAT THOUGH? sOMEONE MUST HAVE THOUGHT YOU DID?
<jbjnr__>
OOPS
<jbjnr__>
CAPS LOCK
<jbjnr__>
grrr
<hkaiser>
Yorlik, K-ballo: I do 'vcpkg install boost:x64-windows'
<hkaiser>
jbjnr__: read back ^^
<Yorlik>
vcpkg worked nicely - but I want to have control over my build.
<hkaiser>
lol
<Yorlik>
:)
<K-ballo>
that reminds me I should stop building 32-bit boost, I only use boost for hpx nowadays
<heller_>
the completion handler might access this, which might have gone out of scope already
<heller_>
making completion handling completely static, avoids this entirely
<Yorlik>
I wonder if I made a mistake when building hpx because I am now getting linker errors - it seems symbols referenced from hpx in boost are not found for some reason. I had that issue before with my program and added #define BOOST_ALL_DYN_LINK which fixed it for my program, but it seems the problem is coming back from the hpx side (if I am reading the error messages correctly)
<jbjnr__>
I compiled static and dynamic versions of boost, so nw cmake is happy. But now I get link errors because it links to both. Once via the cmake library, the other by the auto linking built into boost. PITA
<hkaiser>
heller_: mind, you, I have no objections to make this change, if however the object is gone, then we're in trouble much earlier than when invoking the completion handler
<K-ballo>
we should be disabling auto-linking privately for building hpx targets
<hkaiser>
it's a member of the same object you're afraid of it is already gone
<jbjnr__>
hkaiser: I will, but this sort of thing should not be necessary
<heller_>
hkaiser: set_value, but it doesn't need the object
<hkaiser>
sure it does, calling a member on an object that is gone is UB, isn't it?
<heller_>
hkaiser: well, it might happen, that was the bug we ran into with sync_put_parcel
<heller_>
yes
<heller_>
the situation is different though: the object goes out of scope while the function is being executed ;)
<hkaiser>
sync_put_parcel was caused by the problem you fixed in #3663
<heller_>
yes
<hkaiser>
then let's keep it alive for the duration of the execution of the completion handlers
<hkaiser>
move the 'boost::intrusive_ptr<future_data_base> this_(this);' up in the call chain
<heller_>
I don't think that's the right call here
<hkaiser>
why not, that's what we were doing before, just too late as you're saying
<heller_>
so: either the completion handlers keep the shared state alive long enough by themself, or it just doesn't matter if the object is alive at the time when the completion handler is being called
<hkaiser>
heller_: ok
<heller_>
and that's what we do in future::then and friends. where we need the shared state inside the continuation
<hkaiser>
ok, as I said, I don't object to this change
<hkaiser>
but it doesn't look like a change that has to go into the 1.2.1 release
<hkaiser>
except if this happens to fix the two failing tests ;-)
<heller_>
it is related to #3663
<hkaiser>
heller_: whatever, let simbergm decide
<heller_>
differently: it should have been part of #3663
<heller_>
sure
<heller_>
I don't mind. let's do a 1.3.0 ;)
<heller_>
hkaiser: #3663 fixed a problem where we accessed a member while the object went out of scope too early, turns out that we missed one incarnation of this, namely capturing this in an intrusive pointer when invoking the completion handlers on a new thread, which isn't necessary
<heller_>
so yes, this is nasty, but it's not UB
<hkaiser>
ok
<jbjnr__>
hkaiser: just fyi. I disabled auto linking, but now I have link errors for missing symbols. boost::system::error_category and program options etc etc
<jbjnr__>
it should not be this difficult to get right
<hkaiser>
so it expects the shared libraries but tries to link with the static ones?
<hkaiser>
or v.v.
<hkaiser>
jbjnr__: no idea what you're doing, I have never had this kind of problems
<jbjnr__>
I used to have this problem every time, but I can't find my 'good' settings for the windows machine.
<jbjnr__>
I usually have them in my wiki. After seeing the problems Yorlik was having I thought I'd try a clean build
<simbergm>
do you still want me to upload the archives to stellar.cct.lsu.edu or are you grabbing one from github?
<simbergm>
if all looks good this will be the last one
<diehlpk_work>
I can grab on from github, but for consistence the final release needs to be on the archives.
<simbergm>
of course
<simbergm>
just the rc
<diehlpk_work>
For some reason, I decided to use the stellar.cct.lsu.edu url in the fedora
<diehlpk_work>
Yes, I will use it from github
<Yorlik>
Is add_hpx_executable a full replacement for add_excutable? I'm having issues using it.
<Yorlik>
E.g. I usually add sources in an extra statement: target_sources( ...
<Yorlik>
I tried using the conventional add_executable and target_link_libraries with ${HPX_Libraries, but that didn't work}
<simbergm>
if I have a store(release) in one thread and a load(acquire) in another that definitely happens after the store, I'm supposed to always see the value that I stored, no?
<simbergm>
*see the value stored in the first thread, in the second thread
hkaiser has quit [Quit: bye]
parsa_ is now known as parsa
<diehlpk_work>
simbergm, rc3 is running on the fedora build system. We should know tomorrow morning, if we can finalize the release
<simbergm>
good, thanks
<diehlpk_work>
So we would do a new build with the release candidate on Thursday and could submit to fedora on Friday
<diehlpk_work>
not the rc with the release
<heller_>
simbergm: you might still observe the old value on the load. The acquire only means that operations happening at the same time (more or less) completed
<heller_>
Yorlik: you can use hpx_setup_target
<Yorlik>
I realized there is this _exe and _component convenmtion
<Yorlik>
I'm currently setting up the hello world example
<simbergm>
heller_: :/
<simbergm>
I'm failing to fix cancelable_action_client, but know what's going on
<heller_>
What's going on?
<simbergm>
it hangs in join, sometimes it adds exit callbacks after running them, and then ends up waiting for the callback to run
<simbergm>
is seq_cst supposed to be enough to see a modified value? (running now...)
<heller_>
Yes, with acquire/release you should see it eventually as well
aserio has quit [Ping timeout: 250 seconds]
<heller_>
I'd still go with a lock+cv
<simbergm>
hmm, but I don't want to sit there waiting for it
<simbergm>
so it's run_exit_funcs_ that should be visible
<simbergm>
in thread_data.cpp
<simbergm>
I need to think...
jbjnr has quit [Ping timeout: 240 seconds]
<heller_>
so the exit_thread_funcs_ is what's causing the problems?
aserio has joined #ste||ar
<Yorlik>
Allright - first successfull app compile within my build system.
<Yorlik>
Seems apart from learning to build hapx I can now start learning hpx itself :)
<Yorlik>
s/hapx/hpx/g
<simbergm>
heller_ yeah, or ran_exitfuncs
<simbergm>
Not 100% sure about the name, just left work
<diehlpk_work>
You could check tomorrow morning, if these are all green and do the 1.2.1 release
<Yorlik>
LOL - helloworld --hpx:dump-config is fun ...
<Yorlik>
Thats the largest output I ever received from a helloworld :)
<Yorlik>
Do understand correctly, that exceptions happening on a remote node get rethrown locally?
<Yorlik>
like when getting a future which was computed remotely?
<K-ballo>
yes, exceptions are propagated when you call .get() on a future
<Yorlik>
I'm amazed. I was skimming over the docs the last 2 hours or so. I want to try to create a multithreaded event based Lua Engine next week - I means starting to ..
<Yorlik>
I see a lot of stuff we discussed internally already being implemented in HPC. I didn't expect to find an execution environment that advanced already being available.
<Yorlik>
We always thought " Yeah - would be nice if we could have it, but probably its too much to do it"
<Yorlik>
Like the extensive exception handling, performance counters and logging.
<K-ballo>
HPX
<Yorlik>
It's going to be fun - I'm really looking forward to it.
<K-ballo>
HPC is High Performance Computing
<Yorlik>
I feel the time to messing with the build system is coming to an end finally.
<Yorlik>
Woops
<Yorlik>
I always to that - sry
<Yorlik>
BTW - what does it stand for? Gigh Performance Extension?
<K-ballo>
it meant High Performance ParalleX once, nowadays I think it just names our framework (kinda how LLVM no longer stands for anything)
<Yorlik>
:)
<Yorlik>
I've got to go - thanks for all the explanations. :)
* Yorlik
heads out
jbjnr has joined #ste||ar
mbremer has joined #ste||ar
<mbremer>
Quick question: Is there a quick API call to reset all performance counters? I want to time idle rates and average task sizes, and would like to not time the initialization of the code.
<mbremer>
Maybe reset_active_counters?
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
aserio1 is now known as aserio
<K-ballo>
iirc the function to evaluate them had a reset flag as well
hkaiser has quit [Quit: bye]
<mbremer>
K-ballo: Looking through the code, that seems to be the case. I guess my other question is that all of these operations are synchronous? E.g. if I call reset_active_counters on. Will it happen globally? Or do I need to call the function on each locality?
<K-ballo>
I have no idea about perfcounters.. let me peek at the code
<K-ballo>
mbremer: don't know
<mbremer>
I'll ping Hartmut directly. I'm just a little unfamiliar with where exactly the performance counters sit in the hpx runtime. If they're just components etc etc etc. Thanks for your help though K-ballo
<jbjnr>
mbremer: I don't think there's a call to reset the perf counters, but having one would be quite useful
<jbjnr>
(even if individually - actually, ther must be one for individual ones...)
aserio has quit [Quit: aserio]
<K-ballo>
someone asking about PDF documentation on the slack channel