hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
hkaiser has quit [Quit: bye]
khuck has quit [Ping timeout: 250 seconds]
<simbergm>
bad news: my change to apply made some dataflow tests fail almost all the time, good news: we might actually be able to find out what's wrong with dataflow...
<simbergm>
there's no way that change in itself should be the source of those problems, right?
<simbergm>
it looks like the same that we've seen occasionally, except that it happens almost every time now
<simbergm>
heller: dataflow(f, std::vector<future<blah>>) is meant to wait for the futures in the vector, right? because if not the test is just broken
<heller>
yes, it is supposed to wait
<heller>
the continuation should only be executed once all inputs are ready
<simbergm>
mmh
<jbjnr_>
simbergm: great .I saw those dataflow fails on my guided executor branch and was worried it was that. But if it's on all the branches, I can relax.
<jbjnr_>
maybe I can take a look at one of them, I have some practice with dataflow etc when I worked on the new executors ...
heller has quit [Read error: Connection reset by peer]
heller__ has joined #ste||ar
nikunj has joined #ste||ar
mcopik has joined #ste||ar
hkaiser has joined #ste||ar
<heller__>
hkaiser: so ... what do you propose to solve the issue?
<hkaiser>
no idea
<hkaiser>
I would like to avoid having to set up a separate HPX build for the Phylanx tests
<heller__>
yes
<heller__>
limited resources are a problem
<heller__>
still, the two issues are orthogonal
<hkaiser>
so compiling HPX in c++14 would be a viable workaround
<hkaiser>
(on circleci)
<heller__>
it's only viable until there's a new problem
<hkaiser>
the actual problem is the way Klaus detects c++17 features
<heller__>
right
<heller__>
the change to clang7 should solve the problem (also comes with the gcc8 libstdc++)
<heller__>
unfortunately, there seems to be a problem with the new hpx_main setup...
<hkaiser>
do they have all algorithms by now?
<heller__>
that's my understanding
<heller__>
I only tested std::destroy, TBH
mcopik has quit [Remote host closed the connection]
<hkaiser>
k
<hkaiser>
there should be feature macros defined goinf with each of the algorithms
<hkaiser>
so Klaus could check those
nikunj has quit [Ping timeout: 252 seconds]
<heller__>
if we fix the failures for clang and lld appearing in llvm7, we could at least hide the problem for now
aserio has joined #ste||ar
<heller__>
hkaiser: what about freezing the blaze version?
<hkaiser>
heller__: for how long?
<heller__>
hkaiser: I'd suggest forever
<heller__>
hkaiser: it probably won't be the last time that external dependencies break something
<jbjnr_>
sorry I caused so much trouble.
<hkaiser>
heller__: that's not realistic
<hkaiser>
jbjnr_: not just your fault ;-)
<hkaiser>
it's heller__'s fault too
<jbjnr_>
I could've fixed the mpi stuff the easy way, but I chose to try and do it the 'right' way. Shan't make that mistake again!
<hkaiser>
heller__: but thatnks for setting up clang7 on circleci
<heller__>
hkaiser: well, then you have to live with breakage...
<hkaiser>
heller__: I'll talk to klaus as well
<hkaiser>
for now freezing the blaze version is an option, but that can only be a stop-gap measure
<heller__>
hkaiser: i disagree
<heller__>
hkaiser: there only needs to be a automated mechanism that bumps the versions
<heller__>
That way, you ensure smooth operation
<heller__>
If you always assume that pulling the latest commit on master should work, that's bound to fail
<hkaiser>
heller__: that means that you force people to manually install blaze even if they have a (possibly newer) version already installed
<heller__>
We can't ensure that for hpx or blaze
<heller__>
No, that's what I mean
<hkaiser>
heller__: I'm not talking about latest master, I'm talking about released versions
<heller__>
Phylanx currently uses the latest master versions of hpx, blaze, pybind11 and highfive
<heller__>
For testing on circleci
<heller__>
And that's what I'm suggesting: use fixed versions there.
<hkaiser>
heller__: yes, this is not a necessary setup, it's for us to be sure that dependencies don't break our builds ;-)
<hkaiser>
heller__: ok, sure
<heller__>
Mission accomplished then :p
hkaiser has quit [Quit: bye]
<heller__>
The advantage: we test a curated set of dependencies. Nicely documented
<heller__>
New versions can be updated via PRs. They get checked via circle, and you can report potential problems upstream, without disturbing the normal workflow
<heller__>
This update could even be automated
nikunj has joined #ste||ar
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
eschnett has joined #ste||ar
parsa[w] has quit [Read error: Connection reset by peer]
<simbergm>
that redirects to the generated docs from master now, but I'll change it to latest release once we have 1.2.0 out
<simbergm>
I'm not sure what would be a good place to document this...
<simbergm>
heller suggested I write a blog post about the move to sphinx, I could write about this there as well
<nikunj>
can we not buy a free domain to host the documentation?
<simbergm>
jbjnr_: yeah, if you know something about dataflow please do have a look, I'm looking as well but starting from scratch
<jbjnr_>
I had a very quick look this morning, but the futures are created with make_ready_future, so the fact that is_ready is false is very strange. Implies the construction is flawed?
<simbergm>
no, it's push_back(dataflow(f, make_ready_future))
<simbergm>
so the future from dataflow is not ready immediately (and shouldn't be)
<jbjnr_>
I misread it then. sorry, was only a quick look
<jbjnr_>
will look again later meybe
<simbergm>
np
<hkaiser>
simbergm: could you update the main README to point there, pls?
<simbergm>
hkaiser: yeah, I'll do that
akheir has joined #ste||ar
<hkaiser>
simbergm: thanks, the README still points to the old docs
aserio has quit [Ping timeout: 252 seconds]
<simbergm>
hkaiser: yes, it does
<simbergm>
I'm going to update it on the release branch so that it's changed once we have the release out
<jbjnr_>
can we merge any of my prs before release. squeak squeak
<hkaiser>
simbergm: the README points to the docs in progress as well, that could be changed right away
<simbergm>
jbjnr_: on clang there's this: error: no member named 'free' in namespace 'std'; did you mean simply 'free'?
<simbergm>
if they work :)
<simbergm>
clang 3.8 that is
<jbjnr_>
missing cstdlib then. I fix now
<simbergm>
to all, keep opening prs to master still, I will just merge/cherry-pick to the release branch once that becomes relevant
<ste||ar-github>
hpx/demangle_helper 8239943 John Biddiscombe: Use std::unique_ptr in demangler
ste||ar-github has left #ste||ar [#ste||ar]
<khuck>
is there an ETA on the release?
<hkaiser>
khuck: before sc
<khuck>
thanks
<jbjnr_>
or when we fix ALL the bugs!
<hkaiser>
whatever comes first ;-)
<K-ballo>
sc then
<K-ballo>
there's only a month left
khuck has quit [Read error: Connection reset by peer]
khuck_ has joined #ste||ar
<jbjnr_>
jesus - I just tried turning on CUDA in my build and the list of cmake errors is huge
<jbjnr_>
do we test this?
<jbjnr_>
configure bombs out completely
<hkaiser>
jbjnr_: apparently we don't test it
<jbjnr_>
do we want cuda to be linked PUBLIC or PRIVATE? I vote PRIVATE initially.
<heller__>
Yes
<heller__>
No
<jbjnr_>
No
<jbjnr_>
Yes
<jbjnr_>
:)
<heller__>
Has to be public, because of headers and inline functions
<jbjnr_>
it affext transitivity
<jbjnr_>
affects
<jbjnr_>
we might link to cuda, but the usre might not want to
<jbjnr_>
do we force cuda onto the user or let him add cuda himself if he wants it
<heller__>
He chose to enable it in the first place
<khuck_>
heller__: quick question - do we need any scratch space for the NERSC / ERCAP proposal? The format changed slightly from last year, and I don't think we requested it.
<khuck_>
heller__: we requested 1TB for project space, but unknown for scratch space.
<jbjnr_>
this is hard to argue with, ^^
<heller__>
khuck_: hmm. I usually build on scratch
<jbjnr_>
ut if the system has cuda enabled hpx on it and a user builds a hello world app. Should they be forced to pull in cuda?
<heller__>
khuck_: having at least a TB would be appropriate
<khuck_>
agreed
<heller__>
jbjnr_: it should only be pulled in if it's a cuda app, no?
<heller__>
jbjnr_: hmm you have a point. We're only talking about the rt libs and friends, right?
<jbjnr_>
excatly, but making cuda target_link_libraries(PUBLIC...) forces all the includes, links etc on the user too
<heller__>
Yeah, those should be private
<jbjnr_>
ok, done
<heller__>
So we need to fix the headers, maybe
<heller__>
Not too sure there
<jbjnr_>
cmake 3.12 fixes all this for us with set(CUDA_LINK_LIBRARIES_KEYWORD "PRIVATE")
<jbjnr_>
don't know if this is in earlier cmake versions
<heller__>
If they are implemented properly already...
<heller__>
I think previous versions are pretty broken
<heller__>
Let's just document that we need 3.12 for cuda support.
<jbjnr_>
looks like it was added around cmake 3.9. good
<zao>
I'm guessing by the earlier discussion that these problems are known?
hkaiser has joined #ste||ar
<zao>
(building d4bd36da10d)
akheir has quit [Quit: Leaving]
aserio has quit [Quit: aserio]
<simbergm>
zao: yep, known
<zao>
Was a bit surprised that only those two tests failed building master :)
<khuck>
hkaiser: I am thoroughly confused now. Phylanx crashes with blaze 3.4. Or maybe it just crashes.
<khuck>
hkaiser: is blaze master fixed?
<khuck>
or are we still using 3.4?
<hkaiser>
khuck: blaze master is fixed
<hkaiser>
khuck: on circleci I'd like to get everything else under control first before going back to blaze master
<khuck>
yeah, I am seeing other failures. right now on my main test system everything is failing because there's a zombie HPX process out there, hogging the port. :/
<khuck>
hkaiser: " what(): <unknown>: HPX(network_error) " means that there is another HPX app running, right?
mcopik has quit [Ping timeout: 252 seconds]
mcopik has joined #ste||ar
<K-ballo>
is that all the information? if so we regressed
<khuck>
no, there's more
<K-ballo>
another HPX app running would mention something about binding ports or similar
<khuck>
good point
ste||ar-github has joined #ste||ar
<ste||ar-github>
[hpx] msimberg opened pull request #3493: Remove deprecated options for 1.2.0 part 2 (master...remove-deprecated-options-2) https://github.com/STEllAR-GROUP/hpx/pull/3493
ste||ar-github has left #ste||ar [#ste||ar]
<khuck>
K-ballo: found it. there was a hung 'fibonacci' test from another user - 6 days old.
<K-ballo>
heh
<hkaiser>
khuck: was this the issue you thought being caused by blaze?
<khuck>
no
<khuck>
the blaze issue prevented compilation
<khuck>
phylanx is currently building with blaze 3.4, but there is one failing test