hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
hkaiser has quit [Quit: bye]
khuck has quit [Ping timeout: 250 seconds]
<simbergm> bad news: my change to apply made some dataflow tests fail almost all the time, good news: we might actually be able to find out what's wrong with dataflow...
<simbergm> there's no way that change in itself should be the source of those problems, right?
nanashi55 has quit [Ping timeout: 252 seconds]
nanashi55 has joined #ste||ar
<heller> simbergm: how do they fail?
eschnett_ has quit [Quit: eschnett_]
<heller> that's bad
<simbergm> it looks like the same that we've seen occasionally, except that it happens almost every time now
<simbergm> heller: dataflow(f, std::vector<future<blah>>) is meant to wait for the futures in the vector, right? because if not the test is just broken
<heller> yes, it is supposed to wait
<heller> the continuation should only be executed once all inputs are ready
<simbergm> mmh
<jbjnr_> simbergm: great .I saw those dataflow fails on my guided executor branch and was worried it was that. But if it's on all the branches, I can relax.
<jbjnr_> maybe I can take a look at one of them, I have some practice with dataflow etc when I worked on the new executors ...
mcopik has joined #ste||ar
mcopik has quit [Ping timeout: 268 seconds]
mcopik has joined #ste||ar
mcopik has quit [Ping timeout: 272 seconds]
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] sithhell created sithhell-patch-1 (+1 new commit): https://github.com/STEllAR-GROUP/hpx/commit/9d0e25221eef
<ste||ar-github> hpx/sithhell-patch-1 9d0e252 Thomas Heller: Changing Base docker image
ste||ar-github has left #ste||ar [#ste||ar]
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] sithhell pushed 1 new commit to sithhell-patch-1: https://github.com/STEllAR-GROUP/hpx/commit/4cf1c10c00f3e0363ee6249e8cbaa5b2f237ea68
<ste||ar-github> hpx/sithhell-patch-1 4cf1c10 Thomas Heller: Changing base Docker image for the HPX image
ste||ar-github has left #ste||ar [#ste||ar]
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] sithhell opened pull request #3491: Changing Base docker image (master...sithhell-patch-1) https://github.com/STEllAR-GROUP/hpx/pull/3491
ste||ar-github has left #ste||ar [#ste||ar]
heller has quit [Read error: Connection reset by peer]
heller__ has joined #ste||ar
nikunj has joined #ste||ar
mcopik has joined #ste||ar
hkaiser has joined #ste||ar
<heller__> hkaiser: so ... what do you propose to solve the issue?
<hkaiser> no idea
<hkaiser> I would like to avoid having to set up a separate HPX build for the Phylanx tests
<heller__> yes
<heller__> limited resources are a problem
<heller__> still, the two issues are orthogonal
<hkaiser> so compiling HPX in c++14 would be a viable workaround
<hkaiser> (on circleci)
<heller__> it's only viable until there's a new problem
<hkaiser> the actual problem is the way Klaus detects c++17 features
<heller__> right
<heller__> the change to clang7 should solve the problem (also comes with the gcc8 libstdc++)
<heller__> unfortunately, there seems to be a problem with the new hpx_main setup...
<hkaiser> do they have all algorithms by now?
<heller__> that's my understanding
<heller__> I only tested std::destroy, TBH
mcopik has quit [Remote host closed the connection]
<hkaiser> k
<hkaiser> there should be feature macros defined goinf with each of the algorithms
<hkaiser> so Klaus could check those
nikunj has quit [Ping timeout: 252 seconds]
<heller__> if we fix the failures for clang and lld appearing in llvm7, we could at least hide the problem for now
aserio has joined #ste||ar
<heller__> hkaiser: what about freezing the blaze version?
<hkaiser> heller__: for how long?
<heller__> hkaiser: I'd suggest forever
<heller__> hkaiser: it probably won't be the last time that external dependencies break something
<jbjnr_> sorry I caused so much trouble.
<hkaiser> heller__: that's not realistic
<hkaiser> jbjnr_: not just your fault ;-)
<hkaiser> it's heller__'s fault too
<jbjnr_> I could've fixed the mpi stuff the easy way, but I chose to try and do it the 'right' way. Shan't make that mistake again!
<hkaiser> heller__: but thatnks for setting up clang7 on circleci
<heller__> hkaiser: well, then you have to live with breakage...
<hkaiser> heller__: I'll talk to klaus as well
<hkaiser> for now freezing the blaze version is an option, but that can only be a stop-gap measure
<heller__> hkaiser: i disagree
<heller__> hkaiser: there only needs to be a automated mechanism that bumps the versions
<heller__> That way, you ensure smooth operation
<heller__> If you always assume that pulling the latest commit on master should work, that's bound to fail
<hkaiser> heller__: that means that you force people to manually install blaze even if they have a (possibly newer) version already installed
<heller__> We can't ensure that for hpx or blaze
<heller__> No, that's what I mean
<hkaiser> heller__: I'm not talking about latest master, I'm talking about released versions
<heller__> Phylanx currently uses the latest master versions of hpx, blaze, pybind11 and highfive
<heller__> For testing on circleci
<heller__> And that's what I'm suggesting: use fixed versions there.
<hkaiser> heller__: yes, this is not a necessary setup, it's for us to be sure that dependencies don't break our builds ;-)
<hkaiser> heller__: ok, sure
<heller__> Mission accomplished then :p
hkaiser has quit [Quit: bye]
<heller__> The advantage: we test a curated set of dependencies. Nicely documented
<heller__> New versions can be updated via PRs. They get checked via circle, and you can report potential problems upstream, without disturbing the normal workflow
<heller__> This update could even be automated
nikunj has joined #ste||ar
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
eschnett has joined #ste||ar
parsa[w] has quit [Read error: Connection reset by peer]
parsa[w] has joined #ste||ar
jaafar has joined #ste||ar
RostamLog has joined #ste||ar
jaafar has quit [Ping timeout: 246 seconds]
hkaiser has joined #ste||ar
<hkaiser> simbergm: yt?
<simbergm> hkaiser: here
<hkaiser> simbergm: where is the generated documentation hosted now?
khuck has joined #ste||ar
khuck has quit [Remote host closed the connection]
khuck has joined #ste||ar
david_pfander has quit [Ping timeout: 246 seconds]
<simbergm> that redirects to the generated docs from master now, but I'll change it to latest release once we have 1.2.0 out
<simbergm> I'm not sure what would be a good place to document this...
<simbergm> heller suggested I write a blog post about the move to sphinx, I could write about this there as well
<nikunj> can we not buy a free domain to host the documentation?
<simbergm> jbjnr_: yeah, if you know something about dataflow please do have a look, I'm looking as well but starting from scratch
<jbjnr_> I had a very quick look this morning, but the futures are created with make_ready_future, so the fact that is_ready is false is very strange. Implies the construction is flawed?
<simbergm> no, it's push_back(dataflow(f, make_ready_future))
<simbergm> so the future from dataflow is not ready immediately (and shouldn't be)
<jbjnr_> I misread it then. sorry, was only a quick look
<jbjnr_> will look again later meybe
<simbergm> np
<hkaiser> simbergm: could you update the main README to point there, pls?
<simbergm> hkaiser: yeah, I'll do that
akheir has joined #ste||ar
<hkaiser> simbergm: thanks, the README still points to the old docs
aserio has quit [Ping timeout: 252 seconds]
<simbergm> hkaiser: yes, it does
<simbergm> I'm going to update it on the release branch so that it's changed once we have the release out
<jbjnr_> can we merge any of my prs before release. squeak squeak
<hkaiser> simbergm: the README points to the docs in progress as well, that could be changed right away
<simbergm> jbjnr_: on clang there's this: error: no member named 'free' in namespace 'std'; did you mean simply 'free'?
<simbergm> if they work :)
<simbergm> clang 3.8 that is
<jbjnr_> missing cstdlib then. I fix now
<simbergm> to all, keep opening prs to master still, I will just merge/cherry-pick to the release branch once that becomes relevant
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] biddisco force-pushed demangle_helper from d7710d6 to 8239943: https://github.com/STEllAR-GROUP/hpx/commits/demangle_helper
<ste||ar-github> hpx/demangle_helper 8239943 John Biddiscombe: Use std::unique_ptr in demangler
ste||ar-github has left #ste||ar [#ste||ar]
<khuck> is there an ETA on the release?
<hkaiser> khuck: before sc
<khuck> thanks
<jbjnr_> or when we fix ALL the bugs!
<hkaiser> whatever comes first ;-)
<K-ballo> sc then
<K-ballo> there's only a month left
khuck has quit [Read error: Connection reset by peer]
khuck_ has joined #ste||ar
<jbjnr_> jesus - I just tried turning on CUDA in my build and the list of cmake errors is huge
<jbjnr_> do we test this?
<jbjnr_> configure bombs out completely
<hkaiser> jbjnr_: apparently we don't test it
<jbjnr_> do we want cuda to be linked PUBLIC or PRIVATE? I vote PRIVATE initially.
<heller__> Yes
<heller__> No
<jbjnr_> No
<jbjnr_> Yes
<jbjnr_> :)
<heller__> Has to be public, because of headers and inline functions
<jbjnr_> it affext transitivity
<jbjnr_> affects
<jbjnr_> we might link to cuda, but the usre might not want to
<jbjnr_> do we force cuda onto the user or let him add cuda himself if he wants it
<heller__> He chose to enable it in the first place
<khuck_> heller__: quick question - do we need any scratch space for the NERSC / ERCAP proposal? The format changed slightly from last year, and I don't think we requested it.
<khuck_> heller__: we requested 1TB for project space, but unknown for scratch space.
<jbjnr_> this is hard to argue with, ^^
<heller__> khuck_: hmm. I usually build on scratch
<jbjnr_> ut if the system has cuda enabled hpx on it and a user builds a hello world app. Should they be forced to pull in cuda?
<heller__> khuck_: having at least a TB would be appropriate
<khuck_> agreed
<heller__> jbjnr_: it should only be pulled in if it's a cuda app, no?
<heller__> jbjnr_: hmm you have a point. We're only talking about the rt libs and friends, right?
<jbjnr_> excatly, but making cuda target_link_libraries(PUBLIC...) forces all the includes, links etc on the user too
<heller__> Yeah, those should be private
<jbjnr_> ok, done
<heller__> So we need to fix the headers, maybe
<heller__> Not too sure there
<jbjnr_> cmake 3.12 fixes all this for us with set(CUDA_LINK_LIBRARIES_KEYWORD "PRIVATE")
<jbjnr_> don't know if this is in earlier cmake versions
<heller__> If they are implemented properly already...
<heller__> I think previous versions are pretty broken
<heller__> Let's just document that we need 3.12 for cuda support.
<jbjnr_> looks like it was added around cmake 3.9. good
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] biddisco created cuda_cmake_fix (+1 new commit): https://github.com/STEllAR-GROUP/hpx/commit/81374c5b0441
<ste||ar-github> hpx/cuda_cmake_fix 81374c5 John Biddiscombe: Add CUDA_LINK_LIBRARIES_KEYWORD to allow PRIVATE keyword in linkage to cuda
ste||ar-github has left #ste||ar [#ste||ar]
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] biddisco opened pull request #3492: Add CUDA_LINK_LIBRARIES_KEYWORD to allow PRIVATE keyword in linkage t… (master...cuda_cmake_fix) https://github.com/STEllAR-GROUP/hpx/pull/3492
ste||ar-github has left #ste||ar [#ste||ar]
mcopik has joined #ste||ar
aserio has joined #ste||ar
khuck_ has quit [Remote host closed the connection]
khuck has joined #ste||ar
hkaiser has quit [Quit: bye]
eschnett has quit [Quit: eschnett]
<khuck> aserio: do you have a brief desription of the Rostam cluster?
<aserio> khuck: do you mean like the types of nodes it contains?
<khuck> aserio: yes, but it can be short. i.e. x nodes of y processors, z memory.
<khuck> and if you have a web page that describes it, even better
<aserio> khuck: so this is the long form of the information: https://github.com/STEllAR-GROUP/hpx/wiki/Running-HPX-on-Rostam
<aserio> khuck: most students report on the node that they are running on
<khuck> thanks
<aserio> khuck: Perhaps we could just talk about Marvin and say ... and other nodes
<khuck> Here's what I am adding:
<khuck> Rostam cluster at LSU (47 node heterogeneous test cluster): https://github.com/STEllAR-GROUP/hpx/wiki/Running-HPX-on-Rostam
<khuck> Talapas cluster at UO (128 nodes, 28 cores per node): https://hpcf.uoregon.edu/content/talapas
<khuck> and that's enough
<aserio> Ok :)
<khuck> Alice wanted it
<khuck> I don't want it to look like we have other options. :)
<khuck> aserio: does HPX have compute allocations on other machines in Europe?
<aserio> The team at LSU does not
<khuck> fair enough
<aserio> lol I am not certain what heller__ has access to
<aserio> and jbjnr_ has access to his institutional resources
<khuck> right, but HPX doesn't have a project account to bill against.
<khuck> so that doesn't count.
<zao> 232 - tests.unit.lcos.local_dataflow (Failed)
<zao> 234 - tests.unit.lcos.local_dataflow_executor (Failed)
<zao> I'm guessing by the earlier discussion that these problems are known?
hkaiser has joined #ste||ar
<zao> (building d4bd36da10d)
akheir has quit [Quit: Leaving]
aserio has quit [Quit: aserio]
<simbergm> zao: yep, known
<zao> Was a bit surprised that only those two tests failed building master :)
<khuck> hkaiser: I am thoroughly confused now. Phylanx crashes with blaze 3.4. Or maybe it just crashes.
<khuck> hkaiser: is blaze master fixed?
<khuck> or are we still using 3.4?
<hkaiser> khuck: blaze master is fixed
<hkaiser> khuck: on circleci I'd like to get everything else under control first before going back to blaze master
<khuck> yeah, I am seeing other failures. right now on my main test system everything is failing because there's a zombie HPX process out there, hogging the port. :/
<khuck> hkaiser: " what(): <unknown>: HPX(network_error) " means that there is another HPX app running, right?
mcopik has quit [Ping timeout: 252 seconds]
mcopik has joined #ste||ar
<K-ballo> is that all the information? if so we regressed
<khuck> no, there's more
<K-ballo> another HPX app running would mention something about binding ports or similar
<khuck> good point
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] msimberg opened pull request #3493: Remove deprecated options for 1.2.0 part 2 (master...remove-deprecated-options-2) https://github.com/STEllAR-GROUP/hpx/pull/3493
ste||ar-github has left #ste||ar [#ste||ar]
<khuck> K-ballo: found it. there was a hung 'fibonacci' test from another user - 6 days old.
<K-ballo> heh
<hkaiser> khuck: was this the issue you thought being caused by blaze?
<khuck> no
<khuck> the blaze issue prevented compilation
<khuck> phylanx is currently building with blaze 3.4, but there is one failing test
<khuck> simple_als
<khuck> 0!=1
<khuck> (in the output)