aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
EverYoung has quit [Ping timeout: 258 seconds]
hkaiser has joined #ste||ar
jaafar_ has quit [Ping timeout: 255 seconds]
<jakemp> I made a short example of what seems to be the problem: https://gist.github.com/kempj/4f7122d7afdc5646033a5494777d832f
<jakemp> it just hangs at the end of the end indefinitely
parsa has joined #ste||ar
<hkaiser> jakemp: that should work without problems, really
<hkaiser> except that it might race on the add itself
<hkaiser> well, no
<hkaiser> no race, should just work
<hkaiser> otoh, we're currently in th ecourse of redoing all of the executors, so this might be the cause
<hkaiser> jakemp: why do you need the executor?
<jakemp> it's part of the openMP runtime
<jakemp> for compatibility I need the "threads" to run on a static scheduler, and the tasks on the default workstealing one.
<hkaiser> ok - you might have to wait for all the changes to be done
<hkaiser> is it urgent?
<jakemp> I just updated the runtime to the new dataflow and it stopped workin in this path, worked fine for the others
<hkaiser> nod
<jakemp> somewhat. Is there another way to get this functionality?
<hkaiser> we now changed things to create the needed schedulers at startup, which will cause some changes to your code :/
<hkaiser> jakemp: can you talk to jbjnr tomorrow (he's asleep now), he'll explain what you need to do
<jakemp> sure thing.
<jakemp> thanks for the reply, Its good to know I'm not missing something obvious
<hkaiser> jakemp: sorry for the trouble, I believe overall the changes will improve things
<jakemp> yeah, one of the overheads I have is creating an executor every parallel region, this sounds like it would help with that.
<hkaiser> it will
<hkaiser> jakemp: nice to know you're still working on the omp runtime
<hkaiser> very useful
<hkaiser> jakemp: we also found a better way to init hpx on demand in the background - that might be something for you to look at s well
<jakemp> I think I saw an example doing that
<jakemp> yeah, that's it
<jakemp> I want to start with that, but I wanted to get everything working first.
<hkaiser> right
<jakemp> It's much cleaner, and should help one of the openMP conforming issues I have
<hkaiser> I'd love to have your runtime updated and available
<jakemp> yeah, it's been a minute since I've looked into it. Did some work with OpenMP and HPX last summer and looking to publish something soonish.
<hkaiser> nice
<hkaiser> jakemp: here is the ticket related: https://github.com/STEllAR-GROUP/hpx/issues/2997
jakemp has quit [Ping timeout: 240 seconds]
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
parsa has quit [Client Quit]
<zao> "CMake 3.3.2 or higher is required. You are running version 2.8.12.2"
<zao> CentOS 7 delivers... in all the wrong ways :)
<zao> Have a feeling that its hwloc is going to be "new" too :)
<hkaiser> lol
hkaiser has quit [Quit: bye]
jakemp has joined #ste||ar
parsa has joined #ste||ar
parsa has quit [Client Quit]
<zao> Welp... /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libc++.so
<zao> I wonder if I should just debian/ubuntu this.
parsa has joined #ste||ar
<zao> Ooh, now we're cooking...
<zao> Got a debian singularity container built now that can build a HPX, and should serve as adequate host to running tests.
<zao> Assuming that I didn't misunderstand their optimistic networking features.
nanashi55 has quit [Ping timeout: 255 seconds]
nanashi55 has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
<zao> Is there a CMake target that removes intermediary object files and keeps tests in the build tree?
<zao> Or is that called 'find . -name '*.o' -delete' ?:)
<zao> woodsoak@lin:/hp/woodsoak$ singularity exec --net --bind /hp/woodsoak/builds/42/42a588be80f1ebdbe3904f61ca065e5f57876d47:/tree /hp/woodsoak/prepare/ws-bleh/ /ws-build-rwdi.sh
<zao> woodsoak@lin:/hp/woodsoak# singularity exec --net --bind /hp/woodsoak/builds/05/051b74e660b7f3372b03a165510d6b6b2b07633f:/tree /hp/woodsoak/prepare/ws-bleh/ /ws-test.sh
<zao> This is way nicer than I had thought.
<zao> Running tests in a singularity container with isolated networking (only localhost) on a premade immutable software image, just mounting the tree to build and test into the image/
<zao> Best of all, no docker anywhere :P
<zao> Host OS is Ubuntu 17.10, container OS is debian latest.
<zao> Hey, what did you people do to the tests, only two failures and one timeout this run :P
EverYoung has quit [Ping timeout: 240 seconds]
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
parsa has quit [Client Quit]
K-ballo has quit [Quit: K-ballo]
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vF9XA
<github> hpx/gh-pages b8081c9 StellarBot: Updating docs
david_pfander has joined #ste||ar
david_pfander has quit [Ping timeout: 258 seconds]
tengumis has joined #ste||ar
tengumis has quit [Quit: Page closed]
hkaiser has joined #ste||ar
<github> [hpx] hkaiser pushed 2 new commits to fixing_dataflow: https://git.io/vF99L
<github> hpx/fixing_dataflow 9f262ba Hartmut Kaiser: Attempting to avoid data races in async_traversal while evaluating dataflow()
<github> hpx/fixing_dataflow 7d6ac20 Hartmut Kaiser: Solved one more minor sequencing problem
denisblank has joined #ste||ar
K-ballo has joined #ste||ar
<hkaiser> jbjnr: yt?
denisblank has quit [Quit: denisblank]
diehlpk has joined #ste||ar
<github> [hpx] hkaiser pushed 1 new commit to fixing_dataflow: https://git.io/vF9bE
<github> hpx/fixing_dataflow 38e6f22 Hartmut Kaiser: Using one atomic Boolean for each asynchronous call hierarchy...
eschnett has quit [Quit: eschnett]
parsa has joined #ste||ar
eschnett has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
<github> [hpx] K-ballo created fmtlib (+1 new commit): https://git.io/vFHTM
<github> hpx/fmtlib ee05ea1 Agustin K-ballo Berge: (draft)
diehlpk has quit [Ping timeout: 240 seconds]
<heller> hkaiser: did the atomic bool help?
<hkaiser> heller: no
<hkaiser> same as before
<github> [hpx] K-ballo force-pushed fmtlib from ee05ea1 to eca64d8: https://git.io/vFHIw
<github> hpx/fmtlib eca64d8 Agustin K-ballo Berge: (draft)
diehlpk has joined #ste||ar
<heller> hkaiser: hmmm
<heller> hkaiser: did you try with the clang_tidy branch?
<heller> I know how to get rid of those copies in for_loop now
<hkaiser> heller: I didn't, the error does not manifest itself on non-windows platforms as it's the iterator checking firing
<hkaiser> heller: ahh, good
<heller> So it should fire with ubsan or asan
<heller> How would I reproduce the problem?
<heller> The clang_tidy fixes are not GCC/clang specific
<heller> They should affect all platforms
<heller> hkaiser: fwiw, you can enable iterator checking with libstdc++ as well
<hkaiser> ok, will try that branch
<hkaiser> heller: run the lra example from the phylanx repo with -tN N > 1
<hkaiser> that should reproduce it
<hkaiser> it does reliably reproduces things for me
<heller> Ok, are there docs on how to compile phylanx?
<hkaiser> cmake?
<heller> Prerequisites and such
<heller> Thanks, that's what I was looking for
<heller> I'm still failing to compile blas and lapack :/
<hkaiser> intel mkl
<heller> Ok
<heller> Good to know
<hkaiser> compiling the clang_tidy branch now
<heller> The clang_tidy branch will at least rule out some potential use after move scenarios
<heller> Inside hpx
<hkaiser> right
<hkaiser> don't think this will help, though
<hkaiser> heller: yep, unchanged
<heller> Ok, would have been too easy...
<heller> hkaiser: so it's always giving the correct results on non msvc platforms?
<diehlpk> what(): description is nullptr: HPX(bad_parameter)
<diehlpk> What does this error mean on circle-ci?
<heller> Where?
parsa has joined #ste||ar
<heller> hkaiser: trying with ubsan now...
<heller> hkaiser: could you look at #3007 and #2998 please?
<heller> hkaiser: is blaze header only?
gedaj has quit [Read error: Connection reset by peer]
gedaj has joined #ste||ar
<diehlpk> heller, When I run a test case
<heller> diehlpk: which one? where do I see the output?
<heller> diehlpk: this error usually means, that you try to access some HPX Thread specific stuff on a non HPX thread (usually suspending or something)
<diehlpk> heller, The strange things is that the same test is running perfectly on my local fedora
<heller> in essence: This error message means, the caller of this function is not in a HPX thread
<diehlpk> Ok, why is it working locally but not on circle-ci?
<diehlpk> This is why I can not understand the error
<heller> hkaiser: is python3 a hard requirement?
<heller> diehlpk: ssh into the circle-ci worker and debug it?
<diehlpk> Yes, I will do that
<hkaiser> heller: yes
<hkaiser> heller: well, no
<hkaiser> the lra example does not require any python
<heller> hkaiser: How would I disable python? I am failing at compiling pybind11 right now
<hkaiser> heller: hmm, I think we don't support that at :/
<hkaiser> just disable it in the cmake file
<heller> ok
<heller> I'll just compile python3 now
<hkaiser> k
<heller> not to self, having a ',' in a path is a nogo for a python prefix...
<heller> typos 4tw
<K-ballo> a comma??
<zao> :D
<heller> yeah ...
<heller> /opt/apps/x86_64/python3/3.6,3
<heller> instead of
<heller> /opt/apps/x86_64/python3/3.6.3
<zao> heller: Making progress on running tests on my AMD machine, btw. Built singularity containers last night to build and test.
<heller> ahh, singularity.lbl.gov?
<zao> Yup.
<heller> does SNIC support those?
<zao> On some sites.
<heller> beskow?
<zao> NSC, UPPMAX and HPC2N, as far as I know, mostly in a testing phase.
<zao> Doesn't seem like PDC (beskow) does yet.
<zao> I've got it on my plate to set it up on our other cluster some day, so figured I might as well try it at home.
<heller> "The Singularity software can import your Docker images without having Docker installed or being a superuser." -- The first image is showing sudo commands... ok
<zao> Hehe.
<zao> They've just pushed a somewhat breaking update to 2.4, so interface and requirements are a bit in flux.
<zao> I think they've got a mode where they use 'user binds' to do more things without root.
<zao> But in essence, you set your image up on a box where you have rights, and can run it as a plain user on the clusters later.
<zao> I'll keep playing around, if I get something nice exposed I'll honk.
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
jaafar_ has joined #ste||ar
<heller> zao: doesn't sound too bad
parsa has quit [Quit: Zzzzzzzzzzzz]
<heller> hkaiser: few, finally got a build running for lra...
<zao> This is nice... removing .o files from a build tree gets me 8.5G disk used instead of 20G.
<heller> hkaiser: phylanxd linking since 7 minutes now
<heller> we really, really have to do something about compile time
<jbjnr> get rid of templates?
<zao> Consider C :P
<zao> GNU or proper ld? :)
<heller> I think this is GNU ldd, yeah, lld doesn't really a lot of improvement
<heller> the biggest problem for me, I guess is the file system performance
<hkaiser> heller: that module contains ~50 or so components
<hkaiser> but it doesn't take that long on circleci or elsewhere
<K-ballo> linking for 7 minutes? wow
<heller> ok, might be related to the ubsan instrumentation then
<heller> hkaiser: circle uses lld, btw
<hkaiser> k
<K-ballo> all those warnings about unused captures in algorithms, were they always there?
<hkaiser> K-ballo: most of them are to keep things alive
<heller> yeah
<heller> they got in with the clang update
<K-ballo> ah, so bogus warnings.. I did not remember seeing them before
<heller> well
<heller> the warning has a point to a certain degree, the problem is that our usecase (capturing to keep objects alive) triggers the warning where it shouldn't
<jbjnr> hkaiser/heller: in the thread queues - what's the diff between get_queue_length and get_thread_count
<heller> when you forgot to take out a now unused capture for example
<heller> jbjnr: I never really know without looking at the code ;)
<jbjnr> ok. I'll look
<hkaiser> jbjnr: get_queue_length is the number of currently enqueued threads
<heller> jbjnr: but a first guess would be: queue_length returns the number of pending tasks, and thread_count the number of items in the map
<hkaiser> get_thread_count might give you all of them, even the suspended, terminated, staged, etc.
<K-ballo> I imagine the warning only happens for instantiations where the capture types are trivial?
<hkaiser> K-ballo: could be
<heller> hkaiser: ok, for running lra, I get a completely different error ... the one with the wrapper heap
<hkaiser> :/
<hkaiser> heller: even with your fix?
<heller> yeah....
<hkaiser> heh
<heller> I don't get it ...
<heller> master + alignment fix that is
<hkaiser> heller: the async traversal leaks the shared state :/
<heller> :/
<heller> should we roll back the whole new async traversal until it is fixed?
<heller> sounds like it'll take quite a time to fix all this
<hkaiser> I have it fixed already
<hkaiser> the leak, that is
<heller> ah, in your branch?
<hkaiser> just locally for now
<hkaiser> I think I'm closing in on the iterator problem
<hkaiser> heller: I have it fixed now :D
<github> [hpx] hkaiser pushed 2 new commits to fixing_dataflow: https://git.io/vFHnH
<github> hpx/fixing_dataflow 3e800b6 Hartmut Kaiser: Merge remote-tracking branch 'remotes/origin/clang_tidy' into fixing_dataflow
<github> hpx/fixing_dataflow d0fb5b7 Hartmut Kaiser: Fixed checked iterator problem...
<github> [hpx] hkaiser force-pushed fixing_dataflow from d0fb5b7 to f1285f3: https://git.io/vMn3T
<github> hpx/fixing_dataflow f1285f3 Hartmut Kaiser: Fixed checked iterator problem...