aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
<zao>
Running tests in a singularity container with isolated networking (only localhost) on a premade immutable software image, just mounting the tree to build and test into the image/
<zao>
Best of all, no docker anywhere :P
<zao>
Host OS is Ubuntu 17.10, container OS is debian latest.
<zao>
Hey, what did you people do to the tests, only two failures and one timeout this run :P
<heller>
I'm still failing to compile blas and lapack :/
<hkaiser>
intel mkl
<heller>
Ok
<heller>
Good to know
<hkaiser>
compiling the clang_tidy branch now
<heller>
The clang_tidy branch will at least rule out some potential use after move scenarios
<heller>
Inside hpx
<hkaiser>
right
<hkaiser>
don't think this will help, though
<hkaiser>
heller: yep, unchanged
<heller>
Ok, would have been too easy...
<heller>
hkaiser: so it's always giving the correct results on non msvc platforms?
<diehlpk>
what(): description is nullptr: HPX(bad_parameter)
<diehlpk>
What does this error mean on circle-ci?
<heller>
Where?
parsa has joined #ste||ar
<heller>
hkaiser: trying with ubsan now...
<heller>
hkaiser: could you look at #3007 and #2998 please?
<heller>
hkaiser: is blaze header only?
gedaj has quit [Read error: Connection reset by peer]
gedaj has joined #ste||ar
<diehlpk>
heller, When I run a test case
<heller>
diehlpk: which one? where do I see the output?
<heller>
diehlpk: this error usually means, that you try to access some HPX Thread specific stuff on a non HPX thread (usually suspending or something)
<diehlpk>
heller, The strange things is that the same test is running perfectly on my local fedora
<heller>
in essence: This error message means, the caller of this function is not in a HPX thread
<diehlpk>
Ok, why is it working locally but not on circle-ci?
<diehlpk>
This is why I can not understand the error
<heller>
hkaiser: is python3 a hard requirement?
<heller>
diehlpk: ssh into the circle-ci worker and debug it?
<diehlpk>
Yes, I will do that
<hkaiser>
heller: yes
<hkaiser>
heller: well, no
<hkaiser>
the lra example does not require any python
<heller>
hkaiser: How would I disable python? I am failing at compiling pybind11 right now
<hkaiser>
heller: hmm, I think we don't support that at :/
<hkaiser>
just disable it in the cmake file
<heller>
ok
<heller>
I'll just compile python3 now
<hkaiser>
k
<heller>
not to self, having a ',' in a path is a nogo for a python prefix...
<heller>
typos 4tw
<K-ballo>
a comma??
<zao>
:D
<heller>
yeah ...
<heller>
/opt/apps/x86_64/python3/3.6,3
<heller>
instead of
<heller>
/opt/apps/x86_64/python3/3.6.3
<zao>
heller: Making progress on running tests on my AMD machine, btw. Built singularity containers last night to build and test.
<heller>
ahh, singularity.lbl.gov?
<zao>
Yup.
<heller>
does SNIC support those?
<zao>
On some sites.
<heller>
beskow?
<zao>
NSC, UPPMAX and HPC2N, as far as I know, mostly in a testing phase.
<zao>
Doesn't seem like PDC (beskow) does yet.
<zao>
I've got it on my plate to set it up on our other cluster some day, so figured I might as well try it at home.
<heller>
"The Singularity software can import your Docker images without having Docker installed or being a superuser." -- The first image is showing sudo commands... ok
<zao>
Hehe.
<zao>
They've just pushed a somewhat breaking update to 2.4, so interface and requirements are a bit in flux.
<zao>
I think they've got a mode where they use 'user binds' to do more things without root.
<zao>
But in essence, you set your image up on a box where you have rights, and can run it as a plain user on the clusters later.
<zao>
I'll keep playing around, if I get something nice exposed I'll honk.
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
jaafar_ has joined #ste||ar
<heller>
zao: doesn't sound too bad
parsa has quit [Quit: Zzzzzzzzzzzz]
<heller>
hkaiser: few, finally got a build running for lra...
<zao>
This is nice... removing .o files from a build tree gets me 8.5G disk used instead of 20G.
<heller>
hkaiser: phylanxd linking since 7 minutes now
<heller>
we really, really have to do something about compile time
<jbjnr>
get rid of templates?
<zao>
Consider C :P
<zao>
GNU or proper ld? :)
<heller>
I think this is GNU ldd, yeah, lld doesn't really a lot of improvement
<heller>
the biggest problem for me, I guess is the file system performance
<hkaiser>
heller: that module contains ~50 or so components
<hkaiser>
but it doesn't take that long on circleci or elsewhere
<K-ballo>
linking for 7 minutes? wow
<heller>
ok, might be related to the ubsan instrumentation then
<heller>
hkaiser: circle uses lld, btw
<hkaiser>
k
<K-ballo>
all those warnings about unused captures in algorithms, were they always there?
<hkaiser>
K-ballo: most of them are to keep things alive
<heller>
yeah
<heller>
they got in with the clang update
<K-ballo>
ah, so bogus warnings.. I did not remember seeing them before
<heller>
well
<heller>
the warning has a point to a certain degree, the problem is that our usecase (capturing to keep objects alive) triggers the warning where it shouldn't
<jbjnr>
hkaiser/heller: in the thread queues - what's the diff between get_queue_length and get_thread_count
<heller>
when you forgot to take out a now unused capture for example
<heller>
jbjnr: I never really know without looking at the code ;)
<jbjnr>
ok. I'll look
<hkaiser>
jbjnr: get_queue_length is the number of currently enqueued threads
<heller>
jbjnr: but a first guess would be: queue_length returns the number of pending tasks, and thread_count the number of items in the map
<hkaiser>
get_thread_count might give you all of them, even the suspended, terminated, staged, etc.
<K-ballo>
I imagine the warning only happens for instantiations where the capture types are trivial?
<hkaiser>
K-ballo: could be
<heller>
hkaiser: ok, for running lra, I get a completely different error ... the one with the wrapper heap
<hkaiser>
:/
<hkaiser>
heller: even with your fix?
<heller>
yeah....
<hkaiser>
heh
<heller>
I don't get it ...
<heller>
master + alignment fix that is
<hkaiser>
heller: the async traversal leaks the shared state :/
<heller>
:/
<heller>
should we roll back the whole new async traversal until it is fixed?
<heller>
sounds like it'll take quite a time to fix all this
<hkaiser>
I have it fixed already
<hkaiser>
the leak, that is
<heller>
ah, in your branch?
<hkaiser>
just locally for now
<hkaiser>
I think I'm closing in on the iterator problem
<hkaiser>
heller: I have it fixed now :D
<github>
[hpx] hkaiser pushed 2 new commits to fixing_dataflow: https://git.io/vFHnH