K-ballo changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
<zao>
hkaiser: I'm heading off to bed real soon now, but the test (numa_allocator_test) I used seems to loop forever with hwloc 2.3 and the current combination of the two PRs. Two Ctrl-C states of the process: https://gist.github.com/zao/3b9a60aea46e75af970bf0035500baf9
<zao>
This hwloc 2 is the unpatched version before the update to the ports that the original reporter made, so it should probably reproduce the problem original problem if I go and remove the fix PR.
<zao>
(I've shot myself in the foot a bit by using the version from ports, can only ever have one version installed at once :D )
<hkaiser>
yeah, I added some locks which might have been wrong :/
<hkaiser>
removed now
hkaiser has quit [Quit: bye]
shahrzad has quit [Remote host closed the connection]
teonnik has quit [Ping timeout: 246 seconds]
klaus[m] has quit [Ping timeout: 246 seconds]
parsa[m] has quit [Ping timeout: 240 seconds]
k-ballo[m] has quit [Ping timeout: 240 seconds]
ms[m] has quit [Ping timeout: 240 seconds]
mdiers[m] has quit [Ping timeout: 240 seconds]
gnikunj[m] has quit [Ping timeout: 240 seconds]
jpinto[m] has quit [Ping timeout: 248 seconds]
tiagofg[m] has quit [Ping timeout: 265 seconds]
rori has quit [Ping timeout: 260 seconds]
pedro_barbosa[m] has quit [Ping timeout: 268 seconds]
gonidelis[m] has quit [Ping timeout: 268 seconds]
klaus[m] has joined #ste||ar
rori has joined #ste||ar
bita has quit [Ping timeout: 264 seconds]
ms[m] has joined #ste||ar
ms[m] has quit [Ping timeout: 246 seconds]
klaus[m] has quit [Ping timeout: 265 seconds]
rori has quit [Ping timeout: 260 seconds]
powderluv has quit [Quit: powderluv]
teonnik has joined #ste||ar
parsa[m] has joined #ste||ar
k-ballo[m] has joined #ste||ar
mdiers[m] has joined #ste||ar
gnikunj[m] has joined #ste||ar
pedro_barbosa[m] has joined #ste||ar
tiagofg[m] has joined #ste||ar
jpinto[m] has joined #ste||ar
gonidelis[m] has joined #ste||ar
ms[m] has joined #ste||ar
klaus[m] has joined #ste||ar
rori has joined #ste||ar
<ms[m]>
mdiers: if you still have the debugger open with those threads (or a core dump) I'd also be interested in knowing which assertion on threads 61, 62, 64 is hit (I'm pretty sure it can only be the second), but particularly what the value of stacksize is
<mdiers[m]>
ms: Sorry for my late reply. The line numbers are at the end of the line between the filename and the address in the callstacks.
<mdiers[m]>
* ms: Sorry for my late reply. The line numbers are at the end of the line between the filename and the address in the callstacks. `scheduler_base.hpp:273`
<ms[m]>
mdiers: ah, indeed, thanks, so that's the expected assertion
<ms[m]>
do you have access to the value of stacksize?
<mdiers[m]>
ms: I'll let you know as soon as I catch it in the debugger again.
<diehlpk_work>
ms[m], I will push the rc to Fedora today?
<ms[m]>
mdiers: thanks
<ms[m]>
diehlpk_work: thanks as well
<ms[m]>
note that I've extended the stellar group signing key, but I'm not sure it's propagated to all/most servers yet
hkaiser has joined #ste||ar
<hkaiser>
ms[m], rori: thanks for all the work on the release!
<ms[m]>
hkaiser: thank you, all the hard work has been done before the release :)
<ms[m]>
note that I didn't include the freebsd environment nor the hwloc prs in the rc because I wasn't sure what the status of those were
<hkaiser>
ms[m]: nod
<hkaiser>
ms[m]: zao confirmed yesterday that the freebsd PR is fine, not sure what his verdict on the hwloc was
<ms[m]>
did you conclude yesterday that they were ready to go in or do they need further testing (they probably do, but are they as tested as we will get them right now)?
<ms[m]>
ok, so we can likely go ahead with that one then
<ms[m]>
let's wait a bit to see if we get a confirmation about the hwloc one (either from zao or one of the reporters)
<hkaiser>
nod
<gnikunj[m]>
hkaiser: K-ballo why is it that some C++ features come in as warnings while others as errors while using let's say C++17/20 features on a compiler that defaults to C++14? For instance, using fold expressions on gcc 10.2 throws a warning (the executable runs as expected) saying that I should enable -std=c++17 to use fold-expressions
<hkaiser>
gnikunj[m]: ask the compiler developers
<K-ballo>
conforming extension, use pedantic if you want an error instead of a warning
<hkaiser>
most likely because the feature was available before it got standardized, so this keeps existing code compiling
<gnikunj[m]>
aah, that makes sense.
<ms[m]>
gnikunj: also, clang is stricter than gcc about sticking to features in the version you specify
<ms[m]>
*sticking only to features...
<gnikunj[m]>
got it. I was really curious what went in the discussions that they decided to throw warnings at some and errors at others
<gnikunj[m]>
things related to constexpr always seem to throw errors while things like inline variables and fold expressions throw warnings and the executable behaves as expected
<K-ballo>
there are no conforming constexpr extensions
<K-ballo>
for instance, gcc's constexpr math extensions are non-conforming
<gnikunj[m]>
ohh. That reminds me of another question. Why aren't more algorithms constepxr?
<gnikunj[m]>
C++20 does bring more constexpr algorithms but what is stopping from making most of the algorithms constexpr?
<K-ballo>
mostly lack of proposals, but also some memcpy related concerns
<K-ballo>
all non-allocating algorithms should by constexpr by now
<gnikunj[m]>
but new and delete are now allowed with C++20 when used in the same constexpr function
<gnikunj[m]>
why is memcpy an issue then?
<K-ballo>
you can't memcpy in a constant expression
<K-ballo>
but that got was resolved with is_constant_evaluated()
<K-ballo>
for constexpr allocating algorithms it will take some constexpr implementations first, so that reference implementations can be tested, in order for proposals to come forward
<gnikunj[m]>
makes sense. So we can expect some constexpr allocating algorithms by C++23?
<K-ballo>
assuming they are actually implementable (they should be, memory is only used temporarily), then by some C++next yes
<gnikunj[m]>
nod. Nice!
<zao>
ms[m]: just got up, gonna see how much work I have I must do before I can get something built again.
<hkaiser>
zao: no rush
<hkaiser>
ms[m]: yt?
<ms[m]>
hkaiser: here
<hkaiser>
ms[m]: wrt #5117
<hkaiser>
have you seen the stack backtraces mdiers[m] posted yesterday?
<ms[m]>
yep
<ms[m]>
we very briefly talked about it earlier today
<hkaiser>
ahh
<hkaiser>
it's a follow-up error, I'm just not sure what's causing what
<ms[m]>
we know which assertion it is (the second, and it could really only be that one)
<hkaiser>
right
<ms[m]>
yeah, I suspect so too
<hkaiser>
is it that the assert is the cause or the effect?
<ms[m]>
hard to say
<ms[m]>
I'm really struggling to see what could be wrong if it's the assert that's the cause
<hkaiser>
the only explanation for the assert I have is that the thread_data went out of scope
<ms[m]>
right, that would make sense
<ms[m]>
but still, I've no idea what could cause that :/
<hkaiser>
me neither
<ms[m]>
in the stacktraces #62 is one level further down than what we saw earlier, i.e. in resume rather than notify_one
<ms[m]>
mdiers: yt? in the set of callstacks you posted yesterday, which one is the one with the segfault?
<hkaiser>
ms[m]: that could be caused by different inlining strategies the compiler applied in different places
<ms[m]>
right, my point is that if it's in fact resume rather than notify_one it might be the agent_ref pointer that's messed up, not something in future_data
<ms[m]>
not that that helps us much...
<hkaiser>
ahh, that would coincide with the assert, possibly
<zao>
For ref: hwloc1 hello_world_1 either runs correctly or hangs, 230 and 240 bails with `hpx::init: hpx::exception caught: failed to initialize machine affinity mask: HPX(kernel_error)`, 230-patched and 240-patched has the crash.