aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
<github>
[hpx] biddisco force-pushed namespace_error from 01148c4 to 2e65f2a: https://git.io/vdMA2
<github>
hpx/namespace_error 2e65f2a John Biddiscombe: Fix a namespace compilation error when some schedulers are disabled
<github>
[hpx] biddisco opened pull request #2950: Fix a namespace compilation error when some schedulers are disabled (master...namespace_error) https://git.io/vdMAw
<heller>
jbjnr: we have bigger problems right, as it seems
<jbjnr>
?
<jbjnr>
don't understand your comment
<heller>
right now*
<heller>
the recent function changes seem to have broken distributed runs
<jbjnr>
yes. things are not behaviong as expected. hello world in distributed gives extra output etc.
<heller>
yes
<heller>
trying
<jbjnr>
for my lockup, it seems that the scheduling loop gets stuck because the background thread does not complete properly
<heller>
ok
<heller>
I thought I fixed it
<heller>
i am looking into it right now
EverYoung has joined #ste||ar
hkaiser has joined #ste||ar
EverYoung has quit [Ping timeout: 240 seconds]
david_pfander has quit [Ping timeout: 240 seconds]
<github>
[hpx] sithhell opened pull request #2952: Removing wrong call to cleanup_terminated_locked (master...fix_thread_map) https://git.io/vdDYz
<github>
[hpx] hkaiser created fixing_2947 (+1 new commit): https://git.io/vdDOm
<github>
hpx/fixing_2947 a5d12ef Hartmut Kaiser: Making sure any hpx.os_threads=N supplied through a --hpx::config file is taken into account...
<github>
[hpx] hkaiser opened pull request #2953: Making sure any hpx.os_threads=N supplied through a --hpx::config file is taken into account (master...fixing_2947) https://git.io/vdDOG
<hkaiser>
heller: great, thanks!
<heller>
very stupid :/
K-ballo has joined #ste||ar
<zao>
Blargh... seeing a ton of timeouts on distributed.tcp today.
<heller>
yeah
<heller>
we just reverted a bad commit
<heller>
zao: what should we do about std::rand in unit tests now?
<zao>
Kill with fire, replace reasonably mechanically with a MT + uniform int distribution where needed, and use a constant where you don't really need randomness?
<hkaiser>
has std::rand been deprecated now?
<zao>
My personal opinion is that unless you intend to run a test a lot of times to find problems, randomization only leads to flapping.
<hkaiser>
zao: that's what we do
<zao>
Once-per-commit is way too seldom if the goal is to test different datasets.
<hkaiser>
run them very often ;)
<zao>
Well, you don't run them more than once in CI/buildbot?
<hkaiser>
zao: we do that for years now, quite successfully, btw
<heller>
hkaiser: the problem is that apparently, we run into UB with some std::rand uses
<hkaiser>
heller: what?
<hkaiser>
how's that?
<zao>
hkaiser: The problem last week, was that std::rand returned a number close to INT_MAX.
<hkaiser>
ok
<zao>
Which we overflowed and ended up calling uniform_int_distribution(base - x, base + x)
<zao>
Which is UB.
<hkaiser>
we don't use uniform_distribution, do we?
<zao>
Found it due to blind luck where the seed for std::rand was such that it triggered the lingering bug.
<zao>
Might've misspelled the test name, didn't have any output handy.
<msimberg>
I was trying to account for all the threads that hpx starts, and with jbjnr we got to n worker threads, 2 (default) io pool threads, 2 timer pool threads, 2 parcel pool threads and the wait_helper to wait for finalize. Is this correct? Would hpx spawn threads for any other purposes?
<hkaiser>
zao: ok, thanks!
<hkaiser>
msimberg: no
<hkaiser>
msimberg: wait, there is also the main thread - but that's not started by hpx
aserio has quit [Read error: Connection reset by peer]
<msimberg>
no, meaning no other threads, or not correct? and yeah, I ignored the main thread
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<hkaiser>
msimberg: 'no' as in hpx does not spawn any other threads
aserio has joined #ste||ar
<msimberg>
ok, thanks!
eschnett has joined #ste||ar
<hkaiser>
aserio: yt?
<aserio>
hkaiser: yes
<hkaiser>
see pm, pls
EverYoung has joined #ste||ar
jaafar has quit [Ping timeout: 248 seconds]
gedaj has quit [Remote host closed the connection]
gedaj has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
diehlpk_work has joined #ste||ar
rod_t has joined #ste||ar
bibek_desktop has quit [Quit: Leaving]
Bibek has joined #ste||ar
david_pfander has quit [Ping timeout: 248 seconds]
EverYoung has quit [Ping timeout: 246 seconds]
EverYoung has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 252 seconds]
aserio1 is now known as aserio
<heller>
hkaiser: hmm, everything but the throttle thing is what I am able to fix right now :/
<heller>
removing and adding PUs dynamically might need some more revised rework of the scheduling loop
<heller>
the scheduling loop is currently designed to really run from start to finish ...
<hkaiser>
ok
<hkaiser>
I need this functionality so I will work on it soon
<heller>
what do you need it for?
<hkaiser>
switching back & forth between MPI/OpenMP step and HPX step
<heller>
ok
<heller>
for that it would be overkill to kill of the entire thread anyways
<heller>
a condition variable which controls whether this thread is active or not sounds more suitable
<hkaiser>
nobody said we should kill the threads
<heller>
yes, that is how it was sketched to be implemented
<github>
[hpx] sithhell pushed 1 new commit to fix_thread_map: https://git.io/vdDx6
<github>
hpx/fix_thread_map 36544eb Thomas Heller: Partially reverting background thread handling during shutdown...
<heller>
hkaiser: I'll do it properly then tomorrow
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<hkaiser>
heller: yah, let's see how it goes
<heller>
hkaiser: let's merge what I have ASAP ... this should at least put master in a working condition again for the rest of the world
<hkaiser>
fine by me
<heller>
#2955
<heller>
I am getting more and more angry emails from my EU partners ;)
<hkaiser>
why?
<heller>
"nothing ever works! Do you even do CI!"?
aserio has quit [Read error: Connection reset by peer]
<heller>
My reply: "Yes, we do CI, that's how I know that it is currently broken"
<zao>
Coincindent Irritation.
<hkaiser>
lol
<heller>
made them even angrier
<hkaiser>
idiots
<hkaiser>
know everything better, as usual
<heller>
yeah
<hkaiser>
how many lines of code do they have? 10? 20?
<K-ballo>
this channel is being recorded for quality purposes
<heller>
K-ballo: thanks ;)
<heller>
anyways
<heller>
hkaiser: quite a few actually
<heller>
the code that crosses the boundaries is always the hardest work
<hkaiser>
absolutely
<heller>
and we are in an unfortunate situation that the runtime is the place where the dots get connected
aserio has joined #ste||ar
<hkaiser>
heller: do you have to work off top of master?
<hkaiser>
why not selecting a 'stable' commit?
<heller>
the recent pool executor changes forced us
<hkaiser>
otoh, they can't expect for major changes in between releases to be 100% stable
<hkaiser>
that's nonsense
<heller>
the entire project is WIP, it's research
<heller>
code breaks
<heller>
always
<heller>
or is dead
EverYoung has quit [Ping timeout: 246 seconds]
EverYoung has joined #ste||ar
hkaiser has quit [Quit: bye]
aserio has quit [Ping timeout: 240 seconds]
hkaiser has joined #ste||ar
aserio has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
<jbjnr>
hkaiser: just noticed this conversation and wanted to point out that adding an hpx::suspend and hpx::resume is what msimberg is heading for, so the to of you should have a skype call soon. I can join too if I'm not presenting or anything.
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<hkaiser>
jbjnr: absolutely
<hkaiser>
jbjnr: what class will expose those?
<hkaiser>
msimberg: ^^
hkaiser has quit [Client Quit]
aserio has joined #ste||ar
zbyerly_ has quit [Remote host closed the connection]
zbyerly_ has joined #ste||ar
rod_t has left #ste||ar [#ste||ar]
hkaiser has joined #ste||ar
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 252 seconds]
<jbjnr>
heller: just fyi - I saw that you pushed another commit to the thread handling, but it does not fix the hangs for me. sorry.
<heller>
jbjnr: shit
<heller>
jbjnr: thanks for letting me know. Still the same reproduction?
rod_t has joined #ste||ar
<github>
[hpx] chinz07 opened pull request #2957: Fixing errors generated by mixing different attribute syntaxes (master...fixing_2956) https://git.io/vdyWV
EverYoun_ has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
eschnett has quit [Quit: eschnett]
<github>
[hpx] hkaiser closed pull request #2943: Changing channel actions to be direct (master...channel_direct) https://git.io/vd6MX