aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
jakub_golinowski has quit [Ping timeout: 268 seconds]
hkaiser has joined #ste||ar
<parsa[w]>
hkaiser: i didn't get anywhere on shuffle
<hkaiser>
parsa[w]: ok
<hkaiser>
let's look together today
<parsa[w]>
if i have node_data<double> x with a dynamicvector<double> in it, do you expect auto& a = x.vector() and auto& b = x.vector() to refer to the same spot in memory?
<hkaiser>
the references will screw things up as those will be bound to rvalues (possible with msvc only, btw)
<hkaiser>
if the references were const, then yes, they woul drefer to the same matrix data
anushi has quit [Ping timeout: 245 seconds]
anushi has joined #ste||ar
jakub_golinowski has joined #ste||ar
<parsa[w]>
hkaiser: should i create a separate PR for the remainder of the primitives after #261?
<parsa[w]>
or should i put more in it
<hkaiser>
could you leave it separate, pls?
<simbergm>
hrm, unlocking a mutex from an os thread which is different from the one it was locked on seems to be undefined behaviour, or am I reading the std::mutex docs wrongly?
<hkaiser>
simbergm: why would you ever want to do that?
<simbergm>
wondering if the yield_whiles cause problems because they suspend the hpx thread and then hpx thread might get moved to another os thread
<hkaiser>
does that involve a kernel mutex?
<simbergm>
yup, compat::mutex
<hkaiser>
uhh ohh
<hkaiser>
that could be causing the problems zao was reporting
<simbergm>
has someone actually run the suspension tests on windows? because I do plenty of that stuff there :/
<hkaiser>
simbergm: zao has run hello_world many thousand times recently on windows
<hkaiser>
that has exposed the issues
<simbergm>
yeah, I saw that, that's why I'm asking
<hkaiser>
nod
<simbergm>
if that's the cause the suspension tests should fail much faster
<hkaiser>
I have not done any stress testing of suspension
<hkaiser>
simbergm: we generally try not to suspend hpx threads with a lock held
<zao>
simbergm: Not sure if you saw, but I failed much faster when running two localities with six threads each.
<hkaiser>
zao: which would make sense
<zao>
Blowing up with unlocking an unheld mutex instead of blowing up on destroying a busy mutex.
<hkaiser>
zao: that's a different problem, I believe
<zao>
Ok.
<hkaiser>
simbergm: do you have to hold that lock while suspending the thread?
<zao>
These suspension tests, are there specific test programs for those?
<simbergm>
zao: most tests in tests.unit.resource
<hkaiser>
anyways, gotta run
hkaiser has quit [Quit: bye]
<simbergm>
hkaiser: I guess I could find other ways, I hold it because I don't want to have multiple threads trying to suspend or resume simultanously
<simbergm>
zao: actually more specifically tests.unit.resource.throttle should blow up if the mutexes are the problem
david_pfander has quit [Ping timeout: 245 seconds]
<zao>
I'll see if I can rig that up to run.
<simbergm>
zao: thanks, that would be very helpful
hkaiser has joined #ste||ar
<github>
[hpx] hkaiser force-pushed fixing_3182 from 7b7c183 to b8b3862: https://git.io/vAaOj
<github>
hpx/fixing_3182 b8b3862 Hartmut Kaiser: Fixing return type calculation for bulk_then_execute....
<zao>
Bah, initial CMake was without tests and no amount of re-running can get it to change its mind.
jakub_golinowski has quit [Ping timeout: 260 seconds]
<zao>
VS runtime explicitly validates that you're on the right thread.
<zao>
I need to hurry out, but I hope that this data helps.
<hkaiser>
nod, good thing it does
<simbergm>
zao: definitely does
<simbergm>
hkaiser: I'll work on this, sorry for not realizing earlier
<simbergm>
for the shutdown we have an unlock_guard when we explicitly call yield, but not for the suspend called by yield_while
<simbergm>
looks like it could be enough to just protect the stopeed_ boolean, or would we want to protect more?
<hkaiser>
simbergm: couldn't you make it atomic in this case?
aserio has joined #ste||ar
<simbergm>
hkaiser: yeah, or that
<simbergm>
it's used in a few other places where we check stopped_ and/or terminated_
<simbergm>
was mainly wondering if it needs to protect something else I'm not realizing
<hkaiser>
simbergm: nobody knows better than you ;)
<simbergm>
I'll have a look anyway
<hkaiser>
thanks!
<simbergm>
suspension I agree, shutdown I hope but I
<simbergm>
'm not certain :)
nikunj has joined #ste||ar
eschnett has quit [Quit: eschnett]
<nikunj>
@hkaiser: While running the init_globally, I meet with an unexpected run-time error: terminate called after throwing an instance of 'std::invalid_argument'
<nikunj>
what(): hpx::resource::get_partitioner() can be called only after the resource partitioner has been allowed to parse the command line options.
<hkaiser>
nikunj: heh
<hkaiser>
nikunj: so the example throws?
<nikunj>
Yes this is what the init_globally example throws at runtime
<hkaiser>
nikunj: ok, let me investigate - would you mind creating a ticket for this?
<nikunj>
@hkaiser: Surely, I will. According to what I have investigated about this issue, it seems to come from a misbehaving thread
<hkaiser>
nikunj: I think we just have not updated the example after the recent changes to the resource partitioner in hpx
<hkaiser>
simbergm: you copy ^^ ?
jakub_golinowski has joined #ste||ar
jaafar has joined #ste||ar
jakub_golinowski has quit [Read error: Connection reset by peer]
<nikunj>
@hkaiser: Regarding implementing the GSoC project, I think util will consume most of the time followed by runtime ( mainly in hpx::thread and hpx::resource). performance_counter and lcos should take about the same time and parallel should come next in order of time consumption.
<nikunj>
@hkaiser: Is this conclusion correct?
parsa[[w]] has joined #ste||ar
parsa[w] has quit [Ping timeout: 245 seconds]
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
aserio1 is now known as aserio
<simbergm>
hkaiser: yep, I will also investigate
<diehlpk_work>
To all GSoC srudents, I heard that some of you want to submit the proposal withour showing it to us. I definitely do not recommend this. Please share the proposal before submission with us
<K-ballo>
I'm curious where one would hear such a thing
<diehlpk_work>
Had some chats with students and they wanted to do it like this. I did not aks why the plan to do it
<github>
[hpx] msimberg opened pull request #3209: Fix locking problems during shutdown (master...fix-shutdown-locks) https://git.io/vAMpJ
<simbergm>
zao: there's now #3209 with an additional fix for the windows shutdown problem, if you have time to stress test hello_world again that would be greatly appreciated
<simbergm>
next: suspension locks...
akheir has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 265 seconds]
aserio1 is now known as aserio
Smasher has joined #ste||ar
parsa has quit [Read error: Connection reset by peer]
parsa| has joined #ste||ar
<zao>
"Failed to connect to github.com port 443: Timed out"
<zao>
Welp, not gonna test that right now then :D
vamatya has joined #ste||ar
parsa| has quit [Quit: Zzzzzzzzzzzz]
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
eschnett has joined #ste||ar
victor_ludorum has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoun_ has joined #ste||ar
EverYoun_ has quit [Remote host closed the connection]
<zao>
0x00007ffa9e60ddf0 "f:\\dd\\vctools\\crt\\crtw32\\stdcpp\\thr\\mutex.c(51): mutex destroyed while busy"
<zao>
Main thread remaining, in teardown after main.
<zao>
Original problem.
<zao>
The thread that supposedly holds the mutex is another ID than the main thread, btw.
<zao>
Noteworthy is that the other process exited with "Process 1 failed with an unexpected error code of -1073741819 (expected 0)" when this assert fired.
<zao>
So this case is triggered by the other locality bailing out at some point?
<zao>
That's 0xc0000005 btw, access violation.
<zao>
The double printout in the non-crashing case is curious too.
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 245 seconds]
aserio has joined #ste||ar
jakub_golinowski has joined #ste||ar
parsa has joined #ste||ar
nanashi55 has quit [Ping timeout: 256 seconds]
nanashi55 has joined #ste||ar
aserio has quit [Ping timeout: 276 seconds]
aserio has joined #ste||ar
nanashi55 has quit [Ping timeout: 265 seconds]
nanashi55 has joined #ste||ar
victor_ludorum has quit [Quit: Page closed]
nikunj has quit [Quit: Page closed]
<zao>
This is neat. If I kill the second process, the first process may just wedge forever or crash.
<zao>
I may have accidentally killed the first process there.
<hkaiser>
zao: now you're getting inventive ;)
<zao>
It seems that if the process is told to shut down, we try to look up a config entry which relies on having a working get_partitioner?
<hkaiser>
nod
<hkaiser>
I thought I fixed that
<hkaiser>
zao: ahh, it's #3202, not merged yet
<zao>
Ah yes, it's trying to throw std::invalid_argument there it seems.
<zao>
Was confused by the control flow.
<hkaiser>
nod
<hkaiser>
that's unrelated (and fixed)
<zao>
My hope would be to get to a point where I could provoke locality 0 into reliably breaking so I could record a time-travel trace of it.
<jbjnr>
hkaiser: heller_ After a very long day battling the snow over here in the uk (there's hardly any where I am), I have just returned from Warwick Uni where I had my PhD Defense. Passed. yay \o/
<hkaiser>
jbjnr: congrats!
<hkaiser>
Dr. John!
<jbjnr>
One of the examiners really like the HPX serialization paper - but said - what a shame it was in such a shit conference!
<jbjnr>
thanks hkaiser
<hkaiser>
I'm glad for you that everything ended well