aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
diehlpk has joined #ste||ar
diehlpk has quit [Ping timeout: 248 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 258 seconds]
hkaiser has quit [Quit: bye]
K-ballo has quit [Quit: K-ballo]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 258 seconds]
pree has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 258 seconds]
pree has quit [Remote host closed the connection]
pree has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
pree has quit [Ping timeout: 240 seconds]
pree has joined #ste||ar
pree has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
<zao> tests.unit.resource.throttle has timed out twice in 104+1 runs, but that's known I believe.
hkaiser has joined #ste||ar
<hkaiser> zao: the executor failures are know - I jope heller works on those
<hkaiser> (at least he promises to work on those for a while now)
<zao> Is it possible to run multiple test suites at the same time on a machine?
<zao> w.r.t TCP ports and whatnot?
<hkaiser> hmmm, probably not
<zao> Was considering setting up a single-machine SLURM with oversubscription, so I could run like 2 or 4 nodes on the machine.
<zao> My local Slurm guru tells me it shouldn't be rocket surgery, but then you've got shared port space among nodes.
<hkaiser> could be implemented, though
<zao> Nothing important, just idly wondering.
<heller> hkaiser: i'm working on those
<heller> I already pushed a partial patch, the only missing piece is the throttle stuff...
<hkaiser> heller: :D
<heller> I'll post a PR once the kids are in bed. Let's not waste to much time on this feature which won't be used for real anyways
<hkaiser> heller: remove the throttle scheduler
<heller> I'm not talking about that scheduler
<heller> I'm talking about the remove_processing_unit function provided by the RP
<hkaiser> huh?
<hkaiser> why do you think it's not needed? and btw, I wasn't even aware that this is a problem
<heller> See tests/unit/resource/throttle.cpp
<heller> Well, it works right now
<hkaiser> ahh, so it's not the throttle scheduler, but the RP functionality to remove PUs from a scheduler
<heller> Yes
<hkaiser> don't remove this, it's essential
<heller> Sure
<hkaiser> you said: 'Let's not waste to much time on this feature which won't be used for real anyways'
<heller> I never planned on removing it. Just on not testing it right now
<hkaiser> I will use it for real
<hkaiser> you lost me
<hkaiser> what are you fixing then?
<heller> I'm trying to fix it
<heller> Then I'm under the impression that I'm the only one using it
<hkaiser> what is 'it'?
<heller> Gtg, I'll get back to you later
<hkaiser> k
<zao> Ooh nice, distributed.tcp.migrate_component has failed once instead of the sporadic timing out, https://gist.github.com/zao/0148bdf47a7372b17d8baef9eb300946
<zao> Hardware queues - cute
<hkaiser> zao: that's long overdue
pree has joined #ste||ar
pree_ has joined #ste||ar
pree has quit [Read error: Connection reset by peer]
<heller> hkaiser: ok, so the test in tests/unit/resource/throttle.cpp is using the RP to turn cores on and off. This is what I use for the throttling in allscale now, so the special scheduler can go from my side now
<heller> the problem is that the changes to make that work properly messes with the regular shutdown detection. By reverting the shutdown detection as we had before, more or less, breaks this unit test
<heller> for some reason, the background threads aren't shut down properly when removing one specific scheduling loop
<heller> but anything else seems to work properly for now.
pree_ has quit [Ping timeout: 255 seconds]
<heller> and I can't find the place where this is happening right now :/
<heller> the thing is, that I don't think someone is actively using this feature right now... we probably should get current master working again as a priority
pree_ has joined #ste||ar
<K-ballo> master is broken?
<heller> yes
jaafar has joined #ste||ar
pree_ is now known as pree
<github> [hpx] hkaiser created reporting_set_affinity_problems (+1 new commit): https://git.io/vdgJz
<github> hpx/reporting_set_affinity_problems a9079ca Hartmut Kaiser: Making error reporting during problems with setting affinity masks more verbose...
mcopik has joined #ste||ar
pree has quit [Quit: AaBbCc]
mcopik has quit [Ping timeout: 248 seconds]
mcopik has joined #ste||ar
jaafar has quit [Ping timeout: 240 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
<github> [hpx] hkaiser force-pushed reporting_set_affinity_problems from a9079ca to 0573ee9: https://git.io/vdgCV
<github> hpx/reporting_set_affinity_problems 0573ee9 Hartmut Kaiser: Making error reporting during problems with setting affinity masks more verbose...