hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
wash[m] has quit [Ping timeout: 252 seconds]
wash[m] has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
wash[m]_ has joined #ste||ar
wash[m] has quit [Ping timeout: 246 seconds]
wash[m]_ is now known as wash[m]
hkaiser has quit [Quit: bye]
Vir has quit [Ping timeout: 250 seconds]
Vir has joined #ste||ar
jbjnr has joined #ste||ar
Vir has quit [Ping timeout: 264 seconds]
<simbergm> jbjnr: yt? I'm slightly worried about this: http://cdash.cscs.ch//viewTest.php?onlyfailed&buildid=51840
<simbergm> they're seem to time out every time, but only with that builder
<jbjnr> you shouldn't be. the test is completely broken anyway
<simbergm> how broken?
<jbjnr> oh hold on, that's 4 differnt tests
<simbergm> i.e. what do you know?
<jbjnr> the block_executor doesn't work properly with the RP if I recall correctly
<simbergm> also, if the scheduler branch is pretty far along it would probably be a good idea to open a PR already now so that we can review the parts that are ready
<simbergm> at all?
<jbjnr> those tests need updating. I did make flyby fixes to one of them but wish I hadn't, so this stuff has nothing to do with my numa allocator
<jbjnr> I just didn't like there being an old numa alocator and numa transpose test that didn't work
<simbergm> right :/
<simbergm> something changed though... I'll have a look
<jbjnr> I'll remove changes to that stuff from my pr - lookinga th the dashboard, it seems that the old tests useed to at least pass
<jbjnr> even if they were not doing numa very well.
<simbergm> ok, I'm just confused by what might've caused the change... you only changed one of the transpose examples and the topology changes you made seem unrelated
<jbjnr> correct
<jbjnr> (we have so much old unmaintained code that can cause problems like this that hold us back from making changes - these are unmaintained examples and not proper unit tests)
<simbergm> yep, also correct
<simbergm> they're there to avoid them breaking without us knowing, but I'm not at all against removing them if they're not relevant anymore
<simbergm> *the examples as tests are there...
mdiers_ has joined #ste||ar
<simbergm> anyway, if you can remove the unrelated changes that'd be good, we can see if the examples still fail, and then decide if we should remove some of them
<simbergm> it'd be a shame to remove them completely without a replacement
<jbjnr> I was going to try to fix the numa_transpose - but it has a ton of code and uses the block_executor and it's own numa allocator, plus a bunch of other stuff that doesn't really fit with the RP way of doing things
<simbergm> and open a PR with the scheduler changes ;) you can set it to a "draft" PR so that we don't merge it before you're done with it
<jbjnr> (launching threads on cores directly)
<simbergm> feel free to fix it, but it can go in another PR
<jbjnr> if I canot get a simple numa allocator PR in, then there's no point in me submitting my scheduler fixes
<simbergm> sure, that's why it's good to keep the numa allocator PR free from unrelated changes so that we can get it in
<jbjnr> it still doesn't work on those docker containers and windows machines
<simbergm> can you tell by the output why?
<simbergm> either the numa allocator isn't general enough to run on a machine with one numa domain, or the test just doesn't make sense on whatever machine the circleci tests are run? if the latter you can skip the test as long as you can detect whatever condition it is that breaks it
<jbjnr> simbergm: because ---- instead of 0000 - I need to put in a default fallback for machines that don't have the hwloc support that we use. It will end up just assuming numa node 0 for everything
<simbergm> what does ---- mean? hwloc can't detect anything about numa domains?
<jbjnr> in my unit test, I create an array, bind pages to numa nodes then create a string with the 'detected' numa node for each page and compare it to what it should get. If hwloc can't get the numa I return the string '-' instead of '0' or '1' etc, so the string compare of expected and detected fails.
<jbjnr> so far hwloc worked on laptop/daint/ault/greina/dom/etc, but cicrcleci manages to surprise us
<simbergm> jbjnr: ok, maybe skipping the test with a warning is better since the test doesn't make much sense after that? I just hope it won't then silently break on daint after hwloc changes something... not sure how we can get loud errors on daint but skip it on circleci
<jbjnr> The test is quite thorough - if something changes, it will triggger a fail - that's the reason I have '-' as an output as well. If hwloc fails to get the number, it triggers a fail. What would be better is to know why circleci fails - I presume due to container use, but I have no idea how to setup a containter and test for that
<jbjnr> for windows we can easily just disable the test, or dfault to numa 0 - this is not a problem, but the container one is annoying, because using a default for that, might cause silent fails on other machines in the future
<jbjnr> is there a help section anywher in the docs on container use of hpx
<simbergm> stellargroup/build_env:ubuntu gives you the same image we use on circleci
<jbjnr> thanks
heller has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
heller has joined #ste||ar
heller__ has joined #ste||ar
heller has quit [Ping timeout: 252 seconds]
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
eschnett has quit [Quit: eschnett]
<hkaiser> simbergm, heller__: this looks like to be a useful technique for our module system: https://cristianadam.eu/20190501/bundling-together-static-libraries-with-cmake/
<simbergm> hkaiser: yeah, looks interesting (although annoyingly complicated)
hkaiser has quit [Quit: bye]
aserio has joined #ste||ar
hkaiser has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 276 seconds]
aserio1 is now known as aserio
hkaiser has quit [Quit: bye]
diehlpk has joined #ste||ar
<diehlpk> jbjnr, Do you join the meeting?
hkaiser has joined #ste||ar
diehlpk has quit [Ping timeout: 276 seconds]
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar
aserio has quit [Ping timeout: 246 seconds]
<diehlpk_work> simbergm, The English department advertised GSoD among their students. Some took cases in technical writing.
<simbergm> diehlpk_work: nice!
<simbergm> I'm going to try to finish the blog post tonight so you can send that around as well
jpenuchot has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
K-ballo has joined #ste||ar
<diehlpk_work> simbergm, We have the first student interested
<diehlpk_work> She worked for one year as a technical student writer for a company
<diehlpk_work> Sounds promising
jpenuchot has quit [Ping timeout: 250 seconds]
mdiers_ has quit [Remote host closed the connection]
mdiers_ has joined #ste||ar
jpenuchot has joined #ste||ar
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar
aserio has quit [Ping timeout: 268 seconds]
<heller__> jbjnr: simbergm: Added thoughts on day2 (inline)
<simbergm> it's public now but you might want to run it by yourself or someone else (aserio?) before you post links to it elsewhere
<hkaiser> simbergm: thanks a lot!
<simbergm> heller__: thanks! we should do a call at some point soon to finalize the schedule
<simbergm> hkaiser: pleasure
<simbergm> jbjnr: did you ever advertise gsoc somehow through cscs/eth?
<heller__> Next Monday?
aserio has joined #ste||ar
<simbergm> fine by me
<simbergm> jbjnr: ?
<zao> Yay.
<hkaiser> guys: we have the hpx coordination meeting tomorrow
<hkaiser> just a heads up
<heller__> I'll join
<heller__> Please ping me if I happen to forget it
aserio has quit [Ping timeout: 250 seconds]
aserio has joined #ste||ar
aserio has quit [Ping timeout: 276 seconds]
hkaiser has quit [Quit: bye]
jpenuchot has quit [Quit: Lost terminal]
<K-ballo> heller__: you are working now?
hkaiser has joined #ste||ar
aserio has joined #ste||ar
aserio has quit [Quit: aserio]