aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
<ct-clmsn> @wash: ah right, containers!
<ct-clmsn> @wash: so this is an nvidia container yard?
<zao> I hear some fools are considering sticking GPUs into my cloud.
<ct-clmsn> i'm getting compile time errors from master, compiler can't find HWLOC_OBJ_NUMANODE
<zao> The enumerand should be reasonably old, git blame says late 2014.
<zao> How old is your hwloc?
<hkaiser> ct-clmsn: yah, jbjnr added code requiring a new version of nwloc
<hkaiser> we need to protect that somehow - will talk to John tomorrow
<ct-clmsn> @hkaiser: ah ok, i remembered something about that happening this week - wasn't sure if ya'll had tested for that
<hkaiser> ct-clmsn: I forgot to bring it up during review :/
<hkaiser> sorry for that
<ct-clmsn> np
<ct-clmsn> going through the ast transform code
<ct-clmsn> @hkaiser: i'm going to try to glue my stuff into the mix - it's going to look *really* weird b/c it's probably not going to look very much like the hpx coding style
<wash> ct-clmsn: yah
<ct-clmsn> @wash: nice!
<hkaiser> ct-clmsn: no problem
hkaiser has quit [Quit: bye]
<ct-clmsn> cheers ya'll
ct-clmsn has quit [Quit: Leaving]
EverYoung has quit [Ping timeout: 252 seconds]
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
jaafar has quit [Ping timeout: 246 seconds]
jbjnr has joined #ste||ar
<github> [hpx] biddisco pushed 1 new commit to master: https://git.io/vFteA
<github> hpx/master 6d4d6c5 John Biddiscombe: Protect HWLOC_MEMBIND_BYNODESET behind version check
<zao> Good moroning, all.
<jbjnr> good morning. pun intended?
<zao> I like the expression, but rarely intend to confer any additional meaning :)
<jbjnr> if you were familiar wioth british sitcom, then "good Moaning" might be more appropriate
<zao> Heh, took a look at my test machine... I've apparently run 551 loops of the test suite for a single commit since 2017-10-16
<zao> Some day I'll figure out how to make it test something newer.
<jbjnr> ctest -D NightlyUpdate ??
<zao> Spinning on "ctest -T test" and saving out the XML files for now.
<zao> Hacking a bit on my own database to aggregate results in, to more persistently record them and make nifty queries against.
<jbjnr> you RE AWARE THAT WE HAVE A CDASH SERVER RUNNING FOR STORING TEST RESULTS? http://cdash.cscs.ch/index.php?project=HPX&date=2017-10-27
<jbjnr> OOPS
<jbjnr> caps lock
<jbjnr> sorry
<zao> Vaguely.
<jbjnr> n ifty queries - maybe not, but at least the results are stored
<zao> I don't know if CDash can really do much of the kind of pivoting I want, and I get the feeling that it's
<zao> "one build, one result"
<jbjnr> what do you want to do?
<zao> The build itself is not overly interesting for me, I'm more curious about the stability of the tests.
<zao> A bunch of our tests are a bit flappy, I want hard data on how.
<jbjnr> k
<zao> Worst one I've seen had an incident rate of 1 in 400, having those creep up in CI is a bit icky.
<zao> On that subject, I don't like the tests that hang until timeout, or the tests that take a surprising amount of time to run.
<zao> Slows down the rate of testing until I can figure out how to run multiple suites at once.
<zao> Anyway, it's the only thing this accursed Ryzen is good for :)
david_pfander has joined #ste||ar
jbjnr has quit [Remote host closed the connection]
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vFtJF
<github> hpx/gh-pages 50c1780 StellarBot: Updating docs
jbjnr has joined #ste||ar
jbjnr has quit [Remote host closed the connection]
<github> [hpx] biddisco pushed 1 new commit to master: https://git.io/vFtsO
<github> hpx/master e9881e7 John Biddiscombe: Improve hwloc version checks for HWLOC_OBJ_NUMANODE and HWLOC_MEMBIND_BYNODESET
hkaiser has joined #ste||ar
jbjnr has joined #ste||ar
<msimberg> one reason the throttle test was getting stuck is that the mask is not updated in the loop
<msimberg> and the question: why is it not enough to do the threads::set(...) on line 475? i.e. why is 472-473 needed?
eschnett has quit [Quit: eschnett]
aserio has joined #ste||ar
<hkaiser> msimberg: all of that code was written by heller, I looked but can't really make sense of what's going on, sorry
<msimberg> hkaiser: no worries, thanks anyway
eschnett has joined #ste||ar
<msimberg> but hkaiser, I have a general hpx question for you
<msimberg> how would you normally do an hpx release? how is it decided that it's time for 1.1 for example?
<msimberg> and more practically, is there anything I could do on that side to help? outside of fixing bugs and adding features, that is...
<hkaiser> msimberg: gtg, let's talk later, sorry
hkaiser has quit [Quit: bye]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
rod_t has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 252 seconds]
aserio1 is now known as aserio
eschnett has quit [Quit: eschnett]
<msimberg> jbjnr: thanks! missed that one...
<jbjnr> it doesn't get a lot of air-play - as they say ...
<zao> jbjnr: Say that hypothetically I'd want to shove in results into that there shiny CDash instance. How does one actually do so?
<jbjnr> ctest Submit ?
<zao> (I'm going to run to catch the bus now, but still curious in absence)
<jbjnr> -D or something
<jbjnr> experimental Update, Build, Submit and the results will appear in the dashboard
<jbjnr> if you use the hpx project url. Don';t submit other stuff there please :)
<jbjnr> look in cdash setup or something in the hpx project root somewhere - the urls should be setup already for submission
patg[w] has joined #ste||ar
<jbjnr> msimberg: I volunteer to do an HPX release once, so aserio sent me that doc, so, I unvolunteered immediately and he did it :)
<jbjnr> around 0.99999 or thereabouts
<msimberg> jbjnr: step 29 is at least easy
<jbjnr> lol
<aserio> :)
<msimberg> btw jbjnr: 23 = the number of days since you shaved?
<aserio> jbjnr: Why the need for a release?
<msimberg> aserio: I was just asking in general
<jbjnr> no. 23 was the stack trace depth between calling then_execute and the function actually cally async_execute. Fortunately, hartmut told me I was emulating the wrong async and now it's only a coupl of forwarding calls before the function is dispatched.
<msimberg> ah, almost...
gedaj has quit [Read error: Connection reset by peer]
<aserio> msimberg: Yea, we normally do a release before SC... This year we have been kept quite busy with new projects
<jbjnr> aserio: no need. msimberg was asking how it's done that's all - he comes from a commerical background and loves doing that sort of stuff apparently.
<jbjnr> I think he wants to volunteer for 1.1
<jbjnr> :)
<aserio> I don't think we would stop him...
<msimberg> mmh, loves
<patg[w]> Rule 1. Here don't mention anything that you don't want to be volunteered for
<jbjnr> Rule 2 - don't talk about Rule 1
parsa has joined #ste||ar
<msimberg> well, I did ask how I could help so I guess I'd be in trouble even without rule 1
<patg[w]> aserio: did you see the bibs work in the journal format now
<jbjnr> msimberg: it's about a month now and I cannot wait until halloween is over and I can shave it off. Unfortunately, everyone else seems to like it (family etc).
<msimberg> jbjnr: is it already the correct shape?
hkaiser has joined #ste||ar
gedaj has joined #ste||ar
<jbjnr> I've changed plans and will instead do Wolverine from X-Men (probably). But without the muscles
<aserio> patg[w]: I saw you email... let me try to build it
eschnett has joined #ste||ar
<aserio> patg[w]: You said you were able to build it?
<patg[w]> aserio: yes
<patg[w]> Are you having problems
<aserio> yea
<patg[w]> hmm
<patg[w]> And you pulled the changes
<aserio> are you using the make file?
<patg[w]> yes
<aserio> Ok let me see if that works for me
<aserio> I am just using the MikTex GUI
<patg[w]> It seems they should both work
<aserio> Not necessarily, the make file often is using specific packages to compile things
<aserio> patg[w]: It seems to have worked...
<patg[w]> aserio: hopefully we won't need it but I have a feeling we will not be accepted to ASPLOS
<aserio> Yea, we feel the same
<patg[w]> aserio: Bryce rebuttal is very good though
<aserio> It was!
<patg[w]> aserio: I'll see you at SC right?
<aserio> Yep!
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
parsa has quit [Client Quit]
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
<heller> msimberg: why should the mask change?
<heller> The other code is needed if no affinity mask was set
<msimberg> heller: it may be that I don't know enough about what exactly rp.get_pu_mask returns, so apologies if this is a silly question
<heller> There are no silly questions
<heller> I might have gotten it wrong as well
<msimberg> but at least if you go to the threads::set case, mask will have exactly one bit set in the num_thread position?
<msimberg> and if you've disabled that pu you'll be stuck in the while loop
<heller> Did the test succeed if you update the masks in each iteration?
<msimberg> yes
<heller> Hmmm
<heller> Ok, good catch
<heller> msimberg: excellent catch!
<msimberg> so I unreverted most of your changes and changed that to update the mask and it's happier
<heller> That's the missing piece
<heller> You are of course correct
<msimberg> the only remaining thing was if you use the local_priority_queue_scheduler it hangs in some cases when going over numa domains
<msimberg> because only the first pu will steal across domains
<heller> Yes
<heller> That's a general problem
<msimberg> hence I was asking about the rp.get_pu_mask... is it necessary? or would it be enough to do just to set the one bit?
<heller> Because we rely on stealing here...
<msimberg> thinking of the case when a lot of pus are disabled and there might(?) be unnecessary work with the rp.get_pu_mask
<heller> Yes, we need the mask
<msimberg> so in what cases would the first pu mask be empty?
<heller> There is this silly thing with logical and physical numberings and all this mess
<msimberg> hmm, so other way around: when would threads::set be wrong?
<heller> If the user runs the hpx application with the --hpx:bind=none parameter
<heller> If the physical id is not the same as num_thread+offset
<msimberg> I see
parsa has joined #ste||ar
<msimberg> heller: I wasn't sure if this was on purpose: https://github.com/STEllAR-GROUP/hpx/commit/78c3b59bc6b5047f3814c9a849e3a91b8694f3ca
<parsa> K-ballo: *ping*
<msimberg> I didn't see any problems with it but I assume it was just a typo?
<heller> Could be, yes
<msimberg> ok
<heller> I did a lot of trial and error. If it doesn't make sense to you, ignore it
<msimberg> all right
<msimberg> so far I just used the changes you already had for cleanup_terminated directly
<msimberg> but will have a closer look at that as well
<msimberg> heller: thanks, gtg
<heller> msimberg: thanks a lot. Much appreciated. Let me know once you have something. Then I'll test it with my code
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
parsa has quit [Client Quit]
david_pfander has quit [Ping timeout: 248 seconds]
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
jaafar has joined #ste||ar
patg[w] has quit [Quit: Leaving]
Bibek has quit [Quit: Leaving]
Bibek has joined #ste||ar
hkaiser has quit [Quit: bye]
eschnett has quit [Quit: eschnett]
mbremer has joined #ste||ar
eschnett has joined #ste||ar
jbjnr has quit [Quit: ChatZilla 0.9.93 [Firefox 56.0.2/20171024165158]]
jbjnr has joined #ste||ar
jaafar has quit [Ping timeout: 240 seconds]
jaafar has joined #ste||ar
parsa has joined #ste||ar
hkaiser has joined #ste||ar
aserio has quit [Quit: aserio]
twwright_ has joined #ste||ar
twwright has quit [Ping timeout: 255 seconds]
twwright_ is now known as twwright
<github> [hpx] K-ballo force-pushed then-fwd-future from b5a3c9c to 559b8bf: https://git.io/vFq60
<github> hpx/then-fwd-future 559b8bf Agustin K-ballo Berge: Fix future used with continuation on .then()
mbremer has quit [Quit: Page closed]
<github> [hpx] hkaiser force-pushed local_new_fallback from 13880dd to 8e5be29: https://git.io/vFI3A
<github> hpx/local_new_fallback 8e5be29 Hartmut Kaiser: Fall back to creating local components using local_new...
wash has quit [Quit: leaving]
wash has joined #ste||ar
wash has quit [Ping timeout: 240 seconds]
wash has joined #ste||ar