aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
<github>
[hpx] hkaiser force-pushed resource_partitioner from c75fb59 to 0d234ed: https://git.io/v7lfK
<github>
[hpx] hkaiser pushed 1 new commit to resource_partitioner: https://git.io/v7EhR
<github>
hpx/resource_partitioner 6147af2 Hartmut Kaiser: Making inspect happy
<jbjnr>
hkaiser: is up early again. Good morning.
<hkaiser>
jbjnr: hey
<jbjnr>
thanks for the ongoing RP cleanup. Almost there
<hkaiser>
sure, most welcome
<hkaiser>
jbjnr: I noticed that the rp branch does not have the changes Thomas' colleque pushed to the throttling scheduler
<jbjnr>
wasn't it removed?
<jbjnr>
WE might have messed up a merge
<hkaiser>
it was replaced
<jbjnr>
ok.
<jbjnr>
is much missing?
<hkaiser>
shrug, have not looked yet
<jbjnr>
if you can remember the commit that made the changes in master, then I'll go through and fix stuff
<jbjnr>
I'll look on github for it
<jbjnr>
#2640 I expect
<hkaiser>
jbjnr: #2640 and #2686
<jbjnr>
there were a couple of commmit on the RP branch wehere I found stuff that should have been deleted, but wasn't. I'll review them and see if I squashed the new stuff by mistake
<hkaiser>
ok, tks
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
bikineev has quit [Remote host closed the connection]
<jbjnr>
heller: seems very reasonable.
<heller>
yay
<jbjnr>
instead of "receive parcel", maybe "deserialize parcel" or "decode parcel" because the receive part is probably 'implied' by the 'receiving' arrow
<jbjnr>
HPX group and add them all there. job done
<heller>
nice!
* zao
publishes as Bobby Tables
<heller>
he
<heller>
jbjnr: can it also export to bibtex?
<jbjnr>
I think so. You create a group, add papers, then export all of them to a bib file. I haven't actualy used it for years, but when I was on the SPHERIC steering committee, we maintained one. it's still there, but I didn't play - just posted the link in case it is helpful
<jbjnr>
yup. try the export button on that group page
<heller>
nice
pree_ has joined #ste||ar
eschnett has quit [Quit: eschnett]
ajaivgeorge has joined #ste||ar
eschnett has joined #ste||ar
<hkaiser>
heller: the AGAS image is nice but not entirely correct
<heller>
hkaiser: why not?
<heller>
I mean sure, the destination is determined before encoding
<heller>
and the address does not have to be resolved, potentially
<heller>
I wanted to go for a simple image though
<heller>
hkaiser: or do you mean something else?
ajaivgeorge has quit [Quit: ajaivgeorge]
<hkaiser>
the resolve gid to destination is more like 'determine whether target is local'
<hkaiser>
otherwise we wouldn't have to talk to agas on the sender
<heller>
well, end transform gid to endpoint address
<heller>
*and
<hkaiser>
is that an agas op? we deliberately encode this info in the gid itself
<heller>
conceptually, it is the responsibility of the locality namespace, but we cache the endpoint information in the parcel handler
<heller>
the gid contains the prefix, yes
<hkaiser>
so no agas on the sender at all
<heller>
but if an action is local, it never reaches the parcel handler
<heller>
sure, querying the locality namespace
<hkaiser>
yes, agas on the 'receiver' only here as well
<hkaiser>
heller: shrug, you wanted an opinon, you got it - a phd does not have to reflect reality ;)
<heller>
the fact that we might not need to query AGAS on the receiver is an optimization (due to caching the address on the sender)
<heller>
the feact that we might not need to query AGAS on the receiver is an optimization as well ;)
<hkaiser>
yaya
<hkaiser>
all of agas is an optimization
<jbjnr>
when can we replace the agas backen with a proper distributed in memory key value store?
<jbjnr>
^backend
<hkaiser>
jbjnr: whenever somebody does it - I doubt however that this would give us any benefit except perhaps persistence
<jbjnr>
yes. you said that about parcelports and latencies too!
<jbjnr>
:)
<hkaiser>
jbjnr: prove it to me! ;)
<jbjnr>
not today, not tomorrow, but one day. Maybe.
<jbjnr>
maybe not
<hkaiser>
ok, deal
<heller>
hkaiser: which makes me think if we need the local agas cache at all ;)
<hkaiser>
heller: I think we did some measurements
<hkaiser>
but feel free to disable it, it's a config setting away
<heller>
well, it should be mostly irrelevant if we look up the address on the sender through the cache or on the receiver in the primary namespace (if it is not set explicitly in the case of a promise)
<heller>
anyway
<heller>
back to writing
<hkaiser>
heller: just try not to get carried away into storyland too much ;)
<heller>
you have a point that I should mention those optimizations in the text
<heller>
hkaiser: too late!
<hkaiser>
yah, I was afraid it was
<heller>
isn't this what writing this stuff is about?
<heller>
telling a nice story?
<hkaiser>
sure, as I said, a phd does not have to reflect reality
<heller>
that was a joke, no?
<hkaiser>
;)
<heller>
I should just write a single page: HPX is awesome, I spent the last 6 years working with it. Do you think I wasted my time? It's just the best thing since sliced bread, trust me.
<heller>
this would nicely match with the current raison d’être
<github>
[hpx] hkaiser closed pull request #2791: Circumvent scary warning about placement new (master...fixing_any_warning) https://git.io/v70sX
aserio has joined #ste||ar
<heller>
hkaiser: regarding #2796, the problem seems to be that promise_data never sets started_ to true, and therefor wait_until always returns deferred
<hkaiser>
ok
<hkaiser>
I thought you were writing? ;)
<hkaiser>
aserio: see pm, pls
zbyerly_ has joined #ste||ar
<hkaiser>
heller: I'll look into this now
akheir has joined #ste||ar
<heller>
hkaiser: the issue is the result of me wanting to write ;)
<hkaiser>
heller: interesting - is that the correct solution, though?
<heller>
No
<heller>
Fails if the launch policy is deferred
<heller>
That was just a quick hack to get kiril going
<jbjnr>
hkaiser: yt? I have a small problem with thread numbering that I believed I had fixed some time ago. Did you by any chance change anything to do with thread nums anywhere?
<jbjnr>
(recently)
<hkaiser>
jbjnr: here
<hkaiser>
jbjnr: I did
<hkaiser>
jbjnr: it was blowing up without that change
<jbjnr>
what was the change?
<hkaiser>
hold on, let me think
<jbjnr>
thanks. I have to disappear, but I'll check back here later.
<jbjnr>
then I'll fix it again for all the wschedulers etc
<hkaiser>
jbjnr: otherwise the code was accessing some vector out of bounds
zbyerly_ has quit [Ping timeout: 260 seconds]
david_pfander1 has joined #ste||ar
zbyerly_ has joined #ste||ar
david_pfander1 has quit [Ping timeout: 240 seconds]
david_pfander has quit [Ping timeout: 240 seconds]
<github>
[hpx] hkaiser created fixing_2796 (+1 new commit): https://git.io/v7u9y
<github>
hpx/fixing_2796 49ff4d8 Hartmut Kaiser: Making sure future::wait_for et.al. work properly for action results...
pree_ has quit [Read error: Connection reset by peer]
<github>
[hpx] hkaiser opened pull request #2797: Making sure future::wait_for et.al. work properly for action results (master...fixing_2796) https://git.io/v7u9A
<hkaiser>
heller: ^^
<heller>
hkaiser: the test should add a small delay before checking the values
david_pfander has joined #ste||ar
<hkaiser>
why?
<hkaiser>
it waits for 3 seconds already
<hkaiser>
once wait_for returns the operation is finished
<hkaiser>
heller: ^
aserio has quit [Ping timeout: 258 seconds]
david_pfander has quit [Remote host closed the connection]
zbyerly_ has quit [Ping timeout: 240 seconds]
david_pfander has joined #ste||ar
pree_ has joined #ste||ar
aserio has joined #ste||ar
<aserio>
wash[m] parsa[w]: Meeting?
<aserio>
hkaiser:
<wash[m]>
aserio: ping, I am ready on skype
<hkaiser>
aserio: I'm on skype
akheir_ has joined #ste||ar
<heller>
hkaiser: sure, but theoretically, the task might not be finished yet
<hkaiser>
heller: no, then the future wouldn't have been ready yet, so the other test would fail
zbyerly_ has joined #ste||ar
<heller>
The wait could time out
vamatya has joined #ste||ar
<hkaiser>
heller: yah, but then the test will fail anyways
<hkaiser>
there is no safe way to prevent this
<heller>
It only fails because the function hasn't been called yet, no?
akheir_ has quit [Remote host closed the connection]
vamatya has quit [Ping timeout: 260 seconds]
zbyerly_ has quit [Ping timeout: 258 seconds]
jfbastien has joined #ste||ar
<github>
[hpx] hkaiser force-pushed resource_partitioner from 6147af2 to 8e59ff9: https://git.io/v7lfK
<github>
hpx/resource_partitioner 8e59ff9 Hartmut Kaiser: Making inspect happy
zbyerly_ has joined #ste||ar
<hkaiser>
heller: it waits for 3 seconds already, how muchlonger do you suggestto wait?
pree_ has quit [Read error: Connection reset by peer]
<hkaiser>
3 seconds should be enough for a local thread to be scheduled
<hkaiser>
aserio: Erica can probably tell you what project that is
<hkaiser>
aserio: ahh, it's Steve's AGAVE project - go kick him !
<aserio>
hkaiser: yea he is not in :(
<hkaiser>
is he travelling?
<aserio>
Let me check
zbyerly__ has joined #ste||ar
<aserio>
hkaiser: he is at the ET conference in Illinois
<aserio>
his week
<hkaiser>
aserio: in any case, should I contact Erica wrt the temp account?
<hkaiser>
darn
<aserio>
Sure
<aserio>
hkaiser: and as soon as we have it we will get the money :p
<hkaiser>
sure, doesn't matter
pree_ has joined #ste||ar
<heller>
hkaiser: i just dislike tests depending on timing for a non deterministic system. Sure it will probably succeed in 99%... But you never know
<hkaiser>
heller: any suggestions?
<heller>
Check the called atomic only after init returned?
<hkaiser>
heller: 'init'?
<heller>
hpx::init
<hkaiser>
heller: you lost me
<hkaiser>
ahh
<hkaiser>
well, either wait_for times out, in which case the next check will fail (not only the called atomic check), or it does not time out and all is well
<hkaiser>
so checking called after hpx::init does not help in any way
<hkaiser>
aserio: Irina suggests for a meeting today at 3 or 4pm, what time would work for you (in case you'd like to attend)?
<aserio>
hkaiser: sure, I can be there
<hkaiser>
is 3 pm ok?
<hkaiser>
aserio: ^^?
<aserio>
hkaiser: I will be there
<hkaiser>
k
<heller>
hkaiser: hu? If the wait times out, the status still shouldn't be deferred. Instead of a bool, you could just increment a counter, and check if it's 2 before returning from main
vamatya has joined #ste||ar
<heller>
But yes, it's just a nitpick. It should test calling into the remote locality though, no?
<hkaiser>
why?
<hkaiser>
the problem is reproducible on one locality
<quantomb>
aserio: both Mark and Juana are out today. I understand that Hartmut is as well. I think we should postpone the meeting so you do not have to walk all the way over here
<aserio>
I would be fine with that
<quantomb>
aserio: Bibek and I are meeting together now and I will talk with Ka Ming when he gets in.
<aserio>
Ok
<aserio>
thanks for the heads up!
pree_ has joined #ste||ar
<heller>
hkaiser: sure, but the test doesn't need to run on multiple localities ;)
mars0000 has joined #ste||ar
pree_ has quit [Ping timeout: 268 seconds]
diehlpk has joined #ste||ar
pree_ has joined #ste||ar
<hkaiser>
heller: that's fixed already
pree_ has quit [Ping timeout: 276 seconds]
<hkaiser>
aserio: yt?
<aserio>
yes
<hkaiser>
aserio: Steve has submitted the report and is was accepted
<aserio>
hkaiser: so do we now wait in suspence
<hkaiser>
aserio: could you please set up a kickoff phone call for Phylanx?
<hkaiser>
aserio: we should do the temp account anyways, just to get th eball rolling
<hkaiser>
I sent an email to Erica, but she's out today
<hkaiser>
aserio: funny thing is, Chris poked NSF which caused them to run around like chicken with their heads cut off ;)
<hkaiser>
at least 5 people involved over there...
patg[[w]] has joined #ste||ar
bibek_desktop has quit [Quit: Leaving]
bibek_desktop has joined #ste||ar
pree_ has joined #ste||ar
<pree_>
parsa[w] : yt ?
mars0000 has quit [Quit: mars0000]
mars0000 has joined #ste||ar
zbyerly__ has quit [Ping timeout: 240 seconds]
zbyerly_ has quit [Ping timeout: 246 seconds]
zbyerly_ has joined #ste||ar
zbyerly_ has quit [Ping timeout: 246 seconds]
<hkaiser>
aserio: sorry for volunteering you ;)
pree_ has quit [Quit: AaBbCc]
mars0000 has quit [Quit: mars0000]
bibek_desktop has quit [Quit: Leaving]
bibek_desktop has joined #ste||ar
bibek_desktop has quit [Client Quit]
eschnett has quit [Quit: eschnett]
bibek_desktop has joined #ste||ar
diehlpk has quit [Remote host closed the connection]
<zao>
/home/zao/stellar/hpx/tests/regressions/agas/duplicate_id_registration_1596.cpp:59:5: error: ‘ViewRegistrationListener’ is not a member of ‘hpx::server’
<zao>
Not sure where hpx::server comes in there, but I suspect the test needs to ::server
wash has quit [Ping timeout: 260 seconds]
<hkaiser>
zao: does fully qualifying things work?
<zao>
I qualified ::server in HPX_REGISTER_ACTION_DECLARATION and HPX_REGISTER_ACTION, seems to build.
<hkaiser>
nod
mars0000 has joined #ste||ar
mars0000 has quit [Client Quit]
<zao>
Did you know that a whole lot of functions are deprecated and will be removed in the future? Every single bloody test compiled nags me about it :D
<zao>
(inclusive_scan, mostly)
<zao>
Lots of warnings that annoy me, let's see if I can make a pass to kill them off some day.
<zao>
Like the mm-prefetch that has been there forever, or the char* casts of string literals to build argvs.
<wash[m]>
NERSC just did an emergency shutdown for a wildfire. Crazy
<zao>
If your cooling can't handle a bit of external fire...
EverYoun_ has joined #ste||ar
<wash[m]>
The transformer had to be shut off
<wash[m]>
We shut down the entire lab. Including the particle accelerator
<wash[m]>
Literally 3 minutes notice
<zao>
Heh, fun.
EverYoung has quit [Ping timeout: 246 seconds]
<zao>
Seriously though, do you people know how to robustly build HPX in parallel?
<zao>
core can be built -j16, tests start out decent at -j8 but now I'm out of swap.
<zao>
How hard could it be to gate things based on how much memory it'll take to build :(
<zao>
Kind of bummed by having to build tests with -j2 or -j3
<github>
[hpx] hkaiser pushed 1 new commit to resource_partitioner: https://git.io/v7z5Q
<github>
hpx/resource_partitioner 6c32863 Hartmut Kaiser: Refactoring thread pool
aserio has quit [Quit: aserio]
<github>
[hpx] hkaiser force-pushed resource_partitioner from 6c32863 to 8a19601: https://git.io/v7lfK
<github>
hpx/resource_partitioner 8a19601 Hartmut Kaiser: Refactoring thread pool
<zao>
Welp, this machine can't build HPX effectively. The partitioned tests take over 8 bloody gigs to link.
<zao>
Each.
<zao>
9.7G for parallel segmented vector test.
<zao>
Fudge this, I'm turning the computer into a flowerpot instead. More productive.
<hkaiser>
zao: the power of templates!
parsa[w] has quit [Read error: Connection reset by peer]
<zao>
Are debug builds worse in some way?
<hkaiser>
might make sense to pre-generate partitioned_vector<double> so that it doesn't have to be built for each test