aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
<github> [hpx] hkaiser force-pushed resource_partitioner from c75fb59 to 0d234ed: https://git.io/v7lfK
<github> hpx/resource_partitioner 0d234ed Hartmut Kaiser: Fixing warnings, re-implemented missing pieces...
vamatya has quit [Ping timeout: 260 seconds]
ajaivgeorge has quit [Ping timeout: 240 seconds]
ajaivgeorge has joined #ste||ar
ajaivgeorge has quit [Ping timeout: 240 seconds]
eschnett has joined #ste||ar
ajaivgeorge has joined #ste||ar
ajaivgeorge has quit [Ping timeout: 240 seconds]
ajaivgeorge has joined #ste||ar
ajaivgeorge has quit [Ping timeout: 246 seconds]
mars0000 has joined #ste||ar
ajaivgeorge has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
zbyerly_ has joined #ste||ar
hkaiser has quit [Quit: bye]
https_GK1wmSU has joined #ste||ar
https_GK1wmSU has left #ste||ar [#ste||ar]
bikineev has joined #ste||ar
zbyerly_ has quit [Ping timeout: 255 seconds]
bikineev has quit [Read error: Connection reset by peer]
bikineev has joined #ste||ar
mars0000 has quit [Quit: mars0000]
mars0000 has joined #ste||ar
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
vamatya has joined #ste||ar
mars0000 has quit [Ping timeout: 240 seconds]
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
vamatya has quit [Ping timeout: 240 seconds]
bikineev has quit [Ping timeout: 255 seconds]
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
ajaivgeorge has quit [Ping timeout: 260 seconds]
hkaiser has joined #ste||ar
<github> [hpx] hkaiser closed pull request #2762: hpx::partitioned_vector serializer (master...pv_serializer) https://git.io/vQbET
<github> [hpx] hkaiser deleted pv_serializer at a5b25d0: https://git.io/v7EAq
david_pfander has joined #ste||ar
<github> [hpx] hkaiser pushed 1 new commit to resource_partitioner: https://git.io/v7EhR
<github> hpx/resource_partitioner 6147af2 Hartmut Kaiser: Making inspect happy
<jbjnr> hkaiser: is up early again. Good morning.
<hkaiser> jbjnr: hey
<jbjnr> thanks for the ongoing RP cleanup. Almost there
<hkaiser> sure, most welcome
<hkaiser> jbjnr: I noticed that the rp branch does not have the changes Thomas' colleque pushed to the throttling scheduler
<jbjnr> wasn't it removed?
<jbjnr> WE might have messed up a merge
<hkaiser> it was replaced
<jbjnr> ok.
<jbjnr> is much missing?
<hkaiser> shrug, have not looked yet
<jbjnr> if you can remember the commit that made the changes in master, then I'll go through and fix stuff
<jbjnr> I'll look on github for it
<jbjnr> #2640 I expect
<hkaiser> jbjnr: #2640 and #2686
<jbjnr> there were a couple of commmit on the RP branch wehere I found stuff that should have been deleted, but wasn't. I'll review them and see if I squashed the new stuff by mistake
<hkaiser> ok, tks
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
<heller> what do you guys think?
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
<jbjnr> heller: seems very reasonable.
<heller> yay
<jbjnr> instead of "receive parcel", maybe "deserialize parcel" or "decode parcel" because the receive part is probably 'implied' by the 'receiving' arrow
<heller> jbjnr: good idea
<heller> finally taking more and more shape!
<heller> cause reasons
<jbjnr> ah good. you changed 'prepare parcel' to encode and edecode. much nicer
<heller> indeedly
<heller> also matches the code
<jbjnr> symmetry. lovely
<heller> not that I am good at naming ;)
<heller> inkscape is awesome, yes
<heller> also, I find jabref pretty neat recently
<jbjnr> by symmetry, I meant decode/decode parcel instead of prepare parcel and receive parce.
<jbjnr> encode/decode^
<jbjnr> jabref. yes very handy
<heller> ha, ok
<heller> we should set up a mysql database we can share amongst us to ease the burden of mainting an up to date bibliography
K-ballo has joined #ste||ar
<heller> especially handy for citing C++, or other software packages
<jbjnr> you just create an
<jbjnr> HPX group and add them all there. job done
<heller> nice!
* zao publishes as Bobby Tables
<heller> he
<heller> jbjnr: can it also export to bibtex?
<jbjnr> I think so. You create a group, add papers, then export all of them to a bib file. I haven't actualy used it for years, but when I was on the SPHERIC steering committee, we maintained one. it's still there, but I didn't play - just posted the link in case it is helpful
<jbjnr> yup. try the export button on that group page
<heller> nice
pree_ has joined #ste||ar
eschnett has quit [Quit: eschnett]
ajaivgeorge has joined #ste||ar
eschnett has joined #ste||ar
<hkaiser> heller: the AGAS image is nice but not entirely correct
<heller> hkaiser: why not?
<heller> I mean sure, the destination is determined before encoding
<heller> and the address does not have to be resolved, potentially
<heller> I wanted to go for a simple image though
<heller> hkaiser: or do you mean something else?
ajaivgeorge has quit [Quit: ajaivgeorge]
<hkaiser> the resolve gid to destination is more like 'determine whether target is local'
<hkaiser> otherwise we wouldn't have to talk to agas on the sender
<heller> well, end transform gid to endpoint address
<heller> *and
<hkaiser> is that an agas op? we deliberately encode this info in the gid itself
<heller> conceptually, it is the responsibility of the locality namespace, but we cache the endpoint information in the parcel handler
<heller> the gid contains the prefix, yes
<hkaiser> so no agas on the sender at all
<heller> but if an action is local, it never reaches the parcel handler
<heller> sure, querying the locality namespace
<hkaiser> yes, agas on the 'receiver' only here as well
<hkaiser> heller: shrug, you wanted an opinon, you got it - a phd does not have to reflect reality ;)
<heller> the fact that we might not need to query AGAS on the receiver is an optimization (due to caching the address on the sender)
<heller> the feact that we might not need to query AGAS on the receiver is an optimization as well ;)
<hkaiser> yaya
<hkaiser> all of agas is an optimization
<jbjnr> when can we replace the agas backen with a proper distributed in memory key value store?
<jbjnr> ^backend
<hkaiser> jbjnr: whenever somebody does it - I doubt however that this would give us any benefit except perhaps persistence
<jbjnr> yes. you said that about parcelports and latencies too!
<jbjnr> :)
<hkaiser> jbjnr: prove it to me! ;)
<jbjnr> not today, not tomorrow, but one day. Maybe.
<jbjnr> maybe not
<hkaiser> ok, deal
<heller> hkaiser: which makes me think if we need the local agas cache at all ;)
<hkaiser> heller: I think we did some measurements
<hkaiser> but feel free to disable it, it's a config setting away
<heller> well, it should be mostly irrelevant if we look up the address on the sender through the cache or on the receiver in the primary namespace (if it is not set explicitly in the case of a promise)
<heller> anyway
<heller> back to writing
<hkaiser> heller: just try not to get carried away into storyland too much ;)
<heller> you have a point that I should mention those optimizations in the text
<heller> hkaiser: too late!
<hkaiser> yah, I was afraid it was
<heller> isn't this what writing this stuff is about?
<heller> telling a nice story?
<hkaiser> sure, as I said, a phd does not have to reflect reality
<heller> that was a joke, no?
<hkaiser> ;)
<heller> I should just write a single page: HPX is awesome, I spent the last 6 years working with it. Do you think I wasted my time? It's just the best thing since sliced bread, trust me.
<heller> this would nicely match with the current raison d’être
<github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/v7urp
<github> hpx/master 68bd40c Hartmut Kaiser: Fixing test for partitioned_vector serialization...
<github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/v7uoJ
<github> hpx/master 838e557 Hartmut Kaiser: Merge pull request #2794 from STEllAR-GROUP/fixing_partitioned_vector_iteration...
<github> [hpx] hkaiser closed pull request #2791: Circumvent scary warning about placement new (master...fixing_any_warning) https://git.io/v70sX
aserio has joined #ste||ar
<heller> hkaiser: regarding #2796, the problem seems to be that promise_data never sets started_ to true, and therefor wait_until always returns deferred
<hkaiser> ok
<hkaiser> I thought you were writing? ;)
<hkaiser> aserio: see pm, pls
zbyerly_ has joined #ste||ar
<hkaiser> heller: I'll look into this now
akheir has joined #ste||ar
<heller> hkaiser: the issue is the result of me wanting to write ;)
<heller> hkaiser: thanks for looking into it!
<hkaiser> heller: interesting - is that the correct solution, though?
<heller> No
<heller> Fails if the launch policy is deferred
<heller> That was just a quick hack to get kiril going
<jbjnr> hkaiser: yt? I have a small problem with thread numbering that I believed I had fixed some time ago. Did you by any chance change anything to do with thread nums anywhere?
<jbjnr> (recently)
<hkaiser> jbjnr: here
<hkaiser> jbjnr: I did
<hkaiser> jbjnr: it was blowing up without that change
<jbjnr> what was the change?
<hkaiser> hold on, let me think
<jbjnr> thanks. I have to disappear, but I'll check back here later.
<jbjnr> then I'll fix it again for all the wschedulers etc
<hkaiser> jbjnr: I think I changed this: https://github.com/STEllAR-GROUP/hpx/blob/73d768d5b058f0c040c58e868fb6de9eecebcbd8/hpx/runtime/threads/detail/thread_pool_impl.hpp#L340-L342, i.e. not passing the global threadnum, but the local (relative to pool) to the thread
<hkaiser> jbjnr: otherwise the code was accessing some vector out of bounds
zbyerly_ has quit [Ping timeout: 260 seconds]
david_pfander1 has joined #ste||ar
zbyerly_ has joined #ste||ar
david_pfander1 has quit [Ping timeout: 240 seconds]
david_pfander has quit [Ping timeout: 240 seconds]
<github> [hpx] hkaiser created fixing_2796 (+1 new commit): https://git.io/v7u9y
<github> hpx/fixing_2796 49ff4d8 Hartmut Kaiser: Making sure future::wait_for et.al. work properly for action results...
pree_ has quit [Read error: Connection reset by peer]
<github> [hpx] hkaiser opened pull request #2797: Making sure future::wait_for et.al. work properly for action results (master...fixing_2796) https://git.io/v7u9A
<hkaiser> heller: ^^
<heller> hkaiser: the test should add a small delay before checking the values
david_pfander has joined #ste||ar
<hkaiser> why?
<hkaiser> it waits for 3 seconds already
<hkaiser> once wait_for returns the operation is finished
<hkaiser> heller: ^
aserio has quit [Ping timeout: 258 seconds]
david_pfander has quit [Remote host closed the connection]
zbyerly_ has quit [Ping timeout: 240 seconds]
david_pfander has joined #ste||ar
pree_ has joined #ste||ar
aserio has joined #ste||ar
<aserio> wash[m] parsa[w]: Meeting?
<aserio> hkaiser:
<wash[m]> aserio: ping, I am ready on skype
<hkaiser> aserio: I'm on skype
akheir_ has joined #ste||ar
<heller> hkaiser: sure, but theoretically, the task might not be finished yet
<hkaiser> heller: no, then the future wouldn't have been ready yet, so the other test would fail
zbyerly_ has joined #ste||ar
<heller> The wait could time out
vamatya has joined #ste||ar
<hkaiser> heller: yah, but then the test will fail anyways
<hkaiser> there is no safe way to prevent this
<heller> It only fails because the function hasn't been called yet, no?
akheir_ has quit [Remote host closed the connection]
vamatya has quit [Ping timeout: 260 seconds]
zbyerly_ has quit [Ping timeout: 258 seconds]
jfbastien has joined #ste||ar
<github> [hpx] hkaiser force-pushed resource_partitioner from 6147af2 to 8e59ff9: https://git.io/v7lfK
<github> hpx/resource_partitioner 8e59ff9 Hartmut Kaiser: Making inspect happy
zbyerly_ has joined #ste||ar
<hkaiser> heller: it waits for 3 seconds already, how muchlonger do you suggestto wait?
pree_ has quit [Read error: Connection reset by peer]
<hkaiser> 3 seconds should be enough for a local thread to be scheduled
<hkaiser> also reported as #2785
<hkaiser> something off with the bit masks under certain circumstances
<github> [hpx] K-ballo deleted iota_range at 1860039: https://git.io/v7uAj
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
<hkaiser> aserio: Erica can probably tell you what project that is
<hkaiser> aserio: ahh, it's Steve's AGAVE project - go kick him !
<aserio> hkaiser: yea he is not in :(
<hkaiser> is he travelling?
<aserio> Let me check
zbyerly__ has joined #ste||ar
<aserio> hkaiser: he is at the ET conference in Illinois
<aserio> his week
<hkaiser> aserio: in any case, should I contact Erica wrt the temp account?
<hkaiser> darn
<aserio> Sure
<aserio> hkaiser: and as soon as we have it we will get the money :p
<hkaiser> sure, doesn't matter
pree_ has joined #ste||ar
<heller> hkaiser: i just dislike tests depending on timing for a non deterministic system. Sure it will probably succeed in 99%... But you never know
<hkaiser> heller: any suggestions?
<heller> Check the called atomic only after init returned?
<hkaiser> heller: 'init'?
<heller> hpx::init
<hkaiser> heller: you lost me
<hkaiser> ahh
<hkaiser> well, either wait_for times out, in which case the next check will fail (not only the called atomic check), or it does not time out and all is well
<hkaiser> so checking called after hpx::init does not help in any way
<hkaiser> aserio: Irina suggests for a meeting today at 3 or 4pm, what time would work for you (in case you'd like to attend)?
<aserio> hkaiser: sure, I can be there
<hkaiser> is 3 pm ok?
<hkaiser> aserio: ^^?
<aserio> hkaiser: I will be there
<hkaiser> k
<heller> hkaiser: hu? If the wait times out, the status still shouldn't be deferred. Instead of a bool, you could just increment a counter, and check if it's 2 before returning from main
vamatya has joined #ste||ar
<heller> But yes, it's just a nitpick. It should test calling into the remote locality though, no?
<hkaiser> why?
<hkaiser> the problem is reproducible on one locality
jfbastien has quit [Quit: Textual IRC Client: www.textualapp.com]
akheir has quit [Remote host closed the connection]
bibek_desktop has quit [Quit: Leaving]
bibek_desktop has joined #ste||ar
david_pfander has quit [Quit: david_pfander]
<hkaiser> aserio: ok steve acknowledged to submit the report today
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/v7zLC
<github> hpx/gh-pages 350383f StellarBot: Updating docs
ajaivgeorge has joined #ste||ar
aserio has quit [Ping timeout: 276 seconds]
EverYoung has joined #ste||ar
pree_ has quit [Ping timeout: 260 seconds]
<aserio> hkaiser: thank goodness
aserio has joined #ste||ar
quantomb has joined #ste||ar
<quantomb> aserio: yt?
<aserio> quantomb: yes
<quantomb> aserio: both Mark and Juana are out today. I understand that Hartmut is as well. I think we should postpone the meeting so you do not have to walk all the way over here
<aserio> I would be fine with that
<quantomb> aserio: Bibek and I are meeting together now and I will talk with Ka Ming when he gets in.
<aserio> Ok
<aserio> thanks for the heads up!
pree_ has joined #ste||ar
<heller> hkaiser: sure, but the test doesn't need to run on multiple localities ;)
mars0000 has joined #ste||ar
pree_ has quit [Ping timeout: 268 seconds]
diehlpk has joined #ste||ar
pree_ has joined #ste||ar
<hkaiser> heller: that's fixed already
pree_ has quit [Ping timeout: 276 seconds]
<hkaiser> aserio: yt?
<aserio> yes
<hkaiser> aserio: Steve has submitted the report and is was accepted
<aserio> hkaiser: so do we now wait in suspence
<hkaiser> aserio: could you please set up a kickoff phone call for Phylanx?
<hkaiser> aserio: we should do the temp account anyways, just to get th eball rolling
<hkaiser> I sent an email to Erica, but she's out today
<hkaiser> aserio: funny thing is, Chris poked NSF which caused them to run around like chicken with their heads cut off ;)
<hkaiser> at least 5 people involved over there...
patg[[w]] has joined #ste||ar
bibek_desktop has quit [Quit: Leaving]
bibek_desktop has joined #ste||ar
pree_ has joined #ste||ar
<pree_> parsa[w] : yt ?
mars0000 has quit [Quit: mars0000]
mars0000 has joined #ste||ar
zbyerly__ has quit [Ping timeout: 240 seconds]
zbyerly_ has quit [Ping timeout: 246 seconds]
zbyerly_ has joined #ste||ar
zbyerly_ has quit [Ping timeout: 246 seconds]
<hkaiser> aserio: sorry for volunteering you ;)
pree_ has quit [Quit: AaBbCc]
mars0000 has quit [Quit: mars0000]
bibek_desktop has quit [Quit: Leaving]
bibek_desktop has joined #ste||ar
bibek_desktop has quit [Client Quit]
eschnett has quit [Quit: eschnett]
bibek_desktop has joined #ste||ar
diehlpk has quit [Remote host closed the connection]
<hkaiser> aserio: yt?
<aserio> hkaiser: yes
<hkaiser> aserio: do you have a minute to skype?
<aserio> sure
<zao> Blergh.
<zao> /home/zao/stellar/hpx/tests/regressions/agas/duplicate_id_registration_1596.cpp:59:5: error: ‘ViewRegistrationListener’ is not a member of ‘hpx::server’
<zao> Not sure where hpx::server comes in there, but I suspect the test needs to ::server
wash has quit [Ping timeout: 260 seconds]
<hkaiser> zao: does fully qualifying things work?
<zao> I qualified ::server in HPX_REGISTER_ACTION_DECLARATION and HPX_REGISTER_ACTION, seems to build.
<hkaiser> nod
mars0000 has joined #ste||ar
mars0000 has quit [Client Quit]
<zao> Did you know that a whole lot of functions are deprecated and will be removed in the future? Every single bloody test compiled nags me about it :D
<zao> (inclusive_scan, mostly)
<zao> Lots of warnings that annoy me, let's see if I can make a pass to kill them off some day.
<zao> Like the mm-prefetch that has been there forever, or the char* casts of string literals to build argvs.
<wash[m]> NERSC just did an emergency shutdown for a wildfire. Crazy
<zao> If your cooling can't handle a bit of external fire...
EverYoun_ has joined #ste||ar
<wash[m]> The transformer had to be shut off
<wash[m]> We shut down the entire lab. Including the particle accelerator
<wash[m]> Literally 3 minutes notice
<zao> Heh, fun.
EverYoung has quit [Ping timeout: 246 seconds]
<zao> Seriously though, do you people know how to robustly build HPX in parallel?
<zao> core can be built -j16, tests start out decent at -j8 but now I'm out of swap.
<zao> How hard could it be to gate things based on how much memory it'll take to build :(
<zao> Kind of bummed by having to build tests with -j2 or -j3
<github> [hpx] hkaiser pushed 1 new commit to resource_partitioner: https://git.io/v7z5Q
<github> hpx/resource_partitioner 6c32863 Hartmut Kaiser: Refactoring thread pool
aserio has quit [Quit: aserio]
<github> [hpx] hkaiser force-pushed resource_partitioner from 6c32863 to 8a19601: https://git.io/v7lfK
<github> hpx/resource_partitioner 8a19601 Hartmut Kaiser: Refactoring thread pool
<zao> Welp, this machine can't build HPX effectively. The partitioned tests take over 8 bloody gigs to link.
<zao> Each.
<zao> 9.7G for parallel segmented vector test.
<zao> Fudge this, I'm turning the computer into a flowerpot instead. More productive.
<hkaiser> zao: the power of templates!
parsa[w] has quit [Read error: Connection reset by peer]
<zao> Are debug builds worse in some way?
<hkaiser> might make sense to pre-generate partitioned_vector<double> so that it doesn't have to be built for each test
<hkaiser> zao: much worse
<hkaiser> \o/
<hkaiser> finally a decently used machine
<zao> hkaiser: That's the problem... 8c/16t, 16G of memory, at most two tests built at once with some swapping.
eschnett has joined #ste||ar
patg[[w]] has quit [Ping timeout: 260 seconds]
EverYoun_ has quit [Remote host closed the connection]
<zao> hkaiser: It seems like clang-6.0 and whatever it links with is _way_ better than the GNU clusterhug.
mcopik has quit [Ping timeout: 255 seconds]
<zao> I don't have any evidence, but I'm fairly sure I didn't run out of memory considering how fast this built.
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
<zao> Or just lucky, I guess.
<zao> partitioned_vector_transform_scan_test compiling for just shy of 4G of memory, linking for well under 1G.
<zao> GCC build of the same test eats just shy of 7G to compile, just shy of 2G to link.
<zao> Welp, never running GCC ever again.
<K-ballo> clang-6.0 as in trunk clang?
<zao> Whatever I got from the llvm APT repo, so probably.
<zao> 450 - tests.unit.serialization.serialization_partitioned_vector (Failed)
<zao> 530 - tests.unit.host_.block_allocator (Timeout)
<zao> Just two failed tests, nifty.
<zao> clang version 6.0.0-svn309375-1~exp1 (trunk)
<zao> trunk indeed.
<zao> Few days old, itseems.
<zao> Time to hit the sack, g'night.
<hkaiser> zao: I'm workin gon reducing compile/link time for partition_vector tests