hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC2018: https://wp.me/p4pxJf-k1
jaafar has joined #ste||ar
jaafar has quit [Ping timeout: 240 seconds]
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] hkaiser created thread_local_allocator (+1 new commit): https://github.com/STEllAR-GROUP/hpx/commit/579a4aa65733
<ste||ar-github> hpx/thread_local_allocator 579a4aa Hartmut Kaiser: Adding thread local allocator and use it for future shared states
ste||ar-github has left #ste||ar [#ste||ar]
ste||ar-github has joined #ste||ar
ste||ar-github has left #ste||ar [#ste||ar]
<ste||ar-github> [hpx] hkaiser opened pull request #3406: Adding thread local allocator and use it for future shared states (master...thread_local_allocator) https://github.com/STEllAR-GROUP/hpx/pull/3406
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] hkaiser force-pushed thread_local_allocator from 579a4aa to 42f1259: https://github.com/STEllAR-GROUP/hpx/commits/thread_local_allocator
<ste||ar-github> hpx/thread_local_allocator 42f1259 Hartmut Kaiser: Adding thread local allocator and use it for future shared states
ste||ar-github has left #ste||ar [#ste||ar]
eschnett has joined #ste||ar
jaafar has joined #ste||ar
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] hkaiser closed pull request #3405: Adding DHPX_HAVE_THREAD_LOCAL_STORAGE=ON to builds (master...cci_hpxmp_img) https://github.com/STEllAR-GROUP/hpx/pull/3405
ste||ar-github has left #ste||ar [#ste||ar]
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] hkaiser deleted cci_hpxmp_img at 980a7ae: https://github.com/STEllAR-GROUP/hpx/commit/980a7ae
ste||ar-github has left #ste||ar [#ste||ar]
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: bye]
eschnett has quit [Quit: eschnett]
nanashi55 has quit [Ping timeout: 240 seconds]
nanashi55 has joined #ste||ar
jaafar has quit [Ping timeout: 240 seconds]
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] biddisco created gcc8_warnings (+1 new commit): https://github.com/STEllAR-GROUP/hpx/commit/732d999cff51
<ste||ar-github> hpx/gcc8_warnings 732d999 John Biddiscombe: Fix unused param and extra ; warnings emitted by gcc 8.x
ste||ar-github has left #ste||ar [#ste||ar]
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] biddisco opened pull request #3407: Fix unused param and extra ; warnings emitted by gcc 8.x (master...gcc8_warnings) https://github.com/STEllAR-GROUP/hpx/pull/3407
ste||ar-github has left #ste||ar [#ste||ar]
_bibek_ has joined #ste||ar
bibek has quit [Ping timeout: 256 seconds]
nikunj97 has joined #ste||ar
<heller> 8192 node job submitted
<heller> yay
david_pfander has joined #ste||ar
<zao> heller: Why do I want to get a small manycore ARM board to do indeterminate things with? :)
<heller> zao: burning money and energy?
<zao> We did consider grabbing a pile of pi:s for our cloud, but more proper boards might be more appropraite.
<heller> right
* zao heads out into the maybe-rain
<heller> I think the best reason to grab those arm boards is for the looks
<heller> zao: you got rain? I am still waiting on it :/
daissgr has joined #ste||ar
<zao> Possibly up to 15-20mm.
<heller> all cloudy and windy, but no water from the heavens
<heller> ha, at least something ;)
<heller> zao: as for the cloud: why not go with one single potent X86 or power server?
<heller> way more efficient, no?
<zao> Yeah, more a bit of hybrid work with “emerging tech” project
<zao> We run the cloud on old decommissioned cluster nodes and new Dell 1u machines, currently.
<zao> Not sure what arch people need for their silly cloud software.
<zao> I try my best to avoid it :)
jgolinowski has joined #ste||ar
<ms[m]1> jgolinowski: here now
<ms[m]1> Just the summary missing now?
<jgolinowski> ms[m]1, well I am thinking of including some screenshot of the MartyCam app
<jgolinowski> but yes, the summary is missing as well
<ms[m]1> Ok
<ms[m]1> Should I wait a bit still before reading it thoroughly?
<ms[m]1> Do you think you'll have it more or less finished today?
<jgolinowski> ms[m]1, so I think that what is there I want to be there
<jgolinowski> and more details about the app are somewhat orthogonal to what is there currently so I think you can read it as it is
<ms[m]1> OK, will do
<jgolinowski> so maybe give me 10 minutes for the summary
<jgolinowski> ms[m]1, also I wanted to somehow host the .html with the results of the DNN performance tests
<ms[m]1> Yeah, I'll read it lateat in a couple of hours
<ms[m]1> Could be nice
<jgolinowski> but the blogger seems not to be the best for that because it tries to embed the html as the blog post and it looks very ugly
<jgolinowski> So I thought maybe to put it under the stellar domain or sth like this?
<ms[m]1> It's not possible to somehow have a custom page or files uploaded to blogger?
<jgolinowski> ms[m]1, so there is always this template and when I just tried to wipe it out it did not allow me
<jgolinowski> (in blogger)
<jgolinowski> the "best" up to now is this
<ms[m]1> OK, that's not too bad but I can definitely upload it to the stellar page if you send me the file
<jbjnr> jgolinowski: can you send the link to your martycam repo please. I seem to have misplaced it
<jbjnr> I want to quickly test it
<jgolinowski> this is the repo
<jgolinowski> it is in examples/qt_hpx_opencv
<jgolinowski> I recently updated the readme so the build instructions should be up to date
<jgolinowski> jbjnr, ^
<jbjnr> nikunj97 has added support for runnint tests without using hpx::init and I'm using it on my dca project. It's nice. I have modified some headers to pull in the hpx_main when needed and apart fro theat tests that used to use std thread, now passs with hpx threads
<jbjnr> jgolinowski: thanks
<jbjnr> aha you did not clone martycam - is martycam just an example in there now?
<jbjnr> I had a lot of animal noise ooutside the h0ouse last night and want to get martycam running again to find out what it was/is next time
<jgolinowski> jbjnr, I cloned it initially but I needed an even smaller example at the beginning where I could copy the MartyCam piece by piece
<jgolinowski> since it is more or less done now
<jgolinowski> I can just paste the current version to my fork and update the PR
<jbjnr> cool
<jbjnr> just rebuilding opencv now
<heller> {what}: kernel launch failed: invalid device function: HPX(kernel_error)
<heller> shit
<heller> the HPX.Compute is totally broken
<heller> should I just ditch it from my thesis? Just talking about the possibility and not show any evaluation results?
<jgolinowski> ms[m]1, I also updated the gsoc project abstract (https://summerofcode.withgoogle.com/projects/#5375652711104512)
<jgolinowski> and the summary is ready to read
jgolinowski has quit [Quit: Leaving]
<jbjnr> heller: unless you want to spend x months fixing it and rewriting it, then just state it as another use case that is being developed for heterogeneous computing ...
<jbjnr> (with an outline of it's features/functionality)
<heller> yeah ... let's keep it that way
<heller> in that case: all measurements done
<heller> check
<heller> next: write the analysis of the results ;)
<heller> one more try though...
<heller> do we want to get HPX on compiler explorer?
<ms[m]1> heller: why not? ? how much work does it require though
<ms[m]1> jgolinowski_: just sent you an email with some small fixes
<heller> ms[m]1: I guess someone needs to write the PR
<heller> ms[m]1: if you are up to it, i'll ask matt godbolt for advice
<ms[m]1> heller: not immediately, but could be nice to have
<ms[m]1> a bash script, nice and simple
<heller> right
daissgr has quit [Ping timeout: 256 seconds]
daissgr has joined #ste||ar
<heller> regarding HPX.Compute: I'm just stupid and had the wrong compute architecture...
Vir has joined #ste||ar
daissgr has quit [Ping timeout: 256 seconds]
<heller> so yeah ... stream for GPUs is totally borked
<heller> hooray
K-ballo has joined #ste||ar
daissgr has joined #ste||ar
<jbjnr> why do I get these arrors when I compile openv with hpx, but not when I compile hpx itself. https://gist.github.com/biddisco/0d5acde86227909c7d964c64d18a06f9 I think all my flags are consistent.
<jbjnr> ^opencv
<heller> jbjnr: looks like a -std=c++11 got into somehow
<jbjnr> that's what I thought.
<jbjnr> it's laughable. Thereas a -std=c++14, then later a -std=c++17 and then -std=c++11 all on the same command line. What a mess
<K-ballo> how many flags is enough flags?
hkaiser has joined #ste||ar
eschnett has joined #ste||ar
aserio has joined #ste||ar
jbjnr has quit [Ping timeout: 240 seconds]
<heller> hkaiser: FYI, I just succesfully completed a 2048 node job
<hkaiser> heller: nice
<hkaiser> any problems?
<heller> hkaiser: if all goes according to plan, I'll have a 8k node job finished by the end of the night
<heller> hkaiser: long queue times ;)
<hkaiser> sure
<heller> will have the performance data in a few hours
<hkaiser> nod
<hkaiser> I created that PR with the thread allocator
<hkaiser> still WIP
<heller> yeah, saw it
<heller> will experiment with it next week
<heller> finishin the graphs and the text is my top priority
<hkaiser> sure, no rush - I'll be out starting today for a week anyways
<heller> also, our CUDA stuff is a total mess, the performance is more than awful
<heller> ok
<heller> an order of magnitude performance difference
<heller> to a complete native implementation
<hkaiser> ok, not surprised
<heller> I am
<heller> we had it almost on par when we started this...
<hkaiser> nobody has looked into it for over a year
<hkaiser> god knows what has changed
<hkaiser> dominic will start working with us in september
<heller> makes the case for a good performance regression testing infrastructure even clearer ...
<hkaiser> one goal is to organize another GB submission
<heller> with dominic as the lead?
<hkaiser> this should give us enough incentives to look into perf in many directions
<hkaiser> also, in phylanx Kevin is setting up perf regression testing, we can reuse that as well
<zao> Are there any neat EU conferences this fall I should trick my people into sending me to?
<heller> hkaiser: yeah, I know, we need that a year ago :P
<K-ballo> just one year?
<hkaiser> heh
<heller> would be pretty interesting to replay the performance of our commit history ...
<hkaiser> sure, can be done
<hkaiser> should be even very useful ;-)
<heller> if we hadn't commits that break the build, we could leverage git bisect
<hkaiser> git bisect supports bad commits
<hkaiser> but requires manual intervention
<heller> sure, but if something breaks the compilation, is it a performance regression?
<heller> hard to tell...
<hkaiser> no, it isn't
<hkaiser> if it doesn't run, just go to the next commit
<heller> sure, but in which direction?
<hkaiser> git bisect knows
<heller> it knows only because you mark the commit as good or bad
<heller> something that doesn't compile is neither
<hkaiser> I think you can tell it to ignore the current commit
<hkaiser> git bisect skip
<heller> ahh, great
<heller> not perfect though. chooses the next commit randomly
<heller> better than nothing though
<heller> hkaiser: from 2048 nodes to 3072, I got a 1.46x speedup :D
<heller> I'd call that 97% parallel efficiency :D
<heller> that's with the stencil code from the tutorials, btw
<heller> this will potentially be our first petaflop run ...
<hkaiser> heller: GB was a PF run, no?
<heller> hkaiser: we have no idea, since noone cared to actually put up a performance model
<hkaiser> right
<heller> and IIRC, the estimates were short of a tera flops
<heller> anyways
<hkaiser> so not everything is bad after all
<heller> from a single node, to 3072 nodes, I have a parallel efficiency of 95%
<heller> because that's how awesome this shit is
<hkaiser> is that with your pp modifications?
<heller> no, this is with vanilla master
<hkaiser> cool
<heller> but LF parcelport :D
<hkaiser> yah sure
<hkaiser> figures
<heller> well, it's a simple stencil, regular communication etc. not much overlap
<hkaiser> so it seems to work ;-) I was never sure
<heller> the speedup of oversubscription is 1.1x
<heller> and the exchange is a single, contigous block of memory of 10k doubles
<heller> still a nice PoC
<heller> IMHO
<heller> on the X86 cluster with OmniPath, i got a 99% efficiency for 256 nodes
<heller> the efficiency in general on the KNL is pretty bad though
<hkaiser> nice
<hkaiser> as expected
<hkaiser> they killed it for a reason
<heller> x86: 1.05991e+06, KNL: 907380
<heller> (MLUPS, on 256 nodes)
<heller> so looks like my compiler didn't vectorize :/
_bibek_ is now known as bibek
hkaiser has quit [Quit: bye]
eschnett has quit [Quit: eschnett]
daissgr has quit [Quit: WeeChat 1.9.1]
david_pfander has quit [Quit: david_pfander]
daissgr has joined #ste||ar
eschnett has joined #ste||ar
<K-ballo> zao: meetingcpp?
<zao> K-ballo: tried that before, not relevant enough for our operations
<zao> Also, I really don’t like that Jens guy :)
daissgr has quit [Quit: WeeChat 1.9.1]
Vir has quit [Ping timeout: 265 seconds]
Vir has joined #ste||ar
<diehlpk_work> zao, What kind of conferences you are looking for?
Vir has quit [Ping timeout: 256 seconds]
jaafar has joined #ste||ar
<zao> Something related to HPC but still not horribly boring. We’re sending people to pgconf, ganeticon, OpenStack Summit
<zao> Meeting C++ didn’t have any HPC track or any one that could be motivated to not just be regular dev
<heller> zao: cluster2018?
<diehlpk_work> zao, https://fosdem.org/2019/
<diehlpk_work> The HPC dev room there is interesting
aserio has quit [Ping timeout: 255 seconds]
<zao> diehlpk_work: I’d love to go to the easybuild meetup which is adjacent to fosdem, but I’m obligated to go to Norway to ski with NeIC that time of year :D
<zao> But good suggestion otherwise
<heller> 14086892 PD sithhell stencil_we* 8192 2:00:00 0:00 2018-08-10T00:30:01 regular_0 avail_in_~1.3_hrs knl&quad&cache Resources
aserio has joined #ste||ar
<aserio> zao: yt?
<zao> aserio: not quite here, lounging
<zao> Saw your mail but didn’t manage to compose a reply just yet
<aserio> ah
<aserio> well then nvm :)
<aserio> Done for the day (the lounging part)
<aserio> ?
<zao> Last few days of vacation, spending them at parents and siblings, so no downtime :)
<aserio> I understand that!
<zao> I’d love a HPX t-shirt, as long as I get to use the slogan “the best library no-one can use” :P
<zao> I’ll mail you shortly, eating
Vir has joined #ste||ar
<diehlpk_work> zao, I understand, I know one of the maintainers