aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
parsa has joined #ste||ar
mcopik_ has quit [Ping timeout: 264 seconds]
EverYoung has quit [Ping timeout: 240 seconds]
parsa has quit [Quit: Zzzzzzzzzzzz]
<wash> I heard the word CUDA
parsa has joined #ste||ar
hkaiser has quit [Quit: bye]
parsa has quit [Quit: Zzzzzzzzzzzz]
bikineev has quit [Remote host closed the connection]
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
parsa has quit [Quit: Zzzzzzzzzzzz]
vamatya has joined #ste||ar
vamatya has quit [Ping timeout: 248 seconds]
AnujSharma has joined #ste||ar
bikineev has joined #ste||ar
AnujSharma has quit [Ping timeout: 240 seconds]
AnujSharma has joined #ste||ar
jbjnr has left #ste||ar [#ste||ar]
jbjnr has joined #ste||ar
david_pfander has joined #ste||ar
heller has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
heller has joined #ste||ar
AnujSharma has quit [Ping timeout: 246 seconds]
Matombo has joined #ste||ar
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/v5pgn
<github> hpx/gh-pages c773a03 StellarBot: Updating docs
AnujSharma has joined #ste||ar
bikineev has quit [Remote host closed the connection]
mcopik_ has joined #ste||ar
pree has joined #ste||ar
hkaiser has joined #ste||ar
Matombo has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
<github> [hpx] hkaiser pushed 2 new commits to master: https://git.io/v5pdt
<github> hpx/master 3e9049c Mario Lang: Fix typo in include path
<github> hpx/master 22327cf Hartmut Kaiser: Merge pull request #2909 from mlang/typo...
<github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/v5pdm
<github> hpx/master 2e05958 Hartmut Kaiser: Merge pull request #2878 from STEllAR-GROUP/fix_rp_again...
parsa has joined #ste||ar
<diehlpk_work> Why is nesting parallel for loos not a good idea?
mcopik_ has quit [Ping timeout: 240 seconds]
bikineev has joined #ste||ar
bikineev has quit [Ping timeout: 240 seconds]
bikineev has joined #ste||ar
<heller> mbremer: the idle rates will go up once you go distributed as the message retrieval and sending part is counting as idle rate
<heller> mbremer: regarding multithreaded on or off, that currently doesn't have an effect, IIRC
hkaiser has quit [Quit: bye]
AnujSharma has quit [Quit: Leaving]
aserio has joined #ste||ar
mcopik_ has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 264 seconds]
aserio1 is now known as aserio
hkaiser has joined #ste||ar
bikineev has quit [Ping timeout: 240 seconds]
bikineev has joined #ste||ar
eschnett has quit [Quit: eschnett]
troska1 has joined #ste||ar
troska1 is now known as troska_
eschnett has joined #ste||ar
david_pfander has quit [Ping timeout: 240 seconds]
rod_t has joined #ste||ar
hkaiser has quit [Read error: Connection reset by peer]
aserio has quit [Ping timeout: 246 seconds]
aserio has joined #ste||ar
<github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/v5hss
<github> hpx/master 0bc6e4e Hartmut Kaiser: Adding missing [endsect] to docs
parsa has quit [Quit: Zzzzzzzzzzzz]
<wash[m]> aserio: meeting today?
<heller> finally finished my slides!
<aserio> yes
<heller> let's see how fast the coffee shop uplink is ;)
<aserio> heller: lol are you presenting in an hour
<heller> aserio: no
<heller> aserio: the talk is tomorrow
<heller> aserio: also, 1:30 pm PDT
<aserio> :0 early!?!
<heller> sorry, 2:20
<heller> tomorrow is thursday the 21st, right?
<heller> aserio: who is the organizing genius now?
parsa has joined #ste||ar
<heller> aserio: also, timezones and shit
<heller> aserio: how do you conclude that the talk is in one hour though?
<heller> aserio: you are making me nervous now
akheir has joined #ste||ar
<aserio> heller: because you just finished it :p
<heller> ahh, lol
<heller> yeah, that makes sense
<heller> I am growing older you know ;)
<wash[m]> Don't believe I am muted aserio
<heller> aserio: the mardis gras talk was very special there, where my results graph was rendered when I walked up to the stage
<wash[m]> Aserio: is my audio working now?
<aserio> If you are talking we cannot hear it
hkaiser has joined #ste||ar
<aserio> Give me a second
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
<wash[m]> Yah I think the issue on your end, hm
<aserio> We are working on it
EverYoung has joined #ste||ar
<aserio> wash[m]: say something
vamatya has joined #ste||ar
<zbyerly_> aserio, can you call me?
akheir has quit [Remote host closed the connection]
hkaiser has quit [Quit: bye]
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
aserio has quit [Ping timeout: 246 seconds]
aserio has joined #ste||ar
bikineev has quit [Ping timeout: 240 seconds]
akheir has joined #ste||ar
hkaiser has joined #ste||ar
Vir has quit [Ping timeout: 240 seconds]
bikineev has joined #ste||ar
<heller> hkaiser: just sent a new PDF, could you please try again?
akheir has quit [Remote host closed the connection]
<github> [hpx] hkaiser created process_error_reporting (+1 new commit): https://git.io/v5hg4
<github> hpx/process_error_reporting 0f4b982 Hartmut Kaiser: Improve error reporting for process component on POSIX systems
<hkaiser> heller: still doesn't work :/
<heller> hkaiser: hmmm
<heller> you are using acrobat reader, I assume?
<heller> might be the missing movie files?
<github> [hpx] hkaiser opened pull request #2910: Improve error reporting for process component on POSIX systems (master...process_error_reporting) https://git.io/v5hgr
<heller> stupid latex
<github> [hpx] hkaiser force-pushed process_error_reporting from 0f4b982 to 8698202: https://git.io/v5hg5
<github> hpx/process_error_reporting 8698202 Hartmut Kaiser: Improve error reporting for process component on POSIX systems
<hkaiser> yah acrobat reader
akheir has joined #ste||ar
<hkaiser> sorry for the trouble...
<heller> np
EverYoung has quit [Ping timeout: 255 seconds]
<hkaiser> yah, thanks!
<heller> great
<heller> feedback welcome!
<heller> brb
EverYoung has joined #ste||ar
akheir has quit [Remote host closed the connection]
akheir has joined #ste||ar
bikineev has quit [Read error: No route to host]
bikineev has joined #ste||ar
<diehlpk_work> What is the easiest way to get a future from a parrallel loop?
<diehlpk_work> Here is the example
<diehlpk_work> When dilitation is finished force_int and one other method can be executed and I like to get a future of the two dependent methods for synchronization
StefanLSU has joined #ste||ar
Vir has joined #ste||ar
<heller> diehlpk_work: par(task)
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
aserio has quit [Ping timeout: 246 seconds]
akheir has quit [Remote host closed the connection]
<heller> hkaiser: woohoo! look at that massive green!
<K-ballo> heller: ping
<heller> K-ballo: pong
<K-ballo> heller: do you remember the kind of issues you were seeing with cuda and the pack traversal stuff?
<heller> yes
<heller> they are fine in general for CUDA
aserio has joined #ste||ar
<heller> the problem occurs once they are instantiated from a static, file scope context
<heller> this reminds me ... I need to push this PR
<github> [hpx] sithhell pushed 1 new commit to cuda_clang: https://git.io/v5hXV
<github> hpx/cuda_clang 27549ee Thomas Heller: Merge branch 'master' into cuda_clang
<github> [hpx] sithhell opened pull request #2911: Fixing CUDA problems (master...cuda_clang) https://git.io/v5hXi
<heller> K-ballo: meaning, you can go ahead with your fixes
<heller> it's a problem in the EDG frontend of nvcc
<heller> the only way around is to not compile this stuff with nvcc
<K-ballo> I only have problems at the moment, not fixes
<K-ballo> there's some subtle issues with unwrapping when interacting with dataflow, some instantiation point related issues, recursive instantiation, or similar
<heller> mhm
<heller> the unwrapping code is very obscur to me :/
<heller> didn't have the time to fully grasp it yet
<heller> regarding recursive instantiations
<heller> this was supposed the major benefit of the new unwrapping code. might be that some specialization or so is missing so you end up with infinite recursion at some point
<K-ballo> I suspect something of the sort, but both dataflow and unwrapping are cryptic \to follow
<heller> yeah...
eschnett has quit [Quit: eschnett]
StefanLSU has quit [Quit: StefanLSU]
pree has quit [Ping timeout: 248 seconds]
<K-ballo> heller: so is #2829 ok to merge now?
<mbremer> @heller: Just got to my PC. That's interesting about the idle rates. Is that becuase the thread is being suspended while the network controller copies messages over or is it just not being able to hide send latencies?
<mbremer> Also does tuning HPX_WITH_ZERO_COPY_SERIALIZATION_THRESHOLD matter?
<hkaiser> mbremer: it should, but might require some parameter sweeps
<mbremer> But it is presumably only along that one variable? I don't mind running parameter sweeps. Is it application dependent or more of a function of the network?
pree has joined #ste||ar
eschnett has joined #ste||ar
<hkaiser> mbremer: it decides at what threshold array's are handled using a zero-copy scheme during serialization
<hkaiser> this adds some overhead for small arrays but helps for larger ones
<hkaiser> the current threshold is pulled out of thin air ;)
<mbremer> @hkaiser: kk, well I'll play around with it some then. Thanks!
EverYoun_ has quit [Remote host closed the connection]
<hkaiser> mbremer: jbjnr mentioned at some point that the current value is way too small for MPI, even more so for IB
EverYoung has joined #ste||ar
<hkaiser> mbremer: this will have an effectonly if you send arrays of trivially copyable types (i.e. doubles or so)
<mbremer> That should help then. I'm only sending vectors around.
<mbremer> @hkaiser: Also have you looked at NSF's ECSS https://www.xsede.org/for-users/ecss
<mbremer> I was think it might be interesting to submit something to tune HPX performance?
pree has quit [Ping timeout: 240 seconds]
<hkaiser> mbremer: nice idea
<hkaiser> had not seen this before
pree has joined #ste||ar
<github> [hpx] aserio pushed 2 new commits to add_checkpoint: https://git.io/v5h77
<github> hpx/add_checkpoint 4526e9b aserio: Adding checkpoint to doxygen
<github> hpx/add_checkpoint ccf8f8d aserio: Edit ot hpx.idx
patg[[w]] has joined #ste||ar
pree has quit [Ping timeout: 240 seconds]
<mbremer> @hkaiser: Yeah it's nice, because they can pay one of their people to work on tuning stuff. I might talk to clint about it today as well. I'm not 100% HPX and stampede2 are set up optimally for one another at the moment
<hkaiser> good thinking
<zbyerly> mbremer, hkaiser i just got something back from TACC
<zbyerly> and i quote: "But I should warn you -- nobody here, not even the avx512 instruction set experts -- have done anything at all with gcc on KNL. "
<zbyerly> are you guys building with Intel?
<zao> My site only has GROMACS 2016.3 w/ GCC on the KNLs. Everything else is Intel 2017.early
<mbremer> zbyerly: Everything we've been doing is with gcc/7.1.0
<zao> No-one insane enough to desire HPX here yet.
pree has joined #ste||ar
<mbremer> Can you even build hpx with intel 17?
pree has quit [Quit: AaBbCc]
<K-ballo> mbremer: no, the intel 17 build is broken at the moment
<heller> mbremer: regarding idle rates, none of the above. We handle the messages in a background thread. The execution of this is currently considered as"idle"
<mbremer> That's what I remember hearing (vis-a-vis intel).
<heller> mbremer: regarding bad performance in distributed, the first thing you might want to do is to change small functions, like getter and setter to direct actions
<mbremer> @heller: Documentation? Right now I'm using channels and then just using a continuation to set what gets pulled out of the channel.
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
bikineev has quit [Ping timeout: 248 seconds]
<heller> mbremer: so you only use channels at the moment?
<heller> mbremer: no other components?
hkaiser has quit [Quit: bye]
<mbremer> Yup. It's an unstructured stencil kernel. I have submeshes which are components. But stencils only send messages via the channels
<heller> can you show some code?
<mbremer> Sure.
<mbremer> Let me add you to the repo. And I'll try to send you the salient snippets
<heller> direct actions are currently not documented :/
<heller> alright
<heller> that works
<heller> are you happy with the single node performance now?
<mbremer> Yeah I think so
<mbremer> The grain size still has to be a little large for what I would prefer
<mbremer> also 2 node performance seems good
<heller> alrighty
<heller> what should I build, and how would I run it?
<mbremer> But at 8 nodes it seems to start degrading. But that might be just effects that aren't showing up
<heller> sure
<heller> let me have a look
<mbremer> (I also added you to the repo, so you should have read access)
<heller> I do
<heller> thanks
<mbremer> And that's the the perform_one_timestep call
<mbremer> Everything below that level should just be flops
<heller> ok
<mbremer> I'm happy tos end you meshes and build instructions if the code hasn't scared you away yet :)
<heller> mbremer: do I need this occa stuff?
<heller> mbremer: I am never scared :P
<mbremer> No
<mbremer> Oh
<mbremer> Sorry make sure you're on the hpx or hpx_dev branch
<heller> right
<heller> just switched
<heller> do I need a BLAS library?
<mbremer> Yeah
<mbremer> I think it's blas...
<mbremer> (or lapack)
<mbremer> Are you trying to run on a Cray machine?
<heller> no, locally for now
<heller> well, on one of my test machines
<mbremer> How much memory? I probably need to send you a reasonably sized problem
<heller> mbremer: 128 GB
<heller> 16 cores
<mbremer> Ooo nice.
<mbremer> Ok just a second. I'll send you a problem (it's not too large, but it does give me good scaling on 1 KNL node)
<mbremer> @heller: Actually I have a meeting in 3 minutes. I'll ping you afterwards (I'm terribly sorry)
<heller> mbremer: ok, no problem
<heller> I'll try to get it built first
<mbremer> Should be a normal cmake build process
<heller> yes
<heller> building lapack at the moment
<mbremer> Set the prefix path to the hpx library... and it "should" work
mbremer has quit [Quit: Page closed]
hkaiser has joined #ste||ar
Matombo has joined #ste||ar
StefanLSU has joined #ste||ar
<hkaiser> rod_t: yt?
<rod_t> hkaiser: Yes!
<hkaiser> rod_t: hey
<hkaiser> one sec
<hkaiser> rod_t: could you install python setuptools in the docker image, pls?
<rod_t> hkaiser: sure.
<hkaiser> rod_t: thanks!
<github> [hpx] aserio created update_contributors (+2 new commits): https://git.io/v5jJ5
<github> hpx/update_contributors 8a243a3 aserio: Fixing emails and updating list...
<github> hpx/update_contributors 816a971 aserio: Fixing Bryce's email
<github> [hpx] aserio opened pull request #2912: Update contributors (master...update_contributors) https://git.io/v5jJA
<github> [hpx] sithhell created channel_direct (+1 new commit): https://git.io/v5jUi
<github> hpx/channel_direct 9b76d1d Thomas Heller: Changing channel actions to be direct
<rod_t> hkaiser: on phylanx image, right?
<hkaiser> yes, please - on both Dockerfile's
<rod_t> hkaiser: but it's already installed on the images, isn't it?
<rod_t> hkaiser: are you sure you're using python3?
<hkaiser> -- Found PythonInterp: /usr/bin/python3.5 (found version "3.5.3")
bikineev has joined #ste||ar
<hkaiser> rod_t: in the mean time I have added a workaround (https://github.com/STEllAR-GROUP/phylanx/blob/be47c6b318cc6dcb50aede8c1d10474d8dbd9b44/cmake/templates/setup.py.in#L15-L20), you'd need to back that out if you want to try things
<hkaiser> I'd like to stick with setup tools as the distutils stuff does not reall work as needed
<heller> hkaiser: the hwloc error on circle-ci is really annoying ... should we turn hwloc of for circle?
<hkaiser> heller: the rp branch relies on hwloc, we will need to make it mandatory
mbremer has joined #ste||ar
<hkaiser> heller: are you in SF already?
<heller> yes
<hkaiser> cool
<heller> sitting at NERSC right now
<hkaiser> say hello to Alice
<heller> we have been mentioned once so far
<heller> not possible, she is in new york
<hkaiser> :D
<hkaiser> ahh, ok
<heller> second one coming up ;)
<hkaiser> say hello to John Shalf and Kathy Yellic, then ;)
<heller> I will
<heller> no idea how they look like :P
<heller> so far, no one talked to me yet ;)
<hkaiser> heh
<heller> wearing my STE||AR shirt though ;)
mbremer has quit [Client Quit]
<heller> representing!
<hkaiser> nice!
<hkaiser> she is mrs. pgas
<hkaiser> and john is generally a nice guy
<hkaiser> both fairly high up in LBNLs foodchain
<heller> *nod*
<heller> we'll see. dinner tonight
StefanLSU has quit [Quit: StefanLSU]
<heller> second mention
<hkaiser> by whom?
<heller> thorsten kurth
<heller> first mention was by the director
<hkaiser> cool - so you know whom to talk to wrt a job ;)
<heller> :P
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
<github> [hpx] aserio pushed 1 new commit to add_checkpoint: https://git.io/v5jqd
<github> hpx/add_checkpoint 7189f7d aserio: Expose iteraors for access to the data in checkpoint...
<rod_t> hkaiser: I'm not sure what exactly is the problem, I pulled stellargroup/phylanx:dev and checked on my local machine. the setuptools is installed for python 3 but not 2. could `python` command pointing to python2 be the cause of the problem? or the absolute path to python3 is used?
<github> [hpx] sithhell pushed 1 new commit to cuda_clang: https://git.io/v5jmz
<github> hpx/cuda_clang 2da9927 Thomas Heller: Moving source file to correct directory
bikineev has quit [Remote host closed the connection]
aserio has quit [Quit: aserio]
<github> [hpx] sithhell pushed 1 new commit to throttle_cores: https://git.io/v5jYj
<github> hpx/throttle_cores b198581 Thomas Heller: Merge branch 'master' into throttle_cores...
<heller> K-ballo: looking at #2829 right now. It hasn't been approved yet. Regarding the CUDA problems, I don't object
StefanLSU has joined #ste||ar
StefanLSU has quit [Client Quit]
Matombo has quit [Remote host closed the connection]
<github> [hpx] sithhell pushed 1 new commit to master: https://git.io/v5j3r
<github> hpx/master 157d3a0 Thomas Heller: Merge pull request #2900 from STEllAR-GROUP/numa_balanced...
patg[[w]] has quit [Ping timeout: 240 seconds]
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
EverYoun_ has quit [Ping timeout: 264 seconds]
parsa has quit [Quit: Zzzzzzzzzzzz]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 240 seconds]