aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
eschnett has joined #ste||ar
mcopik_ has quit [Ping timeout: 240 seconds]
EverYoung has quit [Ping timeout: 255 seconds]
K-ballo has quit [Quit: K-ballo]
eschnett has quit [Quit: eschnett]
parsa has joined #ste||ar
StefanLSU has joined #ste||ar
hkaiser has quit [Quit: bye]
StefanLSU has quit [Quit: StefanLSU]
vamatya has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
StefanLSU has joined #ste||ar
vamatya has quit [Read error: Connection reset by peer]
vamatya has joined #ste||ar
StefanLSU has quit [Quit: StefanLSU]
vamatya has quit [Ping timeout: 260 seconds]
thundergroudon[m has quit [Ping timeout: 264 seconds]
taeguk[m] has quit [Ping timeout: 252 seconds]
thundergroudon[m has joined #ste||ar
taeguk[m] has joined #ste||ar
bikineev has joined #ste||ar
david_pfander has joined #ste||ar
jaafar has quit [Ping timeout: 252 seconds]
Matombo has joined #ste||ar
Matombo has quit [Remote host closed the connection]
<diehlpk_work>
Ok, I will wait for him. Just started to clean up the hpxcl repo
<mbremer>
@hkaiser: yt?
eschnett has joined #ste||ar
EverYoung has joined #ste||ar
<hkaiser>
mbremer: here
<mbremer>
Woohoo!
<hkaiser>
wazzup?
<mbremer>
So I'm having some trouble with the scaling now at 8 nodes. Idle-rates seem to be going up. I'm resubmitting jobs (but they're stuck in the queues) to see if overhead changes.
<hkaiser>
nod
<hkaiser>
network overhead impacts things significantly
<mbremer>
Presumably, it's some MPI artifact. I was looking at the cmake toolchain file I slapped together from the CrayCMake file
<hkaiser>
the mpi stuff does not work too well on knls
<mbremer>
artifact being the wrong word...
<hkaiser>
mbremer: cray network?
<mbremer>
Intel omnipath
<mbremer>
Right now I have this line commented out `set(HPX_WITH_PARCELPORT_MPI_MULTITHREADED OFF CACHE BOOL "")`
<hkaiser>
mbremer: jbjnr might be able to help getting the IB parcelport running for you
<mbremer>
should I keep it off?
<mbremer>
ahh
<mbremer>
yes
<mbremer>
that was another question
<mbremer>
who do I bother about that?
<hkaiser>
mbremer: talk to jbjnr
<hkaiser>
JOhn will know
<mbremer>
Will do
aserio has quit [Ping timeout: 240 seconds]
<mbremer>
Also #set(HPX_WITH_ZERO_COPY_SERIALIZATION_THRESHOLD "4096" CACHE STRING # "The threshhold in bytes to when perform zero copy optimizations (default: 128)")
<mbremer>
I have that commented out.
<mbremer>
Should I uncomment both of those settings?
<hkaiser>
the first one depends on the MPI library - heller might know more
<hkaiser>
the second one can be used to tune serialization, the best value there depends on many factors
<mbremer>
The mpi implementation is impi/17.0.3
<hkaiser>
again, jbjnr might be able to give more details
<hkaiser>
if having the MULTI_THREADED=ON works, fine, if not, set it to OFF
<mbremer>
Ah ok. I wasn't sure if you were settting it to off for performance reasons.
<hkaiser>
no
<hkaiser>
not that I'm aware of, but this is heller's baby, he knows best
<mbremer>
Well I'll bother jbjnr and heller when they're around
<hkaiser>
heller: is travelling, I think, jbjnr should be around
hkaiser has quit [Quit: bye]
bikineev has quit [Remote host closed the connection]
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
aserio has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 246 seconds]
aserio1 is now known as aserio
hkaiser has joined #ste||ar
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
<diehlpk_work>
hkaiser, Can I nest hpx::parallel::for_each ?
bikineev has joined #ste||ar
<hkaiser>
diehlpk_work: yes
<hkaiser>
might not be a good idea, however
bikineev has quit [Remote host closed the connection]
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
aserio has quit [Client Quit]
rod_t has left #ste||ar [#ste||ar]
parsa has joined #ste||ar
<K-ballo>
hkaiser: is #2829 (pack traversal) waiting for anything in particular?
<github>
[hpx] K-ballo force-pushed pack-short-circuit from 6a48e50 to d81f532: https://git.io/v762b
<hkaiser>
K-ballo: heller was working on CUDA and was complaining about the unwrap stuff breaking things
<hkaiser>
I was afraid of creating even more havoc by merging the traversal stuff before he didn't fix the unwrap problems
<K-ballo>
the unwrap stuff is breaking other things as well
<hkaiser>
nod
<K-ballo>
I fear if I mess with it that other PR wont even recognize the code anymore
parsa has quit [Quit: Zzzzzzzzzzzz]
bikineev has joined #ste||ar
<hkaiser>
should we merge it?
<hkaiser>
the pack traversal that is
<K-ballo>
not if it breaks more stuff..
<hkaiser>
I'm not sure about that
<K-ballo>
heller: have you kept any logs? I'd like to see if those errors you saw are the same kind we are encountering
<hkaiser>
K-ballo: I think he has a branch where he put his changes
<K-ballo>
the overload sets for both dataflow and unwrapped/ing are a complete mess, I won't be able to figure out things without putting some order to it first, but maybe denis can
<hkaiser>
ok
<hkaiser>
then I'd say let's rather wait
<github>
[hpx] K-ballo pushed 1 new commit to pack-short-circuit: https://git.io/v5xAj