2021-08-06 22:55
hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
00:05
<
gonidelis[m] >
hkaiser:
00:39
<
hkaiser >
gonidelis[m]: it's not specialized
00:41
<
gonidelis[m] >
yeah scratch that... i missed a closing `>` on my reading
00:50
rtohid[m] has joined #ste||ar
00:51
diehlpk has joined #ste||ar
00:56
diehlpk has quit [Ping timeout: 255 seconds]
00:58
diehlpk has joined #ste||ar
01:03
diehlpk has quit [Ping timeout: 255 seconds]
01:31
diehlpk has joined #ste||ar
01:35
diehlpk has quit [Ping timeout: 255 seconds]
02:04
diehlpk has joined #ste||ar
02:09
diehlpk has quit [Ping timeout: 255 seconds]
02:26
diehlpk has joined #ste||ar
02:26
diehlpk has quit [Client Quit]
02:51
Yorlik_ has joined #ste||ar
02:55
Yorlik__ has quit [Ping timeout: 265 seconds]
03:53
hkaiser has quit [Quit: Bye!]
05:52
Yorlik__ has joined #ste||ar
05:56
Yorlik_ has quit [Ping timeout: 264 seconds]
07:55
Yorlik__ is now known as Yorlik
12:10
hkaiser has joined #ste||ar
20:39
diehlpk has joined #ste||ar
21:33
tufei has joined #ste||ar
22:04
diehlpk has quit [Quit: Leaving.]
22:07
diehlpk has joined #ste||ar
22:10
diehlpk has quit [Client Quit]
22:31
weilewei has joined #ste||ar
22:31
<
hkaiser >
this call for instance did find two different tag_invoke implementations, the one from MWGraph and ours.
22:35
<
weilewei >
nvc++ std::execution::seq: 3.65969 sec. avarged over 10 runs.
22:35
<
weilewei >
nvc++ std::execution::par: 0.0699834 sec. avarged over 10 runs.
22:35
<
weilewei >
nvc++ std::execution::par_unseq: 0.0716834 sec. avarged over 10 runs.
22:35
<
weilewei >
hpx::execution::seq: 18.2115 sec. avarged over 10 runs.
22:35
<
weilewei >
hpx::execution::par: 0.277915 sec. avarged over 10 runs.
22:35
<
weilewei >
hpx::execution::par_unseq: 0.253258 sec. avarged over 10 runs.
22:35
<
weilewei >
kokkos::parallel_for transform: 0.562939 sec. avarged over 10 runs.
22:35
<
weilewei >
std::execution::seq: 16.2453 sec. avarged over 10 runs.
22:35
<
weilewei >
std::execution::par: 16.1129 sec. avarged over 10 runs.
22:35
<
weilewei >
std::execution::par_unseq: 16.3805 sec. avarged over 10 runs.
22:35
<
weilewei >
__gnu_parallel::transform: 0.460436 sec. avarged over 10 runs.
22:36
<
hkaiser >
par is not too good either :/
22:37
<
weilewei >
really? hpx par runs at 0.277915 sec and std par runs at 16.1129 sec
22:38
<
weilewei >
it seems hpx is faster. But I don't know why the std par is way too slow.
22:38
<
hkaiser >
but the nvcc version? is it running on gpu?
22:38
<
weilewei >
yes nvc++ is running on gpu
22:38
<
hkaiser >
ahh, so apples and bananas
22:39
<
weilewei >
yeah, ignore nvc++ for now. Later I will run Kokkos with nvc++ backend
22:39
<
hkaiser >
we have not necessarily optimized our seq execution
22:39
<
hkaiser >
I can have a look, however
22:39
<
weilewei >
ok, so hpx seq is slower than std seq, this is expected for now?
22:39
<
hkaiser >
also, you might want to try par_simd ;-) instead of par_unseq (which in our case is the same as par)
22:40
<
hkaiser >
Srinivas can help with that
22:40
<
weilewei >
ok, I will add par_simd to the todo list
22:40
<
weilewei >
if I would like to run hpx par on gpu, how does that work?
22:41
<
hkaiser >
or just simd for sequential with vectorization
22:41
<
hkaiser >
weilewei: we don't support that
22:41
<
weilewei >
adding Kokkos?
22:41
<
hkaiser >
not sure if kokkos-hpx supports that, ms[m]1 might know
22:43
<
hkaiser >
weilewei: btw, how many cores is that benchmark running on?
22:43
<
weilewei >
hmm I have not specified it yet
22:43
<
hkaiser >
how many cores has that node?
22:44
<
weilewei >
64 cpu cores, 256 processors
22:44
<
weilewei >
on Perlmutter
22:45
<
hkaiser >
so it's most likely running on 64 threads (try --hpx:print-bind)
22:46
<
weilewei >
127: PU L#254(P#127), Core L#127(P#63), Socket L#1(P#1), on pool "default"
22:49
<
hkaiser >
so 127 cores
22:50
<
weilewei >
why does it print locality 0 twice?
22:50
<
hkaiser >
that's 64 cores with 2 HT each
22:50
<
hkaiser >
I'd suggest specifying --hpx:threads=64
22:50
<
hkaiser >
hmmm. not sure - how do you launch the test?
22:51
<
hkaiser >
what's your slurm command?
22:51
<
hkaiser >
I'll have a look why it's printing things twice, probably a bug
22:51
<
weilewei >
#SBATCH -C gpu
22:51
<
weilewei >
#SBATCH -t 20:00
22:51
<
weilewei >
#SBATCH -N 1
22:51
<
weilewei >
#SBATCH --ntasks-per-node=1
22:51
<
weilewei >
#SBATCH -o parSTL.out
22:51
<
weilewei >
#SBATCH -e parSTL.err
22:51
<
weilewei >
cd /global/homes/w/wwei/src/parSTL
22:51
<
weilewei >
./scripts.sh
22:52
<
hkaiser >
yah, most likely a bug - I'll investigate
22:52
<
hkaiser >
don't run using HTs
22:52
<
weilewei >
ok, I will use --hpx:threads=64 to run experiments again
22:52
<
weilewei >
let's see how it changes
22:56
<
weilewei >
With --hpx:threads=64, it seems a bit slower than without hpx::execution::seq: 17.0798 sec. avarged over 10 runs.
22:56
<
weilewei >
hpx::execution::par: 0.334277 sec. avarged over 10 runs.
22:56
<
weilewei >
hpx::execution::par_unseq: 0.341671 sec. avarged over 10 runs.
22:57
<
hkaiser >
a bit worse, ok
22:57
<
weilewei >
let me wait for std results in the same job run
23:03
<
weilewei >
std::execution::seq: 16.138 sec. avarged over 10 runs.
23:03
<
weilewei >
std::execution::par: 16.0406 sec. avarged over 10 runs.
23:03
<
weilewei >
std::execution::par_unseq: 16.2172 sec. avarged over 10 runs.
23:04
<
weilewei >
well, hpx seq is close to std seq
23:19
weilewei has quit [Ping timeout: 260 seconds]
23:21
weilewei has joined #ste||ar
23:26
<
weilewei >
hkaiser if I want to learn more on sender/receiver work for HPX, who should be the point of contact?
23:29
<
hkaiser >
shreyas, myself, and certainly ms[m]1
23:30
<
weilewei >
Got it, I will learn a bit more for my next project :)
23:30
<
weilewei >
will reach out later for this regard
23:31
<
hkaiser >
weilewei: as said, we've never tried to make seq optimal
23:31
<
weilewei >
Got it, I will note it
23:42
weilewei has quit [Quit: Ping timeout (120 seconds)]