hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
nan77 has quit [Remote host closed the connection]
nikunj has quit [Ping timeout: 258 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 258 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 265 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 260 seconds]
nikunj has joined #ste||ar
hkaiser has quit [Quit: bye]
akheir has quit [Quit: Leaving]
weilewei has quit [Remote host closed the connection]
nikunj has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 240 seconds]
bita has quit [Quit: Leaving]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 240 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 240 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 265 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 264 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 258 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 250 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 264 seconds]
nikunj has joined #ste||ar
nikunj97 has joined #ste||ar
nikunj has quit [Ping timeout: 246 seconds]
<nikunj97> heller1, I'm finally done with the report and finalized everything. We can now begin with our work :)
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 250 seconds]
nikunj has joined #ste||ar
hkaiser has joined #ste||ar
nikunj97 has quit [Remote host closed the connection]
nikunj97 has joined #ste||ar
rtohid has joined #ste||ar
akheir has joined #ste||ar
nan77 has joined #ste||ar
weilewei has joined #ste||ar
bita has joined #ste||ar
karame_ has joined #ste||ar
<nikunj97> hkaiser, did we have the meet today?
<nikunj97> really sorry, I forgot to join
<hkaiser> nikunj97: np
<hkaiser> was a short meeting confirming what we discussed last time
<nikunj97> aah great!
<weilewei> get some fancy figure from nvsystem profiling tool on DCA
<hkaiser> heh, no parallelism :-o
<weilewei> hkaiser how do you tell?
<hkaiser> from what's on the image, it shows only 3 or 4 cores
<hkaiser> so there might be parallelism after all
<weilewei> I think I use 6 cores
<hkaiser> can you see all of those used in the perf-plots?
<weilewei> this one is a overview of this
<weilewei> cpu 31 is busy all the time
<weilewei> other cpus are not busy
<weilewei> I think I only sets 1 thread task, but asks for 7 cores in my run
<hkaiser> so no parallelism
<weilewei> Yes, it should be
<weilewei> because my nvlink code only works for 1 thread now
<hkaiser> nod
<weilewei> but still want to see how it performs. let me run larger tasks. then compare w/ and without nvlink
<hkaiser> bita: yt?
<hkaiser> bita: could we move our meeting back 15 mins, please?
<hkaiser> 1.15pm?
Hashmi has joined #ste||ar
<bita> hkaiser, sure
<hkaiser> bita: I'm ready whenever you are
<bita> I am gonna join now
diehlpk_work has quit [Ping timeout: 246 seconds]
<diehlpk_mobile[m> hkaiser: power outage in my neighborhood
<diehlpk_mobile[m> I will be available on my phone until the last few percent of my battery lasts
<hkaiser> diehlpk_mobile[m: ok
shahrzad has joined #ste||ar
<weilewei> a lot of hails dropping...
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 256 seconds]
<bita> thank you
shahrzad has quit [Remote host closed the connection]
shahrzad has joined #ste||ar
nikunj97 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 256 seconds]
diehlpk_work has joined #ste||ar
shahrzad has quit [Ping timeout: 272 seconds]
<diehlpk_work> We have power again :)
Hashmi has quit [Quit: Connection closed for inactivity]
akheir1 has joined #ste||ar
akheir has quit [Ping timeout: 260 seconds]
nikunj97 has quit [Read error: Connection reset by peer]
rtohid has left #ste||ar [#ste||ar]
<weilewei> with nvlink method, more time spent on updating G4 kernel function, also more time spent on doing memory copy, becasue we have more peer-to-peer copy
<weilewei> time spent on memory 50.3% (w/ nvlink), 2.5% (without nvlink)
<hkaiser> weilewei: why more memory copy operations? did you say things go directly to the device?
<weilewei> hkaiser I think when you perform MPI_Isend(), it is doing memory copy which copies remote G2 to local G2. Also, it is necessary for the ring algorithm, to copy local G2 to a local send buffer, so there are more copies happen
akheir1 has quit [Read error: Connection reset by peer]
akheir1 has joined #ste||ar
<hkaiser> weilewei: but what's the point of the nvlink, then?
<weilewei> hkaiser currently, I am testing the scenario where size of total G4 can still fit into one GPU. But when G2 becomes larger (from 30MB now to 1GB), then size of total G4 (=G2*G2) can no longer fit into one node, original DCA (without NVLINK) can no longer perform the computation
<weilewei> At that time, one can only solve memory bound issue with NVLINK, and then accumulating G4 also becomes impossible (according to DCA team, it is not necessary to accumulate G4 at that point)
<weilewei> So with NVLINk, one is suffering the penalty of many many many communication time (MPI_isend, etc.) but gains more memory usage.
<weilewei> hkaiser does that make sense? so for small size G4, using nvlink is no benefit at all; but when size increase (more science job can be done), then nvlink comes to the stage
<hkaiser> weilewei: ok
<weilewei> hkaiser thanks, we can talk more tmr when we have meeting
<hkaiser> ok
nikunj97 has joined #ste||ar
<weilewei> https://docs.google.com/document/d/1xMTDjybwAN1x8XX5aFeo3XnF1k0-P0PrKPArfrnv2KY/edit?usp=sharing about 200+ seconds spent on mpi_isend/recv, wait, barrier in total out of 960 seconds run
<weilewei> hkaiser
<weilewei> that's a lot of time in communication phase
<hkaiser> nod
<weilewei> in average, each mpi_wait is 0.006908625 seconds, but the max can go as high as 0.252(s), though it is no clear about standard deviation, but it is implicitly saying potential load imbalance. Sometimes wait long enough to receive G2
nikunj97 has quit [Read error: Connection reset by peer]
Nikunj__ has joined #ste||ar
Nikunj__ has quit [Ping timeout: 272 seconds]
shahrzad has joined #ste||ar