#ste||ar on 2020-04-23 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

00:12 nan77 has quit [Remote host closed the connection]

00:47 nikunj has quit [Ping timeout: 258 seconds]

00:53 nikunj has joined #ste||ar

01:15 nikunj has quit [Ping timeout: 258 seconds]

01:23 nikunj has joined #ste||ar

01:35 nikunj has quit [Ping timeout: 265 seconds]

01:35 nikunj has joined #ste||ar

02:46 nikunj has quit [Ping timeout: 260 seconds]

02:54 nikunj has joined #ste||ar

02:59 hkaiser has quit [Quit: bye]

03:00 akheir has quit [Quit: Leaving]

03:00 weilewei has quit [Remote host closed the connection]

03:06 nikunj has quit [Ping timeout: 256 seconds]

03:11 nikunj has joined #ste||ar

03:27 nikunj has quit [Ping timeout: 240 seconds]

03:36 bita has quit [Quit: Leaving]

03:38 nikunj has joined #ste||ar

03:45 nikunj has quit [Ping timeout: 240 seconds]

03:57 nikunj has joined #ste||ar

04:43 nikunj has quit [Ping timeout: 256 seconds]

04:53 nikunj has joined #ste||ar

05:00 nikunj has quit [Ping timeout: 240 seconds]

05:09 nikunj has joined #ste||ar

06:00 nikunj has quit [Ping timeout: 256 seconds]

06:00 nikunj has joined #ste||ar

06:05 nikunj has quit [Ping timeout: 265 seconds]

06:06 nikunj has joined #ste||ar

07:06 nikunj has quit [Ping timeout: 264 seconds]

07:08 nikunj has joined #ste||ar

09:06 nikunj has quit [Ping timeout: 258 seconds]

09:10 nikunj has joined #ste||ar

09:42 nikunj has quit [Ping timeout: 250 seconds]

09:43 nikunj has joined #ste||ar

10:15 nikunj has quit [Ping timeout: 264 seconds]

10:55 nikunj has joined #ste||ar

11:34 nikunj97 has joined #ste||ar

11:37 nikunj has quit [Ping timeout: 246 seconds]

11:41 <nikunj97> heller1, I'm finally done with the report and finalized everything. We can now begin with our work :)

12:10 nikunj has joined #ste||ar

12:19 nikunj has quit [Ping timeout: 256 seconds]

12:24 nikunj has joined #ste||ar

12:41 nikunj has quit [Ping timeout: 250 seconds]

12:44 nikunj has joined #ste||ar

12:48 hkaiser has joined #ste||ar

13:15 nikunj97 has quit [Remote host closed the connection]

13:24 nikunj97 has joined #ste||ar

13:31 rtohid has joined #ste||ar

14:08 akheir has joined #ste||ar

14:21 nan77 has joined #ste||ar

14:56 weilewei has joined #ste||ar

15:00 bita has joined #ste||ar

15:12 karame_ has joined #ste||ar

16:21 <nikunj97> hkaiser, did we have the meet today?

16:22 <nikunj97> really sorry, I forgot to join

16:22 <hkaiser> nikunj97: np

16:22 <hkaiser> was a short meeting confirming what we discussed last time

16:22 <nikunj97> aah great!

16:39 <weilewei> https://imgur.com/a/VGZF6VI

16:39 <weilewei> get some fancy figure from nvsystem profiling tool on DCA

16:41 <hkaiser> heh, no parallelism :-o

16:42 <weilewei> hkaiser how do you tell?

16:43 <hkaiser> from what's on the image, it shows only 3 or 4 cores

16:43 <hkaiser> so there might be parallelism after all

16:45 <weilewei> I think I use 6 cores

16:48 <hkaiser> can you see all of those used in the perf-plots?

16:49 <weilewei> https://imgur.com/iUvhMcw

16:49 <weilewei> this one is a overview of this

16:50 <weilewei> cpu 31 is busy all the time

16:51 <weilewei> other cpus are not busy

16:57 <weilewei> I think I only sets 1 thread task, but asks for 7 cores in my run

17:02 <hkaiser> so no parallelism

17:03 <weilewei> Yes, it should be

17:03 <weilewei> because my nvlink code only works for 1 thread now

17:04 <hkaiser> nod

17:05 <weilewei> but still want to see how it performs. let me run larger tasks. then compare w/ and without nvlink

17:31 <hkaiser> bita: yt?

17:31 <hkaiser> bita: could we move our meeting back 15 mins, please?

17:31 <hkaiser> 1.15pm?

17:47 Hashmi has joined #ste||ar

17:49 <bita> hkaiser, sure

17:54 <hkaiser> bita: I'm ready whenever you are

17:54 <bita> I am gonna join now

17:57 diehlpk_work has quit [Ping timeout: 246 seconds]

17:59 <diehlpk_mobile[m> hkaiser: power outage in my neighborhood

18:00 <diehlpk_mobile[m> I will be available on my phone until the last few percent of my battery lasts

18:01 <hkaiser> diehlpk_mobile[m: ok

18:05 shahrzad has joined #ste||ar

18:08 <weilewei> a lot of hails dropping...

18:11 Nikunj__ has joined #ste||ar

18:16 nikunj97 has quit [Ping timeout: 256 seconds]

18:34 <hkaiser> bita: see https://github.com/STEllAR-GROUP/hpx/issues/4554

18:35 <bita> thank you

18:40 shahrzad has quit [Remote host closed the connection]

18:41 shahrzad has joined #ste||ar

18:46 nikunj97 has joined #ste||ar

18:49 Nikunj__ has quit [Ping timeout: 256 seconds]

19:07 diehlpk_work has joined #ste||ar

19:08 shahrzad has quit [Ping timeout: 272 seconds]

19:22 <diehlpk_work> We have power again :)

20:26 Hashmi has quit [Quit: Connection closed for inactivity]

20:28 akheir1 has joined #ste||ar

20:31 akheir has quit [Ping timeout: 260 seconds]

20:59 nikunj97 has quit [Read error: Connection reset by peer]

21:03 rtohid has left #ste||ar [#ste||ar]

21:05 <weilewei> https://imgur.com/a/89F6pjl

21:06 <weilewei> with nvlink method, more time spent on updating G4 kernel function, also more time spent on doing memory copy, becasue we have more peer-to-peer copy

21:07 <weilewei> time spent on memory 50.3% (w/ nvlink), 2.5% (without nvlink)

21:27 <hkaiser> weilewei: why more memory copy operations? did you say things go directly to the device?

21:31 <weilewei> hkaiser I think when you perform MPI_Isend(), it is doing memory copy which copies remote G2 to local G2. Also, it is necessary for the ring algorithm, to copy local G2 to a local send buffer, so there are more copies happen

21:31 akheir1 has quit [Read error: Connection reset by peer]

21:32 akheir1 has joined #ste||ar

21:50 <hkaiser> weilewei: but what's the point of the nvlink, then?

21:53 <weilewei> hkaiser currently, I am testing the scenario where size of total G4 can still fit into one GPU. But when G2 becomes larger (from 30MB now to 1GB), then size of total G4 (=G2*G2) can no longer fit into one node, original DCA (without NVLINK) can no longer perform the computation

21:54 <weilewei> At that time, one can only solve memory bound issue with NVLINK, and then accumulating G4 also becomes impossible (according to DCA team, it is not necessary to accumulate G4 at that point)

21:56 <weilewei> So with NVLINk, one is suffering the penalty of many many many communication time (MPI_isend, etc.) but gains more memory usage.

21:59 <weilewei> hkaiser does that make sense? so for small size G4, using nvlink is no benefit at all; but when size increase (more science job can be done), then nvlink comes to the stage

22:04 <hkaiser> weilewei: ok

22:04 <weilewei> hkaiser thanks, we can talk more tmr when we have meeting

22:05 <hkaiser> ok

22:35 nikunj97 has joined #ste||ar

22:35 <weilewei> https://docs.google.com/document/d/1xMTDjybwAN1x8XX5aFeo3XnF1k0-P0PrKPArfrnv2KY/edit?usp=sharing about 200+ seconds spent on mpi_isend/recv, wait, barrier in total out of 960 seconds run

22:36 <weilewei> hkaiser

22:36 <weilewei> that's a lot of time in communication phase

22:37 <hkaiser> nod

22:40 <weilewei> in average, each mpi_wait is 0.006908625 seconds, but the max can go as high as 0.252(s), though it is no clear about standard deviation, but it is implicitly saying potential load imbalance. Sometimes wait long enough to receive G2

23:04 nikunj97 has quit [Read error: Connection reset by peer]

23:04 Nikunj__ has joined #ste||ar

23:09 Nikunj__ has quit [Ping timeout: 272 seconds]

23:57 shahrzad has joined #ste||ar