#ste||ar on 2021-06-13 — irc logs at irclog.cct.lsu.edu

2020-09-17 16:16 K-ballo changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

03:02 hkaiser has quit [Quit: bye]

09:31 parsa| has joined #ste||ar

09:34 srinivasyadav224 has joined #ste||ar

09:35 wash[m]_ has joined #ste||ar

09:37 zao_ has joined #ste||ar

09:37 mdiers[m]1 has joined #ste||ar

09:37 rainmaker6[m]1 has joined #ste||ar

09:39 bering[m]1 has joined #ste||ar

09:42 wash[m] has quit [*.net *.split]

09:42 zao has quit [*.net *.split]

09:42 mdiers[m] has quit [*.net *.split]

09:42 rainmaker6[m] has quit [*.net *.split]

09:42 rgoswami has quit [*.net *.split]

09:42 jedi18[m] has quit [*.net *.split]

09:42 Deepak1411[m] has quit [*.net *.split]

09:42 parsa has quit [*.net *.split]

09:42 srinivasyadav227 has quit [*.net *.split]

09:42 bering[m] has quit [*.net *.split]

09:42 ms[m] has quit [*.net *.split]

09:42 parsa| is now known as parsa

09:42 wash[m]_ is now known as wash[m]

09:42 zao_ is now known as zao

09:49 rgoswami has joined #ste||ar

09:49 jedi18[m] has joined #ste||ar

09:49 ms[m] has joined #ste||ar

09:50 Deepak1411[m] has joined #ste||ar

10:00 parsa has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

10:59 parsa has joined #ste||ar

12:47 hkaiser has joined #ste||ar

14:10 chuanqiu has joined #ste||ar

14:11 chuanqiu has quit [Client Quit]

15:06 <srinivasyadav224> gnikunj: yt?

15:06 <gnikunj[m]> srinivasyadav227: here

15:08 <srinivasyadav224> gnikunj: I am back to normal :) , i started on working on roofline model , can you please check this link https://docs.google.com/document/d/1ULUW4ZibZK9hDBd8TuCaZgT8k8eh_JLsSE0MRpAQMhs/edit?usp=sharing ?

15:08 <gnikunj[m]> P peak calculated is for single core peak performance here

15:09 <srinivasyadav224> yes

15:11 <gnikunj[m]> looks good as a starting point

15:11 <gnikunj[m]> try plotting it into a graph now

15:12 <srinivasyadav224> yeah okay :)

15:13 <srinivasyadav224> in the table it showed it got to 20GFlops i thinks thats when data transfer cost is minimum i.e from L1 cache right?

15:14 <gnikunj[m]> which table?

15:15 <srinivasyadav224> output figure, last column i.e avg simd flops in last page

15:16 <gnikunj[m]> yes, that seems like the doing on L1 cache

15:16 <gnikunj[m]> that's what we concluded in the last met as well I believe. Your machine had 640KB or so which easily meant that the array would fit into the L1 cache.

15:17 <srinivasyadav224> yea, so if we increase the number of elements great than that would fit on cache, that would give us DRAM roofline or peak right?

15:18 <gnikunj[m]> if you increase the number such that it doesn't fit into the caches, yes

15:18 <gnikunj[m]> I'm looking into your numbers and things don't make sense to me

15:18 <gnikunj[m]> it seems that the memory bandwidth calculated is for the whole system and not a single thread

15:20 <srinivasyadav224> yeah, STREAM test is using all the threads (40) ?

15:20 <gnikunj[m]> yeah, we don't want that

15:20 <gnikunj[m]> could you use hwloc-bind to disable it

15:21 <gnikunj[m]> do: hwloc-bind core:0.PU:0 ./stream

15:21 <srinivasyadav224> actually i lost all the links that you posted in the google meet once we exited the meeting :(, sorry

15:21 <gnikunj[m]> this would ensure that only single core memory bandwidth is calculated

15:21 <gnikunj[m]> that's fine, you've been holding up yourself just fine ;)

15:22 <gnikunj[m]> essentially we want to calculate single core memory bandwidth. This is because a single core can't saturate the memory bandwidth completely. Hence your single core figures will never reach that 35GFLOPS peak you showed.

15:23 <srinivasyadav224> <gnikunj[m] "do: hwloc-bind core:0.PU:0 ./str"> thanks :) , this gave me 10GBps

15:23 <gnikunj[m]> if you calculate memory bandwidth for a single core (using the above command), the memory bandwidth should come out pretty low at 10GBps or so. This would change the peak and the numbers will look more aligned to the peak.

15:23 <srinivasyadav224> what. ?? how did you know that?

15:24 <srinivasyadav224> i mean how did you tell 10GBps ?

15:24 <srinivasyadav224> just curios

15:24 <srinivasyadav224> its amost exact 😲

15:24 <gnikunj[m]> and we can explain the 20GFLOPS in 2 ways: 1) using vector intrinsics saturate the memory bandwidth more -> more performance for single core (won't see similar difference for 40 cores though) and 2) L1 cache fits the vector size causing caching to increase the performance

15:25 <gnikunj[m]> <srinivasyadav224 "i mean how did you tell 10GBps ?"> I have worked with stream results enough to predict the performance :P

15:26 <srinivasyadav224> <gnikunj[m] "I have worked with stream result"> that left my brain blanked :)

15:27 <srinivasyadav224> wait, if bandwidth is 10GBPS then P will be 10GFlops right?

15:32 <gnikunj[m]> yes

15:33 <gnikunj[m]> point 1 explains why you see results beyond that point for simd intrinsics

15:34 <srinivasyadav224> with 33554432 elements i.e 128MB data, i get 8.3 GFlops, my L3 cache is 5MB, so it wouldnt fit it any of the cache right?

15:35 <gnikunj[m]> sounds right

15:36 <srinivasyadav224> okay so peak is 10GFlops and this app is doing 8.3 GFlops? so is this good?

15:36 <gnikunj[m]> yes, this is decent first performance

15:36 <gnikunj[m]> we can try to improve it further and we also know the legal max we can get

15:37 <gnikunj[m]> that's why we asked you to plot the roofline model on this

15:38 <srinivasyadav224> <gnikunj[m] "that's why we asked you to plot "> yea, i get it now :) , i just started today afternoon, was down for 2 days approx

15:38 <gnikunj[m]> it was bad for me too. I'm still sort of recovering but I'm able to work now I guess.

15:38 <gnikunj[m]> I'm glad you're back to health :)

15:41 <srinivasyadav224> i am back to 100% now, i will try to finish up things in next few days

16:00 <hkaiser> srinivasyadav224: sounds great!

16:02 <srinivasyadav224> :)

16:20 <hkaiser> jedi18[m]: just a heads-up: I fixed the tests on the minmax PR in your repository

16:21 <jedi18[m]> hkaiser: Thanks a lot! What was the issue?

16:21 <hkaiser> for now, you forgot to change the segmented algorithms to tag_dispatch

16:22 <hkaiser> https://github.com/STEllAR-GROUP/hpx/pull/5241/commits/55aba78c19018680d0c637ebc3df8f0b275ec65c

16:22 <jedi18[m]> Oh right I did, sorry about that

16:22 <hkaiser> np

16:22 <jedi18[m]> Btw regarding my comment here https://github.com/STEllAR-GROUP/hpx/pull/5371

16:22 <hkaiser> the tests themselves run ok for, no idea why they failed on the CI, let's see

16:22 <jedi18[m]> What can I do about that unused variable?

16:22 <jedi18[m]> Oh ok sure

16:24 <hkaiser> jedi18[m]: I added a suggestion on how to change to the PR

16:25 <jedi18[m]> Ah, that was a simple fix :D, thanks!

16:26 <jedi18[m]> Btw for overloads that return void, for the parallel ones it would return util::detail::algorithm_result<ExPolicy> right?

16:26 <jedi18[m]> Is there something special about the algorithm_result<ExPolicy> returned from the base implementation or is it the same as util::detail::algorithm_result<ExPolicy>::get(hpx::util::unused_type);

16:49 <hkaiser> jedi18[m]: yes

17:10 <jedi18[m]> Oh ok thanks

17:46 zao has quit [K-Lined]

17:46 wash[m] has quit [K-Lined]

19:00 zao has joined #ste||ar

19:05 <zao> Well, I'm apparently going to participate way less in this incarnation of this channel now.

19:11 <zao> Kind of hard when the popular and usable IRC service I'm and others are using is not allowed on the network.

19:32 <hkaiser> zao: that's unfortunate

19:33 <hkaiser> if freenode goes down we'll need to look for alternatives anyways

19:33 <hkaiser> I have secured the #ste||ar* channels on LibraChat, btw

19:43 <zao> Excellent.

19:44 <zao> Gonna keep this caveman irssi running if people need to reach me :D

19:47 <hkaiser> great, thanks

19:48 <hkaiser> I'd suspect that people will start discussing matrix or other platforms again

19:49 <zao> I sent a longer email with my thoughts to ms[m] earlier tonight.

19:50 <hkaiser> could you cc me on that one as well, pls?

19:52 <zao> Forwarded.

19:53 <hkaiser> thanks

19:53 hkaiser has quit [Quit: bye]

19:54 hkaiser has joined #ste||ar

20:13 hkaiser has quit [Quit: bye]

20:13 hkaiser has joined #ste||ar

20:16 hkaiser has quit [Client Quit]

20:17 hkaiser has joined #ste||ar

21:35 zao has quit [*.net *.split]

21:39 zao has joined #ste||ar

22:38 parsa has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

22:40 parsa has joined #ste||ar