#ste||ar on 2021-04-30 — irc logs at irclog.cct.lsu.edu

2020-09-17 16:16 K-ballo changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:23 hkaiser has joined #ste||ar

00:48 nanmiao has joined #ste||ar

02:01 nanmiao has quit [Quit: Connection closed]

02:04 K-ballo has quit [Quit: K-ballo]

02:04 hkaiser has quit [Quit: bye]

02:24 jehelset has joined #ste||ar

04:47 jehelset has quit [Ping timeout: 268 seconds]

05:06 jehelset has joined #ste||ar

06:11 bita has joined #ste||ar

06:11 sivoais has quit [Ping timeout: 252 seconds]

06:26 jehelset has quit [Remote host closed the connection]

06:40 sivoais has joined #ste||ar

07:43 jehelset has joined #ste||ar

07:46 bita has quit [Ping timeout: 276 seconds]

11:30 K-ballo has joined #ste||ar

12:42 hkaiser has joined #ste||ar

13:08 <hkaiser> gonidelis[m]: yt?

13:09 <gonidelis[m]> hkaiser: yes

13:09 <hkaiser> see pm pls

13:09 <gonidelis[m]> hkaiser: i have no pm

13:45 bita has joined #ste||ar

14:50 <hkaiser> hey ms[m], you around?

14:50 <ms[m]> hkaiser: yep, here

14:50 <ms[m]> what's up?

14:50 <hkaiser> see pm, pls

15:18 nanmiao has joined #ste||ar

15:30 <hkaiser> gonidelis[m]: see pm, pls

15:53 diehlpk_work has joined #ste||ar

15:54 <gnikunj[m]> ms: yt?

16:04 <hkaiser> gonidelis[m]: I'm ready whenever you are

16:04 <gonidelis[m]> ok

16:04 <gonidelis[m]> give me a sec

16:13 <rachitt_shah[m]> <ms[m] "rachitt_shah, dashohoxha, Roshee"> Hey everyone, when is the GSoD deadline closing to fill this form?

16:23 nanmiao has quit [Quit: Connection closed]

16:24 <rachitt_shah[m]> ms[m] can you please share, if possible.

16:29 <gnikunj[m]> rachitt_shah[m]: iirc it's May 2nd

16:29 <ms[m]> rachitt_shah[m]: it's right there on the application form ;) may 11

16:29 <ms[m]> gnikunj: on my phone but write away

16:30 <rachitt_shah[m]> <ms[m] "rachitt_shah[m]: it's right ther"> Got it, didn't check. Sorry :(

16:30 <gnikunj[m]> ms: putting it here for you to see later. I did run some microbenchmarks. You can check the results and the benchmark here: https://gist.github.com/NK-Nikunj/f88c7ddd30d1f33438c33ea96ba32b4d

16:30 <rachitt_shah[m]> I assumes today would be the deadline.

16:30 <rachitt_shah[m]> * I assumed today would be the deadline.

16:31 <gnikunj[m]> turns out it's really slow to use default_executor in hpx-kokkos

16:32 <gnikunj[m]> ms: btw kokkos parallel-for has a ton of contention when you have loops that iterate over large range. When I had a loop with range 0,1000; it took 2s to complete the kokkos parallel-for which could've been done in 0.02s in HPX

16:33 <gnikunj[m]> I've started looking into the code. If I find anything, I'll let you know.

16:42 nanmiao has joined #ste||ar

17:18 K-ballo has quit [Quit: K-ballo]

17:19 K-ballo has joined #ste||ar

17:32 <ms[m]> gnikunj: the overhead comparing kokkos::parallel_for to the manual loop with async is not surprising

17:33 <ms[m]> especially for cuda, creating those independent instances creates streams (which is a blocking call iirc on top of things) and that's expensive

17:33 <ms[m]> so just creating the executors for that manual loop is expensive

17:34 <ms[m]> for the hpx backend I thought it would be lighter weight, but looking at the code I realize I have some dynamic allocation there (related to handling of the global/independent instances and scratch space; it could perhaps be lazily allocated, not sure)

17:34 <ms[m]> which while relatively cheap, might add up in that case

17:34 <ms[m]> the hpx parallel_executor is pretty much as lightweight as it gets

17:35 <ms[m]> so I'd recommend you see how much of the overhead you get rid of my moving the creation of the kokkos executors outside the loop

17:36 <ms[m]> whatever is remaining may be related to the internal synchronization in the kokkos backend's parallel for loop (it uses e.g. a latch internally, and wraps that in a future, which for sure adds more overhead than just the plain hpx parallel_executor)

17:37 <ms[m]> I'd still also have to check that it actually just creates one task when the size of the range is 1 (and this could be optimized to run inline)

17:37 <ms[m]> in general my comment is that it's not really optimized for your use case of one-off tasks

18:04 Vir has quit [*.net *.split]

18:07 Vir has joined #ste||ar

18:10 <ms[m]> rachitt_shah[m]: no worries, looking forward to seeing your applicatin!

18:10 <ms[m]> *application

18:21 <rachitt_shah[m]> > rachitt_shah[m]: no worries, looking forward to seeing your applicatin!

18:21 <rachitt_shah[m]> Thank you ms ! Will finish it over this weekend. Would there be further interviews too?

18:22 <ms[m]> rachitt_shah: depends on the quality and quantity of applications we get, but it's possible, yes

18:23 <rachitt_shah[m]> <ms[m] "rachitt_shah: depends on the qua"> Got it. Will update here with my questions.

18:24 jehelset has quit [Ping timeout: 268 seconds]

18:49 diehlpk_work has quit [Remote host closed the connection]

18:53 <gnikunj[m]> ms it seems like it. What about kokkos tasks?

19:19 <ms[m]> gnikunj: maybe, it really depends what the final use case is

19:20 <ms[m]> also, there's nothing wrong with just using hpx's parallel_executor on the host side

19:20 <ms[m]> right tool for the right job and all that

19:20 <gnikunj[m]> ms: yeah, got it. I was trying to explore what I should use for the backend. Given kokkos task is specialized for handling a single task, it should be better off for my use case.

19:21 <gnikunj[m]> <ms[m] "right tool for the right job and"> yup that makes sense!

19:23 <ms[m]> could be better, but it goes in the other direction with very fine-grained tasks (stackless)

19:23 <gnikunj[m]> could you elaborate?

19:24 <ms[m]> you might not gain anything by that either

19:24 <gnikunj[m]> damn I see :/

19:25 <ms[m]> if it's just portability for single tasks that you're looking for you might not even need kokkos

19:26 <ms[m]> but if this is eventually supposed to be extended to bulk execution and various parallel algorithms kokkos probably makes sense

19:26 <ms[m]> I'd take a step back and think what you're really aiming for

19:26 <gnikunj[m]> yeah, I figured. I believe things will be better when I add a resilience execution space, it should be better.

19:26 <gnikunj[m]> coz then we'll have resilience over bulk execution

19:27 <gnikunj[m]> <ms[m] "but if this is eventually suppos"> right

19:27 <ms[m]> and by "might not gain anything" I mean that it'll probably be faster than using the parallel algorithms, but it's still kind of overkill to use kokkos tasks for a single task

19:28 <ms[m]> in any case, I'd first try to figure out how much of those overheads come just from the creation of those kokkos executors/instances