#ste||ar on 2021-02-17 — irc logs at irclog.cct.lsu.edu

2020-09-17 16:16 K-ballo changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:12 jehelset has quit [Ping timeout: 256 seconds]

01:08 hkaiser has joined #ste||ar

01:12 jehelset has joined #ste||ar

01:27 jehelset has quit [Remote host closed the connection]

02:15 jehelset has joined #ste||ar

02:29 hkaiser has quit [Quit: bye]

05:56 jehelset has quit [Ping timeout: 240 seconds]

05:57 bita has quit [Ping timeout: 264 seconds]

08:07 jehelset has joined #ste||ar

08:43 jehelset has quit [Remote host closed the connection]

13:44 <gnikunj[m]> ms: how do I run Kokkos kernel on CUDA through HPX backend? Btw, I was able to install HPX with cuda backend. Turns out if you use clang to compile, then you need to pass CXX standard. Somewhere in there, the cxx standard is getting edited/omitted.

13:48 <ms[m]> gnikunj: you don't run Kokkos CUDA kernels through the HPX backend, at least if you mean the HPX execution space by HPX backend, but you can do this: https://github.com/STEllAR-GROUP/hpx-kokkos/blob/98a42dbe4702f1ab9c4d50f81bb076c956319be8/tests/kokkos_async_parallel.cpp#L48-L50 and stick the execution space you want to use in the template parameters (CUDA by default if you have that enabled)

13:49 <ms[m]> where do you have to pass the C++ standard?

13:49 <gnikunj[m]> while building HPX. It was complaining about allocator traits not being part of std when I passed in gpu arch flag with CUDA_CLANG_FLAGSS

13:52 <gnikunj[m]> why do we need cuda backend enable when trying to build hpx-kokkos then? I thought the purpose was to allow using cuda by default if available. Is that right?

13:53 <ms[m]> can you post the command line that you used to configure hpx?

13:53 <gnikunj[m]> yes, wait

13:53 <gnikunj[m]> cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=$HOME/Install/hpx-debug -DHPX_WITH_MALLOC=jemalloc -DHPX_WITH_CUDA=ON -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DHPX_WITH_CUDA_CLANG=ON -DHPX_CUDA_CLANG_FLAGS=--cuda-gpu-arch=sm_75 -DCMAKE_CXX_STANDARD=17 -DHPX_USE_CMAKE_CXX_STANDARD=ON ..

13:54 <gnikunj[m]> if I don't have the CMAKE_CXX_STANDARD enabled, it complains about things not being part of the namespace std

13:58 <gnikunj[m]> ms: I am trying to run the following code and it hangs after printing the integers: https://gist.github.com/NK-Nikunj/0445177b129972ab8546621a80163025

14:24 <ms[m]> gnikunj: thanks, will have a look

14:24 <ms[m]> I suppose only the cuda examples/tests fail, or everything?

14:24 <gnikunj[m]> only the cuda stuff

14:24 <ms[m]> 👍️

14:25 <gnikunj[m]> I'm trying to make my code work as well. Seems like there are a lot of host_ device_ errors ;_;

14:29 <ms[m]> gnikunj: add this after you call Kokkos::initialize: https://github.com/STEllAR-GROUP/hpx-kokkos/blob/5292da1db042e856c460d72cbea0552959e0af26/benchmarks/future_overheads.cpp#L56

14:29 <ms[m]> the cuda futures need polling enabled, which is not enabled by default

14:30 <ms[m]> we're looking at if we can just enable it automatically but for now you have to do it manually

14:30 <gnikunj[m]> aah, got it

14:30 <gnikunj[m]> how does it work by simply constructing an object of polling?

14:30 <gnikunj[m]> I don't see it getting passed anywhere elses

14:31 <ms[m]> also, I recommend you stick the #include <hpx/hpx_main.hpp> or an explicit hpx::init in there, otherwise kokkos will just initialize hpx without argc/argv

14:31 <ms[m]> there are these things called constructors and destructors in C++ ;)

14:31 <ms[m]> and global state...

14:32 <gnikunj[m]> aah, so we're initializing global objects from constructors?

14:32 <gnikunj[m]> <ms[m] "also, I recommend you stick the "> got it

14:33 <ms[m]> yeah, just setting a flag to true

14:34 <gnikunj[m]> now it works! Thanks!

14:36 hkaiser has joined #ste||ar

14:40 <ms[m]> woo \o/

14:40 <gnikunj[m]> ms: clever trickery https://github.com/STEllAR-GROUP/hpx/blob/master/libs/full/async_cuda/include/hpx/async_cuda/cuda_future.hpp#L316

14:53 <ms[m]> gnikunj: regarding the cmake standard, if you set HPX_USE_CMAKE_CXX_STANDARD=ON you should set CMAKE_CXX_STANDARD as well because that's what'll be used (if you don't set it it'll be empty)

14:53 <ms[m]> set HPX_USE_CMAKE_CXX_STANDARD=OFF if you don't want to set it at all

14:55 <gnikunj[m]> ms: I know. When you don't provide HPX_USE_CMAKE_CXX_STANDARD and CMAKE_CXX_STANDARD, somehow the CXX standard set by default is not passed on to the build system

14:55 <ms[m]> are you using an old cache?

14:56 <jaafar> If I have a ready future and I call get() on it, can there still be a context switch?

14:56 <gnikunj[m]> nope. I made sure to clear everything. Do you want me to reproduce it?

14:56 <hkaiser> jaafar: possibly, but unlikely

14:56 <jaafar> it should be fast, right?

14:56 <hkaiser> jaafar: the future's data is protected by a spinlock which can cause a suspension if contended

14:57 <jaafar> mmmm OK makes sense

14:57 <ms[m]> gnikunj: yes please

14:57 <gnikunj[m]> ms: on it

14:58 <ms[m]> is this only with clang cuda? our ci with clang cuda does not set any of those flags and it works just fine (checked it manually now as well)

14:58 <hkaiser> ms[m]: should we simply rely on CMAKE_CXX_STANDARD without additional HPX_ flag?

14:58 <gnikunj[m]> it happens only with clang cuda, specifically when I add additional gpu architecture flag using clang cuda flags option

14:59 <hkaiser> ms[m]: also, thanks for all the work on the release, I'll take care of isocpp.org and vcpkg

14:59 <hkaiser> Katie will do the blog post

14:59 <ms[m]> hkaiser: thanks!

14:59 <ms[m]> I have a blog post for stellar-group.org lined up

15:00 <hkaiser> ahh, ok

15:00 <ms[m]> I'm waiting for the docs to build before sending emails and posting the post

15:00 <hkaiser> nod

15:01 <ms[m]> and yes, perhaps we could just use CMAKE_CXX_STANDARD without any checks

15:01 <ms[m]> I remember K-ballo had some reservations about it earlier though...

15:04 jehelset has joined #ste||ar

15:06 <hkaiser> what's the rationale of having the additional HPX_ flag?

15:08 <ms[m]> it was meant to discourage users from setting the CMAKE_CXX_STANDARD flag at all, but I'm not sure that's so important really

15:14 <gnikunj[m]> ms: https://gist.github.com/NK-Nikunj/27e6929578c73e969a723ee5985585d3

15:25 <ms[m]> gnikunj: hmmm... and you say that doesn't happen if you leave out HPX_CUDA_CLANG_FLAGS?

15:25 <gnikunj[m]> right. It gives an error related to cuda then (essentially not able to determine the gpu architecture)

15:26 <gnikunj[m]> if I provide HPX_CUDA_CLANG_FLAGS, then I can't build without explicitly asking it to use a CXX standard.

15:30 <ms[m]> gnikunj: can I ask for one more log please? make VERBOSE=1

15:31 <ms[m]> with a clean in between :)

15:37 <gnikunj[m]> ms on it

15:46 K-ballo has quit [Ping timeout: 272 seconds]

16:26 bita has joined #ste||ar

16:29 <gnikunj[m]> ms: https://gist.github.com/NK-Nikunj/27e6929578c73e969a723ee5985585d3 (I deleted the unrelated stuff from make; just kept the make command that got us the error)

16:30 <ms[m]> gnikunj: thanks! although I was interested in the other stuff as well :P

16:30 <ms[m]> do you still have it?

16:30 <gnikunj[m]> aah, I'll have to redo it then :/

16:31 <gnikunj[m]> if it doesn't work out. I'll reproduce it again with VERBOSE=1

16:31 <hkaiser> ms[m]: the asio issue is a strange one!

16:31 <ms[m]> hkaiser: to say the least...

16:32 <ms[m]> gnikunj: ok, don't sweat it

16:32 <hkaiser> good catch, though, I wouldn't have thought of this

16:32 <ms[m]> I was just curious to see what flags end up on the non-cuda compilation as well

16:32 <ms[m]> hkaiser: thank stackoverflow ;)

16:33 <ms[m]> I don't really understand it tbh

16:34 <hkaiser> ms[m]: btw: https://github.com/microsoft/vcpkg/pull/16264

16:35 <ms[m]> thanks!

16:49 <hkaiser> ms[m]: btw, we still have no power and have moved into a hotel (no heating), so I might have to skip/cancel tomorrows HPX meeting

16:49 <ms[m]> hkaiser: eek, ok, no problem

16:50 <hkaiser> ms[m]: so you'll know why if I don't show up

16:50 <ms[m]> how cold has it been over there?

16:50 <ms[m]> yep, no worries

16:50 <hkaiser> -5C

16:51 <gonidelis[m]> good clecius count

16:51 <ms[m]> nice frosty winter weather :P

16:51 <hkaiser> indeed, especially if it's the same inside ;-)

16:52 <ms[m]> global warming is a hoax!

16:52 <hkaiser> it definitely is!

16:52 <ms[m]> you've had quite an eventful weather (and non-weather) year...

16:52 <hkaiser> would be boring otherwise

16:52 <ms[m]> :P

17:43 <hkaiser> ms[m]: the CRTP base for segmented iterators is not sufficient to make the overload more specific

18:14 <ms[m]> hkaiser: hmm, do you know why?

18:16 <ms[m]> we could maybe use the tag_fallback version in the default implementations instead, but it may just shift the problem...

18:17 <ms[m]> that would only consider the default implementations if there are no normal matching tag_invoke overloads

18:27 <hkaiser> yes, and the default (non-segmented algorithms) would have to be the fallback

18:27 <hkaiser> which would probably be ok

18:28 hkaiser has quit [Quit: bye]

19:27 <k-ballo[m]> i read some worrysome mention of splitting repositories

19:27 <gonidelis[m]> ...

19:27 <k-ballo[m]> the crtp base is a derived to base conversion, loses to an exact match

19:46 <ms[m]> k-ballo[m]: ah yes, thank you

19:46 <ms[m]> you don't like the thought of multiple repositories?

19:47 <k-ballo[m]> no

19:48 <ms[m]> care to expand?

19:50 <k-ballo[m]> it's a pain to develop against multiple repos

19:57 <ms[m]> likewise against one massive repo...

20:04 <k-ballo[m]> not my experience

20:29 hkaiser has joined #ste||ar

21:38 <jaafar> I have another scheduler question :) I'm seeing in the scan code that we launch some independent tasks (the first pass), call them 1A, 2A, 3A...

21:39 <jaafar> interleaved with those we are launching 1B, 2B, 3B which depend on 1A, (1A and 1B), (1A and 1B and 1C) etc.

21:39 <jaafar> the A's don't have any dependencies

21:40 <jaafar> I would expect that since we create 1B before, say, 11A, if both 1B and 11A can run we would choose 1B

21:41 <jaafar> but in fact it looks like the scheduler considers 1B early on, finds that it is not runnable, and postpones it until all initially runnable tasks are complete

21:41 <jaafar> so we in practice run maybe 1 "B" stage, then finish all the A's, then run all the B's serially

21:42 <jaafar> is there some scheduler option that will get it to reconsider what the next task will be when dependencies become ready?

21:58 <hkaiser> jaafar: the scheduler is eager

21:58 <hkaiser> tasks are not even scheduled if they can't run

21:59 <hkaiser> but once scheduled (at the end of the queue), tasks can get stolen by other cores from the back-end of the queue

22:20 jehelset has quit [Disconnected by services]

22:22 <jaafar> I see. So those B tasks essentially migrate to the end as they are created, because their dependencies are not met?

22:27 <jaafar> It seems like artificial constraints might be the right solution here

22:27 <jaafar> s/constraints/dependencies/

22:27 <hkaiser> jaafar: might be

22:28 <jaafar> I have a manual C++17 solution that I'm pretty happy with. I figure if I can only get HPX to behave the same way we'll have a winner

22:28 <hkaiser> great!

22:29 <hkaiser> jaafar: thanks for looking into this!

22:29 <jaafar> it's sad how long it's taking me

23:01 jehelset has joined #ste||ar

23:02 jehelset has quit [Disconnected by services]

23:04 jehelset has joined #ste||ar

23:04 jehelset has quit [Remote host closed the connection]

23:08 nanmiao11183 has quit [Quit: Connection closed]

23:10 jehelset has joined #ste||ar

23:32 hkaiser has quit [Quit: bye]