K-ballo changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
jehelset has quit [Ping timeout: 256 seconds]
hkaiser has joined #ste||ar
jehelset has joined #ste||ar
jehelset has quit [Remote host closed the connection]
jehelset has joined #ste||ar
hkaiser has quit [Quit: bye]
jehelset has quit [Ping timeout: 240 seconds]
bita has quit [Ping timeout: 264 seconds]
jehelset has joined #ste||ar
jehelset has quit [Remote host closed the connection]
<gnikunj[m]> ms: how do I run Kokkos kernel on CUDA through HPX backend? Btw, I was able to install HPX with cuda backend. Turns out if you use clang to compile, then you need to pass CXX standard. Somewhere in there, the cxx standard is getting edited/omitted.
<ms[m]> gnikunj: you don't run Kokkos CUDA kernels through the HPX backend, at least if you mean the HPX execution space by HPX backend, but you can do this: https://github.com/STEllAR-GROUP/hpx-kokkos/blob/98a42dbe4702f1ab9c4d50f81bb076c956319be8/tests/kokkos_async_parallel.cpp#L48-L50 and stick the execution space you want to use in the template parameters (CUDA by default if you have that enabled)
<ms[m]> where do you have to pass the C++ standard?
<gnikunj[m]> while building HPX. It was complaining about allocator traits not being part of std when I passed in gpu arch flag with CUDA_CLANG_FLAGSS
<gnikunj[m]> why do we need cuda backend enable when trying to build hpx-kokkos then? I thought the purpose was to allow using cuda by default if available. Is that right?
<ms[m]> can you post the command line that you used to configure hpx?
<gnikunj[m]> yes, wait
<gnikunj[m]> cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=$HOME/Install/hpx-debug -DHPX_WITH_MALLOC=jemalloc -DHPX_WITH_CUDA=ON -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DHPX_WITH_CUDA_CLANG=ON -DHPX_CUDA_CLANG_FLAGS=--cuda-gpu-arch=sm_75 -DCMAKE_CXX_STANDARD=17 -DHPX_USE_CMAKE_CXX_STANDARD=ON ..
<gnikunj[m]> if I don't have the CMAKE_CXX_STANDARD enabled, it complains about things not being part of the namespace std
<gnikunj[m]> ms: I am trying to run the following code and it hangs after printing the integers: https://gist.github.com/NK-Nikunj/0445177b129972ab8546621a80163025
<ms[m]> gnikunj: thanks, will have a look
<ms[m]> I suppose only the cuda examples/tests fail, or everything?
<gnikunj[m]> only the cuda stuff
<ms[m]> 👍️
<gnikunj[m]> I'm trying to make my code work as well. Seems like there are a lot of host_ device_ errors ;_;
<ms[m]> the cuda futures need polling enabled, which is not enabled by default
<ms[m]> we're looking at if we can just enable it automatically but for now you have to do it manually
<gnikunj[m]> aah, got it
<gnikunj[m]> how does it work by simply constructing an object of polling?
<gnikunj[m]> I don't see it getting passed anywhere elses
<ms[m]> also, I recommend you stick the #include <hpx/hpx_main.hpp> or an explicit hpx::init in there, otherwise kokkos will just initialize hpx without argc/argv
<ms[m]> there are these things called constructors and destructors in C++ ;)
<ms[m]> and global state...
<gnikunj[m]> aah, so we're initializing global objects from constructors?
<gnikunj[m]> <ms[m] "also, I recommend you stick the "> got it
<ms[m]> yeah, just setting a flag to true
<gnikunj[m]> now it works! Thanks!
hkaiser has joined #ste||ar
<ms[m]> woo \o/
<ms[m]> gnikunj: regarding the cmake standard, if you set HPX_USE_CMAKE_CXX_STANDARD=ON you should set CMAKE_CXX_STANDARD as well because that's what'll be used (if you don't set it it'll be empty)
<ms[m]> set HPX_USE_CMAKE_CXX_STANDARD=OFF if you don't want to set it at all
<gnikunj[m]> ms: I know. When you don't provide HPX_USE_CMAKE_CXX_STANDARD and CMAKE_CXX_STANDARD, somehow the CXX standard set by default is not passed on to the build system
<ms[m]> are you using an old cache?
<jaafar> If I have a ready future and I call get() on it, can there still be a context switch?
<gnikunj[m]> nope. I made sure to clear everything. Do you want me to reproduce it?
<hkaiser> jaafar: possibly, but unlikely
<jaafar> it should be fast, right?
<hkaiser> jaafar: the future's data is protected by a spinlock which can cause a suspension if contended
<jaafar> mmmm OK makes sense
<ms[m]> gnikunj: yes please
<gnikunj[m]> ms: on it
<ms[m]> is this only with clang cuda? our ci with clang cuda does not set any of those flags and it works just fine (checked it manually now as well)
<hkaiser> ms[m]: should we simply rely on CMAKE_CXX_STANDARD without additional HPX_ flag?
<gnikunj[m]> it happens only with clang cuda, specifically when I add additional gpu architecture flag using clang cuda flags option
<hkaiser> ms[m]: also, thanks for all the work on the release, I'll take care of isocpp.org and vcpkg
<hkaiser> Katie will do the blog post
<ms[m]> hkaiser: thanks!
<ms[m]> I have a blog post for stellar-group.org lined up
<hkaiser> ahh, ok
<ms[m]> I'm waiting for the docs to build before sending emails and posting the post
<hkaiser> nod
<ms[m]> and yes, perhaps we could just use CMAKE_CXX_STANDARD without any checks
<ms[m]> I remember K-ballo had some reservations about it earlier though...
jehelset has joined #ste||ar
<hkaiser> what's the rationale of having the additional HPX_ flag?
<ms[m]> it was meant to discourage users from setting the CMAKE_CXX_STANDARD flag at all, but I'm not sure that's so important really
<ms[m]> gnikunj: hmmm... and you say that doesn't happen if you leave out HPX_CUDA_CLANG_FLAGS?
<gnikunj[m]> right. It gives an error related to cuda then (essentially not able to determine the gpu architecture)
<gnikunj[m]> if I provide HPX_CUDA_CLANG_FLAGS, then I can't build without explicitly asking it to use a CXX standard.
<ms[m]> gnikunj: can I ask for one more log please? make VERBOSE=1
<ms[m]> with a clean in between :)
<gnikunj[m]> ms on it
K-ballo has quit [Ping timeout: 272 seconds]
bita has joined #ste||ar
<gnikunj[m]> ms: https://gist.github.com/NK-Nikunj/27e6929578c73e969a723ee5985585d3 (I deleted the unrelated stuff from make; just kept the make command that got us the error)
<ms[m]> gnikunj: thanks! although I was interested in the other stuff as well :P
<ms[m]> do you still have it?
<gnikunj[m]> aah, I'll have to redo it then :/
<gnikunj[m]> if it doesn't work out. I'll reproduce it again with VERBOSE=1
<hkaiser> ms[m]: the asio issue is a strange one!
<ms[m]> hkaiser: to say the least...
<ms[m]> gnikunj: ok, don't sweat it
<hkaiser> good catch, though, I wouldn't have thought of this
<ms[m]> I was just curious to see what flags end up on the non-cuda compilation as well
<ms[m]> hkaiser: thank stackoverflow ;)
<ms[m]> I don't really understand it tbh
<ms[m]> thanks!
<hkaiser> ms[m]: btw, we still have no power and have moved into a hotel (no heating), so I might have to skip/cancel tomorrows HPX meeting
<ms[m]> hkaiser: eek, ok, no problem
<hkaiser> ms[m]: so you'll know why if I don't show up
<ms[m]> how cold has it been over there?
<ms[m]> yep, no worries
<hkaiser> -5C
<gonidelis[m]> good clecius count
<ms[m]> nice frosty winter weather :P
<hkaiser> indeed, especially if it's the same inside ;-)
<ms[m]> global warming is a hoax!
<hkaiser> it definitely is!
<ms[m]> you've had quite an eventful weather (and non-weather) year...
<hkaiser> would be boring otherwise
<ms[m]> :P
<hkaiser> ms[m]: the CRTP base for segmented iterators is not sufficient to make the overload more specific
<ms[m]> hkaiser: hmm, do you know why?
<ms[m]> we could maybe use the tag_fallback version in the default implementations instead, but it may just shift the problem...
<ms[m]> that would only consider the default implementations if there are no normal matching tag_invoke overloads
<hkaiser> yes, and the default (non-segmented algorithms) would have to be the fallback
<hkaiser> which would probably be ok
hkaiser has quit [Quit: bye]
<k-ballo[m]> i read some worrysome mention of splitting repositories
<gonidelis[m]> ...
<k-ballo[m]> the crtp base is a derived to base conversion, loses to an exact match
<ms[m]> k-ballo[m]: ah yes, thank you
<ms[m]> you don't like the thought of multiple repositories?
<k-ballo[m]> no
<ms[m]> care to expand?
<k-ballo[m]> it's a pain to develop against multiple repos
<ms[m]> likewise against one massive repo...
<k-ballo[m]> not my experience
hkaiser has joined #ste||ar
<jaafar> I have another scheduler question :) I'm seeing in the scan code that we launch some independent tasks (the first pass), call them 1A, 2A, 3A...
<jaafar> interleaved with those we are launching 1B, 2B, 3B which depend on 1A, (1A and 1B), (1A and 1B and 1C) etc.
<jaafar> the A's don't have any dependencies
<jaafar> I would expect that since we create 1B before, say, 11A, if both 1B and 11A can run we would choose 1B
<jaafar> but in fact it looks like the scheduler considers 1B early on, finds that it is not runnable, and postpones it until all initially runnable tasks are complete
<jaafar> so we in practice run maybe 1 "B" stage, then finish all the A's, then run all the B's serially
<jaafar> is there some scheduler option that will get it to reconsider what the next task will be when dependencies become ready?
<hkaiser> jaafar: the scheduler is eager
<hkaiser> tasks are not even scheduled if they can't run
<hkaiser> but once scheduled (at the end of the queue), tasks can get stolen by other cores from the back-end of the queue
jehelset has quit [Disconnected by services]
<jaafar> I see. So those B tasks essentially migrate to the end as they are created, because their dependencies are not met?
<jaafar> It seems like artificial constraints might be the right solution here
<jaafar> s/constraints/dependencies/
<hkaiser> jaafar: might be
<jaafar> I have a manual C++17 solution that I'm pretty happy with. I figure if I can only get HPX to behave the same way we'll have a winner
<hkaiser> great!
<hkaiser> jaafar: thanks for looking into this!
<jaafar> it's sad how long it's taking me
jehelset has joined #ste||ar
jehelset has quit [Disconnected by services]
jehelset has joined #ste||ar
jehelset has quit [Remote host closed the connection]
nanmiao11183 has quit [Quit: Connection closed]
jehelset has joined #ste||ar
hkaiser has quit [Quit: bye]