K-ballo changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
jehelset has quit [Ping timeout: 256 seconds]
hkaiser has joined #ste||ar
jehelset has joined #ste||ar
jehelset has quit [Remote host closed the connection]
jehelset has joined #ste||ar
hkaiser has quit [Quit: bye]
jehelset has quit [Ping timeout: 240 seconds]
bita has quit [Ping timeout: 264 seconds]
jehelset has joined #ste||ar
jehelset has quit [Remote host closed the connection]
<gnikunj[m]>
ms: how do I run Kokkos kernel on CUDA through HPX backend? Btw, I was able to install HPX with cuda backend. Turns out if you use clang to compile, then you need to pass CXX standard. Somewhere in there, the cxx standard is getting edited/omitted.
<ms[m]>
where do you have to pass the C++ standard?
<gnikunj[m]>
while building HPX. It was complaining about allocator traits not being part of std when I passed in gpu arch flag with CUDA_CLANG_FLAGSS
<gnikunj[m]>
why do we need cuda backend enable when trying to build hpx-kokkos then? I thought the purpose was to allow using cuda by default if available. Is that right?
<ms[m]>
can you post the command line that you used to configure hpx?
<ms[m]>
the cuda futures need polling enabled, which is not enabled by default
<ms[m]>
we're looking at if we can just enable it automatically but for now you have to do it manually
<gnikunj[m]>
aah, got it
<gnikunj[m]>
how does it work by simply constructing an object of polling?
<gnikunj[m]>
I don't see it getting passed anywhere elses
<ms[m]>
also, I recommend you stick the #include <hpx/hpx_main.hpp> or an explicit hpx::init in there, otherwise kokkos will just initialize hpx without argc/argv
<ms[m]>
there are these things called constructors and destructors in C++ ;)
<ms[m]>
and global state...
<gnikunj[m]>
aah, so we're initializing global objects from constructors?
<gnikunj[m]>
<ms[m] "also, I recommend you stick the "> got it
<ms[m]>
gnikunj: regarding the cmake standard, if you set HPX_USE_CMAKE_CXX_STANDARD=ON you should set CMAKE_CXX_STANDARD as well because that's what'll be used (if you don't set it it'll be empty)
<ms[m]>
set HPX_USE_CMAKE_CXX_STANDARD=OFF if you don't want to set it at all
<gnikunj[m]>
ms: I know. When you don't provide HPX_USE_CMAKE_CXX_STANDARD and CMAKE_CXX_STANDARD, somehow the CXX standard set by default is not passed on to the build system
<ms[m]>
are you using an old cache?
<jaafar>
If I have a ready future and I call get() on it, can there still be a context switch?
<gnikunj[m]>
nope. I made sure to clear everything. Do you want me to reproduce it?
<hkaiser>
jaafar: possibly, but unlikely
<jaafar>
it should be fast, right?
<hkaiser>
jaafar: the future's data is protected by a spinlock which can cause a suspension if contended
<jaafar>
mmmm OK makes sense
<ms[m]>
gnikunj: yes please
<gnikunj[m]>
ms: on it
<ms[m]>
is this only with clang cuda? our ci with clang cuda does not set any of those flags and it works just fine (checked it manually now as well)
<hkaiser>
ms[m]: should we simply rely on CMAKE_CXX_STANDARD without additional HPX_ flag?
<gnikunj[m]>
it happens only with clang cuda, specifically when I add additional gpu architecture flag using clang cuda flags option
<hkaiser>
ms[m]: also, thanks for all the work on the release, I'll take care of isocpp.org and vcpkg
<hkaiser>
Katie will do the blog post
<ms[m]>
hkaiser: thanks!
<ms[m]>
I have a blog post for stellar-group.org lined up
<hkaiser>
ahh, ok
<ms[m]>
I'm waiting for the docs to build before sending emails and posting the post
<hkaiser>
nod
<ms[m]>
and yes, perhaps we could just use CMAKE_CXX_STANDARD without any checks
<ms[m]>
I remember K-ballo had some reservations about it earlier though...
jehelset has joined #ste||ar
<hkaiser>
what's the rationale of having the additional HPX_ flag?
<ms[m]>
it was meant to discourage users from setting the CMAKE_CXX_STANDARD flag at all, but I'm not sure that's so important really
<hkaiser>
ms[m]: btw, we still have no power and have moved into a hotel (no heating), so I might have to skip/cancel tomorrows HPX meeting
<ms[m]>
hkaiser: eek, ok, no problem
<hkaiser>
ms[m]: so you'll know why if I don't show up
<ms[m]>
how cold has it been over there?
<ms[m]>
yep, no worries
<hkaiser>
-5C
<gonidelis[m]>
good clecius count
<ms[m]>
nice frosty winter weather :P
<hkaiser>
indeed, especially if it's the same inside ;-)
<ms[m]>
global warming is a hoax!
<hkaiser>
it definitely is!
<ms[m]>
you've had quite an eventful weather (and non-weather) year...
<hkaiser>
would be boring otherwise
<ms[m]>
:P
<hkaiser>
ms[m]: the CRTP base for segmented iterators is not sufficient to make the overload more specific
<ms[m]>
hkaiser: hmm, do you know why?
<ms[m]>
we could maybe use the tag_fallback version in the default implementations instead, but it may just shift the problem...
<ms[m]>
that would only consider the default implementations if there are no normal matching tag_invoke overloads
<hkaiser>
yes, and the default (non-segmented algorithms) would have to be the fallback
<hkaiser>
which would probably be ok
hkaiser has quit [Quit: bye]
<k-ballo[m]>
i read some worrysome mention of splitting repositories
<gonidelis[m]>
...
<k-ballo[m]>
the crtp base is a derived to base conversion, loses to an exact match
<ms[m]>
k-ballo[m]: ah yes, thank you
<ms[m]>
you don't like the thought of multiple repositories?
<k-ballo[m]>
no
<ms[m]>
care to expand?
<k-ballo[m]>
it's a pain to develop against multiple repos
<ms[m]>
likewise against one massive repo...
<k-ballo[m]>
not my experience
hkaiser has joined #ste||ar
<jaafar>
I have another scheduler question :) I'm seeing in the scan code that we launch some independent tasks (the first pass), call them 1A, 2A, 3A...
<jaafar>
interleaved with those we are launching 1B, 2B, 3B which depend on 1A, (1A and 1B), (1A and 1B and 1C) etc.
<jaafar>
the A's don't have any dependencies
<jaafar>
I would expect that since we create 1B before, say, 11A, if both 1B and 11A can run we would choose 1B
<jaafar>
but in fact it looks like the scheduler considers 1B early on, finds that it is not runnable, and postpones it until all initially runnable tasks are complete
<jaafar>
so we in practice run maybe 1 "B" stage, then finish all the A's, then run all the B's serially
<jaafar>
is there some scheduler option that will get it to reconsider what the next task will be when dependencies become ready?
<hkaiser>
jaafar: the scheduler is eager
<hkaiser>
tasks are not even scheduled if they can't run
<hkaiser>
but once scheduled (at the end of the queue), tasks can get stolen by other cores from the back-end of the queue
jehelset has quit [Disconnected by services]
<jaafar>
I see. So those B tasks essentially migrate to the end as they are created, because their dependencies are not met?
<jaafar>
It seems like artificial constraints might be the right solution here
<jaafar>
s/constraints/dependencies/
<hkaiser>
jaafar: might be
<jaafar>
I have a manual C++17 solution that I'm pretty happy with. I figure if I can only get HPX to behave the same way we'll have a winner
<hkaiser>
great!
<hkaiser>
jaafar: thanks for looking into this!
<jaafar>
it's sad how long it's taking me
jehelset has joined #ste||ar
jehelset has quit [Disconnected by services]
jehelset has joined #ste||ar
jehelset has quit [Remote host closed the connection]