K-ballo changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
bita has joined #ste||ar
<gnikunj[m]>
hkaiser here now
<gnikunj[m]>
Idk how it slipped my notifications :/
hkaiser has joined #ste||ar
hkaiser has quit [Quit: bye]
bita has quit [Ping timeout: 240 seconds]
<gnikunj[m]>
ms: I have HPX_WITH_CUDA and HPX_WITH_CUDA_COMPUTE turned ON but I don't see the header hpx/compute/cuda.hpp in the installed directory. Do I to turn any other cmake option on?
<ms[m]>
gnikunj: no... I think I can reproduce that, and it's very suspicious, I'll have a look
<ms[m]>
thanks pointing that out!
<gnikunj[m]>
thanks!
jpinto[m] has quit [Quit: Idle for 30+ days]
<ms[m]>
gnikunj: fixed (on master), sorry and thank you, that was a very good catch!
<gnikunj[m]>
ms: that was quick. Thanks!
hkaiser has joined #ste||ar
<hkaiser>
Thanks ms[m] and rori for preparing the release candidate!
K-ballo has quit [Read error: Connection reset by peer]
<ms[m]>
basically the hpx_async_base module gets included in the hpx_parallelism and hpx_full libraries, hence the odr violation
<ms[m]>
you can simply remove it and add hpx_parallelism in DEPENDENCIES
<hkaiser>
ok, sounds reasonable
<hkaiser>
I will try that - thanks
<hkaiser>
I would never have thought of this possibility :/
<ms[m]>
it's possibly something we could detect earlier, but I don't know how easy it would be
<hkaiser>
na, that's what we have the sanitizers for - good we have added them
<ms[m]>
indeed, I like that odr violation one!
<hkaiser>
so on linux, hpx_core/hpx_parallism are not shared libraries?
<ms[m]>
and I hope it's actually that, it's just the first thing that popped into my mind (there's been similar ones earlier)
<ms[m]>
they are shared libraries
<hkaiser>
I assumed, that things that are exported from a shared library (async is a symbol exported from one) wouldn't be duplicated in any case
<hkaiser>
but that's probably my misconception caused by assuming Windows dll visibility rules apply everywhere
<ms[m]>
the duplication comes from when we construct the shared libraries from the modules (static libraries) where we have to make sure that each module is only included in one of the shared libraries, but then I don't know how the linker deals with e.g. preloading an allocator library which also exports malloc and friends...
<hkaiser>
ms[m]: ahh, so the shared libraries are linked from a list of static libraries, which is causing some of the object files to end up in two of them
<hkaiser>
that's something to watch out for
<hkaiser>
IOW, the DEPENENCIES clause in add_hpx_module should never contain any modules from another module category
<hkaiser>
that's something we should try to detect indeed
<ms[m]>
exactly, that's why there's a separate MODULE_DEPENDENCIES in addition to DEPENDENCIES when creating modules, the former should only include modules from within the shared library that the module itself belongs to
<ms[m]>
right, exactly that :D
<hkaiser>
ok, I'll take a stab at this
<hkaiser>
thanks for the explanations
<ms[m]>
yep
<ms[m]>
ok, nice, thank you!
<srinivasyadav227>
anyone working on Adapt parallel algorithms to C++20 #4822 ?
<hkaiser>
srinivasyadav227: yes, gonidelis[m] is working on it, but there is a sufficient amount of work for more than one person
<ms[m]>
freenode_srinivasyadav227[m]: gonidelis[m] is
<ms[m]>
too slow...
<srinivasyadav227>
hkaiser: ok, I will try to work on it! :-)
<hkaiser>
srinivasyadav227: please coordinate with gonidelis[m]
<srinivasyadav227>
hkaiser: ok
<hkaiser>
srinivasyadav227: I am planning to add two more related tickets (also as gsoc projects) that are related: disentangle the segmented algorithm implementations from the parallel algorithms, and work on support for the par_unseq execution policy (vectorize the parallel algorithms)
<hkaiser>
ms[m]: I have another nitpick for 5142, sorry
<hkaiser>
srinivasyadav227: yes, google has reduced the period of performance to 7 week, iirc - however we will be able to extend this for another 4 paid weeks using our funds
<gnikunj[m]>
hkaiser: I'm back. Apologies for the short notice. Yes, I'll join today's meeting. I'm currently on my way to run the prototype on hpx+cuda to see if it works without failures there as well.
<hkaiser>
cool, thanks
<hkaiser>
gnikunj[m]: please look over the email I sent
<gnikunj[m]>
yes, reading it
<srinivasyadav227>
hkaiser: yea so far its clear! thanks :)
<srinivasyadav227>
Just curious! can any one attend the meeting?.. wanted to be a spectator as its related to CUDA
<hkaiser>
srinivasyadav227: that is an internal project meeting, sorry
<srinivasyadav227>
ok, np :-)
<hkaiser>
srinivasyadav227: if it was for me, sure - but I can't drop just somebody on our clients
<srinivasyadav227>
no no, its fine! 😀
<gnikunj[m]>
hkaiser: the email is well explained. If the gpu code runs fine, we already have more than we anticipated ;). The 2nd link you provided is broken btw.
<hkaiser>
urgs
<hkaiser>
could you respond to all and fix the link, pls?
<gnikunj[m]>
sure
<hkaiser>
gnikunj[m]: works for me :/
<gnikunj[m]>
are you sure? You have a 80 character line break btw. So tors.hpp part is on the next line and is not part of the link.
<hkaiser>
that's your email client breaking things
<gnikunj[m]>
at least, I can't open the link on the email I receivedxc
<gnikunj[m]>
could be cct to gmail conversion issue then.
<hkaiser>
nod
<gnikunj[m]>
should I write an email then? (or leave it as is?)
<hkaiser>
just leave it, they will ask, if needed
<gnikunj[m]>
sounds good.
<gnikunj[m]>
hkaiser: ms how do I provide gpu architecture to cmake to build cuda backend for hpx?
<gnikunj[m]>
default architecture does not compile for me
<gnikunj[m]>
default is sm_20 btw which needs version 7 or 8 (current standards is 11.2)
<gnikunj[m]>
diehlpk_work: what version of HPX are you using? Looks like an old version.
<diehlpk_work>
gnikunj[m], one specific commit from last September
<gnikunj[m]>
yeah, that's why. Use a newer version.
<gnikunj[m]>
it's most likely due to the CPOs that we added lately (hkaiser will know more)
<diehlpk_work>
Octo-Tiger can not use a newer version and it works for Gregor
<gnikunj[m]>
not sure then. Btw it's not a CPO issue. parallel_execution_tag was previously in hpx::execution::parallel (which was later deprecated for hpx::execution). It could be due to that as well. Are you sure Gregor uses the same commit version?
<zao>
gnikunj[m]: Oh, thought it was a problem with `make`, seems like it's just the CUDA compiler being upset as usual.
<gnikunj[m]>
When will we have a cuda compiler that does what it's meant to do :/
<zao>
Always chasing the C++ wavefront :)
<gonidelis[m]>
hkaiser: gnikunj[m] is the meeting today or tomorrow? i am confused...
<gnikunj[m]>
gonidelis[m]: tomorrow
<gonidelis[m]>
cool
<hkaiser>
gonidelis[m]: what meeting?
<gonidelis[m]>
i read gnikunj[m] referencing a today's meeting but i only have an email for tomorrow
<hkaiser>
gonidelis[m]: ahh, tomorrow is the group meeting, yes
<gonidelis[m]>
hkaiser: ok. one more thing. just read your tickets. if we disentangle completely segmented algos from parallel ones, does that mean that we completely remove the underscore dispatching calls?
<hkaiser>
yes
<hkaiser>
replacing those with tag_invoke overloads
<gonidelis[m]>
hmm... ok. so in such a case there are absolutely zero dependencies between segmented and parallel algos
<hkaiser>
correct
<gonidelis[m]>
and no harm is done if i change the result type of the parallel overload for example ;p
<hkaiser>
except for that the segmented ones rely on the local ones, but not v.v.
<gonidelis[m]>
ok ok
<gonidelis[m]>
hkaiser: i saw you merged the two gsoc projects we were talking about
<gonidelis[m]>
we mention "This project is different from the project Parallel Algorithms and Ranges " but we don't have a " Parallel Algorithms and Ranges" project any more ;p
<gonidelis[m]>
is there any capability for me to make suggestions? or do i just edit the thing?
<hkaiser>
just edit the thing
<gonidelis[m]>
hkaiser: thanks ;)
<hkaiser>
gonidelis[m]: thank *you*
<zao>
gnikunj[m]: What CUDA version was that with? I'm not having any failures with HPX master, CUDA 11.1.1, GCC 10.2.0, Boost 1.74.0
<gnikunj[m]>
I'm using clang
<gnikunj[m]>
not gcc
<gnikunj[m]>
cuda 11.2
<gnikunj[m]>
boost 1.73
<zao>
Your cmake output said 1.75 :)
<zao>
What Clang too?
<diehlpk_work>
Having a thorough and well thought out list of Project Ideas is the most important part of your application.
<diehlpk_work>
So please help all to get or project ideas well organized
<gonidelis[m]>
diehlpk_work: it looks pretty good actually. The algorithm side looks pretty stacked to me. Idk, maybe being more verbal about the prerequisities and how users could lead their way through a minimum level for every project could help. But then the page might get a bit chaotic...
<diehlpk_work>
gonidelis[m], Thanks
<zao>
gnikunj[m]: /eb/software/Boost/1.75.0-GCC-10.2.0/include/boost/move/detail/type_traits.hpp(884): error: data member initializer is not allowed
<zao>
Weeee.
<zao>
Had to build CUDA 11.2.1 to get clang 11 support to repro it.
<zao>
You're kind of on the bleeding edge of support as 11.1.1 doesn't do that compiler
<zao>
CUDA 11.2.1 compiles fine with the same setup but the underlying GCC 10.2.0