K-ballo changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
bita has joined #ste||ar
<gnikunj[m]> hkaiser here now
<gnikunj[m]> Idk how it slipped my notifications :/
hkaiser has joined #ste||ar
hkaiser has quit [Quit: bye]
bita has quit [Ping timeout: 240 seconds]
<gnikunj[m]> ms: I have HPX_WITH_CUDA and HPX_WITH_CUDA_COMPUTE turned ON but I don't see the header hpx/compute/cuda.hpp in the installed directory. Do I to turn any other cmake option on?
<ms[m]> gnikunj: no... I think I can reproduce that, and it's very suspicious, I'll have a look
<ms[m]> thanks pointing that out!
<gnikunj[m]> thanks!
jpinto[m] has quit [Quit: Idle for 30+ days]
<ms[m]> gnikunj: fixed (on master), sorry and thank you, that was a very good catch!
<gnikunj[m]> ms: that was quick. Thanks!
hkaiser has joined #ste||ar
<hkaiser> Thanks ms[m] and rori for preparing the release candidate!
K-ballo has quit [Read error: Connection reset by peer]
K-ballo has joined #ste||ar
<hkaiser> ms[m]: any idea what could have happened here: https://github.com/STEllAR-GROUP/hpx/pull/5153/checks?check_run_id=1859043009? I'm clueless...
<ms[m]> (maybe other similar ones)
<ms[m]> basically the hpx_async_base module gets included in the hpx_parallelism and hpx_full libraries, hence the odr violation
<ms[m]> you can simply remove it and add hpx_parallelism in DEPENDENCIES
<hkaiser> ok, sounds reasonable
<hkaiser> I will try that - thanks
<hkaiser> I would never have thought of this possibility :/
<ms[m]> it's possibly something we could detect earlier, but I don't know how easy it would be
<hkaiser> na, that's what we have the sanitizers for - good we have added them
<ms[m]> indeed, I like that odr violation one!
<hkaiser> so on linux, hpx_core/hpx_parallism are not shared libraries?
<ms[m]> and I hope it's actually that, it's just the first thing that popped into my mind (there's been similar ones earlier)
<ms[m]> they are shared libraries
<hkaiser> I assumed, that things that are exported from a shared library (async is a symbol exported from one) wouldn't be duplicated in any case
<hkaiser> but that's probably my misconception caused by assuming Windows dll visibility rules apply everywhere
<ms[m]> the duplication comes from when we construct the shared libraries from the modules (static libraries) where we have to make sure that each module is only included in one of the shared libraries, but then I don't know how the linker deals with e.g. preloading an allocator library which also exports malloc and friends...
<hkaiser> ms[m]: ahh, so the shared libraries are linked from a list of static libraries, which is causing some of the object files to end up in two of them
<hkaiser> that's something to watch out for
<hkaiser> IOW, the DEPENENCIES clause in add_hpx_module should never contain any modules from another module category
<hkaiser> that's something we should try to detect indeed
<ms[m]> exactly, that's why there's a separate MODULE_DEPENDENCIES in addition to DEPENDENCIES when creating modules, the former should only include modules from within the shared library that the module itself belongs to
<ms[m]> right, exactly that :D
<hkaiser> ok, I'll take a stab at this
<hkaiser> thanks for the explanations
<ms[m]> yep
<ms[m]> ok, nice, thank you!
<srinivasyadav227> anyone working on Adapt parallel algorithms to C++20 #4822 ?
<hkaiser> srinivasyadav227: yes, gonidelis[m] is working on it, but there is a sufficient amount of work for more than one person
<ms[m]> freenode_srinivasyadav227[m]: gonidelis[m] is
<ms[m]> too slow...
<srinivasyadav227> hkaiser: ok, I will try to work on it! :-)
<hkaiser> srinivasyadav227: please coordinate with gonidelis[m]
<srinivasyadav227> hkaiser: ok
<hkaiser> srinivasyadav227: I am planning to add two more related tickets (also as gsoc projects) that are related: disentangle the segmented algorithm implementations from the parallel algorithms, and work on support for the par_unseq execution policy (vectorize the parallel algorithms)
<hkaiser> ms[m]: I have another nitpick for 5142, sorry
<ms[m]> hkaiser: :D
<hkaiser> ... or not :/
<ms[m]> no worries, I appreciate it, so tell me!
<srinivasyadav227> hkaiser: yea cool!
<hkaiser> could this take a string const& ?
<hkaiser> just avoiding another allocation, possibly
<ms[m]> it could, yes
<ms[m]> yeah, that makes sense
<hkaiser> srinivasyadav227: see #5156 and #5157, I'll create the gsoc projects later today
<hkaiser> srinivasyadav227: I have also marked #3364 as a possible gsoc project
<zao> Preparing in good time for GSoC, eh? :)
<zao> I guess that it's actually soon.
<srinivasyadav227> hkaiser: yup, saw them just now, so #3364 is like major release to C++20 and #5156 #5157 #4822 and others are parts of it, right?
<srinivasyadav227> zao: this season work and time line have been reduced right 😀 , so early prep helps for me :-)
<zao> I have no idea, I'm blissfully far from it :)
<srinivasyadav227> ok :)
<jaafar> Out of curiosity - I see the HPX build system has stuff for asan - has anyone tried running ubsan?
<hkaiser> srinivasyadav227: yes, google has reduced the period of performance to 7 week, iirc - however we will be able to extend this for another 4 paid weeks using our funds
<srinivasyadav227> hkaiser: that's really nice!
<hkaiser> srinivasyadav227: I have added some language explaining this here: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-(GSoC)-2021
shahrzad has joined #ste||ar
<gnikunj[m]> hkaiser: I'm back. Apologies for the short notice. Yes, I'll join today's meeting. I'm currently on my way to run the prototype on hpx+cuda to see if it works without failures there as well.
<hkaiser> cool, thanks
<hkaiser> gnikunj[m]: please look over the email I sent
<gnikunj[m]> yes, reading it
<srinivasyadav227> hkaiser: yea so far its clear! thanks :)
<srinivasyadav227> Just curious! can any one attend the meeting?.. wanted to be a spectator as its related to CUDA
<hkaiser> srinivasyadav227: that is an internal project meeting, sorry
<srinivasyadav227> ok, np :-)
<hkaiser> srinivasyadav227: if it was for me, sure - but I can't drop just somebody on our clients
<srinivasyadav227> no no, its fine! 😀
<gnikunj[m]> hkaiser: the email is well explained. If the gpu code runs fine, we already have more than we anticipated ;). The 2nd link you provided is broken btw.
<hkaiser> urgs
<hkaiser> could you respond to all and fix the link, pls?
<gnikunj[m]> sure
<hkaiser> gnikunj[m]: works for me :/
<gnikunj[m]> are you sure? You have a 80 character line break btw. So tors.hpp part is on the next line and is not part of the link.
<hkaiser> that's your email client breaking things
<gnikunj[m]> at least, I can't open the link on the email I receivedxc
<gnikunj[m]> could be cct to gmail conversion issue then.
<hkaiser> nod
<gnikunj[m]> should I write an email then? (or leave it as is?)
<hkaiser> just leave it, they will ask, if needed
<gnikunj[m]> sounds good.
<gnikunj[m]> hkaiser: ms how do I provide gpu architecture to cmake to build cuda backend for hpx?
<gnikunj[m]> default architecture does not compile for me
<gnikunj[m]> default is sm_20 btw which needs version 7 or 8 (current standards is 11.2)
<ms[m]> gnikunj: cmake -DCUDA_NVCC_FLAGS=-arch=sm_XX
<ms[m]> where is sm_20 the default?
<gnikunj[m]> sm_20 is what my cmake took as default so I thought that must be set as default
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<srinivasyadav227> gnikunj[m]: sm_20 supports CUDA 7 and later
<zao> Do current CUDA toolchains still support deprecated and obsolete models?
<srinivasyadav227> not later*, I think it support till 7
<zao> (I haven't had a GPU wired up to my VMs for a long while now)
<gnikunj[m]> zao: looks like it's a libdevice not found issue
<srinivasyadav227> zao: with later versions of CUDA, they don't support compilation
<gnikunj[m]> wait I didn't see that it's nvcc flag. ms is there an equivalent flag for clang?
<gnikunj[m]> or do you insist on using nvcc for anything cuda?
<ms[m]> gnikunj: no, I prefer clang
<ms[m]> I think we have -DHPX_CUDA_CLANG_FLAGS=--cuda-gpu-arch=sm_XX
<gnikunj[m]> let me try it
<gnikunj[m]> HPX_WITH_CUDA_CLANG_FLAGS looks like it's working fine
<gnikunj[m]> ms: HPX_WITH_CUDA_CLANG_FLAGS isn't working either :/
<gnikunj[m]> now I'm getting weird C++ errors about things not being in the std namespace (possibly due to the omition of CXX version)
<diehlpk_work> ms[m], yet?
<diehlpk_work> Have you experience with hpx-kokkos on Summit?
<diehlpk_work> I get following error because kokkos can not find some hpx headers. I asusme this is related to a wrong HPX version or?
<gnikunj[m]> hkaiser: ms zao could you please help me with the following make error? Is it something on my end or the build system? https://gist.github.com/NK-Nikunj/add34f1c8c5a944e6e24fcb8bd932fd0
<gnikunj[m]> diehlpk_work: what version of HPX are you using? Looks like an old version.
<diehlpk_work> gnikunj[m], one specific commit from last September
<gnikunj[m]> yeah, that's why. Use a newer version.
<gnikunj[m]> it's most likely due to the CPOs that we added lately (hkaiser will know more)
<diehlpk_work> Octo-Tiger can not use a newer version and it works for Gregor
<gnikunj[m]> not sure then. Btw it's not a CPO issue. parallel_execution_tag was previously in hpx::execution::parallel (which was later deprecated for hpx::execution). It could be due to that as well. Are you sure Gregor uses the same commit version?
<zao> gnikunj[m]: Oh, thought it was a problem with `make`, seems like it's just the CUDA compiler being upset as usual.
<gnikunj[m]> When will we have a cuda compiler that does what it's meant to do :/
<zao> Always chasing the C++ wavefront :)
<gonidelis[m]> hkaiser: gnikunj[m] is the meeting today or tomorrow? i am confused...
<gnikunj[m]> gonidelis[m]: tomorrow
<gonidelis[m]> cool
<hkaiser> gonidelis[m]: what meeting?
<gonidelis[m]> i read gnikunj[m] referencing a today's meeting but i only have an email for tomorrow
<hkaiser> gonidelis[m]: ahh, tomorrow is the group meeting, yes
<gonidelis[m]> hkaiser: ok. one more thing. just read your tickets. if we disentangle completely segmented algos from parallel ones, does that mean that we completely remove the underscore dispatching calls?
<hkaiser> yes
<hkaiser> replacing those with tag_invoke overloads
<gonidelis[m]> hmm... ok. so in such a case there are absolutely zero dependencies between segmented and parallel algos
<hkaiser> correct
<gonidelis[m]> and no harm is done if i change the result type of the parallel overload for example ;p
<hkaiser> except for that the segmented ones rely on the local ones, but not v.v.
<gonidelis[m]> ok ok
<gonidelis[m]> hkaiser: i saw you merged the two gsoc projects we were talking about
<gonidelis[m]> we mention "This project is different from the project Parallel Algorithms and Ranges " but we don't have a " Parallel Algorithms and Ranges" project any more ;p
<gonidelis[m]> is there any capability for me to make suggestions? or do i just edit the thing?
<hkaiser> just edit the thing
<gonidelis[m]> hkaiser: thanks ;)
<hkaiser> gonidelis[m]: thank *you*
<zao> gnikunj[m]: What CUDA version was that with? I'm not having any failures with HPX master, CUDA 11.1.1, GCC 10.2.0, Boost 1.74.0
<gnikunj[m]> I'm using clang
<gnikunj[m]> not gcc
<gnikunj[m]> cuda 11.2
<gnikunj[m]> boost 1.73
<zao> Your cmake output said 1.75 :)
<zao> What Clang too?
<diehlpk_work> Having a thorough and well thought out list of Project Ideas is the most important part of your application.
<diehlpk_work> So please help all to get or project ideas well organized
<diehlpk_work> shahrzad, hkaiser, ms[m] gnikunj[m] gonidelis[m]
<gonidelis[m]> diehlpk_work: it looks pretty good actually. The algorithm side looks pretty stacked to me. Idk, maybe being more verbal about the prerequisities and how users could lead their way through a minimum level for every project could help. But then the page might get a bit chaotic...
<diehlpk_work> gonidelis[m], Thanks
<zao> gnikunj[m]: /eb/software/Boost/1.75.0-GCC-10.2.0/include/boost/move/detail/type_traits.hpp(884): error: data member initializer is not allowed
<zao> Weeee.
<zao> Had to build CUDA 11.2.1 to get clang 11 support to repro it.
<zao> You're kind of on the bleeding edge of support as 11.1.1 doesn't do that compiler
<zao> CUDA 11.2.1 compiles fine with the same setup but the underlying GCC 10.2.0
* zao invents sleep