hkaiser changed the topic of #ste||ar to: The topic is 'STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
weilewei has quit [Remote host closed the connection]
parsa has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
parsa has joined #ste||ar
adi_ahj has joined #ste||ar
hkaiser has quit [Quit: bye]
adi_ahj has left #ste||ar [#ste||ar]
nikunj has joined #ste||ar
tarzeau has quit [*.net *.split]
Vir has quit [Ping timeout: 268 seconds]
Vir has joined #ste||ar
tarzeau has joined #ste||ar
adi_ahj has joined #ste||ar
nikunj has quit [Read error: Connection reset by peer]
adi_ahj has quit [Ping timeout: 268 seconds]
ritvik has joined #ste||ar
ritvik has quit [Remote host closed the connection]
<mdiers_> does anyone of you have experience with 16x-gpu-node with nvidia and opencl?
<tarzeau> mdiers_: i thought 10 gpus is the maximum per node, but wow. i've only had 8gpu nodes so far (nvidia only, no opencl)
<tarzeau> (and also only with the 11/12 gb mem per gpu, there's some with more memory)
<mdiers_> tarzeau: yes, but 8x k80 -> 16 gpus
<zao> K80s and their compute engine stuff, sole cause of like half of our support questions about CUDA devices :D
<mdiers_> tarzeau: i have a scaling problem here, memory size independent, rather dependent on the opencl api calls and smaller problem sizes.
<mdiers_> zao: interesting to hear ;-)
<tarzeau> zao: we also have/had a lot of support questions about CUDA and software
<tarzeau> mdiers_: what opencl software is that exactly? don't know any opencl users here...
<zao> I wouldn't know without consulting documentation how many devices I would get with --gres=gpu:k80:1, probably two.
<tarzeau> simbergm: i'd still like to have some test software (not the hpx demos) that link/use libhpx to test my raped hpx debian package, if it works at all (or you can test it?) however trying a ubuntu 18.04 backport failed for some reason, should work with 20.04 (not released yet) or debian sid
<tarzeau> zao: i know that amd epyc boards only support two nvidia gpus, and intel up to 8 (but yeah i'm behind then)
<tarzeau> and our multi-gpu machines are used one user/or one job per gpu, i haven't seen users use multiple gpus with one job yet
<tarzeau> (however i've seen hpx supports (cmake options) cuda, so i'll try that next)
<heller> DGX-2 has 16 GPUs, FWIW
<heller> V100 that is
<mdiers_> tarzeau: i use one process and distribute on job/task per gpu. so here comes my request for the task to thread bound.
<mdiers_> thank you very much, now i have an idea that i can check out again.
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 258 seconds]
mdiers_1 has joined #ste||ar
mdiers_1 has quit [Read error: Connection reset by peer]
mdiers_ has quit [Ping timeout: 265 seconds]
mdiers_ has joined #ste||ar
adi_ahj has joined #ste||ar
hkaiser has joined #ste||ar
<jbjnr> I have fixed functioan annotations for dataflow_frame and for packaged_continuation - I think I deserve a new T-shirt for that.
<hkaiser> jbjnr: absolutely!
<hkaiser> what's your size?
<jbjnr> XL lol
<jbjnr> I like baggy!
<hkaiser> also, send me your address
<jbjnr> I wasn't serious.
<hkaiser> I was
<jbjnr> I just wanted everypone to know I did something worthwhile for once
<jbjnr> apex is still shagged though
<hkaiser> let's talk to Kevin
<jbjnr> I woder if it actually works on multinode even with normal networking on
<jbjnr> ^wonder
<jbjnr> the task counts are also wrong in the apex summary. I'll add a note to my issue about it
<hkaiser> jbjnr: I wouldn't know why not, it has no notion of running in distributed
<hkaiser> jaafar: task counts are tricky as apex uses tasks to collect task counts
<hkaiser> jbjnr: ^^
<hkaiser> sorry jaafar
<jbjnr> I guessed it would
<jbjnr> it displays the counts from rano 0, but I am surprised that most stuff is missing from vampir since the node 0 summary looks ok
<hkaiser> everything is rank 0 for apex
<hkaiser> each locality is its own rank 0 in a sense
<jbjnr> it passes rank stuf fthrough to otf2 though
<hkaiser> but that's something to work on
<hkaiser> ahh ok
<jbjnr> otf2 knows it is an mpi job
<jbjnr> kevin will hopefully know what's up
<hkaiser> send him email
<jbjnr> I thought he'd see the github issues
<jbjnr> (via email)
<hkaiser> he might ;-)
<hkaiser> I usually tell him in an email to look at github
<jbjnr> will do
<hkaiser> jbjnr: also, somebody on hpx-users was asking for the dataflow support of annotated_function
<hkaiser> if this is fixed now, could you ask him to test your changes?
<jbjnr> push a PR jsut now
<jbjnr> ah hpx-users. I'll look
<hkaiser> simbergm: meeting now?
<simbergm> hkaiser: yep
<simbergm> jbjnr: hellermeeting
<jbjnr> ok. coming now, 1 minute
<jbjnr> simbergm: please send me zoom room num on slack. thanks. sorry every time, I lose the nunmber and can't fid it
<jbjnr> oog. must be in emaisl
<simbergm> or 8093809300...121212@zoomcrc.com
<simbergm> well, without the 121212
<heller> Be there in a few minutes
adi_ahj has quit [Quit: adi_ahj]
kordejong has quit [Quit: WeeChat 2.7]
hkaiser has quit [Quit: bye]
adi_ahj has joined #ste||ar
adi_ahj has quit [Quit: adi_ahj]
hkaiser has joined #ste||ar
<diehlpk_work> simbergm, heller jbjnr hkaiser I invited you for GSoC 2020, please join and become my co-organizers
<hkaiser> diehlpk_work: thanks, I did register just now
<diehlpk_work> yeah, we finished 33% of the application because we have two org admins :)
<diehlpk_work> simbergm, I added this year's questions to the end of wiki page.
<diehlpk_work> There is only one new one and this one is easy to answer
hkaiser has quit [Ping timeout: 245 seconds]
hkaiser has joined #ste||ar
<simbergm> diehlpk_work: thanks, I'll try to look at it over the weekend
<diehlpk_work> simbergm, which is the matrix channel for hpx?
<simbergm> you joined the right one
nikunj has joined #ste||ar
<jaafar> lol
<jaafar> nobody needs to apologize for accidentally mentioning me
<jaafar> I'm just a lurker/fanboi
<hkaiser> jaafar: ;-)
nikunj has quit [Quit: Leaving]
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar
weilewei has joined #ste||ar
<weilewei> when I am trying to build tests against existing hpx installed by spack locally, I got this error:
<weilewei> [ 13%] Building CXX object tests/regressions/build/CMakeFiles/test_server_1950.dir/server_1950.cpp.oIn file included from /gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests/regressions/build/server_1950.cpp:8:/gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests/regressions/build/server_1950.hpp:9:10: fatal error:
<weilewei> hpx/include/components.hpp: No such file or directory #include <hpx/include/components.hpp>
<weilewei> why this header file hpx/include/components.hpp> is not generated somehow?
<hkaiser> no, it's not generated
<weilewei> yea, it is not installed in include/hpx/
<hkaiser> is anything installed into hpx/include at all?
<weilewei> apply.hpp config.hpp hpx.hpp hpx_suspend.hpp preprocessor sync_launch_policy_dispatch.hppasync.hpp dataflow.hpp hpx_finalize.hpp hpx_user_main_config.hpp preprocessor.hpp throw_exception.hppasync_launch_policy_dispatch.hpp error.hpp
<weilewei> hpx_init.hpp include runtime traitscompat error_code.hpp hpx_init_impl.hpp lcos runtime.hpp utilcomponents exception.hpp hpx_main.hpp lcos_fwd.hpp runtime_fwd.hpp util_fwd.hppcomponents_fwd.hpp
<weilewei> exception_fwd.hpp hpx_main_impl.hpp parallel runtime_impl.hpp version.hppcompute exception_info.hpp hpx_start.hpp performance_counters state.hppconfig exception_list.hpp hpx_start_impl.hpp plugins sync.hpp
<hkaiser> that's a different directory
<weilewei> ah, nothing there, and I did not copy these lines to my own cmakelists.txt
<hkaiser> so hpx/include/* should end up in $HPX_INSTALL_ROOT/hpx/include
<hkaiser> well in order to install HPX you shouldn't need to change the cmake files
<hkaiser> or even $HPX_INSTALL_ROOT/include/hpx/include
<weilewei> right, so I should not touch the https://github.com/STEllAR-GROUP/hpx/blob/master/CMakeLists.txt this file to build tests?
<hkaiser> why should you?
<weilewei> I think it in a wrong way maybe.. let me check
<hkaiser> you build and install hpx from one directory, then you create your cmake file and build the tests again, this time using the installed version
<weilewei> I did not touch the CMakeLists.txt for the first time though, I am editing it for the second time
<hkaiser> why should you install things in that case?
<hkaiser> it should find the headers from the installation you performed first
<weilewei> Yes, that's weird, for the second time, I do not need to install tests at all
<weilewei> the components.hpp is in the dir of /gpfs/alpine/cph102/proj-shared/weile/dev/src/spack/opt/spack/linux-rhel7-power9le/gcc-8.1.1/hpx-1.3.0-n4qbahppmqvcdttazr3ujbe6gfakbwm7/include/hpx/include
<weilewei> but the second time did not find it
<hkaiser> well, then your find_package(HPX) didn't find the proper base directory, not sure why
hkaiser has quit [Ping timeout: 260 seconds]
hkaiser has joined #ste||ar
<hkaiser> weilewei: back here
<weilewei> hkaiser ok
<weilewei> the cmd line is this
<weilewei> [ 13%] Building CXX object tests/regressions/build/CMakeFiles/test_server_1950.dir/server_1950.cpp.ocd /gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/build/tests/regressions/build &&
<weilewei> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-8.1.1/spectrum-mpi-10.3.0.1-20190611-55ygkz53evhcwy3txeis32gc3kzu7wy6/bin/mpic++ -DBOOST_ALL_NO_LIB -Dtest_server_1950_EXPORTS -I/gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests -isystem
<weilewei> /gpfs/alpine/cph102/proj-shared/weile/dev/src/spack/opt/spack/linux-rhel7-power9le/gcc-8.1.1/boost-1.70.0-vqtwnz6pc4meb7xhrvcsj3l5zusdh6nt/include -isystem /gpfs/alpine/cph102/proj-shared/weile/dev/src/spack/opt/spack/linux-rhel7-power9le/gcc-8.1.1/hpx-1.3.0-n4qbahppmqvcdttazr3ujbe6gfakbwm7/include/hpx -fPIC -std=c++14 -o
<weilewei> CMakeFiles/test_server_1950.dir/server_1950.cpp.o -c /gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests/regressions/build/server_1950.cppIn file included from
<weilewei> /gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests/regressions/build/server_1950.cpp:8:/gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests/regressions/build/server_1950.hpp:9:10: fatal error: hpx/include/components.hpp: No such file or directory #include <hpx/include/components.hpp>
<hkaiser> please don't post this here, link it somewhere
<hkaiser> so what is the used base dir?
<weilewei> when I do cmake the second time to build tests, I pass -DHPX_DIR=/gpfs/alpine/cph102/proj-shared/weile/dev/src/spack/opt/spack/linux-rhel7-power9le/gcc-8.1.1/hpx-1.3.0-n4qbahppmqvcdttazr3ujbe6gfakbwm7/lib64/cmake/HPX/
<hkaiser> what dirs are listed in the HPXConfig.cmake in that directory?
<weilewei> you mean the content of HPXConfig.cmake?
<hkaiser> yes
<hkaiser> the firs INCLUDE_DIR should be the correct one, is that on the generated command lines?
<hkaiser> anyways, gtg - sorry
<weilewei> the second one goes one level deeper
<hkaiser> that is needed as well
<weilewei> hkaiser no worries, thanks
<weilewei> Ok, let me think about that how should I pass that one to command lines
hkaiser has quit [Quit: bye]