hkaiser changed the topic of #ste||ar to: The topic is 'STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
weilewei has quit [Remote host closed the connection]
nikunj has quit [Read error: Connection reset by peer]
adi_ahj has quit [Ping timeout: 268 seconds]
ritvik has joined #ste||ar
ritvik has quit [Remote host closed the connection]
<mdiers_>
does anyone of you have experience with 16x-gpu-node with nvidia and opencl?
<tarzeau>
mdiers_: i thought 10 gpus is the maximum per node, but wow. i've only had 8gpu nodes so far (nvidia only, no opencl)
<tarzeau>
(and also only with the 11/12 gb mem per gpu, there's some with more memory)
<mdiers_>
tarzeau: yes, but 8x k80 -> 16 gpus
<zao>
K80s and their compute engine stuff, sole cause of like half of our support questions about CUDA devices :D
<mdiers_>
tarzeau: i have a scaling problem here, memory size independent, rather dependent on the opencl api calls and smaller problem sizes.
<mdiers_>
zao: interesting to hear ;-)
<tarzeau>
zao: we also have/had a lot of support questions about CUDA and software
<tarzeau>
mdiers_: what opencl software is that exactly? don't know any opencl users here...
<zao>
I wouldn't know without consulting documentation how many devices I would get with --gres=gpu:k80:1, probably two.
<tarzeau>
simbergm: i'd still like to have some test software (not the hpx demos) that link/use libhpx to test my raped hpx debian package, if it works at all (or you can test it?) however trying a ubuntu 18.04 backport failed for some reason, should work with 20.04 (not released yet) or debian sid
<tarzeau>
zao: i know that amd epyc boards only support two nvidia gpus, and intel up to 8 (but yeah i'm behind then)
<tarzeau>
and our multi-gpu machines are used one user/or one job per gpu, i haven't seen users use multiple gpus with one job yet
<tarzeau>
(however i've seen hpx supports (cmake options) cuda, so i'll try that next)
<heller>
DGX-2 has 16 GPUs, FWIW
<heller>
V100 that is
<mdiers_>
tarzeau: i use one process and distribute on job/task per gpu. so here comes my request for the task to thread bound.
<mdiers_>
thank you very much, now i have an idea that i can check out again.
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 258 seconds]
mdiers_1 has joined #ste||ar
mdiers_1 has quit [Read error: Connection reset by peer]
mdiers_ has quit [Ping timeout: 265 seconds]
mdiers_ has joined #ste||ar
adi_ahj has joined #ste||ar
hkaiser has joined #ste||ar
<jbjnr>
I have fixed functioan annotations for dataflow_frame and for packaged_continuation - I think I deserve a new T-shirt for that.
<hkaiser>
jbjnr: absolutely!
<hkaiser>
what's your size?
<jbjnr>
XL lol
<jbjnr>
I like baggy!
<hkaiser>
also, send me your address
<jbjnr>
I wasn't serious.
<hkaiser>
I was
<jbjnr>
I just wanted everypone to know I did something worthwhile for once
<jbjnr>
apex is still shagged though
<hkaiser>
let's talk to Kevin
<jbjnr>
I woder if it actually works on multinode even with normal networking on
<jbjnr>
^wonder
<jbjnr>
the task counts are also wrong in the apex summary. I'll add a note to my issue about it
<hkaiser>
jbjnr: I wouldn't know why not, it has no notion of running in distributed
<hkaiser>
jaafar: task counts are tricky as apex uses tasks to collect task counts
<hkaiser>
jbjnr: ^^
<hkaiser>
sorry jaafar
<jbjnr>
I guessed it would
<jbjnr>
it displays the counts from rano 0, but I am surprised that most stuff is missing from vampir since the node 0 summary looks ok
<hkaiser>
everything is rank 0 for apex
<hkaiser>
each locality is its own rank 0 in a sense
<jbjnr>
it passes rank stuf fthrough to otf2 though
<hkaiser>
but that's something to work on
<hkaiser>
ahh ok
<jbjnr>
otf2 knows it is an mpi job
<jbjnr>
kevin will hopefully know what's up
<hkaiser>
send him email
<jbjnr>
I thought he'd see the github issues
<jbjnr>
(via email)
<hkaiser>
he might ;-)
<hkaiser>
I usually tell him in an email to look at github
<jbjnr>
will do
<hkaiser>
jbjnr: also, somebody on hpx-users was asking for the dataflow support of annotated_function
<hkaiser>
if this is fixed now, could you ask him to test your changes?
<jbjnr>
push a PR jsut now
<jbjnr>
ah hpx-users. I'll look
<hkaiser>
simbergm: meeting now?
<simbergm>
hkaiser: yep
<simbergm>
jbjnr: hellermeeting
<jbjnr>
ok. coming now, 1 minute
<jbjnr>
simbergm: please send me zoom room num on slack. thanks. sorry every time, I lose the nunmber and can't fid it
<diehlpk_work>
simbergm, heller jbjnr hkaiser I invited you for GSoC 2020, please join and become my co-organizers
<hkaiser>
diehlpk_work: thanks, I did register just now
<diehlpk_work>
yeah, we finished 33% of the application because we have two org admins :)
<diehlpk_work>
simbergm, I added this year's questions to the end of wiki page.
<diehlpk_work>
There is only one new one and this one is easy to answer
hkaiser has quit [Ping timeout: 245 seconds]
hkaiser has joined #ste||ar
<simbergm>
diehlpk_work: thanks, I'll try to look at it over the weekend
<diehlpk_work>
simbergm, which is the matrix channel for hpx?
<simbergm>
you joined the right one
nikunj has joined #ste||ar
<jaafar>
lol
<jaafar>
nobody needs to apologize for accidentally mentioning me
<jaafar>
I'm just a lurker/fanboi
<hkaiser>
jaafar: ;-)
nikunj has quit [Quit: Leaving]
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar
weilewei has joined #ste||ar
<weilewei>
when I am trying to build tests against existing hpx installed by spack locally, I got this error:
<weilewei>
[ 13%] Building CXX object tests/regressions/build/CMakeFiles/test_server_1950.dir/server_1950.cpp.oIn file included from /gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests/regressions/build/server_1950.cpp:8:/gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests/regressions/build/server_1950.hpp:9:10: fatal error:
<weilewei>
hpx/include/components.hpp: No such file or directory #include <hpx/include/components.hpp>
<weilewei>
why this header file hpx/include/components.hpp> is not generated somehow?
<weilewei>
I think it in a wrong way maybe.. let me check
<hkaiser>
you build and install hpx from one directory, then you create your cmake file and build the tests again, this time using the installed version
<weilewei>
I did not touch the CMakeLists.txt for the first time though, I am editing it for the second time
<hkaiser>
why should you install things in that case?
<hkaiser>
it should find the headers from the installation you performed first
<weilewei>
Yes, that's weird, for the second time, I do not need to install tests at all
<weilewei>
the components.hpp is in the dir of /gpfs/alpine/cph102/proj-shared/weile/dev/src/spack/opt/spack/linux-rhel7-power9le/gcc-8.1.1/hpx-1.3.0-n4qbahppmqvcdttazr3ujbe6gfakbwm7/include/hpx/include
<weilewei>
but the second time did not find it
<hkaiser>
well, then your find_package(HPX) didn't find the proper base directory, not sure why
hkaiser has quit [Ping timeout: 260 seconds]
hkaiser has joined #ste||ar
<hkaiser>
weilewei: back here
<weilewei>
hkaiser ok
<weilewei>
the cmd line is this
<weilewei>
[ 13%] Building CXX object tests/regressions/build/CMakeFiles/test_server_1950.dir/server_1950.cpp.ocd /gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/build/tests/regressions/build &&
<weilewei>
CMakeFiles/test_server_1950.dir/server_1950.cpp.o -c /gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests/regressions/build/server_1950.cppIn file included from
<weilewei>
/gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests/regressions/build/server_1950.cpp:8:/gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests/regressions/build/server_1950.hpp:9:10: fatal error: hpx/include/components.hpp: No such file or directory #include <hpx/include/components.hpp>
<hkaiser>
please don't post this here, link it somewhere
<hkaiser>
so what is the used base dir?
<weilewei>
when I do cmake the second time to build tests, I pass -DHPX_DIR=/gpfs/alpine/cph102/proj-shared/weile/dev/src/spack/opt/spack/linux-rhel7-power9le/gcc-8.1.1/hpx-1.3.0-n4qbahppmqvcdttazr3ujbe6gfakbwm7/lib64/cmake/HPX/
<hkaiser>
what dirs are listed in the HPXConfig.cmake in that directory?
<weilewei>
you mean the content of HPXConfig.cmake?