#ste||ar on 2020-01-17 — irc logs at irclog.cct.lsu.edu

2019-12-03 02:04 hkaiser changed the topic of #ste||ar to: The topic is 'STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:43 weilewei has quit [Remote host closed the connection]

01:14 parsa has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

01:19 parsa has joined #ste||ar

02:37 adi_ahj has joined #ste||ar

02:49 hkaiser has quit [Quit: bye]

02:50 adi_ahj has left #ste||ar [#ste||ar]

05:29 nikunj has joined #ste||ar

07:18 tarzeau has quit [*.net *.split]

07:21 Vir has quit [Ping timeout: 268 seconds]

07:22 Vir has joined #ste||ar

08:05 tarzeau has joined #ste||ar

08:24 adi_ahj has joined #ste||ar

08:24 nikunj has quit [Read error: Connection reset by peer]

09:05 adi_ahj has quit [Ping timeout: 268 seconds]

09:08 ritvik has joined #ste||ar

09:26 ritvik has quit [Remote host closed the connection]

09:28 <mdiers_> does anyone of you have experience with 16x-gpu-node with nvidia and opencl?

09:29 <tarzeau> mdiers_: i thought 10 gpus is the maximum per node, but wow. i've only had 8gpu nodes so far (nvidia only, no opencl)

09:29 <tarzeau> (and also only with the 11/12 gb mem per gpu, there's some with more memory)

09:30 <mdiers_> tarzeau: yes, but 8x k80 -> 16 gpus

09:33 <zao> K80s and their compute engine stuff, sole cause of like half of our support questions about CUDA devices :D

09:34 <mdiers_> tarzeau: i have a scaling problem here, memory size independent, rather dependent on the opencl api calls and smaller problem sizes.

09:35 <mdiers_> zao: interesting to hear ;-)

09:36 <tarzeau> zao: we also have/had a lot of support questions about CUDA and software

09:37 <tarzeau> mdiers_: what opencl software is that exactly? don't know any opencl users here...

09:37 <zao> I wouldn't know without consulting documentation how many devices I would get with --gres=gpu:k80:1, probably two.

09:38 <tarzeau> simbergm: i'd still like to have some test software (not the hpx demos) that link/use libhpx to test my raped hpx debian package, if it works at all (or you can test it?) however trying a ubuntu 18.04 backport failed for some reason, should work with 20.04 (not released yet) or debian sid

09:39 <tarzeau> zao: i know that amd epyc boards only support two nvidia gpus, and intel up to 8 (but yeah i'm behind then)

09:39 <tarzeau> and our multi-gpu machines are used one user/or one job per gpu, i haven't seen users use multiple gpus with one job yet

09:40 <tarzeau> (however i've seen hpx supports (cmake options) cuda, so i'll try that next)

09:57 <heller> DGX-2 has 16 GPUs, FWIW

09:59 <heller> V100 that is

10:15 <mdiers_> tarzeau: i use one process and distribute on job/task per gpu. so here comes my request for the task to thread bound.

10:16 <mdiers_> thank you very much, now i have an idea that i can check out again.

10:48 nikunj has joined #ste||ar

12:58 nikunj has quit [Ping timeout: 258 seconds]

13:12 mdiers_1 has joined #ste||ar

13:12 mdiers_1 has quit [Read error: Connection reset by peer]

13:13 mdiers_ has quit [Ping timeout: 265 seconds]

13:14 mdiers_ has joined #ste||ar

13:22 adi_ahj has joined #ste||ar

13:24 hkaiser has joined #ste||ar

14:30 <jbjnr> I have fixed functioan annotations for dataflow_frame and for packaged_continuation - I think I deserve a new T-shirt for that.

14:33 <hkaiser> jbjnr: absolutely!

14:33 <hkaiser> what's your size?

14:34 <jbjnr> XL lol

14:34 <jbjnr> I like baggy!

14:34 <hkaiser> also, send me your address

14:34 <jbjnr> I wasn't serious.

14:34 <hkaiser> I was

14:34 <jbjnr> I just wanted everypone to know I did something worthwhile for once

14:34 <jbjnr> apex is still shagged though

14:35 <hkaiser> let's talk to Kevin

14:35 <jbjnr> I woder if it actually works on multinode even with normal networking on

14:35 <jbjnr> ^wonder

14:35 <jbjnr> the task counts are also wrong in the apex summary. I'll add a note to my issue about it

14:35 <hkaiser> jbjnr: I wouldn't know why not, it has no notion of running in distributed

14:36 <hkaiser> jaafar: task counts are tricky as apex uses tasks to collect task counts

14:36 <hkaiser> jbjnr: ^^

14:36 <hkaiser> sorry jaafar

14:36 <jbjnr> I guessed it would

14:37 <jbjnr> it displays the counts from rano 0, but I am surprised that most stuff is missing from vampir since the node 0 summary looks ok

14:37 <hkaiser> everything is rank 0 for apex

14:37 <hkaiser> each locality is its own rank 0 in a sense

14:38 <jbjnr> it passes rank stuf fthrough to otf2 though

14:38 <hkaiser> but that's something to work on

14:38 <hkaiser> ahh ok

14:38 <jbjnr> otf2 knows it is an mpi job

14:38 <jbjnr> kevin will hopefully know what's up

14:38 <hkaiser> send him email

14:38 <jbjnr> I thought he'd see the github issues

14:39 <jbjnr> (via email)

14:39 <hkaiser> he might ;-)

14:40 <hkaiser> I usually tell him in an email to look at github

14:42 <jbjnr> will do

14:52 <hkaiser> jbjnr: also, somebody on hpx-users was asking for the dataflow support of annotated_function

14:53 <hkaiser> if this is fixed now, could you ask him to test your changes?

14:53 <jbjnr> push a PR jsut now

14:53 <jbjnr> ah hpx-users. I'll look

15:00 <hkaiser> simbergm: meeting now?

15:00 <simbergm> hkaiser: yep

15:00 <simbergm> jbjnr: hellermeeting

15:01 <jbjnr> ok. coming now, 1 minute

15:05 <jbjnr> simbergm: please send me zoom room num on slack. thanks. sorry every time, I lose the nunmber and can't fid it

15:06 <jbjnr> oog. must be in emaisl

15:09 <simbergm> jbjnr: https://cscs.zoom.us/my/aquarium

15:09 <simbergm> or 8093809300...121212@zoomcrc.com

15:09 <simbergm> well, without the 121212

15:10 <heller> Be there in a few minutes

15:54 adi_ahj has quit [Quit: adi_ahj]

16:02 kordejong has quit [Quit: WeeChat 2.7]

16:20 hkaiser has quit [Quit: bye]

16:26 adi_ahj has joined #ste||ar

16:44 adi_ahj has quit [Quit: adi_ahj]

16:52 hkaiser has joined #ste||ar

17:15 <diehlpk_work> simbergm, heller jbjnr hkaiser I invited you for GSoC 2020, please join and become my co-organizers

17:18 <hkaiser> diehlpk_work: thanks, I did register just now

17:19 <diehlpk_work> yeah, we finished 33% of the application because we have two org admins :)

17:21 <diehlpk_work> simbergm, I added this year's questions to the end of wiki page.

17:22 <diehlpk_work> There is only one new one and this one is easy to answer

17:24 hkaiser has quit [Ping timeout: 245 seconds]

17:24 hkaiser has joined #ste||ar

17:38 <simbergm> diehlpk_work: thanks, I'll try to look at it over the weekend

17:39 <diehlpk_work> simbergm, which is the matrix channel for hpx?

17:40 <simbergm> you joined the right one

17:50 nikunj has joined #ste||ar

18:38 <jaafar> lol

18:38 <jaafar> nobody needs to apologize for accidentally mentioning me

18:39 <jaafar> I'm just a lurker/fanboi

18:39 <hkaiser> jaafar: ;-)

18:58 nikunj has quit [Quit: Leaving]

19:56 hkaiser has quit [Quit: bye]

20:42 hkaiser has joined #ste||ar

21:03 weilewei has joined #ste||ar

21:59 <weilewei> when I am trying to build tests against existing hpx installed by spack locally, I got this error:

21:59 <weilewei> [ 13%] Building CXX object tests/regressions/build/CMakeFiles/test_server_1950.dir/server_1950.cpp.oIn file included from /gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests/regressions/build/server_1950.cpp:8:/gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests/regressions/build/server_1950.hpp:9:10: fatal error:

21:59 <weilewei> hpx/include/components.hpp: No such file or directory #include <hpx/include/components.hpp>

21:59 <weilewei> why this header file hpx/include/components.hpp> is not generated somehow?

22:00 <hkaiser> no, it's not generated

22:00 <hkaiser> it should get installed from here: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/include/components.hpp

22:01 <weilewei> yea, it is not installed in include/hpx/

22:02 <hkaiser> is anything installed into hpx/include at all?

22:02 <weilewei> apply.hpp config.hpp hpx.hpp hpx_suspend.hpp preprocessor sync_launch_policy_dispatch.hppasync.hpp dataflow.hpp hpx_finalize.hpp hpx_user_main_config.hpp preprocessor.hpp throw_exception.hppasync_launch_policy_dispatch.hpp error.hpp

22:02 <weilewei> hpx_init.hpp include runtime traitscompat error_code.hpp hpx_init_impl.hpp lcos runtime.hpp utilcomponents exception.hpp hpx_main.hpp lcos_fwd.hpp runtime_fwd.hpp util_fwd.hppcomponents_fwd.hpp

22:02 <weilewei> exception_fwd.hpp hpx_main_impl.hpp parallel runtime_impl.hpp version.hppcompute exception_info.hpp hpx_start.hpp performance_counters state.hppconfig exception_list.hpp hpx_start_impl.hpp plugins sync.hpp

22:02 <hkaiser> that's a different directory

22:03 <hkaiser> see https://github.com/STEllAR-GROUP/hpx/blob/master/CMakeLists.txt#L1830-L1837

22:04 <weilewei> ah, nothing there, and I did not copy these lines to my own cmakelists.txt

22:04 <hkaiser> so hpx/include/* should end up in $HPX_INSTALL_ROOT/hpx/include

22:04 <hkaiser> well in order to install HPX you shouldn't need to change the cmake files

22:05 <hkaiser> or even $HPX_INSTALL_ROOT/include/hpx/include

22:05 <weilewei> right, so I should not touch the https://github.com/STEllAR-GROUP/hpx/blob/master/CMakeLists.txt this file to build tests?

22:05 <hkaiser> why should you?

22:06 <weilewei> I think it in a wrong way maybe.. let me check

22:06 <hkaiser> you build and install hpx from one directory, then you create your cmake file and build the tests again, this time using the installed version

22:08 <weilewei> I did not touch the CMakeLists.txt for the first time though, I am editing it for the second time

22:13 <hkaiser> why should you install things in that case?

22:13 <hkaiser> it should find the headers from the installation you performed first

22:14 <weilewei> Yes, that's weird, for the second time, I do not need to install tests at all

22:15 <weilewei> the components.hpp is in the dir of /gpfs/alpine/cph102/proj-shared/weile/dev/src/spack/opt/spack/linux-rhel7-power9le/gcc-8.1.1/hpx-1.3.0-n4qbahppmqvcdttazr3ujbe6gfakbwm7/include/hpx/include

22:16 <weilewei> but the second time did not find it

22:17 <hkaiser> well, then your find_package(HPX) didn't find the proper base directory, not sure why

22:22 hkaiser has quit [Ping timeout: 260 seconds]

22:25 hkaiser has joined #ste||ar

22:26 <hkaiser> weilewei: back here

22:26 <weilewei> hkaiser ok

22:27 <weilewei> the cmd line is this

22:27 <weilewei> [ 13%] Building CXX object tests/regressions/build/CMakeFiles/test_server_1950.dir/server_1950.cpp.ocd /gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/build/tests/regressions/build &&

22:27 <weilewei> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-8.1.1/spectrum-mpi-10.3.0.1-20190611-55ygkz53evhcwy3txeis32gc3kzu7wy6/bin/mpic++ -DBOOST_ALL_NO_LIB -Dtest_server_1950_EXPORTS -I/gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests -isystem

22:27 <weilewei> /gpfs/alpine/cph102/proj-shared/weile/dev/src/spack/opt/spack/linux-rhel7-power9le/gcc-8.1.1/boost-1.70.0-vqtwnz6pc4meb7xhrvcsj3l5zusdh6nt/include -isystem /gpfs/alpine/cph102/proj-shared/weile/dev/src/spack/opt/spack/linux-rhel7-power9le/gcc-8.1.1/hpx-1.3.0-n4qbahppmqvcdttazr3ujbe6gfakbwm7/include/hpx -fPIC -std=c++14 -o

22:27 <weilewei> CMakeFiles/test_server_1950.dir/server_1950.cpp.o -c /gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests/regressions/build/server_1950.cppIn file included from

22:27 <weilewei> /gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests/regressions/build/server_1950.cpp:8:/gpfs/alpine/cph102/proj-shared/weile/dev/src/hpx_module_test/tests/regressions/build/server_1950.hpp:9:10: fatal error: hpx/include/components.hpp: No such file or directory #include <hpx/include/components.hpp>

22:27 <hkaiser> please don't post this here, link it somewhere

22:27 <hkaiser> so what is the used base dir?

22:29 <weilewei> when I do cmake the second time to build tests, I pass -DHPX_DIR=/gpfs/alpine/cph102/proj-shared/weile/dev/src/spack/opt/spack/linux-rhel7-power9le/gcc-8.1.1/hpx-1.3.0-n4qbahppmqvcdttazr3ujbe6gfakbwm7/lib64/cmake/HPX/

22:31 <hkaiser> what dirs are listed in the HPXConfig.cmake in that directory?

22:33 <weilewei> you mean the content of HPXConfig.cmake?

22:34 <hkaiser> yes

22:34 <weilewei> https://gist.github.com/weilewei/bda0830c4f55b295d4af2c41627a34a1#gistcomment-3142096

22:35 <hkaiser> the firs INCLUDE_DIR should be the correct one, is that on the generated command lines?

22:36 <hkaiser> anyways, gtg - sorry

22:36 <weilewei> the second one goes one level deeper

22:36 <hkaiser> that is needed as well

22:36 <weilewei> hkaiser no worries, thanks

22:37 <weilewei> Ok, let me think about that how should I pass that one to command lines

22:37 hkaiser has quit [Quit: bye]