hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
diehlpk has joined #ste||ar
diehlpk_work has quit [Remote host closed the connection]
diehlpk_work has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
diehlpk_work has quit [Remote host closed the connection]
diehlpk has quit [Quit: Leaving.]
diehlpk has joined #ste||ar
hkaiser has quit [Quit: Bye!]
diehlpk has left #ste||ar [#ste||ar]
K-ballo has joined #ste||ar
<dkaratza[m]>
ms: what is wrong with the array indexing?
<ms[m]>
dkaratza: you tell me ;)
<ms[m]>
I think the minimal change would be to swap the `*` and `+` in the indexing, but please have a closer look
<dkaratza[m]>
<ms[m]> "I think the minimal change would..." <- ok thanks i will take a look
<dkaratza[m]>
now i encounter a different problem, if u can help
<ms[m]>
dkaratza: ask away
<dkaratza[m]>
when i try to run my example to see what is wrong with the indexing i use `cmake -DCMAKE_PREFIX_PATH=/path/to/hpx/installation ..` and i get a cmake error `CMake Error at CMakeLists.txt:3: By not providing "FindHPX.cmake"...`
<dkaratza[m]>
this means that my path is wrong?
<ms[m]>
where and why are you running that command?
<dkaratza[m]>
i follow the instructions we have at the quickstart for the hello world example to run the matrix multippplication exampel
<dkaratza[m]>
:/
<ms[m]>
ok, that's a slightly different setup... but we can try that as well
<ms[m]>
the easiest way to build that matrix multiplication example is to go to your hpx build directory and build the `matrix_multiplication` target
<dkaratza[m]>
i had it working, but i dont know what went wrong and now i cannot use it
<dkaratza[m]>
ms[m]: ohh
<dkaratza[m]>
how do i build the target?
<ms[m]>
make/ninja matrix_multiplication
<ms[m]>
the binary will go into the bin directory in your build directory
<dkaratza[m]>
<ms[m]> "make/ninja matrix_multiplication" <- i am just in the build directory but there is not such a target
<ms[m]>
dkaratza: short call? easier to debug that way
<dkaratza[m]>
<ms[m]> "dkaratza: short call? easier..." <- sure
<ms[m]>
dkaratza: all right, now?
<dkaratza[m]>
yup
<dkaratza[m]>
should i use your link?
<ms[m]>
dkaratza: yep, same one
hkaiser has joined #ste||ar
hkaiser has quit [Quit: Bye!]
diehlpk_work has joined #ste||ar
<diehlpk_work>
ms[m], Can I disable cublas in HPX?
<ms[m]>
diehlpk_work: currently no, why?
<diehlpk_work>
ms[m], Do we use the official find cuda or our own script?
hkaiser has joined #ste||ar
<diehlpk_work>
It occurs that with CUDA 11.4 the path is different and HPX does not find cublas anymore
<ms[m]>
we use findcudatoolkit, so if it doesn't find it you'll have to give it some hints or figure out if it's a bug in the module
<hkaiser>
diehlpk_work: well, it's the standard cmake FindCUDA script we're using
<ms[m]>
the problem sounds vaguely familiar, will give you a link if I figure out what it was
<diehlpk_work>
ms[m], Thanks
<diehlpk_work>
hkaiser, I get the following strange error on Perlmutter
<diehlpk_work>
Error: /pscratch/sd/d/diehlpk/OctoTigerBuildChain/build/hpx/include/hpx/hardware/timestamp/linux_x86_64.hpp(31): error: asm operand type size(4) does not match type/size implied by constraint 'a'
<diehlpk_work>
and I have not really any idea what is going on there?
<diehlpk_work>
ms[m], I found the issue and could solve it
<diehlpk_work>
But I am not sure if that is a nice solution
<diehlpk_work>
With CUDA 11.4 it seems that one cna split the cuda libs into different paths
<diehlpk_work>
So the core libs are installed into one path and all math libs are installed in a different folder
<diehlpk_work>
It ssems that CMAke does not check the second folder for the math libs
hkaiser_ has joined #ste||ar
hkaiser has quit [Ping timeout: 265 seconds]
<ms[m]>
diehlpk_work: sounds... good-ish
<diehlpk_work>
I hope that CMake will fix that
<dkaratza[m]>
ms: I am now fixing the cmake variable `HPX_WITH_CXX_STANDARD`. I will add as a description the following: "Set a specific C++ standard version e.g. ``HPX_WITH_CXX_STANDARD=20``. The default value is 17, as |hpx| relies on C++17." what do you think?
<ms[m]>
sounds good, maybe change the last sentence to "The default and minimum value is 17."?
<ms[m]>
dkaratza: ^
<dkaratza[m]>
ms[m]: sure
<ms[m]>
diehlpk_work: did you find an issue about it?
<dkaratza[m]>
ms: also updated matrix multiplication. i think now its fine
<ms[m]>
dkaratza: thanks!
<diehlpk_work>
ms[m], I found some tickets that other people has similar issues. I have not checked if there is any ticket for CMake specific
hkaiser_ has quit [Quit: Bye!]
<diehlpk_work>
There is some serious issue with the parcelports
<diehlpk_work>
It seems that HPX needs the tcp parcel port and one cna not disable the tcp parcelport
hkaiser has joined #ste||ar
<diehlpk_work>
hkaiser, Why can I not disable the tcp parcleport?
<diehlpk_work>
if I disable the tco parcelport all my applicationd, even the hello world segfault on startup
<hkaiser>
diehlpk_work: I can't answer this question without investgating
<hkaiser>
how did you disable the tcp pp?
<diehlpk_work>
hkaiser, I just used the cmake option
<hkaiser>
which one?
<diehlpk_work>
-DHPX_WITH_PARCELPORT_MPI=ON
<diehlpk_work>
-DHPX_WITH_PARCELPORT_TCP=OFF
<hkaiser>
ok
<hkaiser>
can you run ./octotiger --hpx:info on one locality, please?
<diehlpk_work>
Sure
<diehlpk_work>
However, I do not see any output
<diehlpk_work>
I only get srun: error: nid002173: tasks 1-3: Segmentation fault
<diehlpk_work>
If I do not specific TCP=OFF the code runs, but crashes later
<hkaiser>
diehlpk_work: one locality, please
<diehlpk_work>
hkaiser, Evne on one locality, I see the above message
<diehlpk_work>
I do not get any other message
<hkaiser>
doesn't make sense, why does it talk about task1-3, then?
<diehlpk_work>
Because I use four localities for the four A100 on one node
<diehlpk_work>
Should I run with only one GPU?
<hkaiser>
one locality, please
<diehlpk_work>
I need to recompile hpx first, I compiled again with tcp on
<hkaiser>
you can add --hpx:exit as well, that will not even start doing any work, then
<hkaiser>
just prints the info
pedro_barbosa[m] has quit [Ping timeout: 240 seconds]
srinivasyadav227 has quit [Ping timeout: 252 seconds]
LorenDB[m] has quit [Ping timeout: 250 seconds]
gonidelis[m] has quit [Ping timeout: 240 seconds]
PatrickDiehl[m] has quit [Ping timeout: 240 seconds]