hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
diehlpk has joined #ste||ar
diehlpk_work has quit [Remote host closed the connection]
diehlpk_work has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
diehlpk_work has quit [Remote host closed the connection]
diehlpk has quit [Quit: Leaving.]
diehlpk has joined #ste||ar
hkaiser has quit [Quit: Bye!]
diehlpk has left #ste||ar [#ste||ar]
K-ballo has joined #ste||ar
<dkaratza[m]> ms: what is wrong with the array indexing?
<ms[m]> dkaratza: you tell me ;)
<ms[m]> I think the minimal change would be to swap the `*` and `+` in the indexing, but please have a closer look
<dkaratza[m]> <ms[m]> "I think the minimal change would..." <- ok thanks i will take a look
<dkaratza[m]> now i encounter a different problem, if u can help
<ms[m]> dkaratza: ask away
<dkaratza[m]> when i try to run my example to see what is wrong with the indexing i use `cmake -DCMAKE_PREFIX_PATH=/path/to/hpx/installation ..` and i get a cmake error `CMake Error at CMakeLists.txt:3: By not providing "FindHPX.cmake"...`
<dkaratza[m]> this means that my path is wrong?
<ms[m]> where and why are you running that command?
<dkaratza[m]> i follow the instructions we have at the quickstart for the hello world example to run the matrix multippplication exampel
<dkaratza[m]> :/
<ms[m]> ok, that's a slightly different setup... but we can try that as well
<ms[m]> the easiest way to build that matrix multiplication example is to go to your hpx build directory and build the `matrix_multiplication` target
<dkaratza[m]> i had it working, but i dont know what went wrong and now i cannot use it
<dkaratza[m]> ms[m]: ohh
<dkaratza[m]> how do i build the target?
<ms[m]> make/ninja matrix_multiplication
<ms[m]> the binary will go into the bin directory in your build directory
<dkaratza[m]> <ms[m]> "make/ninja matrix_multiplication" <- i am just in the build directory but there is not such a target
<ms[m]> dkaratza: short call? easier to debug that way
<dkaratza[m]> <ms[m]> "dkaratza: short call? easier..." <- sure
<ms[m]> dkaratza: all right, now?
<dkaratza[m]> yup
<dkaratza[m]> should i use your link?
<ms[m]> dkaratza: yep, same one
hkaiser has joined #ste||ar
hkaiser has quit [Quit: Bye!]
diehlpk_work has joined #ste||ar
<diehlpk_work> ms[m], Can I disable cublas in HPX?
<ms[m]> diehlpk_work: currently no, why?
<diehlpk_work> ms[m], Do we use the official find cuda or our own script?
hkaiser has joined #ste||ar
<diehlpk_work> It occurs that with CUDA 11.4 the path is different and HPX does not find cublas anymore
<ms[m]> we use findcudatoolkit, so if it doesn't find it you'll have to give it some hints or figure out if it's a bug in the module
<hkaiser> diehlpk_work: well, it's the standard cmake FindCUDA script we're using
<ms[m]> the problem sounds vaguely familiar, will give you a link if I figure out what it was
<diehlpk_work> ms[m], Thanks
<diehlpk_work> hkaiser, I get the following strange error on Perlmutter
<diehlpk_work> Error: /pscratch/sd/d/diehlpk/OctoTigerBuildChain/build/hpx/include/hpx/hardware/timestamp/linux_x86_64.hpp(31): error: asm operand type size(4) does not match type/size implied by constraint 'a'
<diehlpk_work> and I have not really any idea what is going on there?
<diehlpk_work> ms[m], I found the issue and could solve it
<diehlpk_work> But I am not sure if that is a nice solution
<diehlpk_work> With CUDA 11.4 it seems that one cna split the cuda libs into different paths
<diehlpk_work> So the core libs are installed into one path and all math libs are installed in a different folder
<diehlpk_work> It ssems that CMAke does not check the second folder for the math libs
hkaiser_ has joined #ste||ar
hkaiser has quit [Ping timeout: 265 seconds]
<ms[m]> diehlpk_work: sounds... good-ish
<diehlpk_work> I hope that CMake will fix that
<dkaratza[m]> ms: I am now fixing the cmake variable `HPX_WITH_CXX_STANDARD`. I will add as a description the following: "Set a specific C++ standard version e.g. ``HPX_WITH_CXX_STANDARD=20``. The default value is 17, as |hpx| relies on C++17." what do you think?
<ms[m]> sounds good, maybe change the last sentence to "The default and minimum value is 17."?
<ms[m]> dkaratza: ^
<dkaratza[m]> ms[m]: sure
<ms[m]> diehlpk_work: did you find an issue about it?
<dkaratza[m]> ms: also updated matrix multiplication. i think now its fine
<ms[m]> dkaratza: thanks!
<diehlpk_work> ms[m], I found some tickets that other people has similar issues. I have not checked if there is any ticket for CMake specific
hkaiser_ has quit [Quit: Bye!]
<diehlpk_work> There is some serious issue with the parcelports
<diehlpk_work> It seems that HPX needs the tcp parcel port and one cna not disable the tcp parcelport
hkaiser has joined #ste||ar
<diehlpk_work> hkaiser, Why can I not disable the tcp parcleport?
<diehlpk_work> if I disable the tco parcelport all my applicationd, even the hello world segfault on startup
<hkaiser> diehlpk_work: I can't answer this question without investgating
<hkaiser> how did you disable the tcp pp?
<diehlpk_work> hkaiser, I just used the cmake option
<hkaiser> which one?
<diehlpk_work> -DHPX_WITH_PARCELPORT_MPI=ON
<diehlpk_work> -DHPX_WITH_PARCELPORT_TCP=OFF
<hkaiser> ok
<hkaiser> can you run ./octotiger --hpx:info on one locality, please?
<diehlpk_work> Sure
<diehlpk_work> However, I do not see any output
<diehlpk_work> I only get srun: error: nid002173: tasks 1-3: Segmentation fault
<diehlpk_work> If I do not specific TCP=OFF the code runs, but crashes later
<hkaiser> diehlpk_work: one locality, please
<diehlpk_work> hkaiser, Evne on one locality, I see the above message
<diehlpk_work> I do not get any other message
<hkaiser> doesn't make sense, why does it talk about task1-3, then?
<diehlpk_work> Because I use four localities for the four A100 on one node
<diehlpk_work> Should I run with only one GPU?
<hkaiser> one locality, please
<diehlpk_work> I need to recompile hpx first, I compiled again with tcp on
<hkaiser> you can add --hpx:exit as well, that will not even start doing any work, then
<hkaiser> just prints the info
pedro_barbosa[m] has quit [Ping timeout: 240 seconds]
srinivasyadav227 has quit [Ping timeout: 252 seconds]
LorenDB[m] has quit [Ping timeout: 250 seconds]
gonidelis[m] has quit [Ping timeout: 240 seconds]
PatrickDiehl[m] has quit [Ping timeout: 240 seconds]
rori[m] has quit [Ping timeout: 240 seconds]
gdaiss[m] has quit [Ping timeout: 250 seconds]
bhumit[m] has quit [Ping timeout: 250 seconds]
dkaratza[m] has quit [Ping timeout: 260 seconds]
jedi18[m] has quit [Ping timeout: 260 seconds]