<weilewei>
hkaiser I think my code is facing double de-allocation again, see error log here: https://gist.github.com/weilewei/1949941f8d63c51f39cba25f97640ada. So the overall logic is to copy G_ array (a.k.a G2) to sendbuff_G_, when doing copy, it first de-allocate send_buff_G_, then re-allocate, and finally memcopy (all happen in GPU). However, when program
<weilewei>
is doing de-allocate, it finds sendbuff_G has been de-allocated, so it triggers error.
<hkaiser>
use c++ managed pointers
<hkaiser>
unique_ptr or shared_ptr depending on the situation
<hkaiser>
so this will not happen
<weilewei>
hkaiser for example how?
<hkaiser>
unique_ptr automatically deallocates at destruction, no double deallocation can happen
<hkaiser>
have you not listened to what your mom is telling you? ;-)
<hkaiser>
NO RAW POINTERS!
<weilewei>
but I don't have destruction in my code
<hkaiser>
who is deallocating then, if not your code?
<weilewei>
I am not sure who else in the program is deallocating that sendbuffer
<weilewei>
updateG4 is an async call into kernel, however, it does not touch sendbuff
<hkaiser>
weilewei: well, somebody has to deallocate things for it getting deallocated twice
<weilewei>
hkaiser right, in this case, how to track that thief down?
<hkaiser>
set a break point on free() and wait for the pointer to come by
bita has joined #ste||ar
<weilewei>
well... this double-deallocation error only happens when using multi-threaded and multi-ranks, and iteration is large enough, I need to think about that
nan11 has quit [Remote host closed the connection]
shahrzad has joined #ste||ar
shahrzad has quit [Quit: Leaving]
weilewei has quit [Remote host closed the connection]
<zao>
Hrm, colleagues report that `nproc --all` output has changed recently, possibly after last night's kernel update.
<zao>
If a 8C/16T machine boots with SMT on, `nproc --all` says 16; turning off `smt/control` for the CPU in `/sys`, `/proc/cpuinfo` reports 8 cores but `nproc --all` still says 16.
nikunj97 has joined #ste||ar
<Yorlik>
What methods do you use to get to the bottom of memory leaks in HPX application that work on Windows?
<Yorlik>
I tried using the "#include <crtdbg.h>" with "_CrtDumpMemoryLeaks( );" method,
<Yorlik>
but when adding "#define _CRTDBG_MAP_ALLOC" to get detailed information a ton of compile errors
<Yorlik>
Not sure if that is because I'm using jemalloc, just thought I'd ask.
<Yorlik>
pop up all over the place.
<Yorlik>
I'm also interested in using jemalloc entirely and use it for memory debugging,
<Yorlik>
but the configuration process on windows is a bit different than on Linux.
<Yorlik>
Ideas?
hkaiser has joined #ste||ar
<Yorlik>
hkaiser: YT?
jbjnr has left #ste||ar ["User left"]
<hkaiser>
Yorlik: hey
<hkaiser>
g'morning
<Yorlik>
Morning!
<Yorlik>
I had just a quick question about finding memory leaks in Visual Studio
<Yorlik>
the default crtdebug method fails with a ton of compile errors
<Yorlik>
At least if I want to enable _CRTDBG_MAP_ALLOC"
<Yorlik>
jemalloc config on windows is ~special
<Yorlik>
So - I'm in search of a reliable method to pinpoint it
<hkaiser>
use crtdebug without jemalloc
<Yorlik>
I kinda know where it is - probably I'm doing some incorrect use of our Lua Bindings
<zao>
I guess this is only tangentially HPX-related, but has any of you fine people looked at Conan for dependencies, and how bad is it? :P
<hkaiser>
zao: we've had some discussions with the conan people a while back, but nothing has materialized so far (nobody felt the need to investigate)
<zao>
I see there's some attempts at conanfiles out there, most up to date one targets 1.3.0
nikunj has quit [Ping timeout: 244 seconds]
nikunj has joined #ste||ar
<hkaiser>
right, as said - it was a while back
<hkaiser>
I think the conan guys did that at that time
<K-ballo>
I'm using conan for dependencies in a project, self-hosted repository, we produce recipes for all our dependencies... works ok
<zao>
Getting VSCode remotes with a shared codebase to interact well with module systems is turning out all sorts of "fun".
<zao>
Rust has spoiled me :P
<hkaiser>
ms[m]: I can't say anything about #4564, please go ahead as you see fit
<ms[m]>
hkaiser: ok, thanks
<Yorlik>
hkaiser: Does #define _CRTDBG_MAP_ALLOC work for you? I can't get it to work with HPX - even when jemalloc is off
<zao>
Are you building a _DEBUG build too?
<Yorlik>
Yes
<zao>
I'd kind of expect that you'd need to build dependencies with it too.
<zao>
A core problem of it is that it turns `malloc` into a macro, which reportedly is ... unhealthy for some code.
<Yorlik>
I made a HPX debug build without jemalloc for that purpose - cleaned all dirs to really have a blank slate
<Yorlik>
It seems to even touch all ::free functions I have in my ovject pools
<Yorlik>
I think I'll abandon this method - it looks way too messy to me.
<hkaiser>
Yorlik: try using the vld library
<Yorlik>
vld is outdated - but you say it still works?
<Yorlik>
They kinda stopped 2 years ago or so
<hkaiser>
I have used it before with good results, it's been a while, however
<Yorlik>
I'll give it a shot. The default crtdbg method is broken for us
<hkaiser>
Yorlik: on windoes jemalloc does not replace the malloc/free - we use it explicitly through c++ allocators
<Yorlik>
But I already know the source of my leaks: It happens deep inside Lua when using custom userdata objects which seem not to get cleaned up properly.
<hkaiser>
mimalloc is fully-automatic,not sure if it can track leaks, though
<Yorlik>
I might have to re-visit it
<Yorlik>
jemalloc works nicely as explicit lua allocator
<Yorlik>
jemalloc works nicely as explicit lua allocator - even on windows
<Yorlik>
How complicated would it be to start experimenting with kokkos to compute on my local graphics card?
<hkaiser>
Yorlik: download it and use it
<Yorlik>
Wouzld I need anything additional, like CUDA stuff?
<hkaiser>
weilewei: yah, they proted the clang libc++ to the device
<hkaiser>
*ported*
<hkaiser>
Yorlik: you most likely will need cuda (if you have nvidia gpu)
<Yorlik>
OK
<hkaiser>
Yorlik: not sure if it works on windows, though
<weilewei>
hkaiser oh, that's nice and I will watch Bryce's talk then to understand better
<Yorlik>
Aw
<hkaiser>
codewise it might, but the buildsystem will not know anything about msvc
rtohid has joined #ste||ar
akheir has joined #ste||ar
<hkaiser>
bita: yt?
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
<weilewei>
Am I missing something? /gpfs/alpine/proj-shared/cph102/weile/dev/src/Ring_example_MPI_CUDA/gpuDirect_hpx.cpp:30:11: error: 'enable_user_polling' is not a member of 'hpx::mpi' hpx::mpi::enable_user_polling enable_polling;
<hkaiser>
mpi::experimental
<weilewei>
IC... sorry about that
nikunj has quit [Ping timeout: 240 seconds]
nikunj has joined #ste||ar
<hkaiser>
weilewei: I don't think you still need that
<hkaiser>
look at the tests to see how it's done
<hkaiser>
it's much simpler now
<weilewei>
hkaiser right, in hpx tests, it is hpx::mpi::experimental::enable_user_polling enable_polling;
nikunj has quit [Ping timeout: 244 seconds]
nikunj has joined #ste||ar
<weilewei>
hkaiser in hpx libs, I have no problem running mpi_ring_async_executor_test (no double mpi init), but for my program here: https://github.com/weilewei/Ring_example_MPI_CUDA/blob/hpx_mpi_async/G2_ring_hpx.cpp, it still complains: Open MPI has detected that this process has attempted to initializeMPI (via MPI_INIT or MPI_INIT_THREAD) more than once.
<weilewei>
This is erroneous
<weilewei>
The only difference I can think of it is using hpx_main not hpx_init
<hkaiser>
can you set a break point on MPI_Init[_thread] and wait until it comes by to get a stack-backtrace?
<hkaiser>
weilewei: I'm not sure I understand this
<hkaiser>
does it come by the MPI_Init twice?
<weilewei>
hkaiser my impression is after program hits MPI_Init and then next step will crash
<hkaiser>
how's that?
<hkaiser>
does it call MPI_Init_thread instead?
<weilewei>
I don't know actually...
<hkaiser>
did you set a breakpoint on MPI_Init_thread?
<weilewei>
I set it on MPI_Init, because I did not use MPI_Init_thread
<hkaiser>
weilewei: the mpi::experimental stuff uses MPI_Init_thread
<hkaiser>
also, since everything is multi-threaded you should use the threaded version
<weilewei>
hkaiser if that's the case that MPI_Init_thread is used by hpx, then if the application uses MPI_Init_thread, which will lead to double call to MPI_Init_thread? Is it correct?
<weilewei>
hkaiser but I remembered early version of hpx mpi future stuff might not use MPI_init_thread, that's what works in my previous sample code.
<hkaiser>
weilewei: just set the breakpoint on both functions
<hkaiser>
weilewei: I think HPX is linked against a different MPI version than the application
<weilewei>
hkaiser they are the same
nan11 has quit [Remote host closed the connection]
<hkaiser>
they are not, the addresses of MPI_Init_thread are different in both break points
nan11 has joined #ste||ar
<weilewei>
hkaiser but I compile hpx and my application with same spectrum-mpi version...
<weilewei>
Also, it seems MPI_Init_thread is hit three times, two comes with hpx and one comes from application
<hkaiser>
but why?
<hkaiser>
try stepping through the code there
<hkaiser>
all MPI_Init calls are protected by MPI_Initialized(), so it shouldn't be called more than once
<weilewei>
hkaiser I switched hpx_main to hpx::init, and now the issue of dual mpi init goes away
<hkaiser>
interesting
<hkaiser>
but that does not explain what is wrong in the previous code
<hkaiser>
ahh, I know what's up
<weilewei>
Ah, why?
<hkaiser>
weilewei: do you protect the MPI_Init in main with MPI_Initialized?
<weilewei>
no, I did not put MPI_Initialized in my application
<weilewei>
Do I need to?
<hkaiser>
using hpx_main.hpp will cause for HPX to be initialized before main() is executed
<hkaiser>
so your MPI_Init is the second one
<weilewei>
Right, that's my guessing at the beginning, so I should do if (MPI_Initialized == true) {// skip my MPI_Init} else { // do MPI_Init} something like this
nan11 has quit [Remote host closed the connection]
<hkaiser>
something like that, yes
<weilewei>
hkaiser ok, dca with hpx mpi future seems running now after this trick
<weilewei>
now it is time to try to break MPI_wait using hpx mpi future
nan11 has joined #ste||ar
rtohid has quit [Remote host closed the connection]
rtohid has joined #ste||ar
akheir has quit [Quit: Leaving]
karame_ has quit [Remote host closed the connection]
<hkaiser>
weilewei: \o/
weilewei has quit [Remote host closed the connection]