#ste||ar on 2020-04-29 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

00:21 <weilewei> hkaiser I think my code is facing double de-allocation again, see error log here: https://gist.github.com/weilewei/1949941f8d63c51f39cba25f97640ada. So the overall logic is to copy G_ array (a.k.a G2) to sendbuff_G_, when doing copy, it first de-allocate send_buff_G_, then re-allocate, and finally memcopy (all happen in GPU). However, when program

00:21 <weilewei> is doing de-allocate, it finds sendbuff_G has been de-allocated, so it triggers error.

00:22 <hkaiser> use c++ managed pointers

00:22 <hkaiser> unique_ptr or shared_ptr depending on the situation

00:22 <hkaiser> so this will not happen

00:23 <weilewei> hkaiser for example how?

00:23 <hkaiser> unique_ptr automatically deallocates at destruction, no double deallocation can happen

00:24 <hkaiser> have you not listened to what your mom is telling you? ;-)

00:24 <hkaiser> NO RAW POINTERS!

00:24 <weilewei> but I don't have destruction in my code

00:25 <hkaiser> who is deallocating then, if not your code?

00:27 <weilewei> I am not sure who else in the program is deallocating that sendbuffer

00:27 <hkaiser> find out

00:29 <weilewei> unless some asynchronous operation happens in this section of the code, however, I feel like each step is synchronoused: https://github.com/STEllAR-GROUP/DCA/blob/distG4_pr/include/dca/phys/dca_step/cluster_solver/shared_tools/accumulation/tp/tp_accumulator_gpu.hpp#L564-L588

00:30 <weilewei> updateG4 is an async call into kernel, however, it does not touch sendbuff

00:30 <hkaiser> weilewei: well, somebody has to deallocate things for it getting deallocated twice

00:31 <weilewei> hkaiser right, in this case, how to track that thief down?

00:32 <hkaiser> set a break point on free() and wait for the pointer to come by

00:33 bita has joined #ste||ar

00:34 <weilewei> well... this double-deallocation error only happens when using multi-threaded and multi-ranks, and iteration is large enough, I need to think about that

00:35 <hkaiser> that's tough, then

01:27 shahrzad has joined #ste||ar

01:38 bita has quit [Quit: Leaving]

01:49 <weilewei> https://images.app.goo.gl/39tqgDADP7bn2KYa7 XD

01:53 shahrzad has quit [Ping timeout: 240 seconds]

01:56 shahrzad has joined #ste||ar

02:01 <hkaiser> weilewei: I know how you feel

02:01 hkaiser has quit [Quit: bye]

02:02 <weilewei> hkaiser thanks!

02:42 shahrzad has quit [Ping timeout: 240 seconds]

03:30 nan11 has quit [Remote host closed the connection]

03:47 shahrzad has joined #ste||ar

03:53 shahrzad has quit [Quit: Leaving]

04:07 weilewei has quit [Remote host closed the connection]

08:40 <zao> Hrm, colleagues report that `nproc --all` output has changed recently, possibly after last night's kernel update.

08:42 <zao> If a 8C/16T machine boots with SMT on, `nproc --all` says 16; turning off `smt/control` for the CPU in `/sys`, `/proc/cpuinfo` reports 8 cores but `nproc --all` still says 16.

09:31 nikunj97 has joined #ste||ar

09:57 <Yorlik> What methods do you use to get to the bottom of memory leaks in HPX application that work on Windows?

09:57 <Yorlik> I tried using the "#include <crtdbg.h>" with "_CrtDumpMemoryLeaks( );" method,

09:57 <Yorlik> but when adding "#define _CRTDBG_MAP_ALLOC" to get detailed information a ton of compile errors

09:57 <Yorlik> Not sure if that is because I'm using jemalloc, just thought I'd ask.

09:57 <Yorlik> pop up all over the place.

09:57 <Yorlik> I'm also interested in using jemalloc entirely and use it for memory debugging,

09:57 <Yorlik> but the configuration process on windows is a bit different than on Linux.

09:57 <Yorlik> Ideas?

10:26 hkaiser has joined #ste||ar

10:54 <Yorlik> hkaiser: YT?

11:11 jbjnr has left #ste||ar ["User left"]

11:33 <hkaiser> Yorlik: hey

11:33 <hkaiser> g'morning

11:33 <Yorlik> Morning!

11:33 <Yorlik> I had just a quick question about finding memory leaks in Visual Studio

11:34 <Yorlik> the default crtdebug method fails with a ton of compile errors

11:34 <Yorlik> At least if I want to enable _CRTDBG_MAP_ALLOC"

11:34 <Yorlik> jemalloc config on windows is ~special

11:34 <Yorlik> So - I'm in search of a reliable method to pinpoint it

11:35 <hkaiser> use crtdebug without jemalloc

11:35 <Yorlik> I kinda know where it is - probably I'm doing some incorrect use of our Lua Bindings

11:35 <Yorlik> OK

11:35 <Yorlik> Makes sense

11:35 <Yorlik> Thanks !

11:36 <hkaiser> Yorlik: there is also https://marketplace.visualstudio.com/items?itemName=ArkadyShapkin.VisualLeakDetectorforVisualC

11:36 <Yorlik> I'll check it out -thanks for the link !

12:13 <zao> I guess this is only tangentially HPX-related, but has any of you fine people looked at Conan for dependencies, and how bad is it? :P

12:18 <hkaiser> zao: we've had some discussions with the conan people a while back, but nothing has materialized so far (nobody felt the need to investigate)

12:18 <zao> I see there's some attempts at conanfiles out there, most up to date one targets 1.3.0

12:18 nikunj has quit [Ping timeout: 244 seconds]

12:19 nikunj has joined #ste||ar

12:21 <hkaiser> right, as said - it was a while back

12:21 <hkaiser> I think the conan guys did that at that time

12:21 <K-ballo> I'm using conan for dependencies in a project, self-hosted repository, we produce recipes for all our dependencies... works ok

12:21 <zao> Getting VSCode remotes with a shared codebase to interact well with module systems is turning out all sorts of "fun".

12:22 <zao> Rust has spoiled me :P

12:27 <hkaiser> ms[m]: I can't say anything about #4564, please go ahead as you see fit

12:27 <ms[m]> hkaiser: ok, thanks

13:07 <Yorlik> hkaiser: Does #define _CRTDBG_MAP_ALLOC work for you? I can't get it to work with HPX - even when jemalloc is off

13:08 <zao> Are you building a _DEBUG build too?

13:08 <Yorlik> Yes

13:08 <zao> I'd kind of expect that you'd need to build dependencies with it too.

13:09 <zao> A core problem of it is that it turns `malloc` into a macro, which reportedly is ... unhealthy for some code.

13:09 <Yorlik> I made a HPX debug build without jemalloc for that purpose - cleaned all dirs to really have a blank slate

13:09 <Yorlik> It seems to even touch all ::free functions I have in my ovject pools

13:10 <Yorlik> I think I'll abandon this method - it looks way too messy to me.

13:10 <hkaiser> Yorlik: try using the vld library

13:10 <Yorlik> vld is outdated - but you say it still works?

13:10 <Yorlik> They kinda stopped 2 years ago or so

13:11 <hkaiser> I have used it before with good results, it's been a while, however

13:11 <Yorlik> I'll give it a shot. The default crtdbg method is broken for us

13:12 <hkaiser> Yorlik: on windoes jemalloc does not replace the malloc/free - we use it explicitly through c++ allocators

13:12 <Yorlik> But I already know the source of my leaks: It happens deep inside Lua when using custom userdata objects which seem not to get cleaned up properly.

13:12 <hkaiser> mimalloc is fully-automatic,not sure if it can track leaks, though

13:12 <Yorlik> I might have to re-visit it

13:12 <Yorlik> jemalloc works nicely as explicit lua allocator

13:13 <Yorlik> jemalloc works nicely as explicit lua allocator - even on windows

13:13 <Yorlik> I'm just giving this function to Lua:

13:13 <Yorlik> extern "C" static void* custom_l_alloc( void* ud, void* ptr, size_t osize, size_t nsize ) {

13:13 <Yorlik> (void)ud;

13:13 <Yorlik> je_free( ptr );

13:13 <Yorlik> (void)osize; /* not used */

13:13 <Yorlik> if ( nsize == 0 ) {

13:13 <Yorlik> return nullptr;

13:13 <Yorlik> }

13:13 <Yorlik> else

13:13 <Yorlik> return je_realloc( ptr, nsize );

13:13 <Yorlik> }

13:13 <Yorlik> Didn't use it fo my main application.

13:15 <hkaiser> are lua objects reference counted?

13:16 <Yorlik> Yes

13:16 <hkaiser> so that might be your issue

13:16 <hkaiser> how do you manage the reference counts?

13:17 <Yorlik> I'm relying on our Lua Bindings

13:17 <Yorlik> It might be the case there's an issue or I'm doing sth wrong

13:17 <hkaiser> c++ bindings?

13:17 <Yorlik> Yes. We use Sol3

13:17 <hkaiser> ok - they should have gotten things right

13:17 <Yorlik> https://sol2.readthedocs.io/en/latest/

13:17 <Yorlik> sol is pretty good.

14:00 <Yorlik> hkaiser: Got 5244 leaks reported fro my app, 6 for hpx

14:05 <hkaiser> the hpx ones are most probably globals that get free'd after the memory tracing has ended

14:05 <hkaiser> Yorlik: but pls feel free to give us the traces, we'll have a look

14:05 <Yorlik> Sure

14:06 <Yorlik> As gist?

14:06 weilewei has joined #ste||ar

14:07 karame_ has joined #ste||ar

14:11 <Yorlik> hkaiser: https://gist.github.com/McKillroy/28604ab8b124e39fb67a390bbebb58c7

14:12 <hkaiser> Yorlik: some of those are caused by your code

14:13 <Yorlik> In Leak 4 my code shows up, indeed

14:13 <hkaiser> 5 and 6 as well

14:13 <Yorlik> Also 5 and 6

14:14 <hkaiser> leak 1 I don't understand, Leak 2 and 3 are globals that eventually get released

14:14 <Yorlik> I haven't done any thorough analysis yet.

14:14 <hkaiser> but thanks

14:14 <Yorlik> It might be the core problem is something in the destruction of the LuaEngines

14:14 <hkaiser> I feel vindicated ;-)

14:14 <Yorlik> :(

14:18 bita has joined #ste||ar

14:22 nan11 has joined #ste||ar

15:37 <weilewei> hkaiser the hpx mpi async test runs fine on Summit, no double mpi init occurs, thanks

15:37 <hkaiser> weilewei: ok - that one explicitly calls MPI_Init before starting HPX

15:37 <weilewei> hkaiser right, that one

15:37 <hkaiser> that's the same as for dca++, I guess

15:38 <hkaiser> not sure what's different for you, however

15:38 <weilewei> I will try run it now

15:54 <Yorlik> hkaiser: All my "leaks" are gone if I destroy my lua states before exiting - seems i need to call the GC more often .... :)

15:54 <hkaiser> good

15:55 <Yorlik> Still this stuff connected with hpx is there - I'll find it out

16:28 <weilewei> Sources said SC20 might be held as normal with some degree of confidence :)

16:29 nikunj97 has quit [Ping timeout: 256 seconds]

16:30 nikunj97 has joined #ste||ar

16:41 <hkaiser> weilewei: that's surprising

16:42 <K-ballo> would people still go..?

16:42 <weilewei> hkaiser but also it is a developing situation, so I personally think no one can gaurantee

16:42 <hkaiser> right

16:43 <hkaiser> ms[m]: yt?

17:03 nikunj97 has quit [Ping timeout: 244 seconds]

17:04 <weilewei> Bryce is giving Cuda C++ lib talk tonight via Zoom: https://www.meetup.com/ACCU-Bay-Area/events/269904471/

17:09 <Yorlik> How complicated would it be to start experimenting with kokkos to compute on my local graphics card?

17:16 <hkaiser> Yorlik: download it and use it

17:16 <Yorlik> Wouzld I need anything additional, like CUDA stuff?

17:16 <hkaiser> weilewei: yah, they proted the clang libc++ to the device

17:16 <hkaiser> *ported*

17:17 <hkaiser> Yorlik: you most likely will need cuda (if you have nvidia gpu)

17:17 <Yorlik> OK

17:17 <hkaiser> Yorlik: not sure if it works on windows, though

17:17 <weilewei> hkaiser oh, that's nice and I will watch Bryce's talk then to understand better

17:17 <Yorlik> Aw

17:17 <hkaiser> codewise it might, but the buildsystem will not know anything about msvc

17:26 rtohid has joined #ste||ar

17:32 akheir has joined #ste||ar

17:36 <hkaiser> bita: yt?

17:43 nikunj has quit [Read error: Connection reset by peer]

17:44 nikunj has joined #ste||ar

18:12 <weilewei> Am I missing something? /gpfs/alpine/proj-shared/cph102/weile/dev/src/Ring_example_MPI_CUDA/gpuDirect_hpx.cpp:30:11: error: 'enable_user_polling' is not a member of 'hpx::mpi' hpx::mpi::enable_user_polling enable_polling;

18:14 <hkaiser> mpi::experimental

18:15 <weilewei> IC... sorry about that

18:17 nikunj has quit [Ping timeout: 240 seconds]

18:19 nikunj has joined #ste||ar

18:19 <hkaiser> weilewei: I don't think you still need that

18:19 <hkaiser> look at the tests to see how it's done

18:20 <hkaiser> it's much simpler now

18:21 <weilewei> hkaiser right, in hpx tests, it is hpx::mpi::experimental::enable_user_polling enable_polling;

18:23 nikunj has quit [Ping timeout: 244 seconds]

18:24 nikunj has joined #ste||ar

18:24 <weilewei> hkaiser in hpx libs, I have no problem running mpi_ring_async_executor_test (no double mpi init), but for my program here: https://github.com/weilewei/Ring_example_MPI_CUDA/blob/hpx_mpi_async/G2_ring_hpx.cpp, it still complains: Open MPI has detected that this process has attempted to initializeMPI (via MPI_INIT or MPI_INIT_THREAD) more than once.

18:24 <weilewei> This is erroneous

18:25 <weilewei> The only difference I can think of it is using hpx_main not hpx_init

18:26 <hkaiser> can you set a break point on MPI_Init[_thread] and wait until it comes by to get a stack-backtrace?

18:27 <weilewei> let me try

18:28 <bita> hkaiser, yes

18:28 <bita> sorry I missed your ping

18:28 <hkaiser> bita: nvm, found it - thanks!

18:29 <bita> :) :+1

18:37 <weilewei> hkaiser see one rank gdb debug: https://gist.github.com/weilewei/6d66936687beef9c52f9464344aca106

18:42 <hkaiser> weilewei: I'm not sure I understand this

18:43 <hkaiser> does it come by the MPI_Init twice?

18:44 <weilewei> hkaiser my impression is after program hits MPI_Init and then next step will crash

18:44 <hkaiser> how's that?

18:44 <hkaiser> does it call MPI_Init_thread instead?

18:45 <weilewei> I don't know actually...

18:46 <hkaiser> did you set a breakpoint on MPI_Init_thread?

18:47 <weilewei> I set it on MPI_Init, because I did not use MPI_Init_thread

18:48 <hkaiser> weilewei: the mpi::experimental stuff uses MPI_Init_thread

18:48 <hkaiser> also, since everything is multi-threaded you should use the threaded version

18:49 <weilewei> hkaiser if that's the case that MPI_Init_thread is used by hpx, then if the application uses MPI_Init_thread, which will lead to double call to MPI_Init_thread? Is it correct?

18:56 <weilewei> hkaiser but I remembered early version of hpx mpi future stuff might not use MPI_init_thread, that's what works in my previous sample code.

19:02 <hkaiser> weilewei: just set the breakpoint on both functions

19:09 <weilewei> hkaiser https://gist.github.com/weilewei/6d66936687beef9c52f9464344aca106#gistcomment-3275350

19:10 <weilewei> hold on I should set one more breakpoint at MPI_Init

19:15 <weilewei> (gdb) b MPI_Init Function "MPI_Init" not defined. since I replace MPI_Init as MPI_Init_thread, gdb can't place breakpoint to MPI_Init

19:15 <hkaiser> ok

19:15 <hkaiser> so where does the MPI_Init_thread call come from?

19:15 <hkaiser> the second one, that is?

19:15 <hkaiser> look up the stack and try to find out

19:16 <weilewei> hpx::util::mpi_environment::init

19:19 <hkaiser> ok, here: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/mpi/src/mpi_future.cpp#L365?

19:20 <hkaiser> weilewei: ^^

19:20 <weilewei> hkaiser IC

19:20 <hkaiser> does it happen there?

19:21 <weilewei> let me verify a bit more

19:22 <weilewei> In the call stack, it goes to https://github.com/STEllAR-GROUP/hpx/blob/03b951c5dd3eff45b95bece05bd532bf8d00ccd0/libs/mpi_base/src/mpi_environment.cpp#L129

19:22 <weilewei> https://github.com/STEllAR-GROUP/hpx/blob/03b951c5dd3eff45b95bece05bd532bf8d00ccd0/libs/mpi_base/src/mpi_environment.cpp#L129

19:22 <weilewei> second one is correct link

19:23 <hkaiser> sure, but that should not execute if MPI_Initialized returns true (https://github.com/STEllAR-GROUP/hpx/blob/03b951c5dd3eff45b95bece05bd532bf8d00ccd0/libs/mpi_base/src/mpi_environment.cpp#L113)

19:25 <hkaiser> weilewei: I think HPX is linked against a different MPI version than the application

19:28 <weilewei> hkaiser they are the same

19:31 nan11 has quit [Remote host closed the connection]

19:32 <hkaiser> they are not, the addresses of MPI_Init_thread are different in both break points

19:32 nan11 has joined #ste||ar

19:37 <weilewei> hkaiser but I compile hpx and my application with same spectrum-mpi version...

19:38 <weilewei> Also, it seems MPI_Init_thread is hit three times, two comes with hpx and one comes from application

19:39 <hkaiser> but why?

19:40 <hkaiser> try stepping through the code there

19:40 <hkaiser> all MPI_Init calls are protected by MPI_Initialized(), so it shouldn't be called more than once

20:00 <weilewei> hkaiser I switched hpx_main to hpx::init, and now the issue of dual mpi init goes away

20:00 <hkaiser> interesting

20:00 <hkaiser> but that does not explain what is wrong in the previous code

20:01 <hkaiser> ahh, I know what's up

20:01 <weilewei> Ah, why?

20:01 <hkaiser> weilewei: do you protect the MPI_Init in main with MPI_Initialized?

20:02 <weilewei> no, I did not put MPI_Initialized in my application

20:02 <weilewei> Do I need to?

20:02 <hkaiser> using hpx_main.hpp will cause for HPX to be initialized before main() is executed

20:02 <hkaiser> so your MPI_Init is the second one

20:04 <weilewei> Right, that's my guessing at the beginning, so I should do if (MPI_Initialized == true) {// skip my MPI_Init} else { // do MPI_Init} something like this

20:06 nan11 has quit [Remote host closed the connection]

20:06 <hkaiser> something like that, yes

20:15 <weilewei> hkaiser ok, dca with hpx mpi future seems running now after this trick

20:16 <weilewei> now it is time to try to break MPI_wait using hpx mpi future

20:17 nan11 has joined #ste||ar

20:29 rtohid has quit [Remote host closed the connection]

20:33 rtohid has joined #ste||ar

20:47 akheir has quit [Quit: Leaving]

20:48 karame_ has quit [Remote host closed the connection]

20:52 <hkaiser> weilewei: \o/

21:23 weilewei has quit [Remote host closed the connection]

22:04 rtohid has left #ste||ar [#ste||ar]