#ste||ar on 2020-04-27 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

01:30 Nikunj__ has quit [Read error: Connection reset by peer]

02:06 hkaiser has quit [Quit: bye]

06:18 <mdiers[m]> My application has a performance drop of 30% between 11 Mar and 17 Apr. can you find anything in this range in the hpx tests over the period? Maybe it came in during the revision of the executors? (block_executor<local_priority_queue_attached_executor>,static_chunk_size)

07:11 <ms[m]> mdiers: very possible, but I can't see anything that stands out particularly (our performance tests aren't very comprehensive either)

07:12 <ms[m]> one question, are you explicitly using block_executor<local_priority_queue_attached_executor> or are you leaving the executor as the default (the default is not local_priority_...)

07:12 <ms[m]> ?

07:37 <mdiers[m]> ms: thanks for the hint. i will have a look now, i actually use the default. at some point i had a problem with the policy, and had explicitly specified the default. i will adapt it.

07:40 <mdiers[m]> ms: on 11 Mar, the local_priority_queue_attached_executor was the default of the block_executor.

07:41 <ms[m]> mdiers: right, it changed in the executors cleanup pr

07:41 <ms[m]> so it's possible that I made the old default slower, but the new default hopefully faster...

07:42 <ms[m]> in any case, if you can try the new default (restricted_thread_pool_executor) that'd be great

07:44 <mdiers[m]> ms: I'm already on it

07:44 <ms[m]> thanks!

07:52 <mdiers[m]> ms: i test on a single-numa system. on the quad-numa-system the new version now scales, but cannot make a direct performance comparison.

07:54 <mdiers[m]> <ms[m] "in any case, if you can try the "> 1% difference between block_executor<local_priority_queue_attached_executor> and hpx::compute::host::block_executor<hpx::parallel::execution::restricted_thread_pool_executor>

07:56 <mdiers[m]> the same with a direct restricted_thread_pool_executor

08:18 hkaiser has joined #ste||ar

08:42 Nikunj__ has joined #ste||ar

08:44 Hashmi has joined #ste||ar

08:49 <ms[m]> mdiers: same bad performance or same good?

08:53 mcopik has joined #ste||ar

08:53 mcopik has quit [Client Quit]

08:53 <ms[m]> hkaiser: no sleep, eh? :/

08:53 <hkaiser> ms[m]: yah :/

08:56 <ms[m]> we need more boring issues for you to take care of ("type: boring" or "type: makes hartmut fall asleep")

08:57 <hkaiser> lol

09:04 <mdiers[m]> ms: ahh i'm going crazy: i'm rowing back, it's probably only my special case where i use the gpus with opencl.

09:24 Nikunj__ has quit [Read error: Connection reset by peer]

10:52 <mdiers[m]> ms: so, have now the test once again cleanly one after another. whether with or without opencl, I get a performance loss of now 20%.

10:52 <mdiers[m]> * My application has a performance drop of 20% between 11 Mar and 17 Apr. can you find anything in this range in the hpx tests over the period? Maybe it came in during the revision of the executors? (block_executor<local_priority_queue_attached_executor>,static_chunk_size)

10:54 Hashmi has quit [Quit: Connection closed for inactivity]

10:54 Nikunj__ has joined #ste||ar

10:54 <mdiers[m]> ms: will continue tomorrow morning

10:57 nikunj97 has joined #ste||ar

10:59 Nikunj__ has quit [Ping timeout: 244 seconds]

11:01 nikunj has joined #ste||ar

11:02 nikunj97 has quit [Ping timeout: 246 seconds]

11:37 nikunj97 has joined #ste||ar

11:41 nikunj has quit [Ping timeout: 240 seconds]

14:21 akheir has joined #ste||ar

14:24 nan11 has joined #ste||ar

14:32 weilewei has joined #ste||ar

14:41 rtohid has joined #ste||ar

14:44 nikunj97 has quit [Read error: Connection reset by peer]

14:45 <weilewei> hkaiser does hpx mpi future need mpi when compiling HPX? If I do not provide mpi module on my hpx build script, then it says "MPI could not be found but was requested by your configuration, please specify MPI_ROOT to point to the root of your MPI installation"

14:45 nikunj has joined #ste||ar

14:46 <weilewei> I set -DHPX_WITH_NETWORKING=OFF -DHPX_WITH_PARCELPORT_MPI=OFF

14:47 <hkaiser> weilewei: yes, it needs mpi

14:47 <weilewei> If I provide mpi module in the script, then hpx will be built with networking:on, however, I don't want this way. because when I run my application, it will warns mpi is started twice

14:48 <weilewei> hkaiser but in the earlier version, hpx mpi future does not require mpi when building hpx, that's what I remembered

14:49 <hkaiser> weilewei: it always required mpi, iirc

14:49 <weilewei> hkaiser then how should I avoid the error that mpi is started twice? because the main

14:50 <hkaiser> weilewei: no, if you disable networking, then networking will be off

14:50 <hkaiser> you see an error? you didn't say so

14:50 <weilewei> hkaiser right, I see the error "mpi is started twice" even when I disabled networking in hpx

14:51 <hkaiser> what error do you see? can I see the full output, pls?

14:51 <weilewei> hkaiser https://gist.github.com/weilewei/ce2c4fa47375433da846f412f83a458e

14:52 <hkaiser> grrr

14:53 <hkaiser> weilewei: is that HPX master?

14:53 <weilewei> hkaiser I am using your branch fixing_4539

14:53 <hkaiser> weilewei: also, did you solve the hpx_wrap issue?

14:53 <hkaiser> weilewei: ok, I'll have a look later today

14:53 <weilewei> hkaiser yes, the compilation error goes away now

14:54 <weilewei> hkaiser thanks

14:54 <hkaiser> could you comment on the ticket, pls?

14:54 <weilewei> ok, will do

14:58 <hkaiser> weilewei: thanks

14:58 <hkaiser> I have closed the ticket now

14:59 <weilewei> hkaiser thanks.

15:11 nikunj97 has joined #ste||ar

15:14 nikunj has quit [Ping timeout: 244 seconds]

15:14 shahrzad has joined #ste||ar

15:31 <hkaiser> weilewei: yt?

15:31 <weilewei> hkaiser yes

15:31 <hkaiser> I can't reproduce the mpi_init issue with mpi_ring_async_executor_test, can you?

15:33 <weilewei> hkaiser let me try to run that test

15:39 bita has joined #ste||ar

15:40 bita_ has joined #ste||ar

15:41 shahrzad has quit [Ping timeout: 240 seconds]

15:41 <weilewei> hkaiser Am I missing any flags to build mpi_ring_async_executor_test? HPX_WITH_TESTS=ON, HPX_WITH_TESTS_UNIT=ON

15:42 <hkaiser> HPX_MPI_WITH_FUTURE=On?

15:42 <weilewei> yes, I do have this one

15:42 <hkaiser> should be built (does for me)?

15:43 <hkaiser> HPX_MPI_WITH_TESTS=On (however this should be the default)

15:44 karame_ has joined #ste||ar

15:45 <weilewei> hkaiser let me clean up my build dir and try it again

15:51 shahrzad has joined #ste||ar

16:26 akheir has quit [*.net *.split]

16:29 K-ballo has quit [Remote host closed the connection]

16:29 K-ballo has joined #ste||ar

16:31 <shahrzad> hkaiser: Can I have a 15 min meeting with you today or tomorrow? I'm kinda stuck.

16:31 <hkaiser> sec

16:32 <hkaiser> shahrzad: how about today 12:30pm?

16:32 <shahrzad> hkaiser: It's great, thanks!

16:33 <hkaiser> shahrzad ok, same link as for our regular meting

16:34 <shahrzad> hkaiser: Alright.

16:36 <weilewei> hkaiser I got this build error: https://gist.github.com/weilewei/e3011e07102854ef5295de68ef8a7632

16:38 <hkaiser> weilewei: pls update from the branch, I fixied this 10 minutes ago

16:38 <weilewei> hkaiser ok, let me try it again

16:40 <weilewei> Another questions is about threaded ring G algorithm, I suspect MPI_Isend/recv might not be able to aware of threads locally. I am thinking to achieve the following scenarios: in multithreaded ringG, local thread sends data to corresponding thread in right-hand side neighbor, such that we have implicitly constructed multiple MPI communicator. For

16:40 <weilewei> example, we have two ranks, and each rank has 2 threads. Thread 0 from rank 0 issues MPI_Isend with tag associated with send_tag (thread_id+1 = 1) to thread 0 from rank 0 that has recv_tag (thread_id+1=1). However, this threaded ring G algorithm just breaks. See the sample program:

16:40 <weilewei> https://github.com/weilewei/Ring_example_MPI_CUDA/blob/thread_mpi/threads_mpi.cpp

16:42 <hkaiser> how does it 'break'?

16:43 <weilewei> either hangs, or some errors like this: https://gist.github.com/weilewei/e34c0ba562cb913a3eee94a413d986bc

16:44 <weilewei> or Cuda failure /__SMPI_build_dir_______________________________________/ibmsrc/pami/ibm-pami/buildtools/pami_build_port/../pami/components/devices/ibvdevice/CudaIPCPool.h:205: 'invalid resource handle'

16:44 <weilewei> from my experience, it seems the send and recv ends are not matched together

16:54 shahrzad has quit [Ping timeout: 244 seconds]

16:56 bita_ has quit [Quit: Leaving]

17:02 karame_ has quit [Remote host closed the connection]

17:09 karame_ has joined #ste||ar

17:23 <hkaiser> weilewei: is your MPI implementation thread-safe? is it initialized using MPI_Init_thread?

17:29 shahrzad has joined #ste||ar

17:30 <Yorlik> Can we mabe get rid of this annoying warning? include\hpx\threading\jthread.hpp(258): warning C4267: 'return': conversion from 'size_t' to 'unsigned int', possible loss of data

17:31 <Yorlik> Shall I make an issue for it?

17:32 <weilewei> hkaiser let me try again

17:35 <weilewei> hkaiser for your branch, I got this error: https://gist.github.com/weilewei/e3011e07102854ef5295de68ef8a7632#gistcomment-3271997

17:35 <K-ballo> Yorlik: please, then share the link

17:38 <Yorlik> K-ballo: https://github.com/STEllAR-GROUP/hpx/issues/4569

17:43 <weilewei> hkaiser after adding MPI_Init_thread, it works just fine, thanks!

17:57 shahrzad has quit [Ping timeout: 264 seconds]

18:17 weilewei has quit [Remote host closed the connection]

18:20 weilewei has joined #ste||ar

18:31 nan11 has quit [Remote host closed the connection]

18:41 nikunj97 has quit [Read error: Connection reset by peer]

18:58 nan11 has joined #ste||ar

19:03 rtohid has quit [Remote host closed the connection]

19:06 rtohid has joined #ste||ar

19:15 <hkaiser> jbjnr: pls see #4575 for a possible solution to your compilation issue

19:38 <nan11> hkaiser, could you plz take a look at for the dist_diag tests for 2 loc and 4 loc? https://github.com/STEllAR-GROUP/phylanx/blob/dist_diag_1d/tests/unit/plugins/dist_matrixops/dist_diag_2_loc.cpp https://github.com/STEllAR-GROUP/phylanx/blob/dist_diag_1d/tests/unit/plugins/dist_matrixops/dist_diag_4_loc.cpp

19:39 <hkaiser> nan11: ok, what am I looking for?

19:42 <nan11> 1. Is the input tiled vector correct? 2. Is the output for different tiling types correct

19:44 <hkaiser> nan11: why is the result 4 columns wide if the input has only 3 columns? (test_diag_1d_0)

19:45 <hkaiser> ahh, you're asking for the first sub-diagonal... - makes sense

19:45 <nan11> yep

19:45 <nan11> k=1

19:46 <hkaiser> looks fine to me

19:46 <nan11> Okay. Thanks

19:52 <diehlpk_work> hkaiser, see pm

19:55 <hkaiser> diehlpk_work: saw that - I'm working on a proposal right now - not much time for anything else...

20:00 <diehlpk_work> hkaiser, ok, but you will attend the meeting tomorrow?

20:01 <diehlpk_work> weilewei, Do you know how long Summit will be available before the new machine will be up?

20:01 <weilewei> diehlpk_work not until 2022

20:02 <weilewei> until 2022

20:02 <weilewei> but HPX hours on DCA project on Summit will be ended 12/30/2020

20:03 <diehlpk_work> Ok, got that. It is more about what to do with these node hours

20:04 <weilewei> then need to submit renewal. but anyhow, my impression is an evaluation of HPX hours will be placed at the end of this year

20:04 <weilewei> diehlpk_work ok

20:04 <diehlpk_work> Since the INCITE proposal next year, might be for the new machine

20:07 <hkaiser> diehlpk_work: I'll try

22:15 rtohid has left #ste||ar [#ste||ar]

22:41 shahrzad has joined #ste||ar

23:01 shahrzad has quit [Ping timeout: 244 seconds]

23:10 shahrzad has joined #ste||ar

23:14 shahrzad has quit [Ping timeout: 240 seconds]

23:41 jaafar_ has quit [Quit: Konversation terminated!]

23:42 jaafar has joined #ste||ar