#ste||ar on 2019-03-18 — irc logs at irclog.cct.lsu.edu

2018-08-26 23:03 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:25 jaafar has quit [Remote host closed the connection]

00:27 jaafar has joined #ste||ar

00:27 jaafar has quit [Remote host closed the connection]

00:28 jaafar has joined #ste||ar

01:06 diehlpk has joined #ste||ar

02:22 eschnett has joined #ste||ar

02:39 quaz0r has quit [Ping timeout: 272 seconds]

02:56 diehlpk has quit [Ping timeout: 250 seconds]

03:23 hkaiser has quit [Quit: bye]

03:46 eschnett has quit [Ping timeout: 255 seconds]

03:52 quaz0r has joined #ste||ar

04:02 nikunj97 has quit [Ping timeout: 246 seconds]

04:04 nikunj has joined #ste||ar

04:32 nikunj has quit [Ping timeout: 246 seconds]

05:11 nikunj has joined #ste||ar

10:53 <simbergm> what on earth did hkaiser do? our cuda builds are (almost) fixed now...

10:55 <simbergm> heller: 3662 should be ready to go, no? you're not planning any further changes?

11:46 <heller> simbergm: no, it should be ready

12:09 <simbergm> heller: thanks, let's give it a try, it's had to wait long enough...

12:56 hkaiser has joined #ste||ar

12:57 <hkaiser> heller, K-ballo: I'd need a second pair of eyes looking over this change: https://github.com/STEllAR-GROUP/hpx/pull/3745/commits/04159888215039d3f2834b2a86027d64d510b6c4

12:57 <jbjnr_> hkaiser: you pinged me last night - need anything?

12:57 <jbjnr_> I would like to chat to you at some point too

12:57 <hkaiser> jbjnr_: could you run your benchmarks on top of #3745, please?

12:58 <hkaiser> jbjnr_: sure, any time

12:58 <jbjnr_> will do

12:58 <hkaiser> thanks

12:58 <heller> hkaiser: something seems to be wrong ther

12:58 <heller> hkaiser: doesn't build

12:58 <hkaiser> stupid MS compiler again, most probably

12:58 <hkaiser> builds for me

12:58 <heller> no idea

12:59 <heller> looks odd

12:59 <hkaiser> I fixed some compilation errors this morning, however

12:59 <heller> ah, ok

13:00 <jbjnr_> hkaiser: heller Are there known problems with dataflow? Raffaele tells me that the GPU version of cholesky gives the wrong answers sometimes when they use dataflow, but the right answers when they use when_all().then()

13:00 <K-ballo> hkaiser: I suspect it ought to be possible to combine those assert checks with the atomic modifications, otherwise looks good to me

13:00 <jbjnr_> is dataflow known to be racey?

13:00 <hkaiser> K-ballo: I didn't care for atomicity wrt the asserts

13:00 <hkaiser> jbjnr_: not really

13:01 <jbjnr_> well it is now then!

13:01 <hkaiser> nothing would work if dataflow had a race

13:01 <jbjnr_> no. I'm wrong. Raffaele just told me that have a problem with when_all as well. So ignore what I just wrote.

13:01 <hkaiser> it has no synchronization anyways

13:02 <jbjnr_> it just apears more frequently with one than the other it seems. Sorry. This news was hot off the press ...

13:03 <heller> jbjnr_: we fixed a problem with that for 1.2.1 maybe he is running with an old version?

13:04 <jbjnr_> It was master from a few weeks back I think, but I will check

13:32 aserio has joined #ste||ar

13:35 hkaiser has quit [Quit: bye]

13:59 <mdiers_> hkaiser: yt? (for hpx in combination with slurm)

14:03 <jbjnr_> mdiers_: can I help?

14:05 <mdiers_> jbjnr_: perhaps, i have now been able to eliminate my performance problem under slurm by configuring slurm with pmix.

14:05 <jbjnr_> what is the nature of your problem?

14:08 <mdiers_> for pmix an adjustment for the mpi recognition is necessary : hpx.parcel.mpi.env=MV2_COMM_WORLD_RANK,PMI_RANK,OMPI_COMM_WORLD_SIZE,ALPS_APP_PE,PMIX_RANK

14:08 <mdiers_> now i'm stuck with a problem with hyperthreading

14:09 hkaiser has joined #ste||ar

14:10 <hkaiser> K-ballo: better now?

14:10 <K-ballo> sure

14:10 <mdiers_> normally the number of physical cores is determined, but under slurm it runs under the logical number

14:13 <jbjnr_> nothing changes under slurm unless you are using some kind of numactl instructions in your srun command. make sure you run one process per node. Is the machine you are runing on unusual in any respect?

14:13 <mdiers_> jbjnr_: and the difference comes from handle_num_threads() in command_line_handling line 328

14:15 <mdiers_> jbjnr_: have the same behavior on different nodes

14:17 <jbjnr_> does it ignore your --hpx:threads=xxx from the command line

14:19 <mdiers_> the problem is, that the slurm environment the logical cores provides (variable batch_threads in command_line_handling:319 ), this will reset the default_threads in command_line_handling:339

14:20 <mdiers_> jbjnr_: --hpx:threads=xxx woks if the configuration of slurm is identical to the threads::topology

14:21 <jbjnr_> (on't use slurm --threads-per-task and that kind of thing)

14:22 <jbjnr_> ^don't

14:22 <mdiers_> jbjnr_: but we want to feed a heterogeneous cluster, which makes it difficult with --hpx:threads=.

14:23 <jbjnr_> --hpx:threads=cores, each node will choose the amount for that node. Is slurm breaking that?

14:23 <mdiers_> directly with mpirun there is not this behavior

14:23 <jbjnr_> isn't there an ignore-batch environment flag

14:23 <hkaiser> mdiers_: all depends on the environment

14:23 <hkaiser> slurm sets some env variables we try to interpret

14:25 <jbjnr_> mdiers_: --hpx:ignore-batch-env

14:28 <mdiers_> hkaiser: Yes, I've noticed that too.

14:29 <mdiers_> jbjnr_: sounds good, also works so far. now we have a special case where we have to start our processes per socket, but unfortunately it doesn't work

14:35 <mdiers_> i am just a bit confused because of the different behavior of hpx under mpirun and slurm, in both variants the number of logical cores is passed via the environment variables, and hpx goes to the physical cores under mpirun and to the logical cores under slurm.

14:40 <hkaiser> mdiers_: this could be a bug

14:40 <jbjnr_> if you use "--hpx:threads=cores --hpx:ignore-batch-env" then on every node it will launch one thread per core. If it isn't doing that, then please explain what it is doing better cos I don't quite follow what you mean by "logical cores under slurm"

14:41 <hkaiser> mdiers_: also, you can specify a node-specific command line option even in batch mode, if that helps

14:42 <hkaiser> --hpx:N:option will be applied to node 'N" only

14:52 <mdiers_> jbjnr_: yes "--hpx:threads=cores --hpx:ignore-batch-env" works for homogeneous nodes, but unfortunately for our heterogeneous clusters it's a bit expensive

14:54 <mdiers_> hkaiser: yes, I already tried it, it also works, unfortunately also something in our case expensive

15:03 <mdiers_> Actually, I'm only slightly confused by that difference:

15:03 <mdiers_> mpirun -host epyc ./helloworld --> hpx::get_os_thread_count() 24

15:03 <mdiers_> srun --nodelist=epyc ./helloworld --> hpx::get_os_thread_count() 48

15:06 <mdiers_> in both cases the corresponding environmental variable has a value of 48 ( checked with mpirun -host epyc env and srun --nodelist=epyc env )

15:10 tianyi93 has quit [Quit: Leaving]

15:10 <jbjnr_> srun --nodelist=epyc ./helloworld --hpx:threads=cores

15:11 <jbjnr_> ^ --hpx:ignore-batch-env

15:11 <jbjnr_> then what does it report

15:12 <mdiers_> jbjnr_: yes, that works, but is unfortunately somewhat difficult to parameterize in heterogeneous clusters

15:12 <jbjnr_> hkaiser: if I have an action, that returns a future and I attach a continuation to it, but I don't need a future from the continuation - is there a .then version of apply?

15:12 <jbjnr_> or must I returna a future and discard it

15:13 <mdiers_> has to break up now, I'll get back to you tomorrow. thanks a lot

15:14 <jbjnr_> mdiers_: it's not hard at all. It's the same for every node!

15:39 akheir has quit [Quit: Konversation terminated!]

15:47 akheir has joined #ste||ar

15:58 bibek has joined #ste||ar

16:14 <hkaiser> jbjnr_: no, we don't have a fire&forget continuation

16:15 <jbjnr_> ok

16:15 <jbjnr_> ta

16:16 bibek has quit [Quit: Konversation terminated!]

16:17 bibek has joined #ste||ar

16:37 aserio1 has joined #ste||ar

16:40 aserio has quit [Ping timeout: 268 seconds]

16:40 aserio1 is now known as aserio

16:43 bibek has quit [Quit: Konversation terminated!]

16:52 <diehlpk_work> simbergm, Could you please join https://groups.google.com/group/season-of-docs-announce?refresh=1 to be informed about Google Season of Docs

16:53 <diehlpk_work> We would need to start next month with our application

16:56 <simbergm> diehlpk_work: yeah, thanks for reminding me!

17:34 aserio has quit [Ping timeout: 252 seconds]

17:35 <Yorlik> Newbie question: Could I do Line 36 (depending on line 17) in a more simple way than just using a templated initializer function? https://wandbox.org/permlink/fhPgrEss0iZNxyA5

17:39 david_pfander has quit [Ping timeout: 250 seconds]

17:49 aserio has joined #ste||ar

17:50 bibek has joined #ste||ar

18:09 bibek has quit [Quit: Konversation terminated!]

18:11 bibek has joined #ste||ar

19:03 aserio has quit [Ping timeout: 264 seconds]

19:09 aserio has joined #ste||ar

19:19 aserio has quit [Ping timeout: 264 seconds]

19:25 hkaiser has quit [Quit: bye]

19:30 aserio has joined #ste||ar

20:02 bibek has quit [Quit: Konversation terminated!]

20:03 bibek has joined #ste||ar

20:09 bibek has quit [Quit: Konversation terminated!]

20:14 bibek has joined #ste||ar

20:36 hkaiser has joined #ste||ar

20:37 <parsa> Yorlik: not sure what you mean by simple, but you could do this: https://wandbox.org/permlink/al0uzVa6WdaggxWT

20:38 <Yorlik> Ahh yes - thats what I was looking for

20:38 <Yorlik> Thanks a lot !

20:39 <Yorlik> I felt my way was overly compliated and not precisely enough expressing the intention

20:40 * Yorlik is stumbling through custom memory allocator land and has fun with it.

21:35 <heller> hkaiser: #3745 done

21:35 <heller> hkaiser: might have some implication on the hpxMP backend

21:36 aserio has quit [Quit: aserio]

21:38 <hkaiser> heller: yah, saw that - I'll resolve this with Tianyi directly

21:39 <hkaiser> ... and thanks!

21:39 <heller> np ;)

21:39 <heller> do we chat tomorrow?

22:03 <hkaiser> heller: yes, I'd love to

22:03 <heller> excellent