#ste||ar on 2020-08-06 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

00:25 nanmiao11 has quit [Remote host closed the connection]

00:26 nanmiao11 has joined #ste||ar

01:16 Yorlik has quit [Ping timeout: 240 seconds]

01:18 <weilewei> With mpirun, how do I limit number of cores that run application? For example, if I have 56 cores, but I only want to run 7 cores on that machine?

01:18 <weilewei> Or how to limit that actually

01:33 <hkaiser> weilewei: with hpx?

01:34 <hkaiser> generally you should be able to do that through slurm

01:34 <weilewei> hkaiser no, just regular application

01:34 <hkaiser> mpirun should pick it up

01:34 <weilewei> ok, let me explore sbatch

01:38 hkaiser_ has joined #ste||ar

01:40 hkaiser has quit [Ping timeout: 260 seconds]

01:46 hkaiser_ has quit [Quit: bye]

02:14 weilewei has quit [Remote host closed the connection]

03:01 akheir has quit [Quit: Leaving]

03:32 kale[m] has quit [Ping timeout: 265 seconds]

03:41 kale[m] has joined #ste||ar

04:37 nanmiao11 has quit [Ping timeout: 245 seconds]

07:28 <zao> mpirun deals with the number of processes to spawn, which you can indicate with `-np` if you want a number different from the default or environment.

07:28 <zao> SLURM has knobs for the number of tasks/processes --ntasks (-n) and --cpus-per-task (-c). You would set those accordingly to have configurations like "one task per node with all its cores" or "N tasks per node, one core each" or something inbetween like "7 tasks per node, each with 4 cores".

07:28 <zao> Bah, seems like weilwei went offline :D

10:13 <gdaiss[m]> @hkaiser Should we move the Kokkos meeting to next Thursday (or another day that week) as Mikael is on vacation this week?

10:51 hkaiser has joined #ste||ar

12:10 <ms[m]> hkaiser: how was your schedule next week? are you away all week or just at the usual hpx meeting time? I wouldn't be there for the potential meeting today, but should we try to have one tomorrow or next week (if you're around)? also rori_[m] jbjnr gdaiss (we can have the kokkos meeting the same day)

12:11 <rori> 👍️

12:16 <gdaiss[m]> <ms[m] "hkaiser: how was your schedule n"> for the kokkos meeting I would prefer next week over tomorrow! My calendar for tomorrow is already completely full!

12:19 <hkaiser> ms[m]: I'm around next week

12:19 <hkaiser> no plans to be away atm

12:20 <hkaiser> ms[m]: should we cancel today's meeting, then?

12:23 <ms[m]> hkaiser: yes, unless you'd like to have it anyway (I don't mind really)

12:23 <hkaiser> ok, I'll cancel it for today, should we paln for next Thursday?

12:24 <hkaiser> ms[m]: ^^

12:25 <ms[m]> thursday works, but not that time

12:25 <ms[m]> another day might be easier

12:25 <ms[m]> sorry :/

12:25 <hkaiser> ok, let's skip it then and reconvene in 2 weeks

12:25 <ms[m]> all right, works too

12:26 <ms[m]> thanks!

12:26 <hkaiser> : np

12:31 Yorlik has joined #ste||ar

13:36 <gonidelis[m]> hkaiser: you are on a roll ;p I need to step up my adaptation game

13:36 nanmiao11 has joined #ste||ar

13:42 <hkaiser> gonidelis[m]: ;-)

13:44 <gonidelis[m]> hkaiser: Should for_each be checked here? https://github.com/STEllAR-GROUP/hpx/issues/4822

13:44 <gonidelis[m]> cause of mikel's impl

13:45 <hkaiser> once its merged, yes

13:45 <gonidelis[m]> ahh right

13:45 <hkaiser> he still needs to fix a conflict

13:58 kale[m] has quit [Ping timeout: 240 seconds]

13:59 kale[m] has joined #ste||ar

14:18 nanmiao11 has quit [Ping timeout: 245 seconds]

14:21 nanmiao11 has joined #ste||ar

14:45 <gnikunj[m]> hkaiser: yt?

14:45 <hkaiser> here

14:46 <gnikunj[m]> hkaiser: I'm seeing a lot of contention with my distributed benchmarks (which I kind of expected)

14:46 <hkaiser> contention? where?

14:46 <gnikunj[m]> the apis work best when I run with 20ms or more worth of grain size

14:47 <gnikunj[m]> per distributed call

14:47 <hkaiser> ahh, including networking?

14:47 <hkaiser> what parcelport?

14:47 <gnikunj[m]> yes

14:47 <gnikunj[m]> tcp

14:47 <gnikunj[m]> should I switch to mpi?

14:47 <hkaiser> try MPI

14:47 <hkaiser> might improve things

14:48 <gnikunj[m]> good idea, let me

14:49 <hkaiser> gnikunj[m]: also, the remote actions work best if local work is overlapping things

14:50 <hkaiser> I wouldn't suggest exposing/measuring pure latencies

14:50 <gnikunj[m]> I see, so I should add some local work on remote invocations

14:50 <hkaiser> or measure just the overhead introduced by replay/replicate compared to non-replay/non-replicate

14:51 <hkaiser> gnikunj[m]: how is qbc working for you?

14:51 <gnikunj[m]> qbc?

14:52 <hkaiser> qbc.loni.org

14:52 <gnikunj[m]> aah I'm able to login now

14:52 <gnikunj[m]> I haven't done any benchmarking there. I'm playing with rostam right now

14:54 <hkaiser> right

14:56 diehlpk has joined #ste||ar

14:56 diehlpk has quit [Changing host]

15:00 <gnikunj[m]> hkaiser: mpi backend works really nicely. With tcp I was having trouble with lower grain sizes. With mpi I can go as low as 5ms for the grain size.

15:02 <hkaiser> nod, good

15:10 diehlpk has quit [Ping timeout: 256 seconds]

15:58 diehlpk_work has quit [Ping timeout: 244 seconds]

16:00 diehlpk has joined #ste||ar

16:00 diehlpk has quit [Changing host]

16:00 diehlpk has joined #ste||ar

16:03 weilewei has joined #ste||ar

16:09 diehlpk_work has joined #ste||ar

16:11 <gnikunj[m]> hkaiser: I've added some local work to the benchmark. This should be a more real world scenario. I'll run a script now to get some results for various configurations. Let's discuss them in tomorrow's call.

16:12 <hkaiser> gnikunj[m]: ok

16:14 <zao> weilewei: 09:28 <zao> mpirun deals with the number of processes to spawn, which you can indicate with `-np` if you want a number different from the default or environment.

16:14 <zao> 09:28 <zao> SLURM has knobs for the number of tasks/processes --ntasks (-n) and --cpus-per-task (-c). You would set those accordingly to have configurations like "one task per node with all its cores" or "N tasks per node, one core each" or something inbetween like "7 tasks per node, each with 4 cores".

16:16 <weilewei> zao thanks!

16:20 <zao> The manual page for sbatch isn't that bad, and most HPC sites have some sort of writeup on submit file design.

16:25 <gonidelis[m]> Should this be changed from `result_type()` to `result_type{}` ?

16:25 <gonidelis[m]> https://github.com/STEllAR-GROUP/hpx/blob/496e0a38bc60912e6d241a5f2aa7acab83567aab/libs/algorithms/include/hpx/parallel/algorithms/transform.hpp#L489-L491

16:25 <hkaiser> gonidelis[m]: that's a matter of style in this case

16:26 <gonidelis[m]> according to https://github.com/STEllAR-GROUP/hpx/blob/496e0a38bc60912e6d241a5f2aa7acab83567aab/libs/algorithms/include/hpx/parallel/util/result_types.hpp#L66-L76

16:26 <hkaiser> in this case both forms are equivalent

16:27 <gonidelis[m]> Well we were discussing with k-ballo last week that here https://github.com/STEllAR-GROUP/hpx/blob/48ae89f109ec89cc5c74ab0b603c7f753c4ed360/libs/algorithms/include/hpx/parallel/algorithms/copy.hpp#L454

16:27 <gonidelis[m]> `in_out_result()` wouldn't work

16:27 <hkaiser> sure, not in c++14

16:27 <gonidelis[m]> the thing is that the compiler complains

16:28 <K-ballo> complains about ()?

16:28 <gonidelis[m]> sec

16:28 <K-ballo> return T(); and return T{}; will both work, but the semantics are subtly different in some cases

16:28 <gonidelis[m]> https://github.com/STEllAR-GROUP/hpx/blob/48ae89f109ec89cc5c74ab0b603c7f753c4ed360/libs/algorithms/tests/unit/algorithms/transform_binary_tests.hpp#L67-L72

16:28 <K-ballo> I can't find any in_out_result()

16:29 <gonidelis[m]> here for example

16:29 <weilewei> zao right, I run sbatch --help in different platforms, it is helpful. It is just I started with jsrun on Summit and get used to it

16:29 <gonidelis[m]> K-ballo: that's why we use `{}`

16:29 <K-ballo> I can't find any in_out_result{} either

16:29 <K-ballo> did you mean in_out_result(arguments, here)?

16:29 <gonidelis[m]> on the xample above should `hpx::util::get<0>(result)` be changed to `result.in1` etc... ?

16:29 <gonidelis[m]> in_out_result<args>{}

16:30 <K-ballo> link to an actual line please

16:30 <weilewei> on Summit, one uses bjob to submit work, and then one can use jsrun -c to limit number of cores per rank

16:31 <gonidelis[m]> K-ballo: doesn't that mean that I can use `in_in_out_result<templ args>{args}` ?

16:31 <gonidelis[m]> https://github.com/STEllAR-GROUP/hpx/blob/496e0a38bc60912e6d241a5f2aa7acab83567aab/libs/algorithms/include/hpx/parallel/util/result_types.hpp#L55

16:31 <K-ballo> I can't tell what the actual question is..

16:31 <K-ballo> T() vs T{} has a different meaning than T(args) vs T{args}

16:31 <K-ballo> I don't see any `result_type()` in the links above

16:33 <gonidelis[m]> K-ballo: https://github.com/STEllAR-GROUP/hpx/blob/496e0a38bc60912e6d241a5f2aa7acab83567aab/libs/algorithms/include/hpx/parallel/algorithms/transform.hpp#L486-L491

16:33 <gonidelis[m]> that's the case that I am working on

16:34 <K-ballo> I see no `result_type()` in there, only `result_type(args)`

16:34 <gonidelis[m]> should this be `result_type{things_here}`

16:34 <gonidelis[m]> K-ballo: yeah that's what I am talking about

16:34 diehlpk has quit [Ping timeout: 244 seconds]

16:34 <K-ballo> ok, `result_type()` is a separate language construct, different rules, different semantics

16:35 <K-ballo> if that result_type is an `in_out_result` it needs to be using braces, yes

16:35 <gonidelis[m]> yeah sure...

16:35 <K-ballo> fwiw, never change a `T()` into a `T{}` unless you know why you are doing it

16:35 <gonidelis[m]> as you can see I have a `using result_type = util::in_in_out_result<FwdIter1B, FwdIter2, FwdIter3>`

16:36 <gonidelis[m]> you mean `T(args)` to `T{args}` or just plain `T()` to `T{}`?

16:36 <K-ballo> plain T(), it's its own language construct distinct from T(args)

16:37 <gonidelis[m]> K-ballo: so it's not `in_out_result`, but rather `in_in_out_result`

16:37 <gonidelis[m]> K-ballo: got it thanks ;)

17:10 bita_ has joined #ste||ar

17:56 kale[m] has quit [Ping timeout: 260 seconds]

17:57 kale[m] has joined #ste||ar

18:44 <gonidelis[m]> hkaiser: yt?

18:44 <hkaiser> here

18:49 <gonidelis[m]> https://github.com/STEllAR-GROUP/hpx/pull/4855 I have created the CPO for the plain hpx::transform (not ranges::transform yet). I have updated the status of my repo but the compiler complains for a certain `container_algos` test (where I don't think it should be complaining). I think it's a minor detail in order to reach to a stable version of my code. Anyways, the error is at

18:49 <gonidelis[m]> `tests/unit/container_algorithms/transform_range_binary.cpp:50:27` because I changed `hpx::util::get<0>(result) == std::end(c1)` to `result.in1 == std::end(c1)`...

18:49 <gonidelis[m]> Could you give a tip?

18:50 <gonidelis[m]> hkaiser: ^^

19:27 hkaiser has quit [Ping timeout: 244 seconds]

19:35 hkaiser has joined #ste||ar

19:40 hkaiser has quit [Ping timeout: 264 seconds]

19:41 hkaiser has joined #ste||ar

19:54 <gonidelis[m]> hkaiser: did you happen to missh my message maybe ?

19:55 <hkaiser> could be

19:55 <gonidelis[m]> https://github.com/STEllAR-GROUP/hpx/pull/4855 I have created the CPO for the plain hpx::transform (not ranges::transform yet). I have updated the status of my repo but the compiler complains for a certain container_algos test (where I don't think it should be complaining). I think it's a minor detail in order to reach to a stable version of my code. Anyways, the error is at

19:55 <gonidelis[m]> tests/unit/container_algorithms/transform_range_binary.cpp:50:27 because I changed hpx::util::get<0>(result) == std::end(c1) to result.in1 == std::end(c1)...

19:55 <gonidelis[m]> Could you give a tip?

19:55 <hkaiser> the compiler error could only be caused by a mismatch between what the algorithm returns and what you believe it returns

19:57 diehlpk has joined #ste||ar

19:57 diehlpk has quit [Changing host]

19:57 diehlpk has joined #ste||ar

19:58 <gonidelis[m]> hpx::parallel::transform returns either an `in_out_result` or an `in_in_out_result`

20:00 <nanmiao11> hkaiser I am not able to join the room. It says the meeting has problem. Have you joined in?

20:00 <hkaiser> nanmiao11: yes

20:01 <hkaiser> it's not a problem if you can't join today, we'll talk about money, mostly

20:01 <nanmiao11> OK.

20:37 diehlpk has quit [Ping timeout: 260 seconds]

20:42 mcopik has joined #ste||ar

20:42 mcopik has quit [Client Quit]

21:06 bita_ has quit [Ping timeout: 256 seconds]

21:26 <weilewei> wow, I got accepted to SCinent volunteer again this year

21:26 <weilewei> "virtual" volunteer

21:27 <nanmiao11> Congrats!

21:27 <weilewei> nanmiao11 thanks!

21:57 kale[m] has quit [Ping timeout: 240 seconds]

21:57 kale[m] has joined #ste||ar

22:09 <gonidelis[m]> hkaiser: it would help a lot if you could take a look at my updated PR and leave some comments until our meeting tomorow morning. I wil have extra updates until then...

22:12 <hkaiser> will do later tonight

22:14 kale[m] has quit [Read error: Connection reset by peer]

22:14 kale[m] has joined #ste||ar

22:19 <gonidelis[m]> hkaiser: thank you

22:22 t37 has joined #ste||ar

23:16 kale[m] has quit [Ping timeout: 256 seconds]

23:39 t37 has quit [Remote host closed the connection]