#ste||ar on 2020-02-07 — irc logs at irclog.cct.lsu.edu

2019-12-03 02:04 hkaiser changed the topic of #ste||ar to: The topic is 'STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:25 primef5 has joined #ste||ar

00:29 primef5 has quit [Ping timeout: 240 seconds]

01:37 hkaiser has joined #ste||ar

03:05 hkaiser has quit [Quit: bye]

04:25 Yorlik_ has joined #ste||ar

08:14 Yorlik_ has quit [Read error: Connection reset by peer]

09:34 mdiers_ has joined #ste||ar

10:27 heller2 has quit [Quit: killed]

10:27 simbergm has quit [Quit: killed]

10:27 rori has quit [Quit: killed]

10:42 Yorlik has joined #ste||ar

10:52 tarzeau has quit [Ping timeout: 268 seconds]

11:04 kordejong has joined #ste||ar

11:49 rori has joined #ste||ar

12:14 tarzeau has joined #ste||ar

13:06 gdaiss[m] has joined #ste||ar

13:24 hkaiser has joined #ste||ar

13:29 jaafar has quit [Ping timeout: 268 seconds]

13:36 <hkaiser> K-ballo: yt?

13:44 jaafar has joined #ste||ar

14:08 hkaiser has quit [Quit: bye]

14:56 hkaiser has joined #ste||ar

15:31 hkaiser has quit [Ping timeout: 256 seconds]

15:42 parsa has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

15:42 parsa has joined #ste||ar

16:13 hkaiser has joined #ste||ar

17:56 hkaiser has quit [Quit: bye]

19:45 primef5 has joined #ste||ar

20:46 <K-ballo> I'm here now

20:53 hkaiser has joined #ste||ar

21:22 primef5 has quit [Ping timeout: 240 seconds]

21:33 <Yorlik> Do parallel loops accept some sort of hints about how to chop up the portions? I am thinking about how to schedule an array full of stuff and chop it into workloads. Problem is every frame and within the array the times can possibly vary a lot.

21:41 <Yorlik> Trying to understand if this https://stellar-group.github.io/hpx-docs/latest/html/libs/algorithms/api.html#include-hpx-parallel-container-algorithms-for-each-hpp can solve my scheduling issue sufficiently.

21:49 <hkaiser> Yorlik: you can pass a special policy that can make that decision

21:49 <Yorlik> I will probably just start with a naive parloop, but my intuition tells me I will need to respect some runtime metrics to optimize it.

21:52 <hkaiser> Yorlik: see https://github.com/STEllAR-GROUP/hpx/blob/master/libs/execution/tests/unit/executor_parameters.cpp#L82

21:53 <hkaiser> the static_chunk_size policy is a predefined one (https://github.com/STEllAR-GROUP/hpx/blob/master/libs/execution/include/hpx/execution/executors/static_chunk_size.hpp), but you can create your own easily

21:53 <Yorlik> I'm probably overcomplicating the problem

21:53 <Yorlik> The default parloop does some measuing - is that correct?

21:54 <Yorlik> -measuring-

21:54 <Yorlik> After all I will have at least 4 worker threads and a frametime of 16 ms minimum.

21:54 <hkaiser> Yorlik: I don't think so, the default is to use the static_chunk_size policy, I believe

21:55 <Yorlik> So I probably ~should do some runtime instrumentation?

21:55 <Yorlik> I mean - any task length between 200 and 800 us would be very good already.

21:56 <hkaiser> yep: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/execution/include/hpx/execution/executors/parallel_executor.hpp#L94

21:56 <hkaiser> so static_chunk_size is the default, but you can try using the auto_chunk_size policy (which is measuring things)

21:57 <hkaiser> or simply create your own

21:58 <Yorlik> From your experience - Doing this every frame (minimum 16.6 ms, maybe 100 ms if we go for 10 fps) - how would you estimate the benefit of doing something custom instead of just using the auto_chunk_size?

21:58 <hkaiser> shrug

21:58 <hkaiser> depends

21:58 <Yorlik> I have a weird feeling I might be overoptimizing here.

21:58 <Yorlik> Gott a neasure anyways :D

21:58 <hkaiser> I'd go with the defaults and see how it performs

21:59 <Yorlik> Ya. Thatr's probably best.

21:59 <Yorlik> Is there a way to instrument the vhunks, like mean runtime?

21:59 <hkaiser> then you can measure on the first invocation and use the timings for subsequent invocations of the same loop

21:59 <Yorlik> chunks

22:00 <hkaiser> good question, I don't think we have a hook there, but this could be done

22:00 <Yorlik> When you have very variable update times / chunk times it might help. I have a weird feeling, that any heuristic would have to deal with a very chaotic time distribution.

22:01 <hkaiser> this is one chunk: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/algorithms/include/hpx/parallel/util/loop.hpp#L59-L66

22:01 <hkaiser> 'f' is your lambda

22:01 <Yorlik> Would it be possible to dynamically change the chunk size while the loop is executing ?

22:01 <hkaiser> the auto_chunk_size policy currently measures 1% of the iterations to decide how to chunk things

22:02 <hkaiser> for small iterations this works well, for large iterations this causes issues

22:02 <Yorlik> I am thinking of a scheduler collecting the last 4 task runtimes and adjusting the next created chinks accordingly

22:02 <Yorlik> So - you say like every 100th chink is measured?

22:02 <Yorlik> chunk

22:02 <hkaiser> look at the auto_chunk_size policy, your 4 tasks would have to go there (instead of the 1%)

22:03 <hkaiser> we wanted to make that configurable long time ago, so here's your chance ;-)

22:03 <Yorlik> Allright - I'll study that. Probably what HPX gives me here is already good enough for us.

22:03 <Yorlik> Thanks ! :)

22:06 <Yorlik> algorithm_result is a future I assume?

22:10 <Yorlik> Seems like the for_each is finished when all chunks are finished, right?

22:54 <hkaiser> Yorlik: right

22:54 <hkaiser> algorithm_result is a future for async algorithms