hkaiser changed the topic of #ste||ar to: The topic is 'STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
primef5 has joined #ste||ar
primef5 has quit [Ping timeout: 240 seconds]
hkaiser has joined #ste||ar
hkaiser has quit [Quit: bye]
Yorlik_ has joined #ste||ar
Yorlik_ has quit [Read error: Connection reset by peer]
mdiers_ has joined #ste||ar
heller2 has quit [Quit: killed]
simbergm has quit [Quit: killed]
rori has quit [Quit: killed]
Yorlik has joined #ste||ar
tarzeau has quit [Ping timeout: 268 seconds]
kordejong has joined #ste||ar
rori has joined #ste||ar
tarzeau has joined #ste||ar
gdaiss[m] has joined #ste||ar
hkaiser has joined #ste||ar
jaafar has quit [Ping timeout: 268 seconds]
<hkaiser> K-ballo: yt?
jaafar has joined #ste||ar
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar
hkaiser has quit [Ping timeout: 256 seconds]
parsa has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
parsa has joined #ste||ar
hkaiser has joined #ste||ar
hkaiser has quit [Quit: bye]
primef5 has joined #ste||ar
<K-ballo> I'm here now
hkaiser has joined #ste||ar
primef5 has quit [Ping timeout: 240 seconds]
<Yorlik> Do parallel loops accept some sort of hints about how to chop up the portions? I am thinking about how to schedule an array full of stuff and chop it into workloads. Problem is every frame and within the array the times can possibly vary a lot.
<hkaiser> Yorlik: you can pass a special policy that can make that decision
<Yorlik> I will probably just start with a naive parloop, but my intuition tells me I will need to respect some runtime metrics to optimize it.
<hkaiser> the static_chunk_size policy is a predefined one (https://github.com/STEllAR-GROUP/hpx/blob/master/libs/execution/include/hpx/execution/executors/static_chunk_size.hpp), but you can create your own easily
<Yorlik> I'm probably overcomplicating the problem
<Yorlik> The default parloop does some measuing - is that correct?
<Yorlik> -measuring-
<Yorlik> After all I will have at least 4 worker threads and a frametime of 16 ms minimum.
<hkaiser> Yorlik: I don't think so, the default is to use the static_chunk_size policy, I believe
<Yorlik> So I probably ~should do some runtime instrumentation?
<Yorlik> I mean - any task length between 200 and 800 us would be very good already.
<hkaiser> so static_chunk_size is the default, but you can try using the auto_chunk_size policy (which is measuring things)
<hkaiser> or simply create your own
<Yorlik> From your experience - Doing this every frame (minimum 16.6 ms, maybe 100 ms if we go for 10 fps) - how would you estimate the benefit of doing something custom instead of just using the auto_chunk_size?
<hkaiser> shrug
<hkaiser> depends
<Yorlik> I have a weird feeling I might be overoptimizing here.
<Yorlik> Gott a neasure anyways :D
<hkaiser> I'd go with the defaults and see how it performs
<Yorlik> Ya. Thatr's probably best.
<Yorlik> Is there a way to instrument the vhunks, like mean runtime?
<hkaiser> then you can measure on the first invocation and use the timings for subsequent invocations of the same loop
<Yorlik> chunks
<hkaiser> good question, I don't think we have a hook there, but this could be done
<Yorlik> When you have very variable update times / chunk times it might help. I have a weird feeling, that any heuristic would have to deal with a very chaotic time distribution.
<hkaiser> 'f' is your lambda
<Yorlik> Would it be possible to dynamically change the chunk size while the loop is executing ?
<hkaiser> the auto_chunk_size policy currently measures 1% of the iterations to decide how to chunk things
<hkaiser> for small iterations this works well, for large iterations this causes issues
<Yorlik> I am thinking of a scheduler collecting the last 4 task runtimes and adjusting the next created chinks accordingly
<Yorlik> So - you say like every 100th chink is measured?
<Yorlik> chunk
<hkaiser> look at the auto_chunk_size policy, your 4 tasks would have to go there (instead of the 1%)
<hkaiser> we wanted to make that configurable long time ago, so here's your chance ;-)
<Yorlik> Allright - I'll study that. Probably what HPX gives me here is already good enough for us.
<Yorlik> Thanks ! :)
<Yorlik> algorithm_result is a future I assume?
<Yorlik> Seems like the for_each is finished when all chunks are finished, right?
<hkaiser> Yorlik: right
<hkaiser> algorithm_result is a future for async algorithms