#ste||ar on 2020-02-17 — irc logs at irclog.cct.lsu.edu

2019-12-03 02:04 hkaiser changed the topic of #ste||ar to: The topic is 'STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:11 weilewei has quit [Ping timeout: 260 seconds]

00:22 <hkaiser> jaafar: pls feel free to merge wave to master or let me know if you want me to do it

00:55 <jaafar> thanks hkaiser :)

00:55 <jaafar> waiting for appveyor

01:29 hkaiser has quit [Quit: bye]

01:55 diehlpk has joined #ste||ar

02:58 <Yorlik> This:

02:58 <Yorlik> mfut4.then( [&]( ) { std::cout << "mfut4: " << mfut4.get( ) << std::endl; } );

02:58 <Yorlik> gives me this error:

02:58 <Yorlik> error C2672: 'hpx::lcos::future<bool>::then': no matching overloaded function found.

02:58 <Yorlik> What's wrong with this?

02:58 <Yorlik> Couldn't I just give any lambda to then() ?

03:36 diehlpk has quit [Ping timeout: 240 seconds]

05:38 <jaafar> Yorlik: does that future have a value associated with it?

05:38 <Yorlik> Yes

05:39 <jaafar> that would be my first guess, in which case [&](auto){...} might work

05:39 <Yorlik> I found a workaround, but still would like to use the .then construct

05:39 <Yorlik> I'll try - thanks !

05:39 <jaafar> maybe just accept (but discard) an argument

05:42 <Yorlik> Now i'm getting a runtime error: access violation

05:44 <Yorlik> OK - this finally worked:

05:44 <Yorlik> mfut4.then( [&]( decltype(mfut4) mfut4 ) { std::cout << "mfut4: " << mfut4.get( ) << std::endl; } );

05:44 <Yorlik> ---

05:44 <Yorlik> copied ...

05:44 <Yorlik> :(

05:45 <Yorlik> OK - this also works:

05:45 <Yorlik> mfut4.then( [&]( decltype(mfut4)& mfut4 ) { std::cout << "mfut4: " << mfut4.get( ) << std::endl; } );

05:45 <Yorlik> So - seems I'm good now :D

05:46 <Yorlik> Thanks jaafar

05:50 <tarzeau> simbergm: finally got hpx 1.4.1-1 into debian new queue: https://ftp-master.debian.org/new.html awaiting ftp-master copyright review

07:17 rori has joined #ste||ar

08:12 <Yorlik> Is it possible to configure counter display from the ini file?

08:42 <Yorlik> Interesting - lowering loglevel down to 1 gave another nice fps boost ( was 4 )

08:42 <Yorlik> Being at around 17/18 ms / frame now

08:56 <Yorlik> OK - it starts getting interesting ...

09:18 <Yorlik> I removed the throttle from my frame time and am just doing a single memcpy

09:18 <Yorlik> of my data to a temporary buffer variable on the stack.

09:18 <Yorlik> memcpy( (void*)&buffer, (void*)e, sizeof( e_type ) ); // e is a data*

09:18 <Yorlik> So I can be sure all data gets moved through memory:

09:18 <Yorlik> I am now getting frametimes of 5-10 ms, which results in a

09:18 <Yorlik> memory bandwidth of ~ 12-18 GB/sec (110.4 mb /6..9ms).

09:18 <Yorlik> I had to fix a bunch of issues on my side, mostly newbie problems and also

09:18 <Yorlik> get gid of the extensive logging.

09:18 <Yorlik> Interestingly enough the counters still show the HPX threads being pretty idle,

09:18 <Yorlik> probably there is more room for improvements:

09:18 <Yorlik> ... total}/count/cumulative,8,71.463037,[s],1.12685e+06

09:18 <Yorlik> ... total}/idle-rate,8,71.464059,[s],2105,[0.01%]

09:18 <Yorlik> ... total}/time/average,8,71.464075,[s],200067,[ns]

09:18 <Yorlik> At least the result is no longer horrible, if I didn't mess up the measuring. :)

09:20 K-ballo has quit [Quit: K-ballo]

09:20 K-ballo has joined #ste||ar

10:46 nikunj has joined #ste||ar

11:07 <nikunj> simbergm, yt?

11:23 nikunj has quit [Ping timeout: 260 seconds]

11:43 nikunj has joined #ste||ar

11:52 <heller1> Yorlik: the how long is the duration of your test?

11:59 <Yorlik> Not very long - I'm eyeballing it, letting it run for a couple of mionutes. It's not like a fully scientific analysis

12:00 <Yorlik> Also I still have a bug in my variance. I need to go ver it and make it correct, but the mean is precise - never had any issues with it. e.g. when I was padding a frame the times were exactly as expected.

12:01 <Yorlik> When I made tests with the heap there were no signs of any leaks or something at least.

12:02 <Yorlik> What still makes me wonder is, why the counters show so much idle time. Something is still strange.,

12:02 <Yorlik> heller1 ^^

12:03 <heller1> Yorlik: yeah ... they probably also include startup and shutdown, which adds some significant idle time

12:04 <Yorlik> And the updater doesn't do much

12:04 <Yorlik> How could I increase the number of loops in a parloop task?

12:04 <Yorlik> Or tweak the task time

12:04 <Yorlik> ?

12:06 <heller1> increase the number of loops?

12:06 <heller1> just increase the iteration space?

12:06 <heller1> I think your test looks good so far

12:06 <Yorlik> Yes - I'd like to make the chinks of work larger

12:06 <Yorlik> chunks

12:09 <Yorlik> And : Can I add the performance counters to the ini? I hate doing it in the command line all the time

12:09 <Yorlik> I failed trying to figure that out

12:29 <simbergm> Yorlik: not sure about the ini file but you can pass command line parameters in the config vector to `hpx::init/start`

12:29 <simbergm> looking for an example...

12:30 <simbergm> that would be hardcoded in the application of course but at least saves you adding it manually all the time for testing purposes

12:31 <Yorlik> I could make it a custom ini setting in my own ini then.,

12:31 <Yorlik> I am using boost program optiuons too.

12:55 parsa has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

12:58 parsa has joined #ste||ar

13:00 hkaiser has joined #ste||ar

13:09 <hkaiser> simbergm: g'morning

13:10 <simbergm> hkaiser: hey!

13:10 <hkaiser> hey

13:11 <hkaiser> you have been busy!

13:11 <hkaiser> quick q: what is the state of the fedora fix in the base image?

13:11 <simbergm> :P

13:11 <hkaiser> or ubuntu

13:11 <simbergm> I'm good at looking busy

13:12 <hkaiser> ;-)

13:12 <simbergm> hkaiser: I merged the potential fix to the image but it might not have propagated into the hpx image yet

13:13 <simbergm> I merged some things earlier today so one of those should hopefully update it...

13:13 <hkaiser> ok, master is still broken on github

13:13 <hkaiser> ok, let's see - thanks!

13:15 <simbergm> hmm, hpx or phylanx master?

13:15 <hkaiser> hpx

13:15 <simbergm> "last update 3 hours ago" (https://hub.docker.com/r/stellargroup/hpx/tags)

13:15 <hkaiser> might not have propagated yet, let's wait

13:15 <simbergm> link please?

13:16 <hkaiser> btw, wrt #4380: it's probably my fault, I'll look

13:17 <hkaiser> I most likely used the wrong base constructor for the future shared state

13:17 <hkaiser> simbergm: I copied things from malke_future_ready, which is most likely the reason for the future to be ready to begin with

13:18 <hkaiser> I think removing the inplace{} argument will do the right thing, but I'll get back to you

13:18 <simbergm> hkaiser: no worries, I figured that was the reason, but couldn't find an example of how to make it not ready

13:19 <simbergm> I can give that a quick try as well

13:19 <simbergm> thanks for helping out with this!

13:19 <hkaiser> sure, any time - I want to look busy myself ;-)

13:22 <Yorlik> Fancy image: https://imgur.com/a/Yvi3z4R

13:22 <hkaiser> Yorlik: nice!

13:23 <Yorlik> hkaiser: How could I increase the size of the parloop chunks?

13:23 <Yorlik> I believe there still is overhead, since the HPX threads idle

13:23 <hkaiser> there is the static_chunk_size execution parameter

13:24 <hkaiser> Yorlik: I'll create an example for you

13:24 <Yorlik> Thanks !

13:26 <Yorlik> That's my current loop:

13:26 <Yorlik> hpx::parallel::for_loop( hpx::parallel::execution::par( hpx::parallel::execution::task ), 0, m_e_type::maxindex, &update_entity<I> )

13:27 <Yorlik> I also wonder if it wouldmake sense to tweak the parameters in the ini file

13:27 <Yorlik> Like max_idle_loop_count and the like. OFC this workload is super artificial and a real scenario would requzire re-tweaking.

13:37 <simbergm> btw, hkaiser, when you have time feedback on https://github.com/STEllAR-GROUP/hpx/pull/4270/files#r380167118 would be appreciated

13:38 <hkaiser> right

13:38 <hkaiser> will get back to you on that one

13:39 <simbergm> thanks!

13:45 <Yorlik> simbergm: I just realized I can easily add these command line switches in Visual Studio in a json file (launch.vs.json)

13:46 <Yorlik> OFC making batch files manually also is an option, e.g. on Linux.

13:48 <Yorlik> hkaiser: I think my worker tasks are a bit too short for full efficiency:

13:48 <Yorlik> worker-thread#0}/time/average,21,200.333931,[s],192618,[ns]

13:48 <Yorlik> worker-thread#1}/time/average,21,200.333935,[s],156592,[ns]

13:48 <Yorlik> worker-thread#3}/time/average,21,200.317100,[s],151580,[ns]

13:48 <Yorlik> worker-thread#2}/time/average,21,200.333939,[s],156918,[ns]

13:48 <hkaiser> yah

13:49 <Yorlik> I am thinking about writing an internal monito later which pulls the counters and tweaks the loop while the system is running.

13:51 <simbergm> Yorlik: your tasks are about 150-200 us long which is very much on the short side for our schedulers (depends on the number of worker threads though)

13:51 <Yorlik> The point is, I always have enough work.

13:51 <simbergm> we usually recommend at least a ms as a conservative number since that usually gives you almost perfect efficiency with our schedulers

13:51 <Yorlik> Because once a frame is done, the next immediately starts

13:52 <simbergm> par(task).with(static_chunk_size(N)) to set the chunk size

13:52 <simbergm> modulo namespaces

13:52 <Yorlik> I don't understand how the loops can starve at all.

13:52 <simbergm> task overheads

13:53 <Yorlik> Just append to this: hpx::parallel::for_loop( hpx::parallel::execution::par( hpx::parallel::execution::task ), 0, m_e_type::maxindex, &update_entity<I> )

13:53 <simbergm> most likely

13:53 <Yorlik> I'll give it a shot

13:53 <hkaiser> Yorlik: why par(task) btw - you're waiting for it to finish anyways

13:53 <simbergm> yep, exactly

13:54 <Yorlik> Could I give it a target task length instead of a static chunk size?

13:54 <Yorlik> Like 500us or 1ms

13:55 <hkaiser> Yorlik: you can if you use .with(auto_chunk_size(chrono::milliseconds(1))) or similar

13:55 <hkaiser> but this will spend 1% of the iterations to measure how long it takes

13:58 <Yorlik> I'll check both out

13:59 <Yorlik> where do I append the "with"? after the policy?

14:00 <hkaiser> par.with()

14:01 <hkaiser> or par(task).with()

14:02 K-ballo has quit [Quit: K-ballo]

14:06 mdiers_ has joined #ste||ar

14:10 K-ballo has joined #ste||ar

14:12 <Yorlik> Like this?

14:12 <Yorlik> futures.push_back( // Collect futures from

14:12 <Yorlik> .with( auto_chunk_size( 1ms ) ), //

14:12 <Yorlik> hpx::parallel::for_loop( // ParLoop with:

14:12 <Yorlik> hpx::parallel::execution::task ) //

14:12 <Yorlik> hpx::parallel::execution::par( // Execution Policy:

14:12 <Yorlik> 0, // starting at:

14:12 <Yorlik> m_e_type::maxindex, // ending at:

14:12 <Yorlik> &update_entity<I> // calling:

14:12 <Yorlik> ) ); // end loop() // end push_back

14:12 <Yorlik> // error C3861: 'auto_chunk_size': identifier not found

14:12 <Yorlik> hkaiser: ^^

14:13 <Yorlik> Do i need to compile HPX to support this?

14:14 <Yorlik> OK - found out it's a header ...

14:21 <Yorlik> Now I'm confused: https://stellar-group.github.io/hpx-docs/latest/html/libs/execution/api.html?highlight=auto_chunk_size#include-hpx-execution-executors-auto-chunk-size-hpp

14:21 <Yorlik> Seems not to work for me

14:21 <Yorlik> Did this change since 1.4.0?

14:21 <Yorlik> ;y system (1.4.0) has hpx/parallel/executors/auto_chunk_size.hpp

14:22 <Yorlik> But it still tells me: error C3861: 'auto_chunk_size': identifier not found

14:25 <zao> Don't you need to qualify it?

14:26 <Yorlik> OK .. using hpx::parallel::execution::auto_chunk_size ...

14:26 <Yorlik> And it starts running again ... :)

14:28 nikunj97 has joined #ste||ar

14:28 <Yorlik> Still:

14:29 <Yorlik> ../total}/count/cumulative,9,80.100221,[s],331798

14:29 <Yorlik> ../total}/time/average,9,80.097535,[s],722743,[ns]

14:29 <Yorlik> ../total}/idle-rate,9,80.100236,[s],2470,[0.01%]

14:29 <Yorlik> I wonder if the low idle rate comes from my main thread waiting for the futures.

14:29 <Yorlik> And it's an artifact

14:29 <Yorlik> ??

14:30 <Yorlik> Can I remove a specific task from the counter statistics?

14:36 <hkaiser> Yorlik: not really

14:36 <hkaiser> idle-rate is not too bad now

14:36 <hkaiser> 24% is ok-ish

14:36 <Yorlik> How exactly to I have to read the output of it?

14:37 <hkaiser> what output?

14:37 <Yorlik> This: ../total}/idle-rate,9,80.100236,[s],2470,[0.01%]

14:37 <Yorlik> Its 5 fields

14:37 <Yorlik> ../total}/idle-rate, -->9, --> 80.100236, --> [s], --> 2470, --> [0.01%]

14:37 <Yorlik> ?????

14:38 <Yorlik> I didn't find any info how to read the counter outputs

14:38 <hkaiser> the docs say: "These lines have 6 fields, the counter name, the sequence number of the counter invocation, the time stamp at which this information has been sampled, the unit of measure for the time stamp, the actual counter value, and an optional unit of measure for the counter value."

14:38 <Yorlik> Woops?

14:38 <Yorlik> FFSRTFM

14:39 <Yorlik> IC

14:40 <Yorlik> So its sequence 9, at 80.. seconds, and the value is 2470*0.01 = 24.70 %

14:41 <hkaiser> yes

14:41 <Yorlik> OK - overlooked it in the Docs ... thanks for helping out - I need food now

14:42 <hkaiser> any time

14:42 <Yorlik> BBL :D

14:42 hkaiser has quit [Quit: bye]

14:43 mdiers_ has quit [Quit: mdiers_]

14:44 mdiers_ has joined #ste||ar

14:54 <Yorlik> So - it's not much difference with 2,3 or 4 OS Threads, more gets inefficient and less also - in between only the idle rate changes, so I think I'm prtobably memory bound now, or is that a wrong conclusion?

14:54 <Yorlik> However - I'll read up after meal ... BBL

14:54 nikunj97 has quit [Ping timeout: 268 seconds]

14:57 nikunj97 has joined #ste||ar

15:26 mdiers_ has quit [Remote host closed the connection]

15:35 mdiers_ has joined #ste||ar

15:39 <diehlpk_work> simbergm, Will there be another hpx 1.4.1 rc?

15:39 <diehlpk_work> Or when do we anticipate the final release?

16:01 hkaiser has joined #ste||ar

16:17 nikunj97 has quit [Ping timeout: 272 seconds]

16:26 nikunj97 has joined #ste||ar

16:32 <hkaiser> simbergm: : #4380 should be fine now

16:57 <simbergm> hkaiser: excellent, thanks! I'll give it a try

16:57 <simbergm> diehlpk_work: assuming I don't screw anything up again there won't be another rc and I'll do the release on wednesday

17:02 <hkaiser> simbergm: parsa repported that installing HPX doesn't work in Release if a PREFIX_PATH was specified

17:02 <hkaiser> works fine in Debug

17:02 <simbergm> mmh, `CMAKE_PREFIX_PATH`?

17:03 <hkaiser> yes, that's what I meant

17:03 <simbergm> or `CMAKE_INSTALL_PREFIX`?

17:03 <hkaiser> sec

17:03 <simbergm> can you ask him to open an issue?

17:03 <hkaiser> parsa?

17:03 <simbergm> hkaiser: ^

17:03 <hkaiser> I asked him already, he will do that shortly, I hope

17:07 <simbergm> hkaiser: thanks

17:42 RostamLog has joined #ste||ar

17:54 <hkaiser> simbergm: #4392

18:02 <Yorlik> When launching an action with parameters on a local id_type - does HPX automagically skip the serialization of the parameters?

18:05 rori has quit [Ping timeout: 246 seconds]

18:07 kordejong has quit [Ping timeout: 240 seconds]

18:07 simbergm has quit [Ping timeout: 240 seconds]

18:07 heller1 has quit [Ping timeout: 256 seconds]

18:21 <hkaiser> Yorlik: yes

18:34 kordejong has joined #ste||ar

19:40 heller1 has joined #ste||ar

19:43 <hkaiser> Yorlik: I modified auto_chunk_size to allow for specifying the number of iterations to use for measurement, see #4395

19:46 kordejong has quit [Quit: killed]

19:46 heller1 has quit [Quit: killed]

20:19 kordejong has joined #ste||ar

20:20 nikunj97 has quit [Ping timeout: 260 seconds]

20:30 hkaiser has quit [Quit: bye]

20:35 parsa has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

20:37 parsa has joined #ste||ar

21:43 hkaiser has joined #ste||ar

23:24 hkaiser has quit [Ping timeout: 240 seconds]