hkaiser changed the topic of #ste||ar to: The topic is 'STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
weilewei has quit [Ping timeout: 260 seconds]
<hkaiser> jaafar: pls feel free to merge wave to master or let me know if you want me to do it
<jaafar> thanks hkaiser :)
<jaafar> waiting for appveyor
hkaiser has quit [Quit: bye]
diehlpk has joined #ste||ar
<Yorlik> This:
<Yorlik> mfut4.then( [&]( ) { std::cout << "mfut4: " << mfut4.get( ) << std::endl; } );
<Yorlik> gives me this error:
<Yorlik> error C2672: 'hpx::lcos::future<bool>::then': no matching overloaded function found.
<Yorlik> What's wrong with this?
<Yorlik> Couldn't I just give any lambda to then() ?
diehlpk has quit [Ping timeout: 240 seconds]
<jaafar> Yorlik: does that future have a value associated with it?
<Yorlik> Yes
<jaafar> that would be my first guess, in which case [&](auto){...} might work
<Yorlik> I found a workaround, but still would like to use the .then construct
<Yorlik> I'll try - thanks !
<jaafar> maybe just accept (but discard) an argument
<Yorlik> Now i'm getting a runtime error: access violation
<Yorlik> OK - this finally worked:
<Yorlik> mfut4.then( [&]( decltype(mfut4) mfut4 ) { std::cout << "mfut4: " << mfut4.get( ) << std::endl; } );
<Yorlik> ---
<Yorlik> copied ...
<Yorlik> :(
<Yorlik> OK - this also works:
<Yorlik> mfut4.then( [&]( decltype(mfut4)& mfut4 ) { std::cout << "mfut4: " << mfut4.get( ) << std::endl; } );
<Yorlik> So - seems I'm good now :D
<Yorlik> Thanks jaafar
<tarzeau> simbergm: finally got hpx 1.4.1-1 into debian new queue: https://ftp-master.debian.org/new.html awaiting ftp-master copyright review
rori has joined #ste||ar
<Yorlik> Is it possible to configure counter display from the ini file?
<Yorlik> Interesting - lowering loglevel down to 1 gave another nice fps boost ( was 4 )
<Yorlik> Being at around 17/18 ms / frame now
<Yorlik> OK - it starts getting interesting ...
<Yorlik> I removed the throttle from my frame time and am just doing a single memcpy
<Yorlik> of my data to a temporary buffer variable on the stack.
<Yorlik> memcpy( (void*)&buffer, (void*)e, sizeof( e_type ) ); // e is a data*
<Yorlik> So I can be sure all data gets moved through memory:
<Yorlik> I am now getting frametimes of 5-10 ms, which results in a
<Yorlik> memory bandwidth of ~ 12-18 GB/sec (110.4 mb /6..9ms).
<Yorlik> I had to fix a bunch of issues on my side, mostly newbie problems and also
<Yorlik> get gid of the extensive logging.
<Yorlik> Interestingly enough the counters still show the HPX threads being pretty idle,
<Yorlik> probably there is more room for improvements:
<Yorlik> ... total}/count/cumulative,8,71.463037,[s],1.12685e+06
<Yorlik> ... total}/idle-rate,8,71.464059,[s],2105,[0.01%]
<Yorlik> ... total}/time/average,8,71.464075,[s],200067,[ns]
<Yorlik> At least the result is no longer horrible, if I didn't mess up the measuring. :)
K-ballo has quit [Quit: K-ballo]
K-ballo has joined #ste||ar
nikunj has joined #ste||ar
<nikunj> simbergm, yt?
nikunj has quit [Ping timeout: 260 seconds]
nikunj has joined #ste||ar
<heller1> Yorlik: the how long is the duration of your test?
<Yorlik> Not very long - I'm eyeballing it, letting it run for a couple of mionutes. It's not like a fully scientific analysis
<Yorlik> Also I still have a bug in my variance. I need to go ver it and make it correct, but the mean is precise - never had any issues with it. e.g. when I was padding a frame the times were exactly as expected.
<Yorlik> When I made tests with the heap there were no signs of any leaks or something at least.
<Yorlik> What still makes me wonder is, why the counters show so much idle time. Something is still strange.,
<Yorlik> heller1 ^^
<heller1> Yorlik: yeah ... they probably also include startup and shutdown, which adds some significant idle time
<Yorlik> And the updater doesn't do much
<Yorlik> How could I increase the number of loops in a parloop task?
<Yorlik> Or tweak the task time
<Yorlik> ?
<heller1> increase the number of loops?
<heller1> just increase the iteration space?
<heller1> I think your test looks good so far
<Yorlik> Yes - I'd like to make the chinks of work larger
<Yorlik> chunks
<Yorlik> And : Can I add the performance counters to the ini? I hate doing it in the command line all the time
<Yorlik> I failed trying to figure that out
<simbergm> Yorlik: not sure about the ini file but you can pass command line parameters in the config vector to `hpx::init/start`
<simbergm> looking for an example...
<simbergm> that would be hardcoded in the application of course but at least saves you adding it manually all the time for testing purposes
<Yorlik> I could make it a custom ini setting in my own ini then.,
<Yorlik> I am using boost program optiuons too.
parsa has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
parsa has joined #ste||ar
hkaiser has joined #ste||ar
<hkaiser> simbergm: g'morning
<simbergm> hkaiser: hey!
<hkaiser> hey
<hkaiser> you have been busy!
<hkaiser> quick q: what is the state of the fedora fix in the base image?
<simbergm> :P
<hkaiser> or ubuntu
<simbergm> I'm good at looking busy
<hkaiser> ;-)
<simbergm> hkaiser: I merged the potential fix to the image but it might not have propagated into the hpx image yet
<simbergm> I merged some things earlier today so one of those should hopefully update it...
<hkaiser> ok, master is still broken on github
<hkaiser> ok, let's see - thanks!
<simbergm> hmm, hpx or phylanx master?
<hkaiser> hpx
<simbergm> "last update 3 hours ago" (https://hub.docker.com/r/stellargroup/hpx/tags)
<hkaiser> might not have propagated yet, let's wait
<simbergm> link please?
<hkaiser> btw, wrt #4380: it's probably my fault, I'll look
<hkaiser> I most likely used the wrong base constructor for the future shared state
<hkaiser> simbergm: I copied things from malke_future_ready, which is most likely the reason for the future to be ready to begin with
<hkaiser> I think removing the inplace{} argument will do the right thing, but I'll get back to you
<simbergm> hkaiser: no worries, I figured that was the reason, but couldn't find an example of how to make it not ready
<simbergm> I can give that a quick try as well
<simbergm> thanks for helping out with this!
<hkaiser> sure, any time - I want to look busy myself ;-)
<Yorlik> Fancy image: https://imgur.com/a/Yvi3z4R
<hkaiser> Yorlik: nice!
<Yorlik> hkaiser: How could I increase the size of the parloop chunks?
<Yorlik> I believe there still is overhead, since the HPX threads idle
<hkaiser> there is the static_chunk_size execution parameter
<hkaiser> Yorlik: I'll create an example for you
<Yorlik> Thanks !
<Yorlik> That's my current loop:
<Yorlik> hpx::parallel::for_loop( hpx::parallel::execution::par( hpx::parallel::execution::task ), 0, m_e_type::maxindex, &update_entity<I> )
<Yorlik> I also wonder if it wouldmake sense to tweak the parameters in the ini file
<Yorlik> Like max_idle_loop_count and the like. OFC this workload is super artificial and a real scenario would requzire re-tweaking.
<simbergm> btw, hkaiser, when you have time feedback on https://github.com/STEllAR-GROUP/hpx/pull/4270/files#r380167118 would be appreciated
<hkaiser> right
<hkaiser> will get back to you on that one
<simbergm> thanks!
<Yorlik> simbergm: I just realized I can easily add these command line switches in Visual Studio in a json file (launch.vs.json)
<Yorlik> OFC making batch files manually also is an option, e.g. on Linux.
<Yorlik> hkaiser: I think my worker tasks are a bit too short for full efficiency:
<Yorlik> worker-thread#0}/time/average,21,200.333931,[s],192618,[ns]
<Yorlik> worker-thread#1}/time/average,21,200.333935,[s],156592,[ns]
<Yorlik> worker-thread#3}/time/average,21,200.317100,[s],151580,[ns]
<Yorlik> worker-thread#2}/time/average,21,200.333939,[s],156918,[ns]
<hkaiser> yah
<Yorlik> I am thinking about writing an internal monito later which pulls the counters and tweaks the loop while the system is running.
<simbergm> Yorlik: your tasks are about 150-200 us long which is very much on the short side for our schedulers (depends on the number of worker threads though)
<Yorlik> The point is, I always have enough work.
<simbergm> we usually recommend at least a ms as a conservative number since that usually gives you almost perfect efficiency with our schedulers
<Yorlik> Because once a frame is done, the next immediately starts
<simbergm> par(task).with(static_chunk_size(N)) to set the chunk size
<simbergm> modulo namespaces
<Yorlik> I don't understand how the loops can starve at all.
<simbergm> task overheads
<Yorlik> Just append to this: hpx::parallel::for_loop( hpx::parallel::execution::par( hpx::parallel::execution::task ), 0, m_e_type::maxindex, &update_entity<I> )
<simbergm> most likely
<Yorlik> I'll give it a shot
<hkaiser> Yorlik: why par(task) btw - you're waiting for it to finish anyways
<simbergm> yep, exactly
<Yorlik> Could I give it a target task length instead of a static chunk size?
<Yorlik> Like 500us or 1ms
<hkaiser> Yorlik: you can if you use .with(auto_chunk_size(chrono::milliseconds(1))) or similar
<hkaiser> but this will spend 1% of the iterations to measure how long it takes
<Yorlik> I'll check both out
<Yorlik> where do I append the "with"? after the policy?
<hkaiser> par.with()
<hkaiser> or par(task).with()
K-ballo has quit [Quit: K-ballo]
mdiers_ has joined #ste||ar
K-ballo has joined #ste||ar
<Yorlik> Like this?
<Yorlik> futures.push_back( // Collect futures from
<Yorlik> .with( auto_chunk_size( 1ms ) ), //
<Yorlik> hpx::parallel::for_loop( // ParLoop with:
<Yorlik> hpx::parallel::execution::task ) //
<Yorlik> hpx::parallel::execution::par( // Execution Policy:
<Yorlik> 0, // starting at:
<Yorlik> m_e_type::maxindex, // ending at:
<Yorlik> &update_entity<I> // calling:
<Yorlik> ) ); // end loop() // end push_back
<Yorlik> // error C3861: 'auto_chunk_size': identifier not found
<Yorlik> hkaiser: ^^
<Yorlik> Do i need to compile HPX to support this?
<Yorlik> OK - found out it's a header ...
<Yorlik> Seems not to work for me
<Yorlik> Did this change since 1.4.0?
<Yorlik> ;y system (1.4.0) has hpx/parallel/executors/auto_chunk_size.hpp
<Yorlik> But it still tells me: error C3861: 'auto_chunk_size': identifier not found
<zao> Don't you need to qualify it?
<Yorlik> OK .. using hpx::parallel::execution::auto_chunk_size ...
<Yorlik> And it starts running again ... :)
nikunj97 has joined #ste||ar
<Yorlik> Still:
<Yorlik> ../total}/count/cumulative,9,80.100221,[s],331798
<Yorlik> ../total}/time/average,9,80.097535,[s],722743,[ns]
<Yorlik> ../total}/idle-rate,9,80.100236,[s],2470,[0.01%]
<Yorlik> I wonder if the low idle rate comes from my main thread waiting for the futures.
<Yorlik> And it's an artifact
<Yorlik> ??
<Yorlik> Can I remove a specific task from the counter statistics?
<hkaiser> Yorlik: not really
<hkaiser> idle-rate is not too bad now
<hkaiser> 24% is ok-ish
<Yorlik> How exactly to I have to read the output of it?
<hkaiser> what output?
<Yorlik> This: ../total}/idle-rate,9,80.100236,[s],2470,[0.01%]
<Yorlik> Its 5 fields
<Yorlik> ../total}/idle-rate, -->9, --> 80.100236, --> [s], --> 2470, --> [0.01%]
<Yorlik> ?????
<Yorlik> I didn't find any info how to read the counter outputs
<hkaiser> the docs say: "These lines have 6 fields, the counter name, the sequence number of the counter invocation, the time stamp at which this information has been sampled, the unit of measure for the time stamp, the actual counter value, and an optional unit of measure for the counter value."
<Yorlik> Woops?
<Yorlik> FFSRTFM
<Yorlik> IC
<Yorlik> So its sequence 9, at 80.. seconds, and the value is 2470*0.01 = 24.70 %
<hkaiser> yes
<Yorlik> OK - overlooked it in the Docs ... thanks for helping out - I need food now
<hkaiser> any time
<Yorlik> BBL :D
hkaiser has quit [Quit: bye]
mdiers_ has quit [Quit: mdiers_]
mdiers_ has joined #ste||ar
<Yorlik> So - it's not much difference with 2,3 or 4 OS Threads, more gets inefficient and less also - in between only the idle rate changes, so I think I'm prtobably memory bound now, or is that a wrong conclusion?
<Yorlik> However - I'll read up after meal ... BBL
nikunj97 has quit [Ping timeout: 268 seconds]
nikunj97 has joined #ste||ar
mdiers_ has quit [Remote host closed the connection]
mdiers_ has joined #ste||ar
<diehlpk_work> simbergm, Will there be another hpx 1.4.1 rc?
<diehlpk_work> Or when do we anticipate the final release?
hkaiser has joined #ste||ar
nikunj97 has quit [Ping timeout: 272 seconds]
nikunj97 has joined #ste||ar
<hkaiser> simbergm: : #4380 should be fine now
<simbergm> hkaiser: excellent, thanks! I'll give it a try
<simbergm> diehlpk_work: assuming I don't screw anything up again there won't be another rc and I'll do the release on wednesday
<hkaiser> simbergm: parsa repported that installing HPX doesn't work in Release if a PREFIX_PATH was specified
<hkaiser> works fine in Debug
<simbergm> mmh, `CMAKE_PREFIX_PATH`?
<hkaiser> yes, that's what I meant
<simbergm> or `CMAKE_INSTALL_PREFIX`?
<hkaiser> sec
<simbergm> can you ask him to open an issue?
<hkaiser> parsa?
<simbergm> hkaiser: ^
<hkaiser> I asked him already, he will do that shortly, I hope
<simbergm> hkaiser: thanks
RostamLog has joined #ste||ar
<hkaiser> simbergm: #4392
<Yorlik> When launching an action with parameters on a local id_type - does HPX automagically skip the serialization of the parameters?
rori has quit [Ping timeout: 246 seconds]
kordejong has quit [Ping timeout: 240 seconds]
simbergm has quit [Ping timeout: 240 seconds]
heller1 has quit [Ping timeout: 256 seconds]
<hkaiser> Yorlik: yes
kordejong has joined #ste||ar
heller1 has joined #ste||ar
<hkaiser> Yorlik: I modified auto_chunk_size to allow for specifying the number of iterations to use for measurement, see #4395
kordejong has quit [Quit: killed]
heller1 has quit [Quit: killed]
kordejong has joined #ste||ar
nikunj97 has quit [Ping timeout: 260 seconds]
hkaiser has quit [Quit: bye]
parsa has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
parsa has joined #ste||ar
hkaiser has joined #ste||ar
hkaiser has quit [Ping timeout: 240 seconds]