hkaiser changed the topic of #ste||ar to: The topic is 'STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
<hkaiser>
Yorlik: here
<Yorlik>
Would you have a moment helping me understand the profiler result from my server? It seems my meory bandwidth is much smaller than it should be and it also seems I am spending a ton of time in HPX functions
<Yorlik>
It seems like 54% of the time is in hpx.dll
<hkaiser>
Yorlik: could be if you have not enough work
<hkaiser>
look at idle rates
<Yorlik>
I am just looping over the objects and loading them.
<Yorlik>
And reapeat that over and over again.
<Yorlik>
I think the parallel for call could probably be optimized
<Yorlik>
Thats 1 second, should be roughly 10-13 frames
<hkaiser>
10000 threads overall
<hkaiser>
one core is doing almost all of the work
<hkaiser>
core2 has 16% idle rate (which is ok), the rest aren't doing anything
<Yorlik>
That conforms with my measuring, when I did manual thread id prints from std thread and HPX too
<Yorlik>
How can I fix this?
<hkaiser>
shrug
<Yorlik>
I'd like to round robin the cores
<Yorlik>
I mean it's your runtime - you should able to tell me :D
<hkaiser>
work should get stolen if there is any
<Yorlik>
The result is low framerate
<hkaiser>
Yorlik: sum all the work and compare to execution (wall clock) time
<hkaiser>
that will give you a sense of how much work was done
<hkaiser>
the frame rate is low as everything is done on one core
<hkaiser>
(well, almost everything)
<Yorlik>
What is "work" in your book here? I simply wanted to maximize the loops/sec here.
<hkaiser>
the idle-rate gives you the ration of thread-execution-time over wall-clock-time
<Yorlik>
So they're lazy for some reason
<Yorlik>
Wall clock is 1000 ms here
<hkaiser>
no, just not enough work
<Yorlik>
What is work?
<hkaiser>
tasks
<Yorlik>
I mean -- loading the objects at 1.5 GB/s is lame !
<hkaiser>
4000 tasks, 4 cores, that's about 1000 tasks per core
<hkaiser>
average length is about 500 microsecs
<hkaiser>
that means that you're using only half of your compute resources
<hkaiser>
you would get the same when running on just 2 cores
<Yorlik>
This test is not so much about computing but memory bandwidth. I don't understand why the bandwidth is so low
<hkaiser>
bandwith is low because you don't do a lot of work
<Yorlik>
So doing more work would make it faster? That sounds crazy
<Yorlik>
After all I just want it to loop as fast as possible over all the entities
<hkaiser>
I don't know what you're doing - but there is no reason why using hpx should limit your bandwidth
<Yorlik>
The cores are actually at > 90% in resource monitor
<hkaiser>
Yorlik: yah, that is because the hpx scheduler tries to run things constantly, that is a red herring
<Yorlik>
So thats the idle time
<hkaiser>
well, sure - instead of suspending the thread, it keeps running in case new wrok is created
<hkaiser>
so for the OS it looks like as if the thread was 'doing things'
<Yorlik>
Why is it idle and not running the next task of the parallel loop instead?
<hkaiser>
because there is no 'next task' otherwise it would run it
<Yorlik>
I don't understand why the memory bandwidth is so low. I'll do a test with an empty update function
<hkaiser>
Yorlik: is that a release build? or debug?
<Yorlik>
There is not much difference betwen them
<hkaiser>
I doubt that
<Yorlik>
frametime is about 73-100 ms
<hkaiser>
there is usually a factor of 10 between them
<Yorlik>
I get the same numbers with the update function being empty - I'definitely burning time elsewhere. So I'm not really measuring memory bandwidth
<Yorlik>
I'll double check the entire call chain
<Yorlik>
OK - my frametime is down to 20-30 ms now and I don't know why. I had kust played with some smallish things and I atually undid them and the time stays low, which kinda corresponds to ~4.0 GB/sec. Now I'm scared
<Yorlik>
Maybe the server shaped up just by me looking at it ... :D
hkaiser has quit [Quit: bye]
<Yorlik>
Arrived at 4.9 GB/sec. Heap profiling "off" helps a lot ... :D
<zao>
:D
nikunj has joined #ste||ar
<Yorlik>
Still - it shozuld be faster, imo :D
nikunj97 has joined #ste||ar
nikunj has quit [Ping timeout: 260 seconds]
nikunj97 has quit [Ping timeout: 260 seconds]
nikunj97 has joined #ste||ar
nikunj97 has quit [Ping timeout: 265 seconds]
nikunj97 has joined #ste||ar
<Yorlik>
Is it possible to cancel a scheduled task? E.g. for a timer application?
<Yorlik>
I mean a task that has yielded.
hkaiser has joined #ste||ar
<Yorlik>
When I'm starting a lambda as hpx async, does the return type of the lambda determine the template parameter of the future?
<Yorlik>
Like hpx::future<lambda_rettype>
<Yorlik>
NVM - figured it out
nikunj97 has quit [Ping timeout: 240 seconds]
nikunj has joined #ste||ar
<hkaiser>
Yorlik: yes
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
nikunj has quit [Read error: Connection reset by peer]