aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar
Vir has quit [Read error: Connection reset by peer]
hkaiser has quit [Quit: bye]
nanashi55 has quit [Ping timeout: 256 seconds]
nanashi55 has joined #ste||ar
simbergm has quit [Ping timeout: 240 seconds]
Smasher has joined #ste||ar
mcopik has quit [Ping timeout: 255 seconds]
simbergm has joined #ste||ar
Smasher has quit [Remote host closed the connection]
<jbjnr>
probably no differnt if your test has no other work goin on
<jbjnr>
it would only help if the queues were being contended
<heller_>
yeah
<heller_>
so, what I discovered, is that those different task_ and work_ queues cost time
<heller_>
that is, when doing a hpx::async([](){}), it ends up in the task_ queue, only to be placed in the work_ queue once wait_or_add_new is being called
<heller_>
for my simple test, this accounts for 30% of the runtime
<jbjnr>
yes, and this happens every N cycles, so there's always a delay
<jbjnr>
it's on my list of things to look into
<jbjnr>
we have to get the cholesky snmall block sizes runnig faster before end march
<heller_>
ok
<heller_>
I want to have nice numbers for my thesis, essentially 4 weeks ago
<jbjnr>
I'll send you and hk an email about it
<jbjnr>
when do you start new job?
<heller_>
no idea yet
<jbjnr>
have you got one yet?
<heller_>
nope
<jbjnr>
(new job I mean)
<jbjnr>
ah
<jbjnr>
why not?
<heller_>
if I only knew ;)
<jbjnr>
handed in PhD yet?
hkaiser has joined #ste||ar
<heller_>
jbjnr: no
<jbjnr>
<sigh>
<heller_>
stop bothering me
<jbjnr>
I'm not bothering, just curious. that's all.
<heller_>
;)
<jbjnr>
Didn't nag you like your mother or anything
<heller_>
I am running those low level micro benchmarks right now
<jbjnr>
that's hkaiser's job
<heller_>
and well, the numbers just suck
<jbjnr>
hpx::launch::sync might have been a better choice
<jbjnr>
for was wrong
<jbjnr>
k
<jbjnr>
bbiab
<heller_>
sync doesn't create any new task
<hkaiser>
heller_: you're down to micro-optimization :/
<heller_>
yes
<heller_>
hkaiser: I have a problem describing super fine grained dependencies and stuff, and then, when running benchmarks, they don't support those claims :(
<hkaiser>
lol
<hkaiser>
there is that
<heller_>
but I found a very nice representation for varying granularity
<hkaiser>
ok?
<heller_>
well, the numbers aren't too bad, I guess
<heller_>
a heat map
<heller_>
still generating numbers, will show you in a second
<hkaiser>
cool, thanks
<hkaiser>
jbjnr: why is running the micro-benchmarks my job?
<jbjnr>
hkaiser: running the benchmarks is not your job - nagging heller to finish his PhD is your job. Sorry, conversation got mixed up in multiple comments
<hkaiser>
jbjnr: nod, heller_ already said as much
<hkaiser>
jbjnr: thanks for your trust on this
<heller_>
the hot spots are: 1) wait_or_add_new 2) thread_map_.insert
<hkaiser>
heller_: as expected
<heller_>
the first shouldn't even appear for this simple benchmark, IMHO
<hkaiser>
it's being called if queues run empty
<heller_>
which shouldn't be the case, as there is always work
<jbjnr>
tell me what "wait_or_add_new" actually does please, it is on my list of things to look at
<heller_>
it transforms "tasks" into "work"
<heller_>
more or less
<jbjnr>
bit more detail please
<jbjnr>
but thanks
<hkaiser>
heller_: towards the end utilization tapers off
<heller_>
for this specific, simple benchmark, I always have at least 1 task that can be run
<heller_>
what's happening is that we first push the task description into the scheduler queue, then 'get()' suspends, obviously, there is no other work to be performed, so this task description needs to be turned into work by wait_or_add_new
<hkaiser>
heller_: yah, the 'runnow' is used somewhat inconsistently ;)
<hkaiser>
I was planning to look into this at some point
<heller_>
it needs to go ;)
<hkaiser>
I don't think we should always imediatly create a thread
<hkaiser>
that can easily overwhelm the system
<heller_>
I think we should always immediately create a thread, but lazily allocate the stack
<hkaiser>
we should use a different criteria from 'gut feeling', though
<hkaiser>
heller_: that's one way
<heller_>
and have a thread local allocator for the stack
<heller_>
so we don't have any contention there
<hkaiser>
stacks are cached anyways
<heller_>
yes
<heller_>
this caching, i think, can be done without locking
<heller_>
just pop of thread locally when you need it, and push back thread locally when the task is terminated
<hkaiser>
ok
<jbjnr>
yes
<jbjnr>
who is going to d this? I need it now.
<jbjnr>
do^
<jbjnr>
but I must work on something else for a while
<heller_>
it's not a easy task, i think
<heller_>
quite involved
<jbjnr>
should be fairly straight forward - for one of us at least.
<jbjnr>
not GSoC anyway :)
<heller_>
the problem is, that a lot of those things are very tightly connected ...
<heller_>
thread_data, thread_queue and stack allocation mostly
<hkaiser>
the stack-handling can be done independently
<jbjnr>
the real question would be - can thread_map be removed?
<jbjnr>
is there a better way
<hkaiser>
jbjnr: I don't see how - if you have an idea - great
<heller_>
from what I can see, thread_map is only needed for diagnostics, that is to have a handle on the suspended threads
<heller_>
hkaiser: btw, do you remember why thread_data needs to be refcounted?
<hkaiser>
heller_: suspended threads - yes - but not just diagnsotics
<hkaiser>
thread_id is a instrusive pointer
<heller_>
which is refcounted
<hkaiser>
yes
<heller_>
but why?
<hkaiser>
to keep things alive?
<heller_>
but what for?
<hkaiser>
so it does not get deleted prematurely?
<heller_>
shouldn't the lifetime end, once the thread has been terminated?
<hkaiser>
there was a corner case - don't remember
<heller_>
probably a race condition for setting the thread state (or similar) for a terminated thread
<jbjnr>
once a thread completes, it should be set terminated by thre thread it is running on - race between who?
<hkaiser>
could be
<hkaiser>
others might still hold a handle to it
<heller_>
I can't think of a use case where you need 'thread_data *' to be valid after the thread has been terminated that's not a bug
<heller_>
what for?
<hkaiser>
shrug
<hkaiser>
as I said, I forgot
<heller_>
too bad ...
<jbjnr>
try to remember before I start cleaning it up
<heller_>
the refcounting is something that appears in the profiles as well :/
<hkaiser>
switch to a raw pointer and see what happens...
<hkaiser>
heller_: yah, probably in the lower 2%
<heller_>
nope
<hkaiser>
shrug
<heller_>
hpx::threads::intrusive_ptr_release appears as the second entry in vtune's 'bottom-up' view
<hkaiser>
k
<hkaiser>
that might be timing the allocator
<heller_>
half of the time inside wait_or_add_new
<hkaiser>
heller_: if release itself is an issue, why don't you see addref, then?
<heller_>
I see it
<heller_>
6th entry
<heller_>
release: 54 ms, add_ref, 45 ms
<hkaiser>
as I waid - if you can improve things - be my guest
<hkaiser>
no hard feelings, really
<hkaiser>
said*
<jbjnr>
email sent
<heller_>
hkaiser: don't want to waste time just to discover that ref counting is needed after all :/
<hkaiser>
heller_: two changes needed: a) change the typedef, b) delete explicitly on removal from the map
<hkaiser>
shouldn't be much
<hkaiser>
jbjnr: why should I go to that workshop ?
<heller_>
hkaiser: and clean up all places which call .get() ;)
<hkaiser>
the compiler will tell you
<jbjnr>
because if you want HPC people to be intrested in HPX, then lots of big names will be there (probably) and I always say things like "it's good, but this is shit and so is that" and we need someone who really cares about HPX to defend it
eschnett has joined #ste||ar
<hkaiser>
jbjnr: ok, but shouldn't I give the talk, then?
<jbjnr>
no. It has to be an HPX talk
<jbjnr>
HPC^^^
<hkaiser>
ahh, and I'm not an HPC person, I know
<jbjnr>
correct
<hkaiser>
well, have fun, then
<heller_>
jbjnr: I am looking into scheduling
<heller_>
let's see...
<jbjnr>
just talk to your boss and mention that if he's invited this year, then he should bring you as well - or send you in his place
<hkaiser>
no idea why I should do that besides going to Hawaii
<hkaiser>
I hate travelling, especially if it's pointless
<jbjnr>
because it's about networking
<hkaiser>
jbjnr: cool, so it's right p your alley
<jbjnr>
no. I'm a social misfit.
<jbjnr>
networking is not my thing
<hkaiser>
and I'm not an HPC person
<hkaiser>
networking is not my thing either, I barely know how to use a socket
<jbjnr>
no, but your good at being important
<hkaiser>
lol
<jbjnr>
and you are good at arguing and discussing stuff. Which I'm not. I'm only good at ranting and stuff.
<hkaiser>
jbjnr: when is that workshop?
Vir has joined #ste||ar
<jbjnr>
end march sometime. can't remember exactly
<jbjnr>
26-29 ish
<hkaiser>
will not happen, I'm in Japan mid March, that's more travel than I can handle already
<jbjnr>
ok. You win.
<jbjnr>
what's in Japan? C++ meeting?
<hkaiser>
a workshop at some stupid conference
<hkaiser>
giving a talk there...
<jbjnr>
your standards knowledge would have been good for SOS.
<hkaiser>
next year, perhaps
<jbjnr>
by next year they'll all be using kokkosunless we can stop them at this year's SOS meeting.
<hkaiser>
jbjnr: forget about kokkos
<jbjnr>
can't. Sadia. Oak Ridge, everyone else. They love it
<jbjnr>
^Sandia
<hkaiser>
it was shotgun married with the nonsense David Holland is working on - forgot the name
<hkaiser>
DARMA
<heller_>
kokkos doesn't have suspendable tasks, right?
<heller_>
hkaiser: so, regarding thread_map_, what other usage than diagnostics does it have?
eschnett has quit [Quit: eschnett]
<hkaiser>
heller_: it keeps the thread alive while its suspended
<heller_>
so we are running in circles ;)
<hkaiser>
if you remove the ref-counting, then the map is the only place that knows when to delete th ethread
<heller_>
I tend to think that it should be safe to delete a thread once it is terminated
<hkaiser>
ok
<github>
[hpx] hkaiser deleted fix_stack_overhead at 1aee866: https://git.io/vNC7J
eschnett has joined #ste||ar
<heller_>
hmm, lambda_to_action is still UB
<hkaiser>
it works ;)
<hkaiser>
as long as there are no captures it shouldn't be UB
<heller_>
it is
<hkaiser>
what's UB?
<heller_>
since it is calling a function through a nullptr