aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar
Vir has quit [Read error: Connection reset by peer]
hkaiser has quit [Quit: bye]
nanashi55 has quit [Ping timeout: 256 seconds]
nanashi55 has joined #ste||ar
simbergm has quit [Ping timeout: 240 seconds]
Smasher has joined #ste||ar
mcopik has quit [Ping timeout: 255 seconds]
simbergm has joined #ste||ar
Smasher has quit [Remote host closed the connection]
david_pfander has joined #ste||ar
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vNC4D
<github> hpx/gh-pages 6187b44 StellarBot: Updating docs
<heller_> simbergm: great! the PR you merged broke everything but gcc 4.9 ;)
<heller_> please revert, needs a better solution
<simbergm> heller_: yeah, quite impressive :P okay, will revert right away
<simbergm> jbjnr: do you know what's up here: http://cdash.cscs.ch/testDetails.php?test=11594145&build=72817
<jbjnr> no idea. invalid user id? I wonder if the disk space / inode error is reappearing with a new disguise ...
<simbergm> mmh, could be... not sure yet if it's just temporary problems, will see soon'
<heller_> jbjnr: you know what I figured out just now?
<jbjnr> That the earth is not flat?
<jbjnr> please tell
<jbjnr> ...
<heller_> hpx::async([](){}).get(); <-- this takes in the order of 3000 cycles
<jbjnr> a lot happens, so I'm not completely surprised
<heller_> while just doing a context switch, takes roughly 300 cycles
<jbjnr> yup
<heller_> something's wrong there
<jbjnr> creating a task and managing all those queues/schedulers etc costs $$$
<heller_> we really should be able to do better here...
<jbjnr> hpx::async(hpx::launch::fork,[](){}).get()
<jbjnr> how much?
<heller_> give me a second
<jbjnr> probably no differnt if your test has no other work goin on
<jbjnr> it would only help if the queues were being contended
<heller_> yeah
<heller_> so, what I discovered, is that those different task_ and work_ queues cost time
<heller_> that is, when doing a hpx::async([](){}), it ends up in the task_ queue, only to be placed in the work_ queue once wait_or_add_new is being called
<heller_> for my simple test, this accounts for 30% of the runtime
<jbjnr> yes, and this happens every N cycles, so there's always a delay
<jbjnr> it's on my list of things to look into
<jbjnr> we have to get the cholesky snmall block sizes runnig faster before end march
<heller_> ok
<heller_> I want to have nice numbers for my thesis, essentially 4 weeks ago
<jbjnr> I'll send you and hk an email about it
<jbjnr> when do you start new job?
<heller_> no idea yet
<jbjnr> have you got one yet?
<heller_> nope
<jbjnr> (new job I mean)
<jbjnr> ah
<jbjnr> why not?
<heller_> if I only knew ;)
<jbjnr> handed in PhD yet?
hkaiser has joined #ste||ar
<heller_> jbjnr: no
<jbjnr> <sigh>
<heller_> stop bothering me
<jbjnr> I'm not bothering, just curious. that's all.
<heller_> ;)
<jbjnr> Didn't nag you like your mother or anything
<heller_> I am running those low level micro benchmarks right now
<jbjnr> that's hkaiser's job
<heller_> and well, the numbers just suck
<jbjnr> hpx::launch::sync might have been a better choice
<jbjnr> for was wrong
<jbjnr> k
<jbjnr> bbiab
<heller_> sync doesn't create any new task
<hkaiser> heller_: you're down to micro-optimization :/
<heller_> yes
<heller_> hkaiser: I have a problem describing super fine grained dependencies and stuff, and then, when running benchmarks, they don't support those claims :(
<hkaiser> lol
<hkaiser> there is that
<heller_> but I found a very nice representation for varying granularity
<hkaiser> ok?
<heller_> well, the numbers aren't too bad, I guess
<heller_> a heat map
<heller_> still generating numbers, will show you in a second
<hkaiser> cool, thanks
<hkaiser> jbjnr: why is running the micro-benchmarks my job?
<heller_> hkaiser: nagging me is your job
<hkaiser> ahh!
<hkaiser> makes sense
<heller_> hkaiser: your job is also to let us micro optimize ;)
<hkaiser> yah, we've been generating this kind of graphs for the parcel-coalescing - works well
<heller_> way nicer to have multiple graphs in the same plot
<hkaiser> heller_: it's a way to visualize the 3d graph we talked about
<heller_> yup
<heller_> it's essentially the 3d plot when looking at it "from above"
<hkaiser> nod
<heller_> after thinking a little more about those numbers, they aren't that bad after all
<heller_> they could just be *way* better ;)
<hkaiser> sure
<hkaiser> if you find a way to improve basic overheads - great
<heller_> I did indeed profile a little
<heller_> running on a single core
<jbjnr> hkaiser: running the benchmarks is not your job - nagging heller to finish his PhD is your job. Sorry, conversation got mixed up in multiple comments
<hkaiser> jbjnr: nod, heller_ already said as much
<hkaiser> jbjnr: thanks for your trust on this
<heller_> the hot spots are: 1) wait_or_add_new 2) thread_map_.insert
<hkaiser> heller_: as expected
<heller_> the first shouldn't even appear for this simple benchmark, IMHO
<hkaiser> it's being called if queues run empty
<heller_> which shouldn't be the case, as there is always work
<jbjnr> tell me what "wait_or_add_new" actually does please, it is on my list of things to look at
<heller_> it transforms "tasks" into "work"
<heller_> more or less
<jbjnr> bit more detail please
<jbjnr> but thanks
<hkaiser> heller_: towards the end utilization tapers off
<heller_> for this specific, simple benchmark, I always have at least 1 task that can be run
<heller_> what's happening is that we first push the task description into the scheduler queue, then 'get()' suspends, obviously, there is no other work to be performed, so this task description needs to be turned into work by wait_or_add_new
<hkaiser> heller_: yah, the 'runnow' is used somewhat inconsistently ;)
<hkaiser> I was planning to look into this at some point
<heller_> it needs to go ;)
<hkaiser> I don't think we should always imediatly create a thread
<hkaiser> that can easily overwhelm the system
<heller_> I think we should always immediately create a thread, but lazily allocate the stack
<hkaiser> we should use a different criteria from 'gut feeling', though
<hkaiser> heller_: that's one way
<heller_> and have a thread local allocator for the stack
<heller_> so we don't have any contention there
<hkaiser> stacks are cached anyways
<heller_> yes
<heller_> this caching, i think, can be done without locking
<heller_> just pop of thread locally when you need it, and push back thread locally when the task is terminated
<hkaiser> ok
<jbjnr> yes
<jbjnr> who is going to d this? I need it now.
<jbjnr> do^
<jbjnr> but I must work on something else for a while
<heller_> it's not a easy task, i think
<heller_> quite involved
<jbjnr> should be fairly straight forward - for one of us at least.
<jbjnr> not GSoC anyway :)
<heller_> the problem is, that a lot of those things are very tightly connected ...
<heller_> thread_data, thread_queue and stack allocation mostly
<hkaiser> the stack-handling can be done independently
<jbjnr> the real question would be - can thread_map be removed?
<jbjnr> is there a better way
<hkaiser> jbjnr: I don't see how - if you have an idea - great
<heller_> from what I can see, thread_map is only needed for diagnostics, that is to have a handle on the suspended threads
<heller_> hkaiser: btw, do you remember why thread_data needs to be refcounted?
<hkaiser> heller_: suspended threads - yes - but not just diagnsotics
<hkaiser> thread_id is a instrusive pointer
<heller_> which is refcounted
<hkaiser> yes
<heller_> but why?
<hkaiser> to keep things alive?
<heller_> but what for?
<hkaiser> so it does not get deleted prematurely?
<heller_> shouldn't the lifetime end, once the thread has been terminated?
<hkaiser> there was a corner case - don't remember
<heller_> probably a race condition for setting the thread state (or similar) for a terminated thread
<jbjnr> once a thread completes, it should be set terminated by thre thread it is running on - race between who?
<hkaiser> could be
<hkaiser> others might still hold a handle to it
<heller_> I can't think of a use case where you need 'thread_data *' to be valid after the thread has been terminated that's not a bug
<heller_> what for?
<hkaiser> shrug
<hkaiser> as I said, I forgot
<heller_> too bad ...
<jbjnr> try to remember before I start cleaning it up
<heller_> the refcounting is something that appears in the profiles as well :/
<hkaiser> switch to a raw pointer and see what happens...
<hkaiser> heller_: yah, probably in the lower 2%
<heller_> nope
<hkaiser> shrug
<heller_> hpx::threads::intrusive_ptr_release appears as the second entry in vtune's 'bottom-up' view
<hkaiser> k
<hkaiser> that might be timing the allocator
<heller_> half of the time inside wait_or_add_new
<hkaiser> heller_: if release itself is an issue, why don't you see addref, then?
<heller_> I see it
<heller_> 6th entry
<heller_> release: 54 ms, add_ref, 45 ms
<hkaiser> as I waid - if you can improve things - be my guest
<hkaiser> no hard feelings, really
<hkaiser> said*
<jbjnr> email sent
<heller_> hkaiser: don't want to waste time just to discover that ref counting is needed after all :/
<hkaiser> heller_: two changes needed: a) change the typedef, b) delete explicitly on removal from the map
<hkaiser> shouldn't be much
<hkaiser> jbjnr: why should I go to that workshop ?
<heller_> hkaiser: and clean up all places which call .get() ;)
<hkaiser> the compiler will tell you
<jbjnr> because if you want HPC people to be intrested in HPX, then lots of big names will be there (probably) and I always say things like "it's good, but this is shit and so is that" and we need someone who really cares about HPX to defend it
eschnett has joined #ste||ar
<hkaiser> jbjnr: ok, but shouldn't I give the talk, then?
<jbjnr> no. It has to be an HPX talk
<jbjnr> HPC^^^
<hkaiser> ahh, and I'm not an HPC person, I know
<jbjnr> correct
<hkaiser> well, have fun, then
<heller_> jbjnr: I am looking into scheduling
<heller_> let's see...
<jbjnr> just talk to your boss and mention that if he's invited this year, then he should bring you as well - or send you in his place
<hkaiser> no idea why I should do that besides going to Hawaii
<hkaiser> I hate travelling, especially if it's pointless
<jbjnr> because it's about networking
<hkaiser> jbjnr: cool, so it's right p your alley
<jbjnr> no. I'm a social misfit.
<jbjnr> networking is not my thing
<hkaiser> and I'm not an HPC person
<hkaiser> networking is not my thing either, I barely know how to use a socket
<jbjnr> no, but your good at being important
<hkaiser> lol
<jbjnr> and you are good at arguing and discussing stuff. Which I'm not. I'm only good at ranting and stuff.
<hkaiser> jbjnr: when is that workshop?
Vir has joined #ste||ar
<jbjnr> end march sometime. can't remember exactly
<jbjnr> 26-29 ish
<hkaiser> will not happen, I'm in Japan mid March, that's more travel than I can handle already
<jbjnr> ok. You win.
<jbjnr> what's in Japan? C++ meeting?
<hkaiser> a workshop at some stupid conference
<hkaiser> giving a talk there...
<jbjnr> your standards knowledge would have been good for SOS.
<hkaiser> next year, perhaps
<jbjnr> by next year they'll all be using kokkosunless we can stop them at this year's SOS meeting.
<hkaiser> jbjnr: forget about kokkos
<jbjnr> can't. Sadia. Oak Ridge, everyone else. They love it
<jbjnr> ^Sandia
<hkaiser> it was shotgun married with the nonsense David Holland is working on - forgot the name
<hkaiser> DARMA
<heller_> kokkos doesn't have suspendable tasks, right?
<heller_> hkaiser: so, regarding thread_map_, what other usage than diagnostics does it have?
eschnett has quit [Quit: eschnett]
<hkaiser> heller_: it keeps the thread alive while its suspended
<heller_> so we are running in circles ;)
<hkaiser> if you remove the ref-counting, then the map is the only place that knows when to delete th ethread
<heller_> I tend to think that it should be safe to delete a thread once it is terminated
<hkaiser> ok
<github> [hpx] hkaiser deleted fix_stack_overhead at 1aee866: https://git.io/vNC7J
eschnett has joined #ste||ar
<heller_> hmm, lambda_to_action is still UB
<hkaiser> it works ;)
<hkaiser> as long as there are no captures it shouldn't be UB
<heller_> it is
<hkaiser> what's UB?
<heller_> since it is calling a function through a nullptr
<hkaiser> you're right :/
<hkaiser> was not aware Antoine did _this_
<heller_> I think I even mocked it during review
<hkaiser> heller_: what would we do without your foresight
<heller_> hkaiser: getting lost!
<hkaiser> indeed
<heller_> anyways
<heller_> antoine is still with you, isn't he?
<hkaiser> heller_: no
<hkaiser> he's long gone
<heller_> oh, ok
<heller_> who is going to fix this now ;)?
<hkaiser> heller_: whoever needs it to be fixed ;)
<hkaiser> heller_: but feel free to drop him a note, I'm sure he'll feel responsible
<heller_> hkaiser: it breaks my ubsan tests :p
<jbjnr> I seem to recall Antoine posting a link to this http://pfultz2.com/blog/2014/09/02/static-lambda/ when he did the nullptr derferencing
<jbjnr> (or something similar)
mcopik has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
K-ballo has joined #ste||ar
david_pfander has quit [Quit: david_pfander]
vamatya has joined #ste||ar
<heller_> jbjnr: yeah ... paul fultz doesn't call anything through a nullptr
<heller_> hkaiser: so far so good. complete test suite runs like a charm. with undefined behavior sanitizer.
<heller_> address sanitizer and a gcc release build is next
<heller_> so no test seems to rely on that
<hkaiser> on what?
<heller_> thread_id_type keeping thread_data alive
<hkaiser> ahh, cool
<heller_> simplifies a lot
<heller_> ok ... address sanitizer is more picky here ...
<heller_> doesn't really like those changes, might be valid bugs after all
<heller_> one source of bugs is that now thread_id_type isn't properly initialized :/
<Guest88606> [hpx] hkaiser created fixing_3102 (+1 new commit): https://git.io/vNWCD
<Guest88606> hpx/fixing_3102 204d29c Hartmut Kaiser: Adding support for generic counter_raw_values performance counter type...
<hkaiser> heller_: make it a custom type with pointer semantics
<hkaiser> similar to intrusive_ptr, just without reference counting
<hkaiser> you could have just made the addref and release functions empty :-)
<heller_> erm, yes ;)
ct-clmsn has quit [Quit: Leaving]
<heller_> hkaiser: did you ever have a problem with the new async traversal again?
<heller_> there still seems to be a small race condition
<heller_> here
<heller_> resumer might finish before returning, letting the refcount drop to zero
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
<hkaiser> heller_: have not seen that one
<hkaiser> resumer() can't return before frame() has finsihed executing, afaics
<hkaiser> btw, I'm not sure why he used a lambda in the first place, it's invoked inplace anyways
diehlpk has joined #ste||ar
diehlpk has quit [Ping timeout: 256 seconds]
Smasher has joined #ste||ar
diehlpk has joined #ste||ar
diehlpk has quit [Read error: Connection reset by peer]
nanashi55 has quit [Ping timeout: 276 seconds]
nanashi55 has joined #ste||ar