#ste||ar on 2018-01-15 — irc logs at irclog.cct.lsu.edu

2017-05-17 13:54 aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

02:04 hkaiser has quit [Quit: bye]

03:16 hkaiser has joined #ste||ar

03:49 Vir has quit [Read error: Connection reset by peer]

03:55 hkaiser has quit [Quit: bye]

04:36 nanashi55 has quit [Ping timeout: 256 seconds]

04:37 nanashi55 has joined #ste||ar

07:42 simbergm has quit [Ping timeout: 240 seconds]

07:44 Smasher has joined #ste||ar

07:54 mcopik has quit [Ping timeout: 255 seconds]

07:55 simbergm has joined #ste||ar

08:29 Smasher has quit [Remote host closed the connection]

08:36 david_pfander has joined #ste||ar

12:04 <github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vNC4D

12:04 <github> hpx/gh-pages 6187b44 StellarBot: Updating docs

13:23 <heller_> simbergm: great! the PR you merged broke everything but gcc 4.9 ;)

13:23 <heller_> please revert, needs a better solution

13:24 <simbergm> heller_: yeah, quite impressive :P okay, will revert right away

13:27 <simbergm> jbjnr: do you know what's up here: http://cdash.cscs.ch/testDetails.php?test=11594145&build=72817

13:31 <jbjnr> no idea. invalid user id? I wonder if the disk space / inode error is reappearing with a new disguise ...

13:33 <simbergm> mmh, could be... not sure yet if it's just temporary problems, will see soon'

13:36 <heller_> jbjnr: you know what I figured out just now?

13:36 <jbjnr> That the earth is not flat?

13:36 <jbjnr> please tell

13:36 <jbjnr> ...

13:37 <heller_> hpx::async([](){}).get(); <-- this takes in the order of 3000 cycles

13:37 <jbjnr> a lot happens, so I'm not completely surprised

13:37 <heller_> while just doing a context switch, takes roughly 300 cycles

13:37 <jbjnr> yup

13:37 <heller_> something's wrong there

13:38 <jbjnr> creating a task and managing all those queues/schedulers etc costs $$$

13:38 <heller_> we really should be able to do better here...

13:38 <jbjnr> hpx::async(hpx::launch::fork,[](){}).get()

13:38 <jbjnr> how much?

13:38 <heller_> give me a second

13:39 <jbjnr> probably no differnt if your test has no other work goin on

13:39 <jbjnr> it would only help if the queues were being contended

13:39 <heller_> yeah

13:40 <heller_> so, what I discovered, is that those different task_ and work_ queues cost time

13:40 <heller_> that is, when doing a hpx::async([](){}), it ends up in the task_ queue, only to be placed in the work_ queue once wait_or_add_new is being called

13:40 <heller_> for my simple test, this accounts for 30% of the runtime

13:41 <jbjnr> yes, and this happens every N cycles, so there's always a delay

13:41 <jbjnr> it's on my list of things to look into

13:41 <jbjnr> we have to get the cholesky snmall block sizes runnig faster before end march

13:41 <heller_> ok

13:42 <heller_> I want to have nice numbers for my thesis, essentially 4 weeks ago

13:42 <jbjnr> I'll send you and hk an email about it

13:42 <jbjnr> when do you start new job?

13:42 <heller_> no idea yet

13:42 <jbjnr> have you got one yet?

13:42 <heller_> nope

13:42 <jbjnr> (new job I mean)

13:43 <jbjnr> ah

13:43 <jbjnr> why not?

13:43 <heller_> if I only knew ;)

13:44 <jbjnr> handed in PhD yet?

13:44 hkaiser has joined #ste||ar

13:44 <heller_> jbjnr: no

13:44 <jbjnr> <sigh>

13:44 <heller_> stop bothering me

13:44 <jbjnr> I'm not bothering, just curious. that's all.

13:44 <heller_> ;)

13:44 <jbjnr> Didn't nag you like your mother or anything

13:44 <heller_> I am running those low level micro benchmarks right now

13:45 <jbjnr> that's hkaiser's job

13:45 <heller_> and well, the numbers just suck

13:45 <jbjnr> hpx::launch::sync might have been a better choice

13:45 <jbjnr> for was wrong

13:45 <jbjnr> k

13:45 <jbjnr> bbiab

13:46 <heller_> sync doesn't create any new task

13:46 <hkaiser> heller_: you're down to micro-optimization :/

13:46 <heller_> yes

13:47 <heller_> hkaiser: I have a problem describing super fine grained dependencies and stuff, and then, when running benchmarks, they don't support those claims :(

13:47 <hkaiser> lol

13:48 <hkaiser> there is that

13:49 <heller_> but I found a very nice representation for varying granularity

13:49 <hkaiser> ok?

13:49 <heller_> well, the numbers aren't too bad, I guess

13:49 <heller_> a heat map

13:50 <heller_> still generating numbers, will show you in a second

13:50 <hkaiser> cool, thanks

13:50 <hkaiser> jbjnr: why is running the micro-benchmarks my job?

13:51 <heller_> hkaiser: https://i.imgur.com/2tJDZcY.png

13:51 <heller_> hkaiser: nagging me is your job

13:51 <hkaiser> ahh!

13:51 <hkaiser> makes sense

13:51 <heller_> hkaiser: your job is also to let us micro optimize ;)

13:51 <hkaiser> yah, we've been generating this kind of graphs for the parcel-coalescing - works well

13:52 <heller_> way nicer to have multiple graphs in the same plot

13:53 <hkaiser> heller_: it's a way to visualize the 3d graph we talked about

13:53 <heller_> yup

13:54 <heller_> it's essentially the 3d plot when looking at it "from above"

13:54 <hkaiser> nod

13:54 <heller_> after thinking a little more about those numbers, they aren't that bad after all

13:55 <heller_> they could just be *way* better ;)

13:55 <hkaiser> sure

13:55 <hkaiser> if you find a way to improve basic overheads - great

13:56 <heller_> I did indeed profile a little

13:58 <heller_> https://github.com/sithhell/hpxbenchmarks/blob/master/tasks/future_overhead.cpp#L73 <-- this is essentially what I profiled

13:58 <heller_> running on a single core

13:58 <jbjnr> hkaiser: running the benchmarks is not your job - nagging heller to finish his PhD is your job. Sorry, conversation got mixed up in multiple comments

13:58 <hkaiser> jbjnr: nod, heller_ already said as much

13:58 <hkaiser> jbjnr: thanks for your trust on this

13:59 <heller_> the hot spots are: 1) wait_or_add_new 2) thread_map_.insert

13:59 <hkaiser> heller_: as expected

13:59 <heller_> the first shouldn't even appear for this simple benchmark, IMHO

14:00 <hkaiser> it's being called if queues run empty

14:00 <heller_> which shouldn't be the case, as there is always work

14:01 <jbjnr> tell me what "wait_or_add_new" actually does please, it is on my list of things to look at

14:01 <heller_> it transforms "tasks" into "work"

14:01 <heller_> more or less

14:02 <jbjnr> bit more detail please

14:02 <jbjnr> but thanks

14:02 <hkaiser> heller_: towards the end utilization tapers off

14:02 <heller_> for this specific, simple benchmark, I always have at least 1 task that can be run

14:04 <heller_> what's happening is that we first push the task description into the scheduler queue, then 'get()' suspends, obviously, there is no other work to be performed, so this task description needs to be turned into work by wait_or_add_new

14:05 <hkaiser> heller_: yah, the 'runnow' is used somewhat inconsistently ;)

14:05 <hkaiser> I was planning to look into this at some point

14:06 <heller_> it needs to go ;)

14:06 <hkaiser> I don't think we should always imediatly create a thread

14:06 <hkaiser> that can easily overwhelm the system

14:06 <heller_> I think we should always immediately create a thread, but lazily allocate the stack

14:07 <hkaiser> we should use a different criteria from 'gut feeling', though

14:07 <hkaiser> heller_: that's one way

14:07 <heller_> and have a thread local allocator for the stack

14:07 <heller_> so we don't have any contention there

14:07 <hkaiser> stacks are cached anyways

14:08 <heller_> yes

14:08 <heller_> this caching, i think, can be done without locking

14:08 <heller_> just pop of thread locally when you need it, and push back thread locally when the task is terminated

14:09 <hkaiser> ok

14:09 <jbjnr> yes

14:09 <jbjnr> who is going to d this? I need it now.

14:09 <jbjnr> do^

14:10 <jbjnr> but I must work on something else for a while

14:12 <heller_> it's not a easy task, i think

14:12 <heller_> quite involved

14:13 <jbjnr> should be fairly straight forward - for one of us at least.

14:13 <jbjnr> not GSoC anyway :)

14:13 <heller_> the problem is, that a lot of those things are very tightly connected ...

14:14 <heller_> thread_data, thread_queue and stack allocation mostly

14:14 <hkaiser> the stack-handling can be done independently

14:15 <jbjnr> the real question would be - can thread_map be removed?

14:15 <jbjnr> is there a better way

14:16 <hkaiser> jbjnr: I don't see how - if you have an idea - great

14:16 <heller_> from what I can see, thread_map is only needed for diagnostics, that is to have a handle on the suspended threads

14:16 <heller_> hkaiser: btw, do you remember why thread_data needs to be refcounted?

14:17 <hkaiser> heller_: suspended threads - yes - but not just diagnsotics

14:17 <hkaiser> thread_id is a instrusive pointer

14:17 <heller_> which is refcounted

14:17 <hkaiser> yes

14:18 <heller_> but why?

14:18 <hkaiser> to keep things alive?

14:18 <heller_> but what for?

14:18 <hkaiser> so it does not get deleted prematurely?

14:18 <heller_> shouldn't the lifetime end, once the thread has been terminated?

14:18 <hkaiser> there was a corner case - don't remember

14:19 <heller_> probably a race condition for setting the thread state (or similar) for a terminated thread

14:20 <jbjnr> once a thread completes, it should be set terminated by thre thread it is running on - race between who?

14:20 <hkaiser> could be

14:21 <hkaiser> others might still hold a handle to it

14:21 <heller_> I can't think of a use case where you need 'thread_data *' to be valid after the thread has been terminated that's not a bug

14:21 <heller_> what for?

14:21 <hkaiser> shrug

14:21 <hkaiser> as I said, I forgot

14:22 <heller_> too bad ...

14:22 <jbjnr> try to remember before I start cleaning it up

14:22 <heller_> the refcounting is something that appears in the profiles as well :/

14:22 <hkaiser> switch to a raw pointer and see what happens...

14:22 <hkaiser> heller_: yah, probably in the lower 2%

14:22 <heller_> nope

14:23 <hkaiser> shrug

14:24 <heller_> hpx::threads::intrusive_ptr_release appears as the second entry in vtune's 'bottom-up' view

14:24 <hkaiser> k

14:25 <hkaiser> that might be timing the allocator

14:25 <heller_> half of the time inside wait_or_add_new

14:25 <hkaiser> heller_: if release itself is an issue, why don't you see addref, then?

14:26 <heller_> I see it

14:27 <heller_> 6th entry

14:27 <heller_> release: 54 ms, add_ref, 45 ms

14:27 <hkaiser> as I waid - if you can improve things - be my guest

14:27 <hkaiser> no hard feelings, really

14:27 <hkaiser> said*

14:28 <jbjnr> email sent

14:28 <heller_> hkaiser: don't want to waste time just to discover that ref counting is needed after all :/

14:29 <hkaiser> heller_: two changes needed: a) change the typedef, b) delete explicitly on removal from the map

14:29 <hkaiser> shouldn't be much

14:30 <hkaiser> jbjnr: why should I go to that workshop ?

14:30 <heller_> hkaiser: and clean up all places which call .get() ;)

14:31 <hkaiser> the compiler will tell you

14:32 <jbjnr> because if you want HPC people to be intrested in HPX, then lots of big names will be there (probably) and I always say things like "it's good, but this is shit and so is that" and we need someone who really cares about HPX to defend it

14:32 eschnett has joined #ste||ar

14:32 <hkaiser> jbjnr: ok, but shouldn't I give the talk, then?

14:32 <jbjnr> no. It has to be an HPX talk

14:32 <jbjnr> HPC^^^

14:32 <hkaiser> ahh, and I'm not an HPC person, I know

14:32 <jbjnr> correct

14:33 <hkaiser> well, have fun, then

14:33 <heller_> jbjnr: I am looking into scheduling

14:33 <heller_> let's see...

14:33 <jbjnr> just talk to your boss and mention that if he's invited this year, then he should bring you as well - or send you in his place

14:34 <hkaiser> no idea why I should do that besides going to Hawaii

14:34 <hkaiser> I hate travelling, especially if it's pointless

14:35 <jbjnr> because it's about networking

14:35 <hkaiser> jbjnr: cool, so it's right p your alley

14:36 <jbjnr> no. I'm a social misfit.

14:36 <jbjnr> networking is not my thing

14:36 <hkaiser> and I'm not an HPC person

14:36 <hkaiser> networking is not my thing either, I barely know how to use a socket

14:36 <jbjnr> no, but your good at being important

14:36 <hkaiser> lol

14:38 <jbjnr> and you are good at arguing and discussing stuff. Which I'm not. I'm only good at ranting and stuff.

14:38 <hkaiser> jbjnr: when is that workshop?

14:38 Vir has joined #ste||ar

14:38 <jbjnr> end march sometime. can't remember exactly

14:39 <jbjnr> 26-29 ish

14:39 <hkaiser> will not happen, I'm in Japan mid March, that's more travel than I can handle already

14:40 <jbjnr> ok. You win.

14:40 <jbjnr> what's in Japan? C++ meeting?

14:40 <hkaiser> a workshop at some stupid conference

14:40 <hkaiser> giving a talk there...

14:40 <jbjnr> your standards knowledge would have been good for SOS.

14:41 <hkaiser> next year, perhaps

14:41 <jbjnr> by next year they'll all be using kokkosunless we can stop them at this year's SOS meeting.

14:41 <hkaiser> jbjnr: forget about kokkos

14:42 <jbjnr> can't. Sadia. Oak Ridge, everyone else. They love it

14:42 <jbjnr> ^Sandia

14:42 <hkaiser> it was shotgun married with the nonsense David Holland is working on - forgot the name

14:43 <hkaiser> DARMA

15:03 <heller_> kokkos doesn't have suspendable tasks, right?

15:06 <heller_> hkaiser: so, regarding thread_map_, what other usage than diagnostics does it have?

15:17 eschnett has quit [Quit: eschnett]

15:17 <hkaiser> heller_: it keeps the thread alive while its suspended

15:18 <heller_> so we are running in circles ;)

15:19 <hkaiser> if you remove the ref-counting, then the map is the only place that knows when to delete th ethread

15:20 <heller_> I tend to think that it should be safe to delete a thread once it is terminated

15:20 <hkaiser> ok

15:21 <github> [hpx] hkaiser deleted fix_stack_overhead at 1aee866: https://git.io/vNC7J

15:42 eschnett has joined #ste||ar

16:04 <heller_> hmm, lambda_to_action is still UB

16:05 <hkaiser> it works ;)

16:05 <hkaiser> as long as there are no captures it shouldn't be UB

16:05 <heller_> it is

16:05 <hkaiser> what's UB?

16:05 <heller_> since it is calling a function through a nullptr

16:06 <heller_> https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/runtime/actions/lambda_to_action.hpp#L58-L59

16:07 <hkaiser> you're right :/

16:07 <hkaiser> was not aware Antoine did _this_

16:07 <heller_> I think I even mocked it during review

16:08 <hkaiser> heller_: what would we do without your foresight

16:08 <heller_> hkaiser: getting lost!

16:08 <hkaiser> indeed

16:09 <heller_> anyways

16:09 <heller_> antoine is still with you, isn't he?

16:09 <hkaiser> heller_: no

16:10 <hkaiser> he's long gone

16:10 <heller_> oh, ok

16:10 <heller_> who is going to fix this now ;)?

16:12 <hkaiser> heller_: whoever needs it to be fixed ;)

16:12 <hkaiser> heller_: but feel free to drop him a note, I'm sure he'll feel responsible

16:22 <heller_> hkaiser: it breaks my ubsan tests :p

16:29 <jbjnr> I seem to recall Antoine posting a link to this http://pfultz2.com/blog/2014/09/02/static-lambda/ when he did the nullptr derferencing

16:29 <jbjnr> (or something similar)

16:30 mcopik has joined #ste||ar

16:35 K-ballo has quit [Quit: K-ballo]

16:35 K-ballo has joined #ste||ar

16:59 david_pfander has quit [Quit: david_pfander]

18:14 vamatya has joined #ste||ar

18:15 <heller_> jbjnr: yeah ... paul fultz doesn't call anything through a nullptr

18:59 <heller_> hkaiser: so far so good. complete test suite runs like a charm. with undefined behavior sanitizer.

18:59 <heller_> address sanitizer and a gcc release build is next

19:00 <heller_> so no test seems to rely on that

19:14 <hkaiser> on what?

19:14 <heller_> thread_id_type keeping thread_data alive

19:16 <hkaiser> ahh, cool

19:17 <heller_> simplifies a lot

19:20 <heller_> ok ... address sanitizer is more picky here ...

19:22 <heller_> doesn't really like those changes, might be valid bugs after all

19:23 <heller_> one source of bugs is that now thread_id_type isn't properly initialized :/

19:24 <Guest88606> [hpx] hkaiser created fixing_3102 (+1 new commit): https://git.io/vNWCD

19:24 <Guest88606> hpx/fixing_3102 204d29c Hartmut Kaiser: Adding support for generic counter_raw_values performance counter type...

19:28 <hkaiser> heller_: make it a custom type with pointer semantics

19:29 <hkaiser> similar to intrusive_ptr, just without reference counting

19:29 <hkaiser> you could have just made the addref and release functions empty :-)

19:31 <heller_> erm, yes ;)

19:55 ct-clmsn has quit [Quit: Leaving]

20:00 <heller_> hkaiser: did you ever have a problem with the new async traversal again?

20:06 <heller_> there still seems to be a small race condition

20:16 <heller_> https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/util/detail/pack_traversal_async_impl.hpp#L607

20:16 <heller_> here

20:16 <heller_> resumer might finish before returning, letting the refcount drop to zero

20:19 eschnett has quit [Quit: eschnett]

20:45 eschnett has joined #ste||ar

20:48 <hkaiser> heller_: have not seen that one

20:49 <hkaiser> resumer() can't return before frame() has finsihed executing, afaics

20:50 <hkaiser> btw, I'm not sure why he used a lambda in the first place, it's invoked inplace anyways

20:57 diehlpk has joined #ste||ar

21:11 diehlpk has quit [Ping timeout: 256 seconds]

22:33 Smasher has joined #ste||ar

22:35 diehlpk has joined #ste||ar

22:35 diehlpk has quit [Read error: Connection reset by peer]

23:53 nanashi55 has quit [Ping timeout: 276 seconds]

23:57 nanashi55 has joined #ste||ar