#ste||ar on 2018-11-13 — irc logs at irclog.cct.lsu.edu

2018-08-26 23:03 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:14 <heller_> so far, they are offering us one base server (t1.small.x86) and free access to their spot market (with lowest priority). We could then combine buildbot_travis and pycicle to drive builds on those servers (with the hope of high reliability) and switch to a weekly/monthly schedule for tests on daint and other larger resources with less reliable availabilty

00:20 <heller_> what do you think?

01:22 ste||ar-github has joined #ste||ar

01:22 <ste||ar-github> [hpx] PixelOfDeath opened pull request #3543: added Ubuntu dependency list to readme (master...readme_add_ubuntu_dependenc_list) https://github.com/STEllAR-GROUP/hpx/pull/3543

01:22 ste||ar-github has left #ste||ar [#ste||ar]

01:52 quaz0r has quit [Ping timeout: 245 seconds]

02:15 quaz0r has joined #ste||ar

03:09 hkaiser has joined #ste||ar

04:15 hkaiser has quit [Quit: bye]

04:36 nanashi55 has quit [Ping timeout: 252 seconds]

04:39 nanashi55 has joined #ste||ar

06:50 david_pfander has joined #ste||ar

08:31 <simbergm> heller_: sounds cool, but I have a feeling it won't be enough

08:32 <simbergm> we should already be testing more on each PR (although we could be more efficient about it)

08:32 <simbergm> paying is I guess still an option...

08:32 <heller_> Yes

08:32 <simbergm> Atom C2550 :)

08:32 <heller_> The thing I'm unhappy with picycle is that resources are too unreliable

08:33 <heller_> Yes, look up the other options to which we would have access

08:33 <heller_> Through the spot market...

08:34 <simbergm> ok, so what would that mean? whenever they have free instances we get some time there with lowest priority?

08:34 <heller_> And yes, we need faster turnaround times...

08:34 <heller_> Yes

08:34 <heller_> Lowest priority means, that whenever a paying customer comes around, we get suspended

08:35 ste||ar-github has joined #ste||ar

08:35 <ste||ar-github> [hpx] msimberg pushed 2 new commits to gh-pages: https://github.com/STEllAR-GROUP/hpx/compare/262273854669...7c261ce5fb01

08:35 <ste||ar-github> hpx/gh-pages 12c1faf Mikael Simberg: Update redirect to point to latest release

08:35 <ste||ar-github> hpx/gh-pages 7c261ce Mikael Simberg: Delete boostbook documentation

08:35 ste||ar-github has left #ste||ar [#ste||ar]

08:35 <heller_> And resumed eventually

08:35 <simbergm> do they give any numbers on how useful that might be? with > 2h jobs?

08:35 <heller_> No

08:35 <heller_> We'd have to try out

08:36 <simbergm> sure

08:36 <heller_> Or as you said, be more efficient with it ;)

08:36 <heller_> For example, get a new spot instance for each step we currently have in our workflow

08:40 <simbergm> yeah, with pycicle we wouldn't need to build every PR every time master is updated either

08:40 <heller_> Just some ideas... At the end of the day, if we'd invest about 500$ a month, I think we could come around with a decent set of resources

08:40 <simbergm> definitely

08:40 <heller_> Right. Pycicle could only run some longer lasting integration tests, performance regressions etc

08:42 ste||ar-github has joined #ste||ar

08:42 <ste||ar-github> [hpx] msimberg merged release into master: https://github.com/STEllAR-GROUP/hpx/compare/0c5eeed78de9...a40fc75290d3

08:42 ste||ar-github has left #ste||ar [#ste||ar]

08:49 ste||ar-github has joined #ste||ar

08:49 <ste||ar-github> [hpx] msimberg closed pull request #3329: fix dox for creating component instances through 'new_' is (master...master) https://github.com/STEllAR-GROUP/hpx/pull/3329

08:49 ste||ar-github has left #ste||ar [#ste||ar]

08:51 quaz0r has quit [Ping timeout: 240 seconds]

08:54 ste||ar-github has joined #ste||ar

08:54 <ste||ar-github> [hpx] msimberg pushed 5 new commits to master: https://github.com/STEllAR-GROUP/hpx/compare/a40fc75290d3...c97b98e00c8f

08:54 <ste||ar-github> hpx/master 08931a2 Christopher Hinz: Removing the 'hpx_is_target' macro...

08:54 <ste||ar-github> hpx/master 6c4694f Christopher Hinz: Remove SubProject.cmake...

08:54 <ste||ar-github> hpx/master d4c68b3 Christopher Hinz: Replace most macros with functions...

08:54 ste||ar-github has left #ste||ar [#ste||ar]

09:01 <heller_> simbergm: I think in order to make that efficient, we really need to modularize HPX

09:02 quaz0r has joined #ste||ar

09:03 <simbergm> hmm, not sure if it's necessary but it would be really nice

09:04 <simbergm> talked a bit about that with K-ballo, he'd like to do that as well but it's a biiiiig task

09:04 <simbergm> too many circular dependencies

09:04 <heller_> I think it's the only viable way: even if a given component fails, we are not blocked with other components

09:04 <heller_> yes

09:04 <simbergm> but maybe it would be possible to start at the top rather than the bottom

09:05 <simbergm> i.e. instead of trying to start with separating out the core, we start with separating e.g. the parallel algorithms

09:05 <simbergm> or something like that

09:05 <simbergm> would make it easier to know exactly which component is causing a failure as well

09:05 <heller_> yup

09:06 <heller_> and easier for others to choose only what they need

09:06 <simbergm> we can guess pretty well now, but it wouldn't hurt at least

09:06 <heller_> so yes, it will be a shitload of work

09:07 <heller_> but i think totally worth it

09:07 <simbergm> yeeep

09:07 <simbergm> it's not going to be easier in the future

09:07 <heller_> exactly

09:07 <heller_> also, it allows us to get rid of a lot of technical debt

09:07 <simbergm> we want it, convincing management to let us do that part of our time might not be as easy

09:07 <heller_> redesign components with the things we learned in the past

09:07 <heller_> yeah...

09:08 <heller_> only if we can show the benefits

09:08 <simbergm> but it doesn't have to be a full time thing

09:08 <heller_> so here is something I am working on right now

09:08 <simbergm> I think we know the benefits and people have even asked about that here in the group

09:08 <heller_> making our tasking system and scheduling faster

09:08 <heller_> for that, I decided to start from scratch to really analyze what's going on

09:09 <simbergm> yay! that sounds good

09:09 <heller_> the benefits are clear: We get a scalable scheduler in the end

09:09 <simbergm> ok, so what do you know so far?

09:09 <heller_> not a lot :P

09:09 <heller_> most of my time is still caught up with the EU project

09:10 <simbergm> "it's bad"

09:10 <simbergm> ah ok, so you're still doing that next to phylanx?

09:10 <heller_> what I know so far: our context switching is painfully slow. Most of it is attributed to the state machine for the thread state

09:10 <heller_> yes...

09:11 <heller_> this whole atomic business looks wasteful. And the fast path (start a thread, let it run to completion) is overly complicated that way

09:11 <heller_> this is mostly attributed to the fact, that we can have concurrent state changes

09:12 <heller_> so the premise of my work: Once a task goes from staged to pending, it can not be stolen anymore

09:12 <heller_> operations like resuming a task, can only be done on that very same core

09:13 <heller_> this means, that the resume operation needs to be a task pinned on the core that owns the task

09:13 <simbergm> yeah, that is most likely a win regardless of getting rid of atomics etc

09:13 <simbergm> simplifies many things

09:13 <heller_> so the only concurrency in the system would be the staged queues. This has a few nice resulting properties

09:13 <heller_> exactly

09:14 <heller_> stack management can be completely localized reducing potential cache misses

09:14 <heller_> we don't run into issues with compilers that cache accesses to thread_local variables

09:15 <heller_> and once a task goes from staged to pending, the scheduling is mighty fast (should be in the order of one function call)

09:16 <simbergm> you'd still need a context switch?

09:16 <simbergm> or after that?

09:16 <heller_> I still need a context switch

09:16 <heller_> i need to switch to the new task and back to the scheduling loop

09:16 <heller_> but if the task itself doesn't suspend, we can do nice tricks

09:17 <simbergm> exactly, I misinterpreted your "scheduling should be one function call"

09:17 <heller_> the costs should be in that order

09:17 <simbergm> yes, we could even avoid the context switches when a task finishes

09:17 <heller_> since that simplified context switch *almost* looks like a functions call

09:18 <heller_> yeah, I've been thinking about that ...

09:18 <simbergm> it complicates things, but could be a big win

09:18 <heller_> not allocate a stack for each task, only allocate when suspension is required

09:18 <simbergm> and can only be done in some cases

09:18 <heller_> I think it can be done

09:19 <simbergm> but I would think the common case would be that a task always just runs to completion, in which case it could save a lot of time

09:19 <heller_> exactly

09:24 <simbergm> ideas, ideas...

09:25 <simbergm> but you're not abandoning the lazy thread init idea? we should still try to push those, gives a simpler base to start from

09:29 <heller_> not exactly

09:29 <heller_> just simplifying it

09:29 <heller_> right now, we don't have an idea if a thread needs to get a new stack or not when in the scheduling loop

09:30 <heller_> because all we know is that it is pending

09:30 <heller_> with what I sketched, we know exactly when the task needs a new stack

09:30 <heller_> when going from staged->run (skipping the pending step

09:33 ste||ar-github has joined #ste||ar

09:33 <ste||ar-github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://github.com/STEllAR-GROUP/hpx/commit/7d067ccabe986bf3a8bde2070e0637452185741b

09:33 <ste||ar-github> hpx/gh-pages 7d067cc StellarBot: Updating Sphinx docs

09:33 ste||ar-github has left #ste||ar [#ste||ar]

09:34 <heller_> simbergm: https://i.imgur.com/6TvSyVk.png

09:35 <heller_> s/run/active/g

09:37 <heller_> scheduling could then look like this: 1) If we have tasks in the pending queue, run them (no concurrency here) 2) no tasks in the pending queue, try to get some from our staged queue (will need synchronization), if nothing is in our staged queue, steal from neighbors.

09:38 <heller_> the tasks only get a stack when put from staged->run. The stack is being recycled onces the task terminated

09:39 <heller_> `run -> suspended` and `run -> pending` can only be done on `this_thread`. `suspended -> pending` is done from another thread

09:40 <heller_> there needs to be additional state to also support timed suspension and take into account that we can have two reasons for `suspended -> pending`, one is the timeout, the other is the resume due to a condition.

09:42 <heller_> this can be easily handled by adding another state: https://i.imgur.com/vLLjtS2.png

09:43 <heller_> a task goes into the depleted state, if there is another task that might still has a handle (either the timer or the 'condtional' resumer). But this reference counting is again just a thread local operation without any additional synchronization required

09:50 <heller_> simbergm: https://i.imgur.com/tu8YPzY.png <-- annotated state diagram

09:53 <heller_> that's essentially what I have so far ... just thought experiments...

09:53 <heller_> but I think a great simplification, and we have a nice chance to concentrate to optimize the different parts

09:53 <heller_> so for example optimization the synchronization strategy on the staged queue

09:54 <heller_> or the local stack reusage

09:55 <heller_> this should work out though, as we really minimze the possible contention points

10:04 <heller_> accompanied with something like this: https://gist.github.com/sithhell/84e0a5e941c41ac3c9be703781aa0a55

10:13 <simbergm> heller_: I'm mostly following, but you have to go easy on me

10:13 <simbergm> I'm thinking what's the best next step, to make sure this doesn't stay a thought experiment

10:14 <simbergm> only stealing staged tasks is one thing we can do already now fairly easily

10:14 <simbergm> this whole restructuring of the task state sounds like a bigger thing, but I'm not sure

10:15 <simbergm> I like the current lazy thread init/remove wait or add new even just because they simplify things

10:16 <simbergm> but of course we're going to take some wrong turns along the way

10:17 <heller_> ok

10:17 <heller_> it's not too much of a deviation of what we have right now

10:17 <heller_> it mostly affects the scheduler internals only

10:18 <simbergm> right, you're a better judge right now of how much work this could be

10:21 <simbergm> all I can say is that it sounds good

10:22 <simbergm> one thing that we should think about with the split between staged and pending is fairness

10:22 <simbergm> a running/pending thread that is only yielding will keep staged threads from running until it is ready

10:23 <simbergm> although semantically I guess staged and pending threads should be equal

10:24 <heller_> (with the respect to that it shouldn't stay a thought experiment, I fully agree)

10:25 <heller_> the difference in semantics is the required synchronization

10:25 <heller_> fairness is a good point

10:26 <heller_> it's a trade of between contention (the synchronization required for the staged queues) and not starving the system

10:26 ste||ar-github has joined #ste||ar

10:26 <ste||ar-github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://github.com/STEllAR-GROUP/hpx/commit/b5299bb126ec3f4e2054552b66c24f0aece660cf

10:26 <ste||ar-github> hpx/gh-pages b5299bb StellarBot: Updating Sphinx docs

10:26 ste||ar-github has left #ste||ar [#ste||ar]

10:26 <heller_> in any case ... the review for allscale is next week on friday, then i'll work almost full time on this

10:28 <simbergm> ok

10:28 <simbergm> good luck for that!

10:29 <heller_> simbergm: what I always envision is to run a similar scheduling loop on GPUs ;)

10:29 <simbergm> we'll have the hpx call on the 27th as well, that'll be a good opportunity to discuss

10:29 <heller_> yes

10:29 <simbergm> some day :)

10:30 <heller_> with the exception that we currently can't suspend/resume tasks on GPUs, but that shouldn't a big deal for the algorithms to run on GPUs anyway

10:30 <simbergm> I don't know if you were informed about this, but I did a visit to sandia

10:30 <simbergm> https://github.com/msimberg/kokkos/tree/hpx-backend

10:30 <heller_> yes, I know

10:30 <simbergm> ok

10:30 <heller_> did you get any results so far?

10:30 <simbergm> it's a bit scruffy, but it workds

10:30 <simbergm> works

10:31 <heller_> performance wise?

10:31 <simbergm> it's slow, but working on setting up some more apps/benchmarks

10:31 <heller_> ok

10:31 <heller_> kokkos should benefit greatly from those things

10:31 <heller_> as it's primary use case is data parallelism

10:32 <simbergm> yes and no, performance wise they can do much better because it's just fork-join

10:33 <simbergm> but they're interested in looking at what we have and experimenting with something like futures in their api

10:37 <heller_> right

10:38 <heller_> there's nothing prevent us from getting better at fork-join as well ...

10:38 <heller_> and I think this is the right step in that direction

10:38 <heller_> in the fork-join model, having futures is a total overkill, for example

10:38 <heller_> unless you do a asynchronous fork-join

10:42 <heller_> simbergm: the problem there however is, that our tasking already comes with soo much overhead, we'll always loose against this simple data parallel fork-join model

10:42 <heller_> at least on fine granularities

10:44 <jbjnr__> heller_: simbergm I've just seen all this writing but cannot read it because I have to go to the other building. Can you create an issue with these ideas in so we can read it properly, comment and put together a plan between us.

10:45 <jbjnr__> (I mean the scheduler clean up stuff)

10:50 <heller_> jbjnr__: I will very soon (promise)

10:50 <simbergm> probably not a bad idea

10:54 <heller_> yeah

10:56 <heller_> I am planning on doing a series of blog posts on that topic...

11:03 <simbergm> ooh, nice

11:03 <simbergm> which reminds me, I said I would do some... :/

11:07 ste||ar-github has joined #ste||ar

11:07 <ste||ar-github> [hpx] msimberg opened pull request #3544: Post 1.2.0 version bumps (master...version-bump) https://github.com/STEllAR-GROUP/hpx/pull/3544

11:07 ste||ar-github has left #ste||ar [#ste||ar]

11:33 nanashi55 has quit [Ping timeout: 252 seconds]

11:34 nanashi55 has joined #ste||ar

11:44 nanashi55 has quit [Ping timeout: 268 seconds]

11:50 nanashi55 has joined #ste||ar

11:54 nanashi55 has quit [Ping timeout: 250 seconds]

11:54 nanashi55 has joined #ste||ar

12:29 nanashi55 has quit [Ping timeout: 244 seconds]

12:30 david_pfander has quit [Ping timeout: 240 seconds]

12:32 nanashi55 has joined #ste||ar

12:52 nanashi55 has quit [Ping timeout: 276 seconds]

12:57 nanashi55 has joined #ste||ar

13:04 nanashi55 has quit [Ping timeout: 272 seconds]

13:05 nanashi55 has joined #ste||ar

13:26 nanashi55 has quit [Ping timeout: 244 seconds]

13:28 nanashi55 has joined #ste||ar

13:41 david_pfander has joined #ste||ar

13:44 nanashi55 has quit [Ping timeout: 260 seconds]

13:48 nanashi55 has joined #ste||ar

15:16 hkaiser has joined #ste||ar

15:30 <hkaiser> see: https://isocpp.org/

15:31 <hkaiser> thanks again simbergm

15:38 <hkaiser> also, please upvote on reddit (https://www.reddit.com/r/cpp/)

15:43 <simbergm> whew, I posted one as well but it's gone again now :)

15:43 <simbergm> (on reddit that is)

15:43 <simbergm> thanks hkaiser

15:44 <hkaiser> simbergm: ohh sorry

15:44 <simbergm> upvoted

15:44 <simbergm> no problem

15:44 <hkaiser> didn't know you already did this

15:44 <simbergm> I was just being slow, you reminded me

15:44 <simbergm> I registered after I saw your isocpp post

15:45 <simbergm> next time I'll really post the link to reddit, now I already have an account ;)

15:46 <hkaiser> heh

16:33 <heller_> hkaiser: hey! How is the demo going?

16:39 <heller_> (and the rest of course)

16:47 parsa has joined #ste||ar

16:50 <hkaiser> heller_: running

16:50 <hkaiser> heller_: it's not really selfexplanatory, though

16:51 nikunj has joined #ste||ar

16:51 david_pfander has quit [Ping timeout: 245 seconds]

16:51 hkaiser has quit [Quit: bye]

16:56 <zao> simbergm: I wonder about the reddit announcement, is it technically possible to have the first blurb and/or the highlights inline in the post, to maybe foster a bit of discussion?

16:57 <zao> Oh, wait, it was hkaiser that posted it :)

16:57 <simbergm> zao: that's a good point, but I think not officially

16:58 <simbergm> they have the option of submitting a "post", "image or video" or a "link"

16:58 <zao> (I of course don't know the capabilities of reddit, nor the rules of the subreddit ^^)

16:58 <simbergm> and with the link option there's no place to put a longer description ;?

16:59 <zao> I see.

16:59 <simbergm> hmm, ;? was not what I was going for but maybe it fits

16:59 <zao> Evolutionary emoticonology.

16:59 <simbergm> I guess we could put a link into a post, but I don't know if that's the right way to do it (I don't really know the etiquette on reddit either)

17:00 <zao> Ditto, just idly musing.

18:04 jaafar has quit [Quit: Konversation terminated!]

18:20 preejackie has joined #ste||ar

18:24 mdiers_ has quit [Remote host closed the connection]

18:24 mdiers_ has joined #ste||ar

18:28 preejackie has left #ste||ar ["WeeChat 1.9.1"]

18:33 parsa has quit [Quit: Zzzzzzzzzzzz]

18:37 parsa has joined #ste||ar

18:37 preejackie has joined #ste||ar

18:38 preejackie has quit [Read error: Connection reset by peer]

18:39 parsa has quit [Client Quit]

18:44 parsa has joined #ste||ar

18:55 parsa has quit [Quit: Zzzzzzzzzzzz]

19:18 jaafar has joined #ste||ar

19:57 parsa has joined #ste||ar

20:02 nanashi55 has quit [Ping timeout: 276 seconds]

20:03 nanashi55 has joined #ste||ar

20:35 parsa has quit [Quit: Zzzzzzzzzzzz]

21:00 <heller_> simbergm: btw, the deadline for the scheduler things to get into master is january ;)

21:00 <heller_> as I have to give a talk on it mid february ;)

21:15 <simbergm> I like that :P

21:16 <simbergm> what is it for? it's not your defense, is it...? (I thought you might have done that already)

21:28 <heller_> no, not my defense

21:28 <heller_> https://www.parallelcon.de/veranstaltung-7649-massiver-parallelismus-mit-c%2B%2B.html?source=0&id=7649

21:29 <simbergm> ah, nice

21:29 <simbergm> did you already have your defense?

21:31 <heller_> no

21:31 <heller_> the thesis got delivered to the reviewers this week

21:32 <simbergm> ooh, so you're almost free now

21:35 <heller_> yes

21:59 <heller_> static linking is broken...

22:36 jbjnr_ has joined #ste||ar

22:40 jbjnr__ has quit [Ping timeout: 268 seconds]

23:31 jbjnr_ has quit [Read error: Connection reset by peer]

23:39 parsa[w] has quit [Read error: Connection reset by peer]

23:42 parsa[w] has joined #ste||ar

23:43 jaafar has quit [Ping timeout: 250 seconds]