hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
<heller_> so far, they are offering us one base server (t1.small.x86) and free access to their spot market (with lowest priority). We could then combine buildbot_travis and pycicle to drive builds on those servers (with the hope of high reliability) and switch to a weekly/monthly schedule for tests on daint and other larger resources with less reliable availabilty
<heller_> what do you think?
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] PixelOfDeath opened pull request #3543: added Ubuntu dependency list to readme (master...readme_add_ubuntu_dependenc_list) https://github.com/STEllAR-GROUP/hpx/pull/3543
ste||ar-github has left #ste||ar [#ste||ar]
quaz0r has quit [Ping timeout: 245 seconds]
quaz0r has joined #ste||ar
hkaiser has joined #ste||ar
hkaiser has quit [Quit: bye]
nanashi55 has quit [Ping timeout: 252 seconds]
nanashi55 has joined #ste||ar
david_pfander has joined #ste||ar
<simbergm> heller_: sounds cool, but I have a feeling it won't be enough
<simbergm> we should already be testing more on each PR (although we could be more efficient about it)
<simbergm> paying is I guess still an option...
<heller_> Yes
<simbergm> Atom C2550 :)
<heller_> The thing I'm unhappy with picycle is that resources are too unreliable
<heller_> Yes, look up the other options to which we would have access
<heller_> Through the spot market...
<simbergm> ok, so what would that mean? whenever they have free instances we get some time there with lowest priority?
<heller_> And yes, we need faster turnaround times...
<heller_> Yes
<heller_> Lowest priority means, that whenever a paying customer comes around, we get suspended
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] msimberg pushed 2 new commits to gh-pages: https://github.com/STEllAR-GROUP/hpx/compare/262273854669...7c261ce5fb01
<ste||ar-github> hpx/gh-pages 12c1faf Mikael Simberg: Update redirect to point to latest release
<ste||ar-github> hpx/gh-pages 7c261ce Mikael Simberg: Delete boostbook documentation
ste||ar-github has left #ste||ar [#ste||ar]
<heller_> And resumed eventually
<simbergm> do they give any numbers on how useful that might be? with > 2h jobs?
<heller_> No
<heller_> We'd have to try out
<simbergm> sure
<heller_> Or as you said, be more efficient with it ;)
<heller_> For example, get a new spot instance for each step we currently have in our workflow
<simbergm> yeah, with pycicle we wouldn't need to build every PR every time master is updated either
<heller_> Just some ideas... At the end of the day, if we'd invest about 500$ a month, I think we could come around with a decent set of resources
<simbergm> definitely
<heller_> Right. Pycicle could only run some longer lasting integration tests, performance regressions etc
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] msimberg merged release into master: https://github.com/STEllAR-GROUP/hpx/compare/0c5eeed78de9...a40fc75290d3
ste||ar-github has left #ste||ar [#ste||ar]
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] msimberg closed pull request #3329: fix dox for creating component instances through 'new_' is (master...master) https://github.com/STEllAR-GROUP/hpx/pull/3329
ste||ar-github has left #ste||ar [#ste||ar]
quaz0r has quit [Ping timeout: 240 seconds]
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] msimberg pushed 5 new commits to master: https://github.com/STEllAR-GROUP/hpx/compare/a40fc75290d3...c97b98e00c8f
<ste||ar-github> hpx/master 08931a2 Christopher Hinz: Removing the 'hpx_is_target' macro...
<ste||ar-github> hpx/master 6c4694f Christopher Hinz: Remove SubProject.cmake...
<ste||ar-github> hpx/master d4c68b3 Christopher Hinz: Replace most macros with functions...
ste||ar-github has left #ste||ar [#ste||ar]
<heller_> simbergm: I think in order to make that efficient, we really need to modularize HPX
quaz0r has joined #ste||ar
<simbergm> hmm, not sure if it's necessary but it would be really nice
<simbergm> talked a bit about that with K-ballo, he'd like to do that as well but it's a biiiiig task
<simbergm> too many circular dependencies
<heller_> I think it's the only viable way: even if a given component fails, we are not blocked with other components
<heller_> yes
<simbergm> but maybe it would be possible to start at the top rather than the bottom
<simbergm> i.e. instead of trying to start with separating out the core, we start with separating e.g. the parallel algorithms
<simbergm> or something like that
<simbergm> would make it easier to know exactly which component is causing a failure as well
<heller_> yup
<heller_> and easier for others to choose only what they need
<simbergm> we can guess pretty well now, but it wouldn't hurt at least
<heller_> so yes, it will be a shitload of work
<heller_> but i think totally worth it
<simbergm> yeeep
<simbergm> it's not going to be easier in the future
<heller_> exactly
<heller_> also, it allows us to get rid of a lot of technical debt
<simbergm> we want it, convincing management to let us do that part of our time might not be as easy
<heller_> redesign components with the things we learned in the past
<heller_> yeah...
<heller_> only if we can show the benefits
<simbergm> but it doesn't have to be a full time thing
<heller_> so here is something I am working on right now
<simbergm> I think we know the benefits and people have even asked about that here in the group
<heller_> making our tasking system and scheduling faster
<heller_> for that, I decided to start from scratch to really analyze what's going on
<simbergm> yay! that sounds good
<heller_> the benefits are clear: We get a scalable scheduler in the end
<simbergm> ok, so what do you know so far?
<heller_> not a lot :P
<heller_> most of my time is still caught up with the EU project
<simbergm> "it's bad"
<simbergm> ah ok, so you're still doing that next to phylanx?
<heller_> what I know so far: our context switching is painfully slow. Most of it is attributed to the state machine for the thread state
<heller_> yes...
<heller_> this whole atomic business looks wasteful. And the fast path (start a thread, let it run to completion) is overly complicated that way
<heller_> this is mostly attributed to the fact, that we can have concurrent state changes
<heller_> so the premise of my work: Once a task goes from staged to pending, it can not be stolen anymore
<heller_> operations like resuming a task, can only be done on that very same core
<heller_> this means, that the resume operation needs to be a task pinned on the core that owns the task
<simbergm> yeah, that is most likely a win regardless of getting rid of atomics etc
<simbergm> simplifies many things
<heller_> so the only concurrency in the system would be the staged queues. This has a few nice resulting properties
<heller_> exactly
<heller_> stack management can be completely localized reducing potential cache misses
<heller_> we don't run into issues with compilers that cache accesses to thread_local variables
<heller_> and once a task goes from staged to pending, the scheduling is mighty fast (should be in the order of one function call)
<simbergm> you'd still need a context switch?
<simbergm> or after that?
<heller_> I still need a context switch
<heller_> i need to switch to the new task and back to the scheduling loop
<heller_> but if the task itself doesn't suspend, we can do nice tricks
<simbergm> exactly, I misinterpreted your "scheduling should be one function call"
<heller_> the costs should be in that order
<simbergm> yes, we could even avoid the context switches when a task finishes
<heller_> since that simplified context switch *almost* looks like a functions call
<heller_> yeah, I've been thinking about that ...
<simbergm> it complicates things, but could be a big win
<heller_> not allocate a stack for each task, only allocate when suspension is required
<simbergm> and can only be done in some cases
<heller_> I think it can be done
<simbergm> but I would think the common case would be that a task always just runs to completion, in which case it could save a lot of time
<heller_> exactly
<simbergm> ideas, ideas...
<simbergm> but you're not abandoning the lazy thread init idea? we should still try to push those, gives a simpler base to start from
<heller_> not exactly
<heller_> just simplifying it
<heller_> right now, we don't have an idea if a thread needs to get a new stack or not when in the scheduling loop
<heller_> because all we know is that it is pending
<heller_> with what I sketched, we know exactly when the task needs a new stack
<heller_> when going from staged->run (skipping the pending step
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://github.com/STEllAR-GROUP/hpx/commit/7d067ccabe986bf3a8bde2070e0637452185741b
<ste||ar-github> hpx/gh-pages 7d067cc StellarBot: Updating Sphinx docs
ste||ar-github has left #ste||ar [#ste||ar]
<heller_> s/run/active/g
<heller_> scheduling could then look like this: 1) If we have tasks in the pending queue, run them (no concurrency here) 2) no tasks in the pending queue, try to get some from our staged queue (will need synchronization), if nothing is in our staged queue, steal from neighbors.
<heller_> the tasks only get a stack when put from staged->run. The stack is being recycled onces the task terminated
<heller_> `run -> suspended` and `run -> pending` can only be done on `this_thread`. `suspended -> pending` is done from another thread
<heller_> there needs to be additional state to also support timed suspension and take into account that we can have two reasons for `suspended -> pending`, one is the timeout, the other is the resume due to a condition.
<heller_> this can be easily handled by adding another state: https://i.imgur.com/vLLjtS2.png
<heller_> a task goes into the depleted state, if there is another task that might still has a handle (either the timer or the 'condtional' resumer). But this reference counting is again just a thread local operation without any additional synchronization required
<heller_> simbergm: https://i.imgur.com/tu8YPzY.png <-- annotated state diagram
<heller_> that's essentially what I have so far ... just thought experiments...
<heller_> but I think a great simplification, and we have a nice chance to concentrate to optimize the different parts
<heller_> so for example optimization the synchronization strategy on the staged queue
<heller_> or the local stack reusage
<heller_> this should work out though, as we really minimze the possible contention points
<heller_> accompanied with something like this: https://gist.github.com/sithhell/84e0a5e941c41ac3c9be703781aa0a55
<simbergm> heller_: I'm mostly following, but you have to go easy on me
<simbergm> I'm thinking what's the best next step, to make sure this doesn't stay a thought experiment
<simbergm> only stealing staged tasks is one thing we can do already now fairly easily
<simbergm> this whole restructuring of the task state sounds like a bigger thing, but I'm not sure
<simbergm> I like the current lazy thread init/remove wait or add new even just because they simplify things
<simbergm> but of course we're going to take some wrong turns along the way
<heller_> ok
<heller_> it's not too much of a deviation of what we have right now
<heller_> it mostly affects the scheduler internals only
<simbergm> right, you're a better judge right now of how much work this could be
<simbergm> all I can say is that it sounds good
<simbergm> one thing that we should think about with the split between staged and pending is fairness
<simbergm> a running/pending thread that is only yielding will keep staged threads from running until it is ready
<simbergm> although semantically I guess staged and pending threads should be equal
<heller_> (with the respect to that it shouldn't stay a thought experiment, I fully agree)
<heller_> the difference in semantics is the required synchronization
<heller_> fairness is a good point
<heller_> it's a trade of between contention (the synchronization required for the staged queues) and not starving the system
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://github.com/STEllAR-GROUP/hpx/commit/b5299bb126ec3f4e2054552b66c24f0aece660cf
<ste||ar-github> hpx/gh-pages b5299bb StellarBot: Updating Sphinx docs
ste||ar-github has left #ste||ar [#ste||ar]
<heller_> in any case ... the review for allscale is next week on friday, then i'll work almost full time on this
<simbergm> ok
<simbergm> good luck for that!
<heller_> simbergm: what I always envision is to run a similar scheduling loop on GPUs ;)
<simbergm> we'll have the hpx call on the 27th as well, that'll be a good opportunity to discuss
<heller_> yes
<simbergm> some day :)
<heller_> with the exception that we currently can't suspend/resume tasks on GPUs, but that shouldn't a big deal for the algorithms to run on GPUs anyway
<simbergm> I don't know if you were informed about this, but I did a visit to sandia
<heller_> yes, I know
<simbergm> ok
<heller_> did you get any results so far?
<simbergm> it's a bit scruffy, but it workds
<simbergm> works
<heller_> performance wise?
<simbergm> it's slow, but working on setting up some more apps/benchmarks
<heller_> ok
<heller_> kokkos should benefit greatly from those things
<heller_> as it's primary use case is data parallelism
<simbergm> yes and no, performance wise they can do much better because it's just fork-join
<simbergm> but they're interested in looking at what we have and experimenting with something like futures in their api
<heller_> right
<heller_> there's nothing prevent us from getting better at fork-join as well ...
<heller_> and I think this is the right step in that direction
<heller_> in the fork-join model, having futures is a total overkill, for example
<heller_> unless you do a asynchronous fork-join
<heller_> simbergm: the problem there however is, that our tasking already comes with soo much overhead, we'll always loose against this simple data parallel fork-join model
<heller_> at least on fine granularities
<jbjnr__> heller_: simbergm I've just seen all this writing but cannot read it because I have to go to the other building. Can you create an issue with these ideas in so we can read it properly, comment and put together a plan between us.
<jbjnr__> (I mean the scheduler clean up stuff)
<heller_> jbjnr__: I will very soon (promise)
<simbergm> probably not a bad idea
<heller_> yeah
<heller_> I am planning on doing a series of blog posts on that topic...
<simbergm> ooh, nice
<simbergm> which reminds me, I said I would do some... :/
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] msimberg opened pull request #3544: Post 1.2.0 version bumps (master...version-bump) https://github.com/STEllAR-GROUP/hpx/pull/3544
ste||ar-github has left #ste||ar [#ste||ar]
nanashi55 has quit [Ping timeout: 252 seconds]
nanashi55 has joined #ste||ar
nanashi55 has quit [Ping timeout: 268 seconds]
nanashi55 has joined #ste||ar
nanashi55 has quit [Ping timeout: 250 seconds]
nanashi55 has joined #ste||ar
nanashi55 has quit [Ping timeout: 244 seconds]
david_pfander has quit [Ping timeout: 240 seconds]
nanashi55 has joined #ste||ar
nanashi55 has quit [Ping timeout: 276 seconds]
nanashi55 has joined #ste||ar
nanashi55 has quit [Ping timeout: 272 seconds]
nanashi55 has joined #ste||ar
nanashi55 has quit [Ping timeout: 244 seconds]
nanashi55 has joined #ste||ar
david_pfander has joined #ste||ar
nanashi55 has quit [Ping timeout: 260 seconds]
nanashi55 has joined #ste||ar
hkaiser has joined #ste||ar
<hkaiser> see: https://isocpp.org/
<hkaiser> thanks again simbergm
<hkaiser> also, please upvote on reddit (https://www.reddit.com/r/cpp/)
<simbergm> whew, I posted one as well but it's gone again now :)
<simbergm> (on reddit that is)
<simbergm> thanks hkaiser
<hkaiser> simbergm: ohh sorry
<simbergm> upvoted
<simbergm> no problem
<hkaiser> didn't know you already did this
<simbergm> I was just being slow, you reminded me
<simbergm> I registered after I saw your isocpp post
<simbergm> next time I'll really post the link to reddit, now I already have an account ;)
<hkaiser> heh
<heller_> hkaiser: hey! How is the demo going?
<heller_> (and the rest of course)
parsa has joined #ste||ar
<hkaiser> heller_: running
<hkaiser> heller_: it's not really selfexplanatory, though
nikunj has joined #ste||ar
david_pfander has quit [Ping timeout: 245 seconds]
hkaiser has quit [Quit: bye]
<zao> simbergm: I wonder about the reddit announcement, is it technically possible to have the first blurb and/or the highlights inline in the post, to maybe foster a bit of discussion?
<zao> Oh, wait, it was hkaiser that posted it :)
<simbergm> zao: that's a good point, but I think not officially
<simbergm> they have the option of submitting a "post", "image or video" or a "link"
<zao> (I of course don't know the capabilities of reddit, nor the rules of the subreddit ^^)
<simbergm> and with the link option there's no place to put a longer description ;?
<zao> I see.
<simbergm> hmm, ;? was not what I was going for but maybe it fits
<zao> Evolutionary emoticonology.
<simbergm> I guess we could put a link into a post, but I don't know if that's the right way to do it (I don't really know the etiquette on reddit either)
<zao> Ditto, just idly musing.
jaafar has quit [Quit: Konversation terminated!]
preejackie has joined #ste||ar
mdiers_ has quit [Remote host closed the connection]
mdiers_ has joined #ste||ar
preejackie has left #ste||ar ["WeeChat 1.9.1"]
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
preejackie has joined #ste||ar
preejackie has quit [Read error: Connection reset by peer]
parsa has quit [Client Quit]
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
jaafar has joined #ste||ar
parsa has joined #ste||ar
nanashi55 has quit [Ping timeout: 276 seconds]
nanashi55 has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
<heller_> simbergm: btw, the deadline for the scheduler things to get into master is january ;)
<heller_> as I have to give a talk on it mid february ;)
<simbergm> I like that :P
<simbergm> what is it for? it's not your defense, is it...? (I thought you might have done that already)
<heller_> no, not my defense
<simbergm> ah, nice
<simbergm> did you already have your defense?
<heller_> no
<heller_> the thesis got delivered to the reviewers this week
<simbergm> ooh, so you're almost free now
<heller_> yes
<heller_> static linking is broken...
jbjnr_ has joined #ste||ar
jbjnr__ has quit [Ping timeout: 268 seconds]
jbjnr_ has quit [Read error: Connection reset by peer]
parsa[w] has quit [Read error: Connection reset by peer]
parsa[w] has joined #ste||ar
jaafar has quit [Ping timeout: 250 seconds]