hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoD: https://developers.google.com/season-of-docs/
nikunj has quit [Quit: Bye]
jbjnr_ has joined #ste||ar
jbjnr_ has quit [Client Quit]
Coldblackice has joined #ste||ar
Coldblackice_ has quit [Ping timeout: 265 seconds]
rori has joined #ste||ar
<heller> simbergm: yt?
<simbergm> heller: yep
<heller> simbergm: good to merge #4137?
<heller> simbergm: the exception thingy has been fixed. tests look goof
<heller> good*
<simbergm> heller: yep, looks good to go
<heller> great
Coldblackice has quit [Read error: Connection reset by peer]
Coldblackice has joined #ste||ar
<heller> simbergm: ahh ... sorry, this is just pulling ;)
<heller> thought it would push...
<simbergm> :P
<heller> simbergm: is there a problem with the daint builders?
<simbergm> heller: not as far as I know, why?
<heller> simbergm: doesn't seem to run
<simbergm> oh they'll run, it just takes some time
<simbergm> too many jobs if I were to trigger them immediately
<heller> ah, ok
<simbergm> (too many files actually most of the time, I run out of quota)
<heller> ahh, now I see them run
<heller> one thing that might be interesting is azure pipelines
<hkaiser> heller, simbergm: is the meetin gthis morning still on?
<hkaiser> or did you agree to move it?
<simbergm> hkaiser: I don't think we agreed on anything, but moving it would probably be good
<simbergm> if we want heller to join we should move it
<hkaiser> k
<heller> I can't join today
<hkaiser> tomorrow? jbjnr?
<jbjnr> tomorrow heller is here at CSCS, so afternoon for us would be fine
<heller> hkaiser: BTW, I think I fixed the exception issue, could you have a look please?
<jbjnr> 4pm CSCS time would be ideal
<hkaiser> heller: #4137?
<hkaiser> simbergm: does 4pm work for you, tomorrow?
<heller> hkaiser: yes
<simbergm> hkaiser: yep
<jbjnr> 4pm it is then. thanks
<hkaiser> ok, let's do it tomorrow 9am/4pm, then
<heller> Great
<jbjnr> heller: check email too please
<hkaiser> I'll let aserio know
<simbergm> also rori ^
<hkaiser> yah, sorry - rori - can you make that ?
<hkaiser> jbjnr: also any news on the meeting with Thomas at SC?
<heller> jbjnr: reply sent
<jbjnr> I started a doodle poll, but didn't finish it to get a good time. I'll get that done asap
<hkaiser> jbjnr: thanks
<jbjnr> hkaiser: latest work on scheduler https://pasteboard.co/ICepJsW.png
<zao> The overhead is off the charts :P
<jbjnr> results for other machines to follow
<hkaiser> jbjnr: nice!
<heller> jbjnr: that's pretty amazing!
<heller> Did you look at the tokio post from last night?
<jbjnr> the heirarchical launch is now down to 0.1us - (lowes on daint is 0.08) hat's not bad at all.
<jbjnr> tokio post? who are you asking that question to?
<heller> To you
<jbjnr> no idea what you are referring to
<rori> yep
<jbjnr> tokyo the city?
<heller> That one
<heller> We should leverage that...
<jbjnr> only skimmed through the text but it looks very similar to what we do
<jbjnr> I will bookmark that for a thorough read.
<heller> Yes, they have some nice optimizations there
<hkaiser> heller: #4137 is fine (and already merged anyways)
<heller> Oh, ok ;)
<hkaiser> ahh no
<hkaiser> sorry
<hkaiser> heller: looked again
<hkaiser> so what's the difference now compared to before?
<hkaiser> I thought the TSS has to be cleaned up before the result value was set?
<hkaiser> otherwise it is back to what it was before, I think
<hkaiser> heller: ^^
<heller> No, before we reset the self pointer and the coroutine state
<heller> I'll double check
<hkaiser> ahh, makes sense
<hkaiser> ok, all is well, then
<jbjnr> what's more important that he raw speed numbers of the scheduler is our ability to layer affinity on top of it.
<jbjnr> ^/that he/than the/g
<jbjnr> anyone - is there a way to clear a std:function and make it empty after you're done with it?
<zao> Swap with a temporary?
<jbjnr> yuck!
<zao> (or assign one, I guess)
<jbjnr> I am currently assigning [](){} but I'd rather clear it and make it properly uninitialized
<simbergm> can you assign {}?
<zao> godbolt says "yes" :)
<jbjnr> ooh. what does it mean though?
<jbjnr> does that make it uninitialiozed again?
<jbjnr> zao: please send godbolt snippet
<zao> Same as assigning a default-constructed one, I reckon.
<hkaiser> simbergm: see my comment on #4138, pls
<simbergm> hkaiser: yep, replying to it at the moment
<simbergm> makes sense...
<simbergm> two other options: serialization module holds just the basics for serialization and serialization_impls (needs a better name, but...) would hold the actual implementations
<simbergm> or have x_serialization for each module x that has something that needs to be serialized
<simbergm> we'll end up with lots more modules like that...
<zao> Seems to build with GCC, ICC and MSVC.
<hkaiser> simbergm: nod
<jbjnr> thanks simbergm zao , that's nice. It does the right thing with if(f) and marks it as unassigned. I'll use it
<simbergm> but since serialization is independent of many things now keeping it the way you have it now might be okay
<simbergm> hkaiser: ^
<hkaiser> jbjnr: use hpx::util::function which has a reset function
<simbergm> we can go with this for now and I'll deal with it later if it becomes a problem
<jbjnr> thanks hkaiser
<hkaiser> simbergm: what problems do you anticipate?
<simbergm> hkaiser: no problem, just an unnecessary dependency
<hkaiser> simbergm: we would have dependencies either way
<hkaiser> simbergm: but yah, having a separate x_serialization module would solve this - I'm on the fence here
<simbergm> well, anything that's local shouldn't need serialization
<hkaiser> simbergm: is 'local' a compile-time property?
<simbergm> but this is still much better than depending on all the rest of the distributed stuff (thank you!)
<simbergm> potentially
<heller> Even you have just have local only instance, you might still want to use serialization
<heller> I'm not a fan of xxx_serialization modules
<simbergm> for?
<hkaiser> simbergm: saving local state, checkpointing
<heller> I don't know, sending complex c++ data structures over the wire with for example MPI?
<heller> Or any other networking library
<hkaiser> I wanted to avoid having serialization depend on everything else
<heller> serialization is a core module after all
<simbergm> good points
<simbergm> to be clear, I'm happy with this as it is, I just thought we could avoid that dependency from a quick look
<hkaiser> simbergm: right - I thought about that - did come up empty handed
<hkaiser> except by cheating
<hkaiser> include the header explicitly and make the user add the dependency to serialization if needed
<simbergm> "the header" = which one?
<heller> That sounds scary
<hkaiser> memory/serialization/intrusive_ptr.hpp
<hkaiser> i.e. the code that actually depends on serialization
<simbergm> oh... yeah, that sounds like nasty cheating
<heller> Also, consider that the same would apply for the data structure module and functional
<hkaiser> yes, that's on my list of things to do
<hkaiser> but I wanted to have a decision on how to go ahead first
<jbjnr> (if this conversation was in slack, you could embed snippets, links, etc. <sigh>)
<heller> I don't like that at all, why not make serialization as central as datastructures and assert and friends
<hkaiser> jbjnr: come on
<jbjnr> because 99% people won't use it on a node
<hkaiser> heller: right, you would have to add serialization as a dependency for anything that requires serialization
<jbjnr> an examples/serialization/mpi demo that encoded a set of vectors and sent it over the wire using mpi, then decoded it would be quite a good selling point
<heller> The alternative is to push that burden onto users, which isn't appealing either
<hkaiser> heller: right
<heller> And as said in the comment, it should be very leightweight
<hkaiser> right, it is
<heller> If it's not, then we need to fix that issue
<heller> As in, don't pay for what you don't use.
<heller> Where building that module should be neglectable and just including only gives minimal impact on compile times
<rori> hey ! are the compression plugins still in use ?
<heller> I'm not aware, why do you ask?
<rori> to know if I spend time fixing the build ^^ but if you don't know I will
<hkaiser> rori: I think we should keep'em
<hkaiser> simbergm, heller: so do we agree to leave things as proposed in the PR?
<hkaiser> jbjnr: such an example is trivial
<simbergm> hkaiser: yep, I'm happy with leaving it the way it is
<hkaiser> simbergm: ok, thanks
<rori> ?
<jbjnr> "such an example is trivial" - famous last words. I expect Boris Johnson used the same phrase when planning his brexit strategy.
<hkaiser> lol
<heller> hkaiser: ok
<hkaiser> (jbjnr: he never intended to pull through with the brexit anyways ;-) )
<hkaiser> jbjnr: but seriously: 2 lines serialization and 2 for deserialization plus serialization support for your types
<jbjnr> he did and I'm glad it's only 2 lines.
<hkaiser> ok
<jbjnr> (he doesn't care at all about the country, only making himself look like superman and saving the world)
hkaiser has quit [Ping timeout: 245 seconds]
aserio has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 268 seconds]
aserio1 is now known as aserio
hkaiser has joined #ste||ar
bita has joined #ste||ar
bita has quit [Quit: Leaving]
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 264 seconds]
aserio1 is now known as aserio
aserio has quit [Ping timeout: 246 seconds]
rori has quit [Quit: WeeChat 1.9.1]
jbjnr_ has joined #ste||ar
<jaafar> hkaiser: do you have some time to review our conversation re: launch policies?
<jaafar> s/review/restart/
aserio has joined #ste||ar
<jaafar> I'll just stick my questions here and we can operate... asynchronously... haha
<jaafar> "launch::sync should be synchronous, except for remote operations, where its equivalent to async().get()"
<jaafar> I see scan_partitioner.hpp using it this way:
<jbjnr_> jaafar:what is your question?
<jaafar> finalitems.push_back(dataflow(hpx::launch::sync, ...))
<jaafar> which, it seems to me, is unlikely to block
<jaafar> or if it does the algorithm works much differently than I thought :)
<jbjnr_> launch::sync is used for a case like do_this.then(do_that) - normally, do that is spawned as a new task and gets queued like all other tasks
<jbjnr_> but with do_this.then(launch:sync, do_that), then do_that is called directly on termination of do_this
<jaafar> jbjnr_: can you explain how it works in the context of "dataflow"?
<jbjnr_> it's like a future that's not really a future
<jbjnr_> dataflow, the same
<jbjnr_> dataflow is really a version of when_all(this1, this2).then(do_that)
<jaafar> so I'm correct to say that dataflow(hpx::launch::sync, ...) is *itself* non-blocking
<jbjnr_> so if you use sync on dataflow, then either this1 or this2 will call do_that
<jbjnr_> (depending on which finishes second in this example)
<jaafar> they will call, and not do it via supplying the promise value?
<jaafar> so they are both in the same thread?
<jbjnr_> yes, it just chains two tasks into one, but it still returns a future to the end of the second one, so it is nonblocking
<jaafar> OK, I think I understand. dataflow() is itself non-blocking; the launch policy supplied simply tells what to do when the inputs are available
<jaafar> not what happens right now
<jbjnr_> this1 runs as one task in a thread, this2 runs as a task in a thread, 'that' runs in the same thread as the last one to finish
<jaafar> OK great
<jbjnr_> hold on ...
<jaafar> and "async" would mark the task ready to go, but not actually switch to it
<jbjnr_> correct
<jaafar> I should say continuation
<jaafar> What does "fork" do?
<jbjnr_> fork stops the current task right now, then switches directly to the new one
<jaafar> how is that different from "sync"?
<jbjnr_> then resumes the old one afterwards
<jbjnr_> sync runs one task when another one ends (but on the same thread)
<jbjnr_> fork doesn't wait till one task ends, it interrupts the current task and switches to the new one
<jaafar> when does that happen? at the call to dataflow(hpx::launch::fork, ...) or when the inputs are available?
<jbjnr_> fork is probably meaningless in the context of a continueation
<jbjnr_> dataflow(fork, ...) is probably meaningless!
<jaafar> OK
<jbjnr_> but async(fork, stuff)
<jbjnr_> would be like
<jbjnr_> this_thread.suspend, stuff.run_now
<jaafar> seems like you could just call stuff()
<jbjnr_> then resule this thread when stuff finishes
<jbjnr_> ^resume
<jaafar> why not just do the work directly?
<jbjnr_> althoug technically this thread would go onto the queue so might not be resumed right away
<jaafar> ah so here we are using "thread" in a special HPX way right?
<jaafar> this is not std/boost::thread
<zao> Sounds like a useful concept, but I'll save you my bikeshed on the name :D
<jbjnr_> usually, you would do the work directly, but there might be a case where you've broken your application into "tasks" and might want to dro everything and fork
<jbjnr_> I've never used it
<jaafar> OK!
<jaafar> Last question
<jbjnr_> just calling the function directly would make more sense as you point out
<jaafar> I think async policy can accept a priority argument
<jbjnr_> yes, via an executor param
<jaafar> How is that used?
<jbjnr_> high, normal, low
<jbjnr_> the scheduler maintains 3 queues
<jbjnr_> and high Priority (HP) gets taken before normal or low
<zao> Would there be any benefits in stack height?
<jbjnr_> to use it, grep for thread_priority_critical
<jbjnr_> in the code and look at an example
<jbjnr_> zao:benefits where? when forking or sync?
<jaafar> I found I could do finalitems.push_back(dataflow(hpx::launch::async(threads::thread_priority::thread_priority_low), ...)
<zao> Forking.
<jaafar> and get different results
<jbjnr_> yes, forke creates a new stack frame, but calling a funcion uses the existing one - good thinking
<jaafar> a new stack frame or a new stack?
aserio has quit [Ping timeout: 246 seconds]
<jbjnr_> different results?
<jaafar> performance results
<jbjnr_> new stack means new stack frame in this context
<jbjnr_> new memory with reassigned stack pointer to point to it
<jaafar> I feel like calling a function generally creates a new stack frame :)
aserio has joined #ste||ar
<jbjnr_> true
<jbjnr_> I confuse easily
<jaafar> so the comment that "calling a function uses the existing one" confused me
Coldblackice_ has joined #ste||ar
<jaafar> so does fork create a new stack?
<jbjnr_> yes
<jaafar> OK gotcha
<jaafar> I guess I don't need to know about fork then
<jbjnr_> no
<jaafar> thanks for the explanation
<jbjnr_> it's useless really
<jaafar> so I could use thread priorities to manipulate the order my async tasks were scheduled in?
<jbjnr_> soeone miust have a reason for it as it was added to the standard
<jbjnr_> proposal
<jaafar> I was looking for that
<jbjnr_> priority is your friend - we use a high_priority executor for all communications with mpi for example, and also for tasks that generate many chilren and must be done first when they go into queues
<jbjnr_> otherwise queues drain, then a parent task goes in and generates children, but the queues are temporarily empty
Coldblackice has quit [Ping timeout: 246 seconds]
<jaafar> This is very helpful. I think "sync" will be very useful to me. Some of the work needs to be done in the same thread if possible.
<jaafar> and the priorities too
<jbjnr_> sync is useful when you have a short task that must be run after another finishes, I use it to trigger other stuff like dataflow(sync, blah, blah, trigger_something)
<jbjnr_> you know that the trigger will happen as soon as the tasks complete and it wont be created as a new 'trigger task' that goes to the back of the queue and waits for ages before happening
<jbjnr_> but don't put too many sync calls chained together otherwise as zao points out, your stack will be used up (I think), sync doesn't create a new stack frame AFAIK
<jbjnr_> and putting many sync calls one after the other prevents any work stealingfrom hapening
<jbjnr_> as you have just created very long functions really!
<jbjnr_> gtg
<jbjnr_> bbiab
hkaiser has quit [Ping timeout: 245 seconds]
<jaafar> Yeah my big thing here is keeping the cache warm
<jaafar> so I'd like the second phase of an algorithm to start on the same thread as soon as possible
<simbergm> jaafar: btw: https://en.wikipedia.org/wiki/Work_stealing#Child_stealing_vs._continuation_stealing
<simbergm> async is child stealing, fork is parent/contiuation stealing
K-ballo has quit [Ping timeout: 240 seconds]
K-ballo has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
aserio has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 246 seconds]
<jaafar> simbergm: I understood from jbjnr_'s explanation that "sync" was continuation stealing
aserio1 has quit [Ping timeout: 265 seconds]
<simbergm> jaafar: async(sync, f) is equivalent to a direct function call
<simbergm> fork is continuation stealing because f is executed immediately on this thread but the continuation (what comes after async(fork, f)) can be stolen
<jaafar> simbergm: I am trying to understand this in the context of dataflow(sync, ...) vs dataflow(fork, ...)
<jaafar> the way jbjnr_ described it sounds like dataflow(sync, ...) is "
<jaafar> continuation stealing" as the Wikipedia article described
<jaafar> in the sense that the remaining arguments to dataflow() are executed immediately by whichever thread supplied the last required data
<jaafar> without an intervening reschedule etc.
<jaafar> do I have that right?
aserio has joined #ste||ar
<jaafar> (I do understand that async(sync, ...) is blocking, just trying to understand about dataflow)
hkaiser has joined #ste||ar
jbjnr_ has quit [Ping timeout: 245 seconds]
aserio has quit [Quit: aserio]
K-ballo has quit [Quit: K-ballo]
K-ballo has joined #ste||ar
K-ballo has quit [Client Quit]