#ste||ar on 2019-10-16 — irc logs at irclog.cct.lsu.edu

2019-06-17 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoD: https://developers.google.com/season-of-docs/

02:09 nikunj has quit [Quit: Bye]

06:08 jbjnr_ has joined #ste||ar

06:09 jbjnr_ has quit [Client Quit]

07:47 Coldblackice has joined #ste||ar

07:48 Coldblackice_ has quit [Ping timeout: 265 seconds]

08:21 rori has joined #ste||ar

09:28 <heller> simbergm: yt?

09:29 <simbergm> heller: yep

09:29 <heller> simbergm: good to merge #4137?

09:30 <heller> simbergm: the exception thingy has been fixed. tests look goof

09:30 <heller> good*

09:30 <simbergm> heller: yep, looks good to go

09:31 <heller> great

09:35 Coldblackice has quit [Read error: Connection reset by peer]

09:35 Coldblackice has joined #ste||ar

09:39 <heller> simbergm: https://circleci.com/gh/STEllAR-GROUP/hpx/135222?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-checks-link

09:39 <heller> simbergm: ahh ... sorry, this is just pulling ;)

09:40 <heller> thought it would push...

09:40 <simbergm> :P

09:57 <heller> simbergm: is there a problem with the daint builders?

09:58 <simbergm> heller: not as far as I know, why?

10:01 <heller> simbergm: doesn't seem to run

10:02 <simbergm> oh they'll run, it just takes some time

10:02 <simbergm> too many jobs if I were to trigger them immediately

10:02 <heller> ah, ok

10:02 <simbergm> (too many files actually most of the time, I run out of quota)

10:02 <heller> ahh, now I see them run

10:03 <heller> one thing that might be interesting is azure pipelines

11:40 <hkaiser> heller, simbergm: is the meetin gthis morning still on?

11:40 <hkaiser> or did you agree to move it?

11:42 <simbergm> hkaiser: I don't think we agreed on anything, but moving it would probably be good

11:42 <simbergm> if we want heller to join we should move it

11:42 <hkaiser> k

11:42 <heller> I can't join today

11:42 <hkaiser> tomorrow? jbjnr?

11:43 <jbjnr> tomorrow heller is here at CSCS, so afternoon for us would be fine

11:43 <heller> hkaiser: BTW, I think I fixed the exception issue, could you have a look please?

11:43 <jbjnr> 4pm CSCS time would be ideal

11:43 <hkaiser> heller: #4137?

11:44 <hkaiser> simbergm: does 4pm work for you, tomorrow?

11:44 <heller> hkaiser: yes

11:44 <simbergm> hkaiser: yep

11:44 <jbjnr> 4pm it is then. thanks

11:44 <hkaiser> ok, let's do it tomorrow 9am/4pm, then

11:44 <heller> Great

11:44 <jbjnr> heller: check email too please

11:44 <hkaiser> I'll let aserio know

11:44 <simbergm> also rori ^

11:45 <hkaiser> yah, sorry - rori - can you make that ?

11:46 <hkaiser> jbjnr: also any news on the meeting with Thomas at SC?

11:46 <heller> jbjnr: reply sent

11:46 <jbjnr> I started a doodle poll, but didn't finish it to get a good time. I'll get that done asap

11:47 <hkaiser> jbjnr: thanks

11:49 <jbjnr> hkaiser: latest work on scheduler https://pasteboard.co/ICepJsW.png

11:50 <zao> The overhead is off the charts :P

11:50 <jbjnr> and this https://pasteboard.co/ICeq3Kz.png

11:51 <jbjnr> results for other machines to follow

11:52 <hkaiser> jbjnr: nice!

11:53 <heller> jbjnr: that's pretty amazing!

11:53 <heller> Did you look at the tokio post from last night?

11:54 <jbjnr> the heirarchical launch is now down to 0.1us - (lowes on daint is 0.08) hat's not bad at all.

11:54 <jbjnr> tokio post? who are you asking that question to?

11:54 <heller> To you

11:54 <jbjnr> no idea what you are referring to

11:54 <rori> yep

11:55 <jbjnr> tokyo the city?

11:55 <heller> https://tokio.rs/blog/2019-10-scheduler/

11:55 <heller> That one

11:55 <heller> We should leverage that...

11:58 <jbjnr> only skimmed through the text but it looks very similar to what we do

11:59 <jbjnr> I will bookmark that for a thorough read.

11:59 <heller> Yes, they have some nice optimizations there

12:01 <hkaiser> heller: #4137 is fine (and already merged anyways)

12:01 <heller> Oh, ok ;)

12:02 <hkaiser> ahh no

12:02 <hkaiser> sorry

12:03 <hkaiser> heller: looked again

12:03 <hkaiser> so what's the difference now compared to before?

12:03 <hkaiser> I thought the TSS has to be cleaned up before the result value was set?

12:04 <hkaiser> otherwise it is back to what it was before, I think

12:04 <hkaiser> heller: ^^

12:05 <heller> No, before we reset the self pointer and the coroutine state

12:05 <heller> I'll double check

12:05 <hkaiser> ahh, makes sense

12:05 <hkaiser> ok, all is well, then

12:09 <jbjnr> what's more important that he raw speed numbers of the scheduler is our ability to layer affinity on top of it.

12:10 <jbjnr> ^/that he/than the/g

12:13 <jbjnr> anyone - is there a way to clear a std:function and make it empty after you're done with it?

12:14 <zao> Swap with a temporary?

12:14 <jbjnr> yuck!

12:14 <zao> (or assign one, I guess)

12:15 <jbjnr> I am currently assigning [](){} but I'd rather clear it and make it properly uninitialized

12:20 <simbergm> can you assign {}?

12:20 <zao> godbolt says "yes" :)

12:21 <jbjnr> ooh. what does it mean though?

12:22 <jbjnr> does that make it uninitialiozed again?

12:22 <jbjnr> zao: please send godbolt snippet

12:22 <zao> Same as assigning a default-constructed one, I reckon.

12:22 <hkaiser> simbergm: see my comment on #4138, pls

12:23 <zao> https://gcc.godbolt.org/z/asQ2iL

12:23 <simbergm> hkaiser: yep, replying to it at the moment

12:23 <simbergm> makes sense...

12:24 <simbergm> two other options: serialization module holds just the basics for serialization and serialization_impls (needs a better name, but...) would hold the actual implementations

12:24 <simbergm> or have x_serialization for each module x that has something that needs to be serialized

12:25 <simbergm> we'll end up with lots more modules like that...

12:25 <zao> Seems to build with GCC, ICC and MSVC.

12:25 <hkaiser> simbergm: nod

12:25 <jbjnr> thanks simbergm zao , that's nice. It does the right thing with if(f) and marks it as unassigned. I'll use it

12:26 <simbergm> but since serialization is independent of many things now keeping it the way you have it now might be okay

12:26 <simbergm> hkaiser: ^

12:26 <hkaiser> jbjnr: use hpx::util::function which has a reset function

12:26 <simbergm> we can go with this for now and I'll deal with it later if it becomes a problem

12:27 <jbjnr> thanks hkaiser

12:27 <hkaiser> simbergm: what problems do you anticipate?

12:27 <simbergm> hkaiser: no problem, just an unnecessary dependency

12:27 <hkaiser> simbergm: we would have dependencies either way

12:28 <hkaiser> simbergm: but yah, having a separate x_serialization module would solve this - I'm on the fence here

12:28 <simbergm> well, anything that's local shouldn't need serialization

12:29 <hkaiser> simbergm: is 'local' a compile-time property?

12:29 <simbergm> but this is still much better than depending on all the rest of the distributed stuff (thank you!)

12:29 <simbergm> potentially

12:29 <heller> Even you have just have local only instance, you might still want to use serialization

12:30 <heller> I'm not a fan of xxx_serialization modules

12:30 <simbergm> for?

12:31 <hkaiser> simbergm: saving local state, checkpointing

12:31 <heller> I don't know, sending complex c++ data structures over the wire with for example MPI?

12:31 <heller> Or any other networking library

12:31 <hkaiser> I wanted to avoid having serialization depend on everything else

12:32 <heller> serialization is a core module after all

12:32 <simbergm> good points

12:33 <simbergm> to be clear, I'm happy with this as it is, I just thought we could avoid that dependency from a quick look

12:34 <hkaiser> simbergm: right - I thought about that - did come up empty handed

12:34 <hkaiser> except by cheating

12:34 <hkaiser> include the header explicitly and make the user add the dependency to serialization if needed

12:35 <simbergm> "the header" = which one?

12:35 <heller> That sounds scary

12:36 <hkaiser> memory/serialization/intrusive_ptr.hpp

12:36 <hkaiser> i.e. the code that actually depends on serialization

12:36 <simbergm> oh... yeah, that sounds like nasty cheating

12:37 <heller> Also, consider that the same would apply for the data structure module and functional

12:37 <hkaiser> yes, that's on my list of things to do

12:37 <hkaiser> but I wanted to have a decision on how to go ahead first

12:37 <jbjnr> (if this conversation was in slack, you could embed snippets, links, etc. <sigh>)

12:38 <heller> I don't like that at all, why not make serialization as central as datastructures and assert and friends

12:38 <hkaiser> jbjnr: come on

12:38 <jbjnr> because 99% people won't use it on a node

12:38 <hkaiser> heller: right, you would have to add serialization as a dependency for anything that requires serialization

12:39 <jbjnr> an examples/serialization/mpi demo that encoded a set of vectors and sent it over the wire using mpi, then decoded it would be quite a good selling point

12:39 <heller> The alternative is to push that burden onto users, which isn't appealing either

12:40 <hkaiser> heller: right

12:40 <heller> And as said in the comment, it should be very leightweight

12:40 <hkaiser> right, it is

12:40 <heller> If it's not, then we need to fix that issue

12:41 <heller> As in, don't pay for what you don't use.

12:42 <heller> Where building that module should be neglectable and just including only gives minimal impact on compile times

12:52 <rori> hey ! are the compression plugins still in use ?

12:54 <heller> I'm not aware, why do you ask?

12:56 <rori> to know if I spend time fixing the build ^^ but if you don't know I will

12:59 <hkaiser> rori: I think we should keep'em

13:00 <hkaiser> simbergm, heller: so do we agree to leave things as proposed in the PR?

13:00 <hkaiser> jbjnr: such an example is trivial

13:01 <simbergm> hkaiser: yep, I'm happy with leaving it the way it is

13:01 <hkaiser> simbergm: ok, thanks

13:01 <rori> ?

13:03 <jbjnr> "such an example is trivial" - famous last words. I expect Boris Johnson used the same phrase when planning his brexit strategy.

13:03 <hkaiser> lol

13:04 <heller> hkaiser: ok

13:04 <hkaiser> (jbjnr: he never intended to pull through with the brexit anyways ;-) )

13:05 <hkaiser> jbjnr: but seriously: 2 lines serialization and 2 for deserialization plus serialization support for your types

13:05 <jbjnr> he did and I'm glad it's only 2 lines.

13:05 <hkaiser> ok

13:06 <jbjnr> (he doesn't care at all about the country, only making himself look like superman and saving the world)

13:10 hkaiser has quit [Ping timeout: 245 seconds]

13:29 aserio has joined #ste||ar

13:32 aserio1 has joined #ste||ar

13:36 aserio has quit [Ping timeout: 268 seconds]

13:36 aserio1 is now known as aserio

13:54 hkaiser has joined #ste||ar

14:50 bita has joined #ste||ar

15:26 bita has quit [Quit: Leaving]

15:53 aserio1 has joined #ste||ar

15:54 aserio has quit [Ping timeout: 264 seconds]

15:54 aserio1 is now known as aserio

16:51 aserio has quit [Ping timeout: 246 seconds]

17:14 rori has quit [Quit: WeeChat 1.9.1]

17:35 jbjnr_ has joined #ste||ar

18:39 <jaafar> hkaiser: do you have some time to review our conversation re: launch policies?

18:39 <jaafar> s/review/restart/

18:53 aserio has joined #ste||ar

19:04 <jaafar> I'll just stick my questions here and we can operate... asynchronously... haha

19:04 <jaafar> "launch::sync should be synchronous, except for remote operations, where its equivalent to async().get()"

19:05 <jaafar> I see scan_partitioner.hpp using it this way:

19:05 <jbjnr_> jaafar:what is your question?

19:05 <jaafar> finalitems.push_back(dataflow(hpx::launch::sync, ...))

19:05 <jaafar> which, it seems to me, is unlikely to block

19:06 <jaafar> or if it does the algorithm works much differently than I thought :)

19:06 <jbjnr_> launch::sync is used for a case like do_this.then(do_that) - normally, do that is spawned as a new task and gets queued like all other tasks

19:07 <jbjnr_> but with do_this.then(launch:sync, do_that), then do_that is called directly on termination of do_this

19:07 <jaafar> jbjnr_: can you explain how it works in the context of "dataflow"?

19:07 <jbjnr_> it's like a future that's not really a future

19:07 <jbjnr_> dataflow, the same

19:08 <jbjnr_> dataflow is really a version of when_all(this1, this2).then(do_that)

19:08 <jaafar> so I'm correct to say that dataflow(hpx::launch::sync, ...) is *itself* non-blocking

19:08 <jbjnr_> so if you use sync on dataflow, then either this1 or this2 will call do_that

19:08 <jbjnr_> (depending on which finishes second in this example)

19:08 <jaafar> they will call, and not do it via supplying the promise value?

19:09 <jaafar> so they are both in the same thread?

19:09 <jbjnr_> yes, it just chains two tasks into one, but it still returns a future to the end of the second one, so it is nonblocking

19:09 <jaafar> OK, I think I understand. dataflow() is itself non-blocking; the launch policy supplied simply tells what to do when the inputs are available

19:10 <jaafar> not what happens right now

19:10 <jbjnr_> this1 runs as one task in a thread, this2 runs as a task in a thread, 'that' runs in the same thread as the last one to finish

19:10 <jaafar> OK great

19:10 <jbjnr_> hold on ...

19:10 <jaafar> and "async" would mark the task ready to go, but not actually switch to it

19:10 <jbjnr_> correct

19:10 <jaafar> I should say continuation

19:11 <jaafar> What does "fork" do?

19:11 <jbjnr_> fork stops the current task right now, then switches directly to the new one

19:11 <jaafar> how is that different from "sync"?

19:11 <jbjnr_> then resumes the old one afterwards

19:11 <jbjnr_> sync runs one task when another one ends (but on the same thread)

19:12 <jbjnr_> fork doesn't wait till one task ends, it interrupts the current task and switches to the new one

19:12 <jaafar> when does that happen? at the call to dataflow(hpx::launch::fork, ...) or when the inputs are available?

19:12 <jbjnr_> fork is probably meaningless in the context of a continueation

19:13 <jbjnr_> dataflow(fork, ...) is probably meaningless!

19:13 <jaafar> OK

19:13 <jbjnr_> but async(fork, stuff)

19:13 <jbjnr_> would be like

19:13 <jbjnr_> this_thread.suspend, stuff.run_now

19:13 <jaafar> seems like you could just call stuff()

19:14 <jbjnr_> then resule this thread when stuff finishes

19:14 <jbjnr_> ^resume

19:14 <jaafar> why not just do the work directly?

19:14 <jbjnr_> althoug technically this thread would go onto the queue so might not be resumed right away

19:14 <jaafar> ah so here we are using "thread" in a special HPX way right?

19:14 <jaafar> this is not std/boost::thread

19:15 <zao> Sounds like a useful concept, but I'll save you my bikeshed on the name :D

19:15 <jbjnr_> usually, you would do the work directly, but there might be a case where you've broken your application into "tasks" and might want to dro everything and fork

19:15 <jbjnr_> I've never used it

19:15 <jaafar> OK!

19:15 <jaafar> Last question

19:15 <jbjnr_> just calling the function directly would make more sense as you point out

19:15 <jaafar> I think async policy can accept a priority argument

19:16 <jbjnr_> yes, via an executor param

19:16 <jaafar> How is that used?

19:16 <jbjnr_> high, normal, low

19:16 <jbjnr_> the scheduler maintains 3 queues

19:16 <jbjnr_> and high Priority (HP) gets taken before normal or low

19:16 <zao> Would there be any benefits in stack height?

19:16 <jbjnr_> to use it, grep for thread_priority_critical

19:17 <jbjnr_> in the code and look at an example

19:17 <jbjnr_> zao:benefits where? when forking or sync?

19:17 <jaafar> I found I could do finalitems.push_back(dataflow(hpx::launch::async(threads::thread_priority::thread_priority_low), ...)

19:17 <zao> Forking.

19:17 <jaafar> and get different results

19:17 <jbjnr_> yes, forke creates a new stack frame, but calling a funcion uses the existing one - good thinking

19:18 <jaafar> a new stack frame or a new stack?

19:18 aserio has quit [Ping timeout: 246 seconds]

19:18 <jbjnr_> different results?

19:18 <jaafar> performance results

19:18 <jbjnr_> new stack means new stack frame in this context

19:18 <jbjnr_> new memory with reassigned stack pointer to point to it

19:18 <jaafar> I feel like calling a function generally creates a new stack frame :)

19:19 aserio has joined #ste||ar

19:19 <jbjnr_> true

19:19 <jbjnr_> I confuse easily

19:19 <jaafar> so the comment that "calling a function uses the existing one" confused me

19:19 Coldblackice_ has joined #ste||ar

19:20 <jaafar> so does fork create a new stack?

19:20 <jbjnr_> yes

19:20 <jaafar> OK gotcha

19:20 <jaafar> I guess I don't need to know about fork then

19:20 <jbjnr_> no

19:20 <jaafar> thanks for the explanation

19:20 <jbjnr_> it's useless really

19:20 <jaafar> so I could use thread priorities to manipulate the order my async tasks were scheduled in?

19:20 <jbjnr_> soeone miust have a reason for it as it was added to the standard

19:21 <jbjnr_> proposal

19:21 <jaafar> I was looking for that

19:21 <jbjnr_> priority is your friend - we use a high_priority executor for all communications with mpi for example, and also for tasks that generate many chilren and must be done first when they go into queues

19:22 <jbjnr_> otherwise queues drain, then a parent task goes in and generates children, but the queues are temporarily empty

19:22 Coldblackice has quit [Ping timeout: 246 seconds]

19:23 <jaafar> This is very helpful. I think "sync" will be very useful to me. Some of the work needs to be done in the same thread if possible.

19:23 <jaafar> and the priorities too

19:24 <jbjnr_> sync is useful when you have a short task that must be run after another finishes, I use it to trigger other stuff like dataflow(sync, blah, blah, trigger_something)

19:25 <jbjnr_> you know that the trigger will happen as soon as the tasks complete and it wont be created as a new 'trigger task' that goes to the back of the queue and waits for ages before happening

19:26 <jbjnr_> but don't put too many sync calls chained together otherwise as zao points out, your stack will be used up (I think), sync doesn't create a new stack frame AFAIK

19:26 <jbjnr_> and putting many sync calls one after the other prevents any work stealingfrom hapening

19:26 <jbjnr_> as you have just created very long functions really!

19:26 <jbjnr_> gtg

19:27 <jbjnr_> bbiab

19:31 hkaiser has quit [Ping timeout: 245 seconds]

19:45 <jaafar> Yeah my big thing here is keeping the cache warm

19:45 <jaafar> so I'd like the second phase of an algorithm to start on the same thread as soon as possible

19:55 <simbergm> jaafar: btw: https://en.wikipedia.org/wiki/Work_stealing#Child_stealing_vs._continuation_stealing

19:56 <simbergm> async is child stealing, fork is parent/contiuation stealing

20:06 K-ballo has quit [Ping timeout: 240 seconds]

20:07 K-ballo has joined #ste||ar

20:36 aserio has quit [Ping timeout: 240 seconds]

20:42 aserio has joined #ste||ar

20:55 aserio1 has joined #ste||ar

20:56 aserio has quit [Ping timeout: 246 seconds]

20:59 <jaafar> simbergm: I understood from jbjnr_'s explanation that "sync" was continuation stealing

21:00 aserio1 has quit [Ping timeout: 265 seconds]

21:07 <simbergm> jaafar: async(sync, f) is equivalent to a direct function call

21:08 <simbergm> fork is continuation stealing because f is executed immediately on this thread but the continuation (what comes after async(fork, f)) can be stolen

21:12 <jaafar> simbergm: I am trying to understand this in the context of dataflow(sync, ...) vs dataflow(fork, ...)

21:12 <jaafar> the way jbjnr_ described it sounds like dataflow(sync, ...) is "

21:12 <jaafar> continuation stealing" as the Wikipedia article described

21:13 <jaafar> in the sense that the remaining arguments to dataflow() are executed immediately by whichever thread supplied the last required data

21:13 <jaafar> without an intervening reschedule etc.

21:13 <jaafar> do I have that right?

21:14 aserio has joined #ste||ar

21:15 <jaafar> (I do understand that async(sync, ...) is blocking, just trying to understand about dataflow)

21:19 hkaiser has joined #ste||ar

21:35 jbjnr_ has quit [Ping timeout: 245 seconds]

21:38 aserio has quit [Quit: aserio]

23:36 K-ballo has quit [Quit: K-ballo]

23:46 K-ballo has joined #ste||ar

23:51 K-ballo has quit [Client Quit]