#ste||ar on 2020-01-14 — irc logs at irclog.cct.lsu.edu

2019-12-03 02:04 hkaiser changed the topic of #ste||ar to: The topic is 'STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

01:12 adi_ahj has joined #ste||ar

01:16 adi_ahj has quit [Client Quit]

01:29 adi_ahj has joined #ste||ar

01:29 adi_ahj has quit [Client Quit]

02:11 diehlpk has joined #ste||ar

02:21 <diehlpk> hkaiser, Can you take care of the HPX and performance counter section for the SC20 paper?

02:22 <hkaiser> diehlpk: sure

02:22 <diehlpk> I added the template and the astro guys will add the scientific story by Thursday

02:22 <diehlpk> So we can read them before we meet on Monday

02:22 <hkaiser> cool

02:22 <hkaiser> thanks for pushing this

02:23 <diehlpk> I will finish the book proposal by Friday so you can edit it next week

02:25 <diehlpk> hkaiser, is the release branch the candidate for HPX 1.4?

02:25 <diehlpk> I will run a fedora build before we release

02:26 <hkaiser> yes, it's the release branch

02:28 <diehlpk> Ok, let me start a fedora build

02:30 adi_ahj has joined #ste||ar

02:34 hkaiser has quit [Quit: bye]

02:38 <diehlpk> simbergm, https://koji.fedoraproject.org/koji/taskinfo?taskID=40503750

02:38 <diehlpk> https://koji.fedoraproject.org/koji/taskinfo?taskID=40503802

02:38 <diehlpk> Once these two builds are green, I am fine to build HPX 1.4 on the Fedora build system

02:50 adi_ahj has quit [Quit: adi_ahj]

03:12 <diehlpk> jbjnr_, Do you think you can get libfabric working with hpx 1.4 or any version later by the end of next week?

03:12 <diehlpk> We intent to start to run the first runs for the SC20 paper the week after

03:12 <diehlpk> and will you be able to contribute to the SC20 paper?

03:18 diehlpk has quit [Ping timeout: 260 seconds]

05:31 nikunj has joined #ste||ar

06:28 nikunj97 has joined #ste||ar

06:29 nikunj has quit [Read error: Connection reset by peer]

07:23 <jbjnr_> diehlpk_work: I can try to get it working, but I have a lot on my plate at the moment so I won't mske any promises

07:49 mdiers_ has joined #ste||ar

09:02 <mdiers_> Hello, short simple question: I need to bind a async task to core thread (without switching between the core threads). Exists a short example?

09:09 <simbergm> mdiers_: hpx::threads::executors::default_executor exec(hpx::threads::thread_schedule_hint(thread_number)); hpx::async(exec, mytask);

09:10 <simbergm> that's the closest you can get to at the moment

09:10 <simbergm> with the default scheduler the task is not guaranteed to stay on the core (if it yields or suspends)

09:10 <simbergm> with `--hpx:queueing=static or static-priority` it will stay, but no stealing will happen at all

09:11 <simbergm> jbjnr_ has a new scheduler that has some support for properly binding tasks to threads but it's still work in progress

09:11 <simbergm> would be good if you try it out though

09:21 rori has joined #ste||ar

11:11 <mdiers_> simbergm: tkz, the first approach with default_executor exec(thread_schedule_hint(1u)); while( true ) { async(exec, [](){ cout << get_worker_thread_num() << endl; } ); } results in different thread ids

11:12 nikunj97 has quit [Read error: Connection reset by peer]

11:12 <simbergm> mdiers_: yeah, especially if that's the only work you're doing the tasks are definitely going to be stolen

11:13 <simbergm> that's why it's a "hint"...

11:29 <mdiers_> simbergm: ;-) as far as I understand now. otherwise I could do it via user defined thread_pools? like in resource_partitioner/tests/unit/named_pool_executor.cpp?

11:38 Dr_Q has joined #ste||ar

11:38 <Dr_Q> Hi, I would like to know how I could join to the Stellar Group?

11:39 <Dr_Q> which are the prerequisites?

11:57 <simbergm> mdiers_: if you need guarantees that tasks will stay on a core the scheduler itself has to support it (like jbjnr_'s scheduler; it's `--hpx:queuing=shared-priority` I think together with a hint `thread_priority_bound`)

11:57 <simbergm> an executor isn't enough to guarantee that behaviour, it can only pass things like hints on to the actual scheduler

11:58 <simbergm> if you only need one thread you can create a custom thread pool with a single thread which also makes sure the tasks stay on that core

11:58 <simbergm> tasks are not stolen across different thread pools

11:59 <simbergm> Dr_Q: there's no formal way of joining the stellar group, it's more just a collection of people working on hpx, phylanx, and related projects

12:00 <simbergm> (the stellar group is actually two things: one is a research group at LSU, the other is the informal collection of people mentioned above...)

12:00 <simbergm> the best way is to start contributing to e.g. hpx ;)

12:01 <simbergm> Dr_Q: anything particular you're interested in?

12:11 Dr_Q is now known as jmgomez

12:12 <jmgomez> i am working in low latency software but as well I need to focus in high performance and parallel programming. So, I was just wanted to contribute with my experience and learning from you all.

12:14 jmgomez has left #ste||ar [#ste||ar]

12:16 <jbjnr_> mdiers_: please paste your example into a gist and I will modify it to use the bound tasks prototype with the new scheduler

12:30 <simbergm> hkaiser: yt?

13:03 weilewei has quit [Remote host closed the connection]

13:05 <mdiers_> jbjnr_: Thanks, I'll just be a moment, I just have to do a few other things.

13:43 hkaiser has joined #ste||ar

13:55 adi_ahj has joined #ste||ar

14:03 adi_ahj has quit [Quit: adi_ahj]

14:19 adi_ahj has joined #ste||ar

14:32 <jbjnr_> hkaiser: yt?

14:32 <hkaiser> here

14:33 <jbjnr_> hkaiser: dataflow doesn't [ick up annotated tasks https://gist.github.com/biddisco/a99ba3613962b6cc1858cedbdfd80ff9

14:33 <jbjnr_> or rather apex doesn't display them - when they are inside dataflow

14:33 nikunj has joined #ste||ar

14:33 <jbjnr_> it is working for async

14:33 <hkaiser> ok

14:33 <jbjnr_> but not dataflow.

14:34 <hkaiser> does async work with your executor?

14:34 <jbjnr_> I will have to go through the code again, but if you have any quick tips, please share

14:34 <jbjnr_> async is fine and .then is fine

14:34 <hkaiser> ok

14:34 <jbjnr_> actually, I'm not certain that .then is ok

14:35 <jbjnr_> I'll check

14:36 <hkaiser> jbjnr_: dataflow creates the thread here:

14:36 <hkaiser> https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/lcos/dataflow.hpp#L290-L293

14:38 <jbjnr_> looks like doth then also is broken

14:38 <jbjnr_> balls.

14:38 <hkaiser> .then() dispatches through executor::post as well

14:39 <jbjnr_> thanks. I' will investigate

14:39 <hkaiser> jbjnr_: I can have a look, if you want - would need a small self-contained example, though

14:48 nikunj has quit [Ping timeout: 265 seconds]

14:50 <diehlpk_work> simbergm, The Fedora builds for HPX 1.4 are fine

14:50 hkaiser has quit [Quit: bye]

14:50 <jbjnr_> hkaiser: it looks like dataflow wraps the callable in so many other callables that the annotation type is no longer seen when it gets to the post level

14:51 <K-ballo> "so many"

14:51 <jbjnr_> well 15 levels of call stack, maybe only a coupld of actual invokes

14:52 nikunj has joined #ste||ar

14:53 <K-ballo> I like that it sounds as a mater of quantity, as if N levels of wrapping where ok but N+1 is too many

14:53 <K-ballo> s/where/were

14:55 <jbjnr_> unfortunately, it only takes one level of wrapping for the type of the iner function to be lost

14:59 <diehlpk_work> simbergm, I just checked and the failed build on s390s are due to https://pagure.io/fedora-infrastructure/issue/8522

15:19 ritvik99 has joined #ste||ar

15:20 <simbergm> diehlpk_work: thanks, very good to hear!

15:21 ritvik99 has quit [Remote host closed the connection]

15:21 ritvik99 has joined #ste||ar

15:24 adi_ahj has quit [Quit: adi_ahj]

15:25 <mdiers_> jbjnr_: here is the small example: https://gist.github.com/m-diers/c557801344f5652e12a8850c51a23b21

15:28 hkaiser has joined #ste||ar

15:32 <diehlpk_work> nikunj, yet?

15:32 <nikunj> diehlpk_work, here

15:32 <nikunj> did you get the ssh access?

15:32 <diehlpk_work> Not yet

15:33 <nikunj> diehlpk_work, btw did you send them the email?

15:33 <diehlpk_work> Yes, I sent them the document and he told me that he will forward it to Japan, so they could prepare the testbed

15:34 <nikunj> great! so we should have the access soon. Did you tell Karame about it?

15:34 <diehlpk_work> Yes

15:34 ritvik99 has quit [Ping timeout: 265 seconds]

15:38 <nikunj> diehlpk_work, ok

15:38 <nikunj> meanwhile, do you have something that I should do?

15:39 <diehlpk_work> No, we have to wait for them

15:39 <nikunj> ok

15:44 nikunj has quit [Ping timeout: 268 seconds]

16:00 <heller> diehlpk_work: what's the idea for the SC paper?

16:03 <diehlpk_work> Follow up on the work from parsa on his SC workshop paper and do long-term runs and show

16:04 <diehlpk_work> 1) Show that AGAS is not expensive for large simulations

16:05 <diehlpk_work> 2) Show performance of libfabric by using hpx performance counters

16:05 <diehlpk_work> 3) APEX measurements

16:12 weilewei has joined #ste||ar

16:12 weilewei has quit [Remote host closed the connection]

16:16 weilewei has joined #ste||ar

16:23 RostamLog has joined #ste||ar

16:32 <diehlpk_work> jbjnr_, Do you know the next deadline for the Pix Daint allocation proposals?

16:32 <diehlpk_work> simbergm, ?

16:32 <jbjnr_> https://www.cscs.ch/user-lab/allocation-schemes/

16:34 <jbjnr_> every 6 months, so april 2020 I'd say.

16:48 RostamLog has joined #ste||ar

17:11 <diehlpk_work> https://ieeexplore.ieee.org/abstract/document/8950704

17:12 <diehlpk_work> The SC workshop paper os published

17:18 <heller> There's also a supermuc call

17:19 <heller> diehlpk_work: how are you planning to show that agas has no significant overhead? The important question here is overhead compared to what

17:27 <heller> diehlpk_work: and a negative point for not being open access ;)

17:40 <diehlpk_work> parsa, Can you send heller your sc19 workshop paper

17:41 rori has quit [Quit: bye]

17:44 <diehlpk_work> heller, https://arxiv.org/a/0000-0003-3922-8419.html

17:44 <diehlpk_work> All my papers are available as preprints

18:09 nikunj has joined #ste||ar

18:09 <hkaiser> send it to me as well, I would like to add it to our publications page

18:09 <hkaiser> parsa: ^^

18:13 <hkaiser> heller: why istn't execution_agent::do_resume calling set_thread_state() but implements it itself?

18:21 <heller> hkaiser: to avoid the coroutine::,self_

18:22 adi_ahj has quit [Quit: adi_ahj]

18:23 <hkaiser> set_thread_state doesn't use coroutine_self

18:31 <heller> I thought it did...

18:31 <hkaiser> would you mind if I changed that?

18:33 <heller> Right, only the timed one

18:34 <heller> I'm impartial there. But if IIRC, the implementation I wrote is a bit shorter and clearer, at least that was my intention

18:35 adi_ahj has joined #ste||ar

18:35 <hkaiser> heller: yes

18:36 <hkaiser> the rescheduling if the thread is active is cleaner, I would adapt that

18:36 <hkaiser> the rest is essentially copy&paste

18:42 <heller> Ok, fair enough

18:44 <hkaiser> heller: also, I might have to make coroutines depend on the basic_execution module

18:44 <hkaiser> I hope this is ok

18:44 <hkaiser> I need access to the base agent from the couroutine

18:45 <heller> Hmm, that's not good

18:45 <hkaiser> right, I don't like that

18:45 <hkaiser> but if agent is the new self then we need that

18:46 <heller> If you need to access the base agent from the coroutine, why do you need to have the base agent depend on the coroutine then?

18:46 <heller> Why?

18:46 <hkaiser> no the other way around

18:46 <hkaiser> coroutine needs to depend on agent

18:46 <heller> Wouldn't the coroutine module need to depend an the agent?

18:47 <heller> So the other way around?

18:47 <heller> Which is as intended

18:47 adi_ahj has quit [Ping timeout: 265 seconds]

18:47 <hkaiser> yes

18:48 <hkaiser> ok

18:49 adi_ahj has joined #ste||ar

19:07 RostamLog has joined #ste||ar

19:12 nikunj has quit [Remote host closed the connection]

19:13 nikunj has joined #ste||ar

19:58 adi_ahj has quit [Quit: adi_ahj]

20:13 nikunj has quit [Quit: Leaving]

21:48 weilewei has quit [Remote host closed the connection]

21:59 hkaiser has quit [Quit: bye]

23:08 hkaiser has joined #ste||ar