#ste||ar on 2017-11-02 — irc logs at irclog.cct.lsu.edu

2017-05-17 13:54 aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:00 parsa has quit [Quit: Zzzzzzzzzzzz]

00:17 EverYoung has quit [Ping timeout: 246 seconds]

00:19 EverYoung has joined #ste||ar

00:26 EverYoung has quit [Ping timeout: 246 seconds]

00:30 EverYoung has joined #ste||ar

00:30 EverYoung has quit [Remote host closed the connection]

00:31 EverYoung has joined #ste||ar

00:38 parsa has joined #ste||ar

00:44 parsa has quit [Quit: Zzzzzzzzzzzz]

00:52 parsa has joined #ste||ar

01:13 EverYoung has quit [Ping timeout: 252 seconds]

01:50 diehlpk has quit [Ping timeout: 240 seconds]

01:55 eschnett has quit [Quit: eschnett]

02:00 eschnett has joined #ste||ar

02:35 hkaiser has quit [Ping timeout: 248 seconds]

02:57 parsa has quit [Quit: Zzzzzzzzzzzz]

03:12 parsa has joined #ste||ar

03:12 parsa has quit [Client Quit]

03:40 EverYoung has joined #ste||ar

03:41 EverYoung has quit [Remote host closed the connection]

03:42 EverYoung has joined #ste||ar

03:48 parsa has joined #ste||ar

04:00 parsa has quit [Quit: Zzzzzzzzzzzz]

04:13 parsa has joined #ste||ar

04:41 gedaj has quit [Quit: leaving]

04:43 gedaj has joined #ste||ar

05:33 gedaj has quit [Quit: leaving]

05:33 gedaj has joined #ste||ar

05:39 gedaj has quit [Quit: leaving]

05:39 gedaj has joined #ste||ar

05:45 gedaj has quit [Quit: leaving]

05:46 gedaj has joined #ste||ar

05:50 gedaj has quit [Client Quit]

05:51 gedaj has joined #ste||ar

05:51 gedaj has quit [Client Quit]

05:52 gedaj has joined #ste||ar

05:59 parsa has quit [Quit: Zzzzzzzzzzzz]

07:03 EverYoung has quit [Ping timeout: 252 seconds]

07:35 jaafar has quit [Ping timeout: 248 seconds]

08:46 david_pfander has joined #ste||ar

09:07 <github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vFs6d

09:07 <github> hpx/gh-pages 397636a StellarBot: Updating docs

09:30 msimberg has quit [Ping timeout: 248 seconds]

09:45 msimberg has joined #ste||ar

09:49 <heller> msimberg: hey

09:49 <heller> msimberg: how is it going?

10:05 <jbjnr> group meeting - he'll be out at lunchtime :(

10:27 <msimberg> heller: I found the reason for the last (for now) lockup

10:28 <heller> msimberg: nice!

10:28 <msimberg> not sure what the best way to fix it but I have it somehow working at least

10:28 <heller> glad to hear you found the issues

10:28 <heller> ok, what's the last reason?

10:28 <msimberg> cleanup_threads wants the thread map to be empty

10:28 <msimberg> otherwise it returns false

10:28 <msimberg> so it was locking up when removing the first pu

10:29 <msimberg> because hpx_main goes on the first one

10:29 <heller> yeah...

10:29 <msimberg> and it stays in the thread map even if it gets stolen

10:29 <heller> yup, only destroyed threads get removed

10:29 <msimberg> so for now cleanup_threads returns true if terminated_items_count is 0

10:30 <heller> ok

10:30 <msimberg> I think that's nicer because it's a bit confusing that cleanup_threads cares about anything else but cleaning up threads

10:30 <heller> I agree

10:31 <msimberg> I think it still works because the termination waits until there are 1 + os threads left

10:31 <heller> yes

10:31 <heller> termination was one of the bigger issues with my patches back then

10:32 <msimberg> there are conflicting interests for when dynamically removing a pu and removing one when shutting down the whole runtime

10:32 <msimberg> but gtg now

10:32 <msimberg> I'll push my changes to a branch if you want to try it out

10:32 <heller> msimberg: yup, that's why i said earlier that we should keep it seperate

11:27 hkaiser has joined #ste||ar

11:27 msimberg has quit [Ping timeout: 248 seconds]

11:39 msimberg has joined #ste||ar

12:10 <heller> hkaiser: may I put your attention to the failing binbacking tests on buildbot ;)?

12:11 <hkaiser> will look once I find the time

12:12 <heller> seems to be related to the local_new changes

12:12 <hkaiser> uhh ohh

12:12 <hkaiser> will look

12:12 <hkaiser> can't be much

12:12 <heller> at least they failed after the merge

13:01 <github> [hpx] sithhell created ubsan (+1 new commit): https://git.io/vFGJo

13:01 <github> hpx/ubsan afa254d Thomas Heller: Fixing integer overflows

13:02 <heller> hkaiser: do you want me to dry run the HPX tutorial?

13:02 <hkaiser> heller: what for?

13:02 <heller> aka proof read it ;)

13:02 <hkaiser> ahh, sure

13:02 <heller> just wanted to offer help

13:03 <github> [hpx] sithhell opened pull request #2979: Fixing integer overflows (master...ubsan) https://git.io/vFGJQ

13:03 <heller> and I am really interested in the jupyter integration

13:05 <hkaiser> nod

13:06 <hkaiser> I don't think this will go well :/

13:08 <heller> what's the problem there?

13:11 <jbjnr> (lol)

13:12 <hkaiser> jbjnr: you misunderstand - the problem is not the technology

13:12 <heller> do you have the material ready yet?

13:12 <hkaiser> think so, yes

13:14 <jbjnr> heller: are you doing the tutorial at HLRS next year?

13:14 <heller> jbjnr: if either you or jbjnr (or both!) do it with me, yes!

13:14 <jbjnr> I got into trouble for our one this year as nothing worked, so if you/we do one at HLRS I will prepare new material.

13:15 <jbjnr> I have some ideas ...

13:15 <heller> ok

13:15 <heller> what was the reaction?

13:16 <heller> jbjnr: i'll only say "yes" if I don't have to do it alone

13:17 <jbjnr> I'll get clearance and get back to you

13:18 <heller> thanks

13:19 <diehlpk_work> heller, I can also support you

13:32 hkaiser has quit [Quit: bye]

13:36 aserio has joined #ste||ar

14:06 <wash[m]> aserio: we got kicked off

14:07 <aserio> wash[m]: yea I tried to fix audio

14:07 <aserio> didnt work

14:07 <wash[m]> Call in by phone

14:09 <heller> Operation Bell call going on right now or in an hour>

14:19 eschnett has quit [Quit: eschnett]

14:19 patg[w] has joined #ste||ar

14:45 eschnett has joined #ste||ar

14:53 aserio has quit [Ping timeout: 240 seconds]

15:20 hkaiser has joined #ste||ar

15:20 <hkaiser> jbjnr: here now

15:21 <heller> PSA: I updated the stellar-group/build_env:debian_clang to use clang-5.0.0 + lld in the hope for faster build times

15:22 <heller> I am just now testing it with latest HPX master

15:22 <heller> please report any problems this upgrade might entail

15:22 <zao> heller: I'm wondering a bit... would there be value in being able to ask a bot to run a subset of tests repeatedly for a commit?

15:22 <zao> Feels like I'm reinventing plain boring buildbot/cdash otherwise.

15:23 diehlpk has joined #ste||ar

15:23 <heller> well

15:23 <zao> Trying to figure out how to get some value out of the Ryzen box :P

15:23 <heller> buildbot is able to do just that

15:23 <heller> theoretically

15:24 <heller> zao: what you describe sounds like a cronjob

15:24 <heller> as of the value: Yes, I think there is a lot of value there to identify more race conditions etc.

15:24 <zao> Traditional builds are of the "one build, one full testrun", right?

15:25 <zao> I'm thinking of having a box where one can request soaking of some tests to build confidence.

15:25 <zao> And if it doesn't have any particular jobs, just track heads and run their suites.

15:26 <heller> that sounds incredible valuable to me

15:27 patg[w] has quit [Quit: Leaving]

15:27 <heller> I'd also like to have an easy answer to the question: "since which commit did this test fail?"

15:27 <zao> That's a good one.

15:29 <zao> Things I need to figure out there is the interface to request jobs, and what kind of queries one can make. Also how to run multiple HPXes on the same machine without them colliding port-wise.

15:29 <heller> easy as in: just tell me on the dashboard: - Which tests have been failing repeatedly (and since when) - Which tests fail only occasionally, maybe starting with commit XXX

15:29 <jbjnr> hkaiser: ok

15:29 <zao> I see.

15:29 <heller> multiple HPXes: only configuring the MPI parcelport should work

15:30 <zao> Would that exclude some TCP-only tests?

15:30 <heller> yes

15:30 <hkaiser> jbjnr: skype?

15:30 <heller> but there are no TCP-only tests

15:30 <zao> Ah.

15:30 <heller> everything that runs with TCP is also run with MPI

15:30 <zao> Could've sworn there were ones with "tcp" in the name :)

15:30 <zao> I guess that I might be able to shove the run into a container, and not have to care.

15:31 <heller> yes, and the same with mpi, if you'd have enabled it

15:31 <heller> thats another option

15:31 <heller> for request interface: I'd look into the github API. you can subscribe to multiple events

15:32 <heller> for example, PRs created, comments to commits made etc.

15:32 <heller> this should contain most informaiton

15:32 <heller> and you'd could say in a comment: "zao, please soak test this commit"

15:32 <heller> or something along those lines

15:33 <heller> mighty herder of cats, please give me some confidence!

15:35 <zao> Ah yes, the stuff you can hook in if you control the repo?

15:35 <zao> I've just been looking at the RSS feed.

15:35 <heller> yes, this stuff

15:36 <heller> there is an rss feed?

15:36 <heller> ncie

15:37 <heller> ok, no significant improvement for building core

15:50 <msimberg> heller: current state for the throttle test is here: https://github.com/msimberg/hpx/tree/fix-throttle-test

15:51 <msimberg> I also added a simple counter for the background threads, needs cleaning up though

15:51 <msimberg> if you have time I have a couple of questions about the scheduler

15:55 <zao> Things like https://github.com/stellar-group/hpx/commits/master.atom exist, but they only cover integrations and commits, sadly.

15:55 aserio has joined #ste||ar

15:59 wash has joined #ste||ar

16:01 diehlpk has quit [Remote host closed the connection]

16:05 EverYoung has joined #ste||ar

16:05 <heller> msimberg: thanks for the branch

16:05 EverYoung has quit [Remote host closed the connection]

16:05 <heller> msimberg: ask your questions

16:05 EverYoung has joined #ste||ar

16:06 <msimberg> heller: just as a warning I haven't run all the tests yet

16:06 <heller> Examples got a 4 minutes improvement

16:06 <heller> msimberg: sure

16:07 <msimberg> first, when would this case happen: https://github.com/msimberg/hpx/blob/fd390e005847c2c76a7133bf389f7e89ce62c2af/hpx/runtime/threads/detail/scheduling_loop.hpp#L418

16:07 <msimberg> when I was running some of the tests it never happens, but I would've thought the background thread would be "suspended" whenever it's not running

16:07 <heller> msimberg: when a background thread suspends

16:08 <heller> No

16:08 <msimberg> it seems to be pending most of the time

16:08 <msimberg> but how that would get triggered then?

16:08 <msimberg> but do you know...

16:09 <heller> It happens, when for example, the code executed needs to wait on something. For example calling future.get or waiting on a spinlock

16:10 <heller> The primary case where this happens is with some direct actions

16:10 wash has quit [Quit: leaving]

16:10 <heller> We then put the task onto the regular queues to avoid starvation there

16:11 wash has joined #ste||ar

16:11 <msimberg> ah okay, I guess this could get triggered in the background thread if there's some agas stuff or similar going on

16:11 <heller> Yes, for example

16:11 <msimberg> I'm running only on one node so never see it

16:11 <msimberg> ok

16:11 <msimberg> second, when would the scheduler be in the suspended state? there were very few places where state_suspended is used...

16:12 <heller> We currently don't use that, iirc

16:12 <msimberg> and the runtime and scheduler use the same state enum, right? some states seem unused?

16:12 <heller> Yes

16:13 <msimberg> there was one place for the this_thread_executor that uses it

16:13 <msimberg> anyway

16:14 <msimberg> I'm asking because it would be useful to use the suspended/suspending state for suspending with condition variables, but I wasn't sure if it's okay to use state_suspended or if it needs another one

16:14 <heller> I would say yes

16:16 <msimberg> okay, thanks

16:16 <msimberg> I'll try it out and see if it works anyway

16:16 <heller> Good

16:21 <msimberg> heller: one more to clarify

16:21 <msimberg> https://github.com/msimberg/hpx/blob/fd390e005847c2c76a7133bf389f7e89ce62c2af/hpx/runtime/threads/detail/scheduling_loop.hpp#L717-L727

16:22 <heller> Yes

16:22 <heller> Ahhh, right

16:22 <msimberg> this piece of code doesn't explicitly schedule the background_thread (as it says)

16:23 <msimberg> based on what you said earlier, an LCO in this context would be e.g. a get that blocks and once that is ready it will really be added to the thread queues, correct?

16:23 <msimberg> i.e. it will be added to the queue even if background_thread itself wasn't originally in the queue

16:24 <heller> Ok, when an hpx suspends, it doesn't sit in any queue. The thread id is being held by some data structure (usually condition_variable) which will set it back to pending eventually and put it into a queue

16:24 <heller> Right

16:25 <heller> You need another thread to have the thread transition from suspended to pending

16:26 <msimberg> okay, that helps a lot, I haven't looked so much at the lcos yet

16:26 <msimberg> thanks!

16:26 <heller> Your welcome

16:26 <heller> You are

16:46 parsa has joined #ste||ar

16:52 jaafar has joined #ste||ar

17:01 hkaiser has quit [Quit: bye]

17:09 EverYoung has quit [Ping timeout: 252 seconds]

17:11 EverYoung has joined #ste||ar

17:14 parsa has quit [Quit: Zzzzzzzzzzzz]

17:20 david_pfander has quit [Ping timeout: 260 seconds]

17:30 EverYoun_ has joined #ste||ar

17:33 EverYoung has quit [Ping timeout: 252 seconds]

17:33 aserio has quit [Ping timeout: 246 seconds]

18:06 hkaiser has joined #ste||ar

18:11 <github> [hpx] K-ballo force-pushed fixing-2904 from fd485f2 to 98dc643: https://git.io/vFGxe

18:11 <github> hpx/fixing-2904 98dc643 Agustin K-ballo Berge: Temporarily disable allocator rebinding in pack traversal

18:30 eschnett has quit [Ping timeout: 260 seconds]

18:31 EverYoun_ has quit [Remote host closed the connection]

18:32 EverYoung has joined #ste||ar

18:44 eschnett has joined #ste||ar

18:58 parsa has joined #ste||ar

19:22 <heller> Docker compile is killing everything :/

19:42 hkaiser has quit [Quit: bye]

19:42 <github> [hpx] K-ballo force-pushed then-fwd-future from 75fcf30 to 654091a: https://git.io/vFq60

19:42 <github> hpx/then-fwd-future d631d51 Agustin K-ballo Berge: Fix future used with continuation on .then()

19:42 <github> hpx/then-fwd-future 96fa9e5 Agustin K-ballo Berge: Improve .then interface

19:42 <github> hpx/then-fwd-future e2fef6a Agustin K-ballo Berge: Improve error messages caused by misuse of .then

19:44 hkaiser has joined #ste||ar

19:44 parsa has quit [Quit: Zzzzzzzzzzzz]

19:52 <heller> weeh, 30 minutes improvement

19:53 <K-ballo> heller: on what?

19:53 <heller> circle ci

19:53 <heller> 4:30 to 4:00

19:53 <K-ballo> what did you change/

19:53 <heller> I updated to clang 5.0.1 and lld

19:54 <K-ballo> sweet

19:55 <heller> do you know if that enables lto by default?

19:55 <K-ballo> no idea

19:56 <zao> I saw chandler explicitly used -flto=thin when demoing in the recent pacific++ presentation, not sure what it does default, if at all.

19:56 <zao> (that was also trunk)

19:58 EverYoun_ has joined #ste||ar

19:59 mbremer has joined #ste||ar

20:02 EverYoung has quit [Ping timeout: 252 seconds]

20:03 <mbremer> Hi, I'm trying to figure out what percentage of my application is dedicated to overhead. Could I represent this by /threads/time/average-overhead divided by /threads/time/average?

20:10 hkaiser has quit [Quit: bye]

20:34 EverYoun_ has quit [Remote host closed the connection]

20:35 EverYoung has joined #ste||ar

20:43 parsa has joined #ste||ar

20:45 aserio has joined #ste||ar

20:59 parsa has quit [Quit: Zzzzzzzzzzzz]

21:02 hkaiser has joined #ste||ar

21:28 mbremer has quit [Quit: Page closed]

21:32 aserio has quit [Ping timeout: 252 seconds]

21:35 EverYoun_ has joined #ste||ar

21:36 <heller> zao: lto-thin is a Google optimization for multi threaded linking

21:38 EverYoung has quit [Ping timeout: 258 seconds]

21:38 <zao> Advertised as the kind of thing that did LTO at all.

21:38 <zao> ( https://www.youtube.com/watch?v=uZI_Qla4pNA )

21:41 <heller> zao: there is a cppcon talk about it. Pretty cool

21:42 <zao> I've got a lot to catch up on.

21:56 EverYoun_ has quit [Remote host closed the connection]

21:57 EverYoung has joined #ste||ar

22:00 patg[w] has joined #ste||ar

22:15 patg[w] has quit [Quit: Leaving]

22:15 <parsa[w]> hkaiser: #99 is ready

22:20 <github> [hpx] K-ballo opened pull request #2981: Temporarily disable allocator rebinding in pack traversal (master...fixing-2904) https://git.io/vFZ0A

22:43 <K-ballo> why do I get an *error* about finalize being deprecated?

22:43 <hkaiser> uhh

22:43 <hkaiser> that's not right

22:45 <K-ballo> dataflow::finalize (with v1 executors)

22:46 <K-ballo> SDL checks, turns deprecated warnings into errors, I've hit this before

23:02 <jbjnr> hkaiser: what are the smart executors that you will talk about at espm workshop at SC?

23:03 <jbjnr> I just got an email with the programme and saw an hpx paper mentioned

23:04 <jbjnr> I like the sound of "Addressing Global Data Dependencies in Heterogeneous Asynchronous Runtime Systems on GPUs"

23:04 aserio has joined #ste||ar

23:08 <hkaiser> jbjnr: it's Zahra's work, using ML for finding the right parameters

23:10 <hkaiser> K-ballo: ahh yes, that one is right

23:12 EverYoun_ has joined #ste||ar

23:13 <jbjnr> ML?

23:14 <jbjnr> right parameters for what? chunk sizes or something of that kind?

23:15 EverYoung has quit [Ping timeout: 258 seconds]

23:16 EverYoun_ has quit [Ping timeout: 258 seconds]

23:19 EverYoung has joined #ste||ar

23:23 EverYoung has quit [Ping timeout: 255 seconds]

23:27 gedaj has quit [Quit: leaving]

23:31 eschnett has quit [Quit: eschnett]

23:31 aserio has quit [Quit: aserio]