#ste||ar on 2017-09-04 — irc logs at irclog.cct.lsu.edu

2017-05-17 13:54 aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:04 EverYoung has joined #ste||ar

00:09 EverYoung has quit [Ping timeout: 246 seconds]

00:36 EverYoung has joined #ste||ar

00:49 diehlpk has joined #ste||ar

00:52 EverYoung has quit [Ping timeout: 252 seconds]

01:52 vamatya has quit [Ping timeout: 240 seconds]

02:20 K-ballo has quit [Quit: K-ballo]

02:20 hkaiser has quit [Quit: bye]

02:46 diehlpk has quit [Ping timeout: 240 seconds]

04:06 vamatya has joined #ste||ar

04:07 vamatya_ has joined #ste||ar

04:10 vamatya has quit [Ping timeout: 246 seconds]

04:49 EverYoung has joined #ste||ar

04:54 EverYoung has quit [Ping timeout: 246 seconds]

05:24 Matombo has joined #ste||ar

06:09 vamatya_ has quit [Ping timeout: 248 seconds]

06:20 Matombo has quit [Remote host closed the connection]

06:51 EverYoung has joined #ste||ar

06:55 EverYoung has quit [Ping timeout: 246 seconds]

07:02 <heller> jbjnr: got a running stream benchmark on daint again

07:02 <heller> including GPU support

07:39 david_pfander has joined #ste||ar

07:51 bikineev has joined #ste||ar

07:57 bikineev has quit [Read error: Connection reset by peer]

08:02 bikineev has joined #ste||ar

08:04 <jbjnr> heller: what's your skype id?

08:04 <jbjnr> we are setting everything up here

08:04 <jbjnr> we have a laptop ready for your face

08:05 bikineev_ has joined #ste||ar

08:06 bikineev_ has quit [Remote host closed the connection]

08:07 bikineev has quit [Ping timeout: 240 seconds]

08:11 david_pfander has quit [Ping timeout: 240 seconds]

08:20 <heller> jbjnr: heller52

08:21 <jbjnr> ok, I'll send an invite from the cscs machine/account

08:21 <heller> thanks

08:21 david_pfander has joined #ste||ar

08:26 <jbjnr> Will is giving a quick intro etc

08:26 <jbjnr> On our team is Alan, from Nvidia

08:26 <jbjnr> he will help us port our kernels, if we can explain them well enough to him

08:26 <heller> great

08:27 <heller> I am only having problems with cuda + clang in debug mode

08:27 <jbjnr> ok, we were deciding that relwithdebinfo would be our mode of choice for the week

08:28 <jbjnr> I'll do an hpx install that we will use at this end, we hope all of us can use the same basic set of binaries if poss

08:29 <heller> ok

08:29 <heller> I have some adjustements

08:29 <heller> and we'll probably need to adjust the install over the week over time

08:29 <jbjnr> well, we'll probably have our own octo builds as we tweak stuff

08:29 <jbjnr> yes^

08:29 <heller> MPI support is missing for example

08:29 <jbjnr> adjustments for sure

08:29 <jbjnr> We will concentrate on single node to begin with

08:29 <heller> sure

08:30 <heller> jbjnr: speak up that we intend to use clang

08:30 <jbjnr> nb. there is a 128 node reservation on daint, but only eurohack accounts can access it :(

08:30 <jbjnr> I announced that at the mentors meeting this morning

08:30 <jbjnr> (cuda clang)

08:30 <heller> ok

08:30 <heller> np

08:32 <jbjnr> just fyi https://github.com/fomics/EuroHack17

08:34 <heller> I'm on the slack already

08:40 <jbjnr> great

08:40 <jbjnr> talk coming to an end, no idea if you can hear it

08:50 david_pfander has quit [Ping timeout: 255 seconds]

08:52 david_pfander has joined #ste||ar

09:10 pree has joined #ste||ar

09:38 Matombo has joined #ste||ar

10:24 bikineev has joined #ste||ar

10:31 bikineev has quit [Remote host closed the connection]

10:34 bikineev has joined #ste||ar

10:36 bikineev_ has joined #ste||ar

10:39 bikineev has quit [Ping timeout: 240 seconds]

10:52 EverYoung has joined #ste||ar

10:52 bikineev_ has quit [Ping timeout: 248 seconds]

10:57 EverYoung has quit [Ping timeout: 255 seconds]

11:12 K-ballo has joined #ste||ar

11:43 diehlpk has joined #ste||ar

11:45 bikineev has joined #ste||ar

11:49 bikineev has quit [Ping timeout: 248 seconds]

12:08 bikineev has joined #ste||ar

12:15 hkaiser has joined #ste||ar

12:27 <github> [hpx] sithhell force-pushed cuda_clang from 1b08a20 to f6b1a6a: https://git.io/v5ukg

12:27 <github> hpx/cuda_clang f6b1a6a Thomas Heller: Making Clang + CUDA work...

12:33 heller has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

12:33 Matombo has quit [Ping timeout: 246 seconds]

12:35 heller has joined #ste||ar

12:37 mcopik has joined #ste||ar

12:39 pree has quit [Ping timeout: 240 seconds]

12:46 pree has joined #ste||ar

12:46 pree has quit [Remote host closed the connection]

12:47 pree has joined #ste||ar

12:49 Matombo has joined #ste||ar

12:53 pree has quit [Ping timeout: 240 seconds]

12:54 Matombo has quit [Remote host closed the connection]

12:54 EverYoung has joined #ste||ar

12:58 EverYoung has quit [Ping timeout: 246 seconds]

13:05 diehlpk has quit [Ping timeout: 248 seconds]

13:06 bikineev_ has joined #ste||ar

13:06 bikineev has quit [Ping timeout: 255 seconds]

14:02 <github> [hpx] sithhell pushed 1 new commit to cuda_clang: https://git.io/v5uZL

14:02 <github> hpx/cuda_clang 56a0a30 Thomas Heller: Fixing ICE with nvcc

14:09 bikineev_ has quit [Ping timeout: 240 seconds]

14:14 <jbjnr> hkaiser: heller I have not been able to find a reasonable explanation for the double peaks in our task times https://pasteboard.co/GI0Dk5x.png - is there any conceivable way that when runnin hpx on many threads - it could accidentally run the task twice - due to a race in the deep internals?

14:15 <heller> very unlikely

14:15 <jbjnr> indeed.

14:15 <hkaiser> unlikely indeed

14:15 <jbjnr> just posted on slack that stream ok now, thanks for boost patch

14:16 <heller> semi ok

14:16 <heller> performance sucks

14:16 <hkaiser> jbjnr: could be a matter of critical tasks bein gexecuted too late, holdin gback everything else

14:16 <heller> which worries me

14:16 <heller> i'd really check the hardware counters to see what kind of cache misses or other memory transfer we are dealing with here

14:16 <jbjnr> no.

14:17 <heller> likwid would be a perfect tool to check this

14:17 <jbjnr> the cache cannot explain it and the task execution cannot causse it - the time is started inside the lambda, and stopped at the end of the lambda

14:17 <jbjnr> memory bw calcs do not alow for the scale of the slowdown - cache not the cause

14:18 <hkaiser> no suspension?

14:18 <jbjnr> if we run a small time, it takes 8ms and we see a peak at 8 and another at 16

14:18 <heller> please just check it

14:18 <jbjnr> if we run a big tile that takes 30, we see a peak at 30 and another at 60

14:18 <hkaiser> loading tlbs?

14:18 <jbjnr> only explanation is two threads bound to one code

14:18 <jbjnr> core^

14:18 <jbjnr> but diagnostics disprove this

14:19 <jbjnr> as I can dump out the core with each task

14:19 <jbjnr> and they all are differnt and correct

14:19 <hkaiser> TLBs?

14:19 <heller> translation lookaside buffers?

14:19 <hkaiser> yes

14:20 <heller> instruction cache misses?

14:20 <jbjnr> none of these would cause a 2x delay - they would add some overhead, but not scale with tile size

14:20 <heller> well

14:20 <jbjnr> bbiab

14:20 <hkaiser> jbjnr: TLBs would scale

14:21 <heller> we'll only know for sure once we actually look at the counters

14:22 <heller> hkaiser: btw, nvcc on daint works now. as well as cuda clang

14:22 <hkaiser> cool, what did you change?

14:22 <heller> hooray for spending hours and hours in front of compiler error messages ;)

14:22 <heller> really nothing

14:22 <hkaiser> heller: you're my hero

14:22 <heller> now, we need to bring back the performance :P

14:22 <hkaiser> uhh, so why did it start to work?

14:23 <heller> well, the strange segfaults last week were on my local test system

14:23 <heller> now I am running on daint

14:23 <hkaiser> and the compilation problems in unwrap?

14:23 <heller> they are only showing up with the binpacking distribution policy

14:23 <hkaiser> ahh, because that ties in the actions, right?

14:24 <heller> the assertion (in the EDG frontend) is coming out out of a file named "scope_tks.c"

14:24 <hkaiser> lol

14:24 <hkaiser> very helpful

14:24 <heller> the policies are statically initialized, right?

14:24 <hkaiser> might be

14:24 <heller> with a global that is, at namespace scope

14:25 <hkaiser> don't remember

14:25 <heller> yeah, they are

14:25 <hkaiser> ok

14:25 <heller> so this is my guess: it is ok when unwrap is called from within function scope

14:25 <hkaiser> heller: could you comment about your findings on the related tickets, pls?

14:25 <heller> but the assert fails once it is instantiated from a static scope

14:25 <heller> not static scope

14:25 <heller> file scope

14:25 <hkaiser> k

14:26 <heller> to be precise: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/runtime/components/binpacking_distribution_policy.hpp#L469

14:26 <heller> this is were the instantiation happens

14:27 <heller> https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/runtime/components/binpacking_distribution_policy.hpp#L100-L106

14:27 <hkaiser> k, should we blend this out for cuda, then?

14:27 <heller> ^^ this is one of the problematic lines that lead to the ICE

14:27 <heller> yes

14:27 <heller> just came to my mind that the real solution to the problem is to move this into a seperate TU

14:27 <hkaiser> heller: well, this unwrap is just a convenience, could be removed alltogether

14:28 <heller> conclusion: explaining problems to other people helps in finding solutions ;)

14:28 <heller> well sure

14:28 <hkaiser> glad to serve as a rubber-duck

14:28 <zao> *quack*

14:28 <heller> but that would just solve the symptoms

14:28 <heller> we know the reason now

14:28 <hkaiser> indeed

14:28 <hkaiser> but we can't solve the cause anyways

14:29 <heller> no

14:29 <heller> there are plenty of other unwraps in that file though

14:29 <heller> and cuda clang is sooo much nicer

14:30 <heller> I'll properly test and file the PR tomorrow

14:30 <heller> gtg now

14:30 <hkaiser> ok, thanks

14:31 <heller> also, the thing with the static is just an assumption for now which needs to be verified

14:31 <hkaiser> heller: the important part is that cuda is unblocked now

14:31 <heller> yes

14:32 <heller> the guys in lugano had something to work with since the very first hour, so a close call again ;)

14:32 <github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/v5uC5

14:32 <github> hpx/master ee5303e Hartmut Kaiser: Merge pull request #2871 from STEllAR-GROUP/fix_throttle...

14:33 <heller> hkaiser: buildbot seems to work again, which is very nice

14:33 <heller> now showing a few failing tests

14:33 <hkaiser> heller: yah, akheir has invested a lot of time

14:33 <heller> good

14:33 <hkaiser> some things are still pending

14:33 <heller> ok

14:34 <heller> I couldn't spot the issues I reported anymore

14:34 <hkaiser> heller: http://irclog.cct.lsu.edu/ste~b~~b~ar/2017-09-02#1504358322-1504358628;

14:34 <heller> so all failures should be considered as the real thing now

14:34 <heller> yeah, saw the message

14:34 <hkaiser> k

14:35 <heller> I hope I can get around the throttle thingy by this week

14:35 <heller> I get reports that the throttling scheduler the IBM guy wrote now hangs as well :P

14:37 bikineev has joined #ste||ar

15:11 parsa has joined #ste||ar

15:25 <jbjnr> heller: slack ping

15:36 diehlpk has joined #ste||ar

15:44 <diehlpk> hkaiser, heller jbjnr zbyerly When should we skype tomorrow? Deadline for the paper is in 10 days

15:44 <hkaiser> diehlpk: any time works for me

15:44 <diehlpk> heller, Would 6pm your time work for you agin?

15:58 parsa has quit [Quit: Zzzzzzzzzzzz]

16:02 ajaivgeorge has joined #ste||ar

16:11 bikineev has quit [Read error: Connection timed out]

16:20 jaafar has joined #ste||ar

16:22 ajaivgeorge has quit [Quit: ajaivgeorge]

16:50 diehlpk has quit [Ping timeout: 260 seconds]

17:27 jaafar has quit [Ping timeout: 240 seconds]

17:33 diehlpk has joined #ste||ar

17:34 david_pfander has quit [Ping timeout: 246 seconds]

17:58 bikineev has joined #ste||ar

18:02 parsa has joined #ste||ar

18:04 Matombo has joined #ste||ar

18:08 Matombo has quit [Ping timeout: 248 seconds]

18:08 bikineev has quit [Remote host closed the connection]

18:20 Matombo has joined #ste||ar

18:29 K-ballo1 has joined #ste||ar

18:35 Matombo has quit [Ping timeout: 255 seconds]

18:37 diehlpk has quit [Ping timeout: 246 seconds]

18:37 K-ballo has quit [*.net *.split]

18:37 K-ballo1 is now known as K-ballo

18:37 patg has joined #ste||ar

18:38 patg is now known as Guest67730

18:38 Guest67730 is now known as patg

18:39 patg is now known as Guest6416

18:39 Guest6416 is now known as patg

18:59 Matombo has joined #ste||ar

19:01 patg has quit [Quit: This computer has gone to sleep]

19:08 diehlpk has joined #ste||ar

19:09 <github> [hpx] K-ballo created format (+2 new commits): https://git.io/v5uSY

19:09 <github> hpx/format ee5da64 Agustin K-ballo Berge: Wrap boost::format uses in traditional (variadic) function call syntax

19:09 <github> hpx/format b1dab66 Agustin K-ballo Berge: Add inspect check for unguarded boost::format usage

19:15 Matombo has quit [Ping timeout: 240 seconds]

19:17 Matombo has joined #ste||ar

19:21 patg has joined #ste||ar

19:22 pree has joined #ste||ar

19:41 <heller> diehlpk: no

19:48 Matombo has quit [Ping timeout: 240 seconds]

19:48 diehlpk has quit [Ping timeout: 240 seconds]

19:48 Matombo has joined #ste||ar

19:51 <heller> 18:00 doesn't work. 16:00 would be fine

20:15 pree has quit [Ping timeout: 240 seconds]

20:16 david_pfander has joined #ste||ar

20:28 pree has joined #ste||ar

20:39 bikineev has joined #ste||ar

20:41 pree has quit [Quit: AaBbCc]

20:42 diehlpk has joined #ste||ar

20:43 <diehlpk> heller, Any other time?

20:43 <diehlpk> hkaiser, and I are available the whole day

20:44 <hkaiser> diehlpk: he said 16:00 works for him

20:46 <heller> Yes

20:50 <diehlpk> Ok, so let us meet at this time. I will commit my chnages before the meeting

20:50 <diehlpk> We have two pages now

20:51 <hkaiser> I might be 5-10 minutes late for this

20:51 <hkaiser> but will try to b eon time

20:51 <heller> Sure, no problem

20:51 <heller> I'll work on it tomorrow

20:55 <jbjnr> heller: in general, which is better, generic context (boost) coroutines, on, or off.

20:55 <jbjnr> or hkaiser ^

20:55 <hkaiser> jbjnr: depends on platform

20:56 <jbjnr> x86

20:56 <hkaiser> linux?

20:56 <jbjnr> mostly

20:56 <hkaiser> then leave it off

20:56 <jbjnr> cray linux anyway

20:56 <jbjnr> ok

20:56 <jbjnr> what's the main diff

20:56 <hkaiser> ours is a tick faster

20:56 <jbjnr> ok thanks

21:03 patg has quit [Read error: Connection reset by peer]

21:06 patg has joined #ste||ar

21:07 david_pfander has quit [Ping timeout: 240 seconds]

21:09 bikineev has quit [Remote host closed the connection]

21:29 bikineev has joined #ste||ar

21:32 diehlpk has quit [Ping timeout: 240 seconds]

21:49 <jbjnr> hkaiser: fyi - hpx::bind=scatter looks like it is broken. Not sure, but pu mask does not seem right to me ...

21:50 <hkaiser> ok

21:50 <hkaiser> pls create a ticket

21:52 <jbjnr> ok, still doing tests.

21:57 <patg> use balanced

21:58 <jbjnr> balanced is wrong too

21:59 <jbjnr> this is not good

22:03 bikineev has quit [Remote host closed the connection]

22:08 Matombo has quit [Remote host closed the connection]

22:33 <hkaiser> jbjnr: --bind=balanced distributes thread across cores in a balanced way, not across sockets - at least that's the intent.

22:33 <hkaiser> from what I can see from the bitmasks you posted all is correct

22:40 <hkaiser> jbjnr: so this is not a bug

22:56 EverYoung has joined #ste||ar

23:00 EverYoung has quit [Ping timeout: 246 seconds]

23:14 mbremer has joined #ste||ar

23:17 mbremer has quit [Client Quit]

23:47 diehlpk has joined #ste||ar