#ste||ar on 2017-07-14 — irc logs at irclog.cct.lsu.edu

2017-05-17 13:54 aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:21 zbyerly_ has quit [Remote host closed the connection]

00:21 zbyerly_ has joined #ste||ar

01:16 EverYoung has quit [Ping timeout: 246 seconds]

01:21 vamatya has joined #ste||ar

01:21 zbyerly_ has quit [Remote host closed the connection]

01:21 zbyerly_ has joined #ste||ar

01:30 eschnett has joined #ste||ar

01:34 hkaiser has quit [Quit: bye]

01:39 vamatya has quit [Ping timeout: 240 seconds]

01:47 eschnett_ has joined #ste||ar

01:48 eschnett has quit [Ping timeout: 240 seconds]

01:48 eschnett_ is now known as eschnett

02:15 K-ballo has quit [Quit: K-ballo]

02:21 zbyerly_ has quit [Remote host closed the connection]

02:21 zbyerly_ has joined #ste||ar

02:39 patg has joined #ste||ar

02:39 patg is now known as Guest15479

03:21 zbyerly_ has quit [Remote host closed the connection]

03:21 zbyerly_ has joined #ste||ar

03:48 Guest15479 has quit [Read error: Connection reset by peer]

04:09 patg has joined #ste||ar

04:10 patg is now known as Guest25792

04:21 zbyerly_ has quit [Remote host closed the connection]

04:21 zbyerly_ has joined #ste||ar

04:55 taeguk has joined #ste||ar

05:21 zbyerly_ has quit [Remote host closed the connection]

05:21 zbyerly_ has joined #ste||ar

05:25 <heller> wash: sure

05:25 <heller> That's the plan ;)

06:18 taeguk has quit [Quit: Page closed]

06:21 zbyerly_ has quit [Remote host closed the connection]

06:21 zbyerly_ has joined #ste||ar

07:21 zbyerly_ has quit [Remote host closed the connection]

07:21 zbyerly_ has joined #ste||ar

07:30 bikineev has joined #ste||ar

07:36 bikineev has quit [Remote host closed the connection]

07:47 david_pfander has joined #ste||ar

08:20 zbyerly_ has quit [Remote host closed the connection]

08:21 zbyerly_ has joined #ste||ar

08:32 bikineev has joined #ste||ar

08:32 Matombo has joined #ste||ar

08:46 bikineev has quit [Ping timeout: 240 seconds]

09:07 <github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vQ7Yc

09:07 <github> hpx/gh-pages 2ad5d87 StellarBot: Updating docs

09:13 david_pfander1 has joined #ste||ar

09:17 david_pfander has quit [Ping timeout: 260 seconds]

09:17 david_pfander1 is now known as david_pfander

09:20 zbyerly_ has quit [Remote host closed the connection]

09:21 zbyerly_ has joined #ste||ar

09:28 bikineev has joined #ste||ar

09:39 <heller> david_pfander: see mail

09:40 <heller> i've identified some more "good" commits in the meantime

09:40 <heller> david_pfander: https://github.com/STEllAR-GROUP/octotiger/commits/master

09:40 <heller> david_pfander: see here, all marked green are good.

09:41 <heller> last commit finished retesting is "6aec8ce"

10:07 <david_pfander> heller: saw your mail, I guess we should discuss that after it is more clear what the future of octotiger will be

10:07 <heller> yes

10:11 <heller> david_pfander: well, it doesn't hurt to take responsibility, this will help in future projects as well ;)

10:12 <david_pfander> heller: what do you mean with that?

10:12 <heller> it's very frustrating to dump hours and hours into debugging problems which should have been seen months ago...

10:13 <david_pfander> heller: tell me about it...

10:13 <heller> well, the unit tests have been failing for months now

10:13 <heller> "unit tests"

10:13 <heller> without a real attempt trying to fix them

10:14 <heller> the general opinion was: "This has to be the fault of HPX, FIX IT!"

10:20 hkaiser has joined #ste||ar

10:20 zbyerly_ has quit [Remote host closed the connection]

10:21 zbyerly_ has joined #ste||ar

10:31 <heller> hkaiser: good morning

10:31 <heller> see PM please

10:50 david_pfander has quit [Quit: david_pfander]

10:50 david_pfander has joined #ste||ar

11:02 mcopik has quit [Ping timeout: 240 seconds]

11:13 jbjnr has joined #ste||ar

11:20 zbyerly_ has quit [Remote host closed the connection]

11:21 zbyerly_ has joined #ste||ar

11:30 <heller> jbjnr: hey! how did your presentation go?

11:32 <jbjnr> fine. I gave an internal presentation of the zero-copy serialization work to our group

11:32 <jbjnr> I have some questions:

11:32 <jbjnr> if I disable the timer pool and the io-pool - what will stop working?

11:34 <hkaiser> timed suspension

11:34 <hkaiser> and all code relying on the io-pool, like octotiger

11:34 <hkaiser> jbjnr: why do you want to disable that? those threads are most of the time suspended

11:37 <jbjnr> raffaele collected the task timings for all the tasks that do a dgemm on a single tile of the matrix and got a plot that looks like a two humped camel. One hum corresponds to the time that it takes to do a dgemm on a tile of the matrix. The second hump is what we would get if a thread was being suspended by the OS for a timeslice. Running the matrix code on fewer threads improves the...

11:37 <jbjnr> ...situation and improves the runtime, so I wanted to try disabling the timer pol and io-pool to see if it made any difference.

11:37 <jbjnr> are there any other threads that can run?

11:38 <jbjnr> (other codes that do not use hpx do not show this multi-hump timing)

11:39 <jbjnr> I've checked the affinity binding and can confirm that worker threads are pinned to os cores uniquely (which I suspected might have been a possibilitiy, but it turns out no)

11:39 <jbjnr> (I mean if two worker threads were bound to the same core, that would hose things in the manner we see)

11:45 ajaivgeorge has joined #ste||ar

11:56 <heller> jbjnr: interesting

11:56 eschnett has quit [Quit: eschnett]

11:57 <heller> jbjnr: that would mean that the timer and/or IO threads are periodically scheduled for some reason

11:58 <heller> jbjnr: I don't see a reason why you shouldn't be able to (temporarly) disable those two helper threads to see if it makes a difference

11:58 <jbjnr> I have removed io pool and timer pool, but there are still extra threads running

11:59 david_pfander has quit [Ping timeout: 268 seconds]

11:59 <heller> jbjnr: there is the main thread, which is just waiting on the runtime to down again

11:59 <jbjnr> and 4 more

11:59 <heller> jbjnr: there is currently no way to disable that

11:59 <heller> that would be the TCP parcelport threads?

12:00 <jbjnr> HPX_H?A

12:00 <jbjnr> HPX_HAVE_NETWORKING=

12:00 <jbjnr> OFF

12:00 <heller> hmmm

12:00 <jbjnr> but tcp might still be active.

12:00 <jbjnr> tracking that down now

12:00 <heller> ok

12:00 <heller> maybe some CUDA related threads?

12:02 <jbjnr> does agas have any dedicated threads at all?

12:03 <heller> no, AGAS runs purely as HPX threads

12:04 <hkaiser> jbjnr: kill the main thread (the one running main())!

12:05 <hkaiser> jbjnr: FWIW hpx launches the following threads: main() (launched by the OS), 2 threads each per IO, timer, and TCP, and one thread per --hpx:threads=N

12:06 <hkaiser> so either kill main or the ones run by the scheduler - voilla, look'ma no thread

12:06 <hkaiser> s

12:07 <hkaiser> jbjnr: may I see those humps, pls?

12:09 denis_blank has joined #ste||ar

12:11 <jbjnr> the red line is the mean. it tells us that we're only getting 60% ish of peak. The matrix should be solved in 8ms if we have peak constantly

12:11 <jbjnr> https://pasteboard.co/GATrUJ9.png

12:11 <hkaiser> jbjnr: does this show results from many runs?

12:12 <jbjnr> this is one run

12:12 <hkaiser> ok, what do you measure then? many dgemm's?

12:12 <jbjnr> there are thousands of matrix multiplies in each run and each one is timed

12:12 <hkaiser> ok

12:12 <hkaiser> I don't think this is caused by stray threads being scheduled

12:13 <hkaiser> this is most likely caused by bad NUMA placement

12:13 <hkaiser> jbjnr: have you looked at cross NUMA memory traffic?

12:14 <jbjnr> I'd like to test the numa theory. I am worried that we do not schedule child tasks on the right threads

12:14 <jbjnr> ^no

12:14 <hkaiser> hpx does not do anything by default to run tasks on the numa domain where the data is located

12:14 <jbjnr> I know

12:14 <hkaiser> use executors and hpx::compute allocators

12:14 <jbjnr> I need to try fixing that, but I have to eliminate other options first

12:15 <hkaiser> NUMA is the most obvious and hurtful option

12:15 <hkaiser> I doubt the thread scheduling hurst you that much

12:15 <hkaiser> even more as those threads sleep allmost all the time

12:16 <jbjnr> the two peaks are too far apart in timing to be just a numa problem. the tile size is 512x512 double and fits into cache

12:16 <jbjnr> so there should only be a single fetch or block overhead

12:16 <hkaiser> how do you know?

12:16 <jbjnr> ^of block

12:17 <hkaiser> well, reading stuff from the other numa domain is almost twice as slow than reading from the own memory

12:18 <jbjnr> well, I may be wrong. If many blocks are being fetched, then the memory BW will be a bottleneck and could cause this,

12:18 <hkaiser> the next (non-obvious) problem may be caused by the tasking nature of HPX

12:18 K-ballo has joined #ste||ar

12:18 <hkaiser> this may cause excessive TLB reloading which can hurt you massively

12:18 <jbjnr> but 2MB at 100GB/s should be only 1 or 2 ms delay

12:18 <jbjnr> we've got 10ms delay here

12:19 <heller> jbjnr: so this is a histogram of the timing of the different dgemm kernels?

12:20 bikineev has quit [Ping timeout: 260 seconds]

12:20 <jbjnr> yes essentially

12:20 hkaiser has quit [Read error: Connection reset by peer]

12:20 hkaiser_ has joined #ste||ar

12:20 <hkaiser_> jbjnr: all I'm saying is that I doubt the threads are the culprit

12:20 <heller> are they potentially suspending somehow?

12:20 <heller> jbjnr: I am taking sides with hkaiser_ ;)

12:20 zbyerly_ has quit [Remote host closed the connection]

12:21 zbyerly_ has joined #ste||ar

12:21 <jbjnr> question: when a task completes - and triggers a continuation - do we know on which thread number the task that is completing is on. It would be trivial in my sceduler to use the same queue for the continuation as a first choice to run on

12:21 <jbjnr> instead of the round robin we use currently

12:21 <heller> yes

12:21 <jbjnr> this would help numa placement

12:21 <hkaiser_> on the thread which makes the future ready

12:21 <heller> jbjnr: good idea!

12:21 <heller> not when it has async launch policy

12:22 <hkaiser_> well, then it's scheduled on the current thread, I believe

12:22 <heller> not sure about that right now

12:22 <hkaiser_> let's have a look

12:23 <jbjnr> I'm talking about the queues in the thread pool. When a task is added to the queue, we use curr_queue++ and the tasks are added round robin, but if I know where the current one is ....

12:23 <hkaiser_> not always

12:23 <jbjnr> correct

12:24 <heller> as far as I can see, there is no thread specified

12:25 <hkaiser_> nod, I agree

12:25 <heller> https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/lcos/local/packaged_continuation.hpp#L321

12:25 <hkaiser_> right

12:26 <heller> jbjnr: this defaults to num_thread == -1, and only then our current schedulers do the round robin thing

12:26 <jbjnr> yes, there we do not pass the current queue number into create_thread

12:26 <jbjnr> yes, that's what I want, to replace the -1 with the one we are on now

12:27 <heller> so you can either change the thread placement policy locally (in that code) or in your scheduler globally

12:27 <jbjnr> I could get the thread num_tss and use that?

12:27 <heller> you have to figure out if the code on line 353 or on line 321 is called though ;)

12:27 <heller> one sec

12:29 <heller> ok, can't find the place where I did that last time...

12:32 <jbjnr> from here https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/lcos/local/packaged_continuation.hpp#L275

12:33 <heller> where is the actual thread scheduling happening though?

12:33 <heller> jbjnr: btw, what is median timing for the other solutions?

12:34 <jbjnr> ?

12:34 <jbjnr> don't understand the question

12:37 <jbjnr> so the thread init data comes from here and is just default initialized https://github.com/STEllAR-GROUP/hpx/blob/bb7ab04e938727f8f49354991ca520069083cfb2/hpx/runtime/applier/apply.hpp#L195

12:38 <jbjnr> I can just get the current OS thread from thread_num_tss and use that, convert it to pool index and then make sure it goes into that queue first.

12:38 <heller> yes

12:38 <jbjnr> but is that apply function only called for a continuation, or ...

12:39 <heller> FWIW, the round robin scheduling is the most plausible explanation

12:40 <heller> you potentially have cross-domain atomic instructions to schedule those new threads

12:40 <heller> which leads to cache line invalidations etc.

12:40 <heller> which would very well explain this intense delay and the two humps

12:40 <jbjnr> yes, but we're 10ms slow. A cache line fail is tiny compared to that

12:40 <heller> not if it accumulates

12:41 <jbjnr> there is not that much contention

12:41 <jbjnr> these tasks take 8ms each

12:41 <jbjnr> and are running 10ms slow

12:41 <jbjnr> milli, not micro

12:42 <hkaiser_> jbjnr: look at cross numa-domain memory traffic at least once - that will answer your question - no need to second-guess

12:42 <heller> the contention happens when all want to schedule a new task, no?

12:42 <jbjnr> no

12:42 <heller> sure

12:43 <heller> curr_queue_ is a shared atomic, for example, the queues are implemented using atomics

12:43 <jbjnr> we see a lovelt task trace, with all the tasks beind scheduled beautiffully, but significant number of them just task twice as long as the should

12:43 <heller> which explains it

12:43 <heller> you don't see this delay in the task trace

12:43 <jbjnr> yes we do

12:44 <jbjnr> the contention appears as a staggering of the tasks

12:44 <heller> you see it because some tasks take longer, sure

12:44 <jbjnr> this is just tasks taking longer

12:44 <heller> but you don't see this effect as holes in the task trace

12:44 <heller> because scheduling a new task inside of the task does still account to the tasks time

12:45 <heller> there is no suspension or anything going on. the instructions just take longer

12:46 <jbjnr> you see them as holes because raffaele's task profiling doesn't get created until the task is running

12:48 <heller> so the task traces are purely within rafaele's user code, and they don't include, for example the attaching of continuations?

12:48 <jbjnr> essentially yes

12:48 <heller> ok

12:49 <heller> then this theory is bullshit

12:49 <jbjnr> we can tell the difference between tasks starting late and tasks taking longer.

12:49 <heller> did you figure out what those remaining 4 OS threads are?

12:51 <jbjnr> there's a boost::asio::servive thread that I thought had disabled, and possibly an apex thread

12:51 <jbjnr> I got distracted.

12:51 <jbjnr> if HPX_HAVE_NETWORKING is OFF, should the tcp threads still be created, or skipped altogether?

12:55 <zao> From Raymond Chen himself - »There's also std::future, but its lack of composability makes it a poor choice for asynchronous programming.»

12:55 <zao> (about the PPL, but still amusing to see)

12:56 <heller> jbjnr: they shouldn't run if you disabled networking

12:56 <heller> zao: source?

12:56 <zao> https://blogs.msdn.microsoft.com/oldnewthing/20170710-00/?p=96565

12:57 diehlpk_work has joined #ste||ar

12:58 <diehlpk_work> Next week I will be at a conference trying to advertise HPX in the engineering community. So I will be not available via irc

13:17 mcopik has joined #ste||ar

13:19 Guest25792 has quit [Quit: This computer has gone to sleep]

13:20 zbyerly_ has quit [Remote host closed the connection]

13:23 vamatya has joined #ste||ar

13:24 <jbjnr> hkaiser_: do you know if apex allows collection of papi type data yet?

13:24 <jbjnr> (HPX5 git repo is not getting much activity these days)

13:26 hkaiser_ has quit [Quit: bye]

13:28 <K-ballo> I love the new state of the branches btw.. can we drop 5 or 6 more? or is it asking for too much?

13:30 vamatya has quit [Ping timeout: 260 seconds]

13:30 <jbjnr> hmm. I thought I deleted all mine, but some are still there

13:32 <diehlpk_work> 0'reilly is not interested in publishing the HPX book

13:32 <K-ballo> lol, I was just kidding, didn't mean yours specifically jbjnr

13:35 <jbjnr> K-ballo: well if I do a count of branches in my local tree, including all remote clone branches, it's 647, which frankly is a trifle large, so a bit of a cleanup is not a bad thing

13:35 <K-ballo> wow :|

13:36 <K-ballo> I assume you've never did prune on sync?

13:36 <jbjnr> and HPX is just one of several projects that I have too many branches in :(

13:36 <jbjnr> I don't prune ofter. Too afrid of losing something that "might be useful one day"

13:37 <K-ballo> then I can see getting to 647 for HPX rather quickly

13:37 <jbjnr> indeed

13:37 <K-ballo> I have like 39.. counting by hand, not sure how to measure properly

13:37 <jbjnr> I have a similar number for paraview/vtk/vtkm etc

13:38 <K-ballo> I prune pretty much every time I fetch from upstream

13:38 <jbjnr> 39 sounds about right for local branches. I also counted the remote copies etc etc. Say 100 each for 5 or 6 repos in all.

13:39 <jbjnr> so most are copies

13:39 <K-ballo> I count 9 local refs, 9 in origin, 21 in upstream

13:40 <K-ballo> local + origin = 15 unique

13:43 aserio has joined #ste||ar

13:46 bikineev has joined #ste||ar

13:58 hkaiser has joined #ste||ar

14:03 zbyerly_ has joined #ste||ar

14:14 <K-ballo> why is StellarBot blocked on github?

14:18 <heller> what do you mean?

14:30 bikineev has quit [Ping timeout: 258 seconds]

14:40 hkaiser has quit [Quit: bye]

14:41 ajaivgeorge has quit [Quit: ajaivgeorge]

14:43 <wash> aserio: ping

14:46 hkaiser has joined #ste||ar

14:48 <aserio> wash: here (but in a meeting)

14:57 aserio1 has joined #ste||ar

15:00 aserio has quit [Ping timeout: 246 seconds]

15:00 aserio1 is now known as aserio

15:00 <diehlpk_work> In 9 days the second evaluation starts denis_blank ABresting thundergroudon[m taeguk[m]

15:05 bikineev has joined #ste||ar

15:12 <denis_blank> diehlpk_work: Ok, thanks for noticing

15:12 aserio has quit [Ping timeout: 246 seconds]

15:27 bikineev has quit [Ping timeout: 260 seconds]

15:27 mars0000 has joined #ste||ar

15:33 aserio has joined #ste||ar

15:49 bikineev has joined #ste||ar

15:58 EverYoung has joined #ste||ar

16:11 aserio has quit [Ping timeout: 240 seconds]

16:12 aserio has joined #ste||ar

16:19 denis_blank has quit [Quit: denis_blank]

16:39 zbyerly_ has quit [Ping timeout: 276 seconds]

16:45 bikineev has quit [Ping timeout: 240 seconds]

16:47 mars0000 has quit [Quit: mars0000]

16:48 bikineev has joined #ste||ar

16:48 bikineev has quit [Remote host closed the connection]

16:51 <ABresting> diehlpk_work: +1

16:56 vamatya has joined #ste||ar

17:06 aserio has quit [Ping timeout: 240 seconds]

17:19 EverYoung has quit [Ping timeout: 246 seconds]

17:20 EverYoung has joined #ste||ar

17:37 <wash> aserio: We should touch base re the paper sometime next week

17:54 EverYoun_ has joined #ste||ar

17:56 EverYoun_ has quit [Remote host closed the connection]

17:58 EverYoung has quit [Ping timeout: 276 seconds]

17:58 EverYoung has joined #ste||ar

18:00 <github> [hpx] hkaiser force-pushed simple_base_lco from 6430975 to e4a4c2c: https://git.io/vQ5mv

18:00 <github> hpx/simple_base_lco e4a4c2c Hartmut Kaiser: Leave managed base_lco_with_value the default...

18:03 aserio has joined #ste||ar

18:09 ajaivgeorge has joined #ste||ar

18:14 ajaivgeorge has quit [Remote host closed the connection]

18:15 ajaivgeorge has joined #ste||ar

18:17 ajaivgeorge has quit [Read error: Connection reset by peer]

18:17 ajaivgeorge has joined #ste||ar

18:18 antoinet has joined #ste||ar

18:20 antoinet has quit [Remote host closed the connection]

18:20 K-ballo has quit [Read error: Connection reset by peer]

18:22 K-ballo has joined #ste||ar

18:25 ajaivgeorge_ has joined #ste||ar

18:25 ajaivgeorge has quit [Read error: Connection reset by peer]

18:28 ajaivgeorge has joined #ste||ar

18:28 ajaivgeorge_ has quit [Client Quit]

18:30 mars0000 has joined #ste||ar

18:31 <diehlpk_work> Does anyone know why my code compiles and links with gcc, but for clang I got this

18:32 <diehlpk_work> In function `void output<problem::Quasistatic<material::Elastic> >(IO::deck::PD*, problem::Quasistatic<material::Elastic>*)':

18:32 ajaivgeorge has quit [Client Quit]

18:32 ajaivgeorge has joined #ste||ar

18:32 <diehlpk_work> undefined reference

18:39 mars0000 has quit [Quit: mars0000]

18:40 <heller> diehlpk_work: I don't see an undefined reference.

18:41 <heller> And please, use a paste site

18:42 <heller> Looking at my Crystal ball, you probably link against a library which uses an abi incompatible stdlib

18:42 <heller> That is, at least one of your dependencies was compiled with gcc

18:43 <heller> And/or a different flavor of c++

18:44 <diehlpk_work> https://pastebin.com/wWXpGR77

18:44 <diehlpk_work> No, the lib is compiled with clang too

18:46 <heller> You didn't export the symbol?

18:50 <diehlpk_work> Will check after the meeting, but why it will work with gcc then?

18:54 hkaiser has quit [Quit: bye]

18:55 EverYoung has quit [Remote host closed the connection]

18:57 ajaivgeorge has quit [Ping timeout: 268 seconds]

18:57 aserio has quit [Ping timeout: 246 seconds]

18:59 ajaivgeorge has joined #ste||ar

19:40 aserio has joined #ste||ar

20:13 mars0000 has joined #ste||ar

20:18 EverYoung has joined #ste||ar

20:21 bikineev has joined #ste||ar

20:23 hkaiser has joined #ste||ar

20:48 bikineev has quit [Remote host closed the connection]

20:49 <aserio> hkaiser: yt?

21:28 mcopik has quit [Ping timeout: 260 seconds]

21:29 bikineev has joined #ste||ar

21:37 aserio has quit [Quit: aserio]

21:42 diehlpk_work has quit [Quit: Leaving]

21:44 Matombo has quit [Remote host closed the connection]

21:47 mars0000 has quit [Quit: mars0000]

21:51 bikineev has quit [Remote host closed the connection]

21:53 bikineev has joined #ste||ar

22:59 bikineev has quit [Remote host closed the connection]

23:13 EverYoun_ has joined #ste||ar

23:16 EverYoung has quit [Ping timeout: 246 seconds]

23:17 EverYoun_ has quit [Ping timeout: 240 seconds]

23:19 EverYoung has joined #ste||ar

23:19 EverYoung has quit [Remote host closed the connection]

23:20 EverYoung has joined #ste||ar

23:52 hkaiser_ has joined #ste||ar

23:52 hkaiser has quit [Read error: Connection reset by peer]

23:53 hkaiser_ has quit [Read error: Connection reset by peer]

23:54 hkaiser has joined #ste||ar