#ste||ar on 2017-08-06 — irc logs at irclog.cct.lsu.edu

2017-05-17 13:54 aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:01 zbyerly__ has quit [Ping timeout: 240 seconds]

01:10 mcopik has quit [Ping timeout: 260 seconds]

01:31 parsa[w] has quit [Read error: Connection reset by peer]

02:27 hkaiser has quit [Quit: bye]

02:31 pagrubel has joined #ste||ar

02:52 pagrubel has quit [Ping timeout: 276 seconds]

02:55 K-ballo has quit [Quit: K-ballo]

06:34 vamatya_ has quit [Ping timeout: 246 seconds]

06:52 bikineev has quit [Remote host closed the connection]

08:06 mcopik has joined #ste||ar

08:32 bikineev has joined #ste||ar

09:02 bikineev has quit [Remote host closed the connection]

09:06 <github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/v7rP1

09:06 <github> hpx/gh-pages a72fb30 StellarBot: Updating docs

09:29 bikineev has joined #ste||ar

09:41 bikineev has quit [Remote host closed the connection]

10:52 david_pf_ has joined #ste||ar

10:52 bikineev has joined #ste||ar

10:55 hkaiser has joined #ste||ar

11:22 K-ballo has joined #ste||ar

11:51 <github> [hpx] hkaiser closed pull request #2799: Fix a unit test failure on GCC in tuple_cat (master...tuple_cat_fix) https://git.io/v7Vx9

11:51 <github> [hpx] hkaiser closed pull request #2802: Fix FreeBSD 11 (master...fix-freebsd-11) https://git.io/v7rYW

12:15 bikineev has quit [Remote host closed the connection]

12:41 bikineev has joined #ste||ar

13:12 http_GK1wmSU has joined #ste||ar

13:14 http_GK1wmSU has left #ste||ar [#ste||ar]

13:30 bikineev has quit [Remote host closed the connection]

13:35 <hkaiser> jbjnr: you around?

13:41 pagrubel has joined #ste||ar

13:48 <jbjnr> here for a min

13:48 <jbjnr> hkaiser: ^^

13:48 <jbjnr> what's up

13:48 <hkaiser> jbjnr: just a q

13:49 <hkaiser> how did you envision exposing oversubscription to the rp?

13:50 <jbjnr> if the user says "--hpx:threads=1" but in int main the user adds the same thread to more than 1 pool, it would throw unless the user added --hpx:allow-oversubscription

13:50 <jbjnr> (or similar)

13:50 <hkaiser> ok, that's the command line, but how would the rp do that?

13:51 <jbjnr> currently the RP only provides access to the threads that hpx:threads=N gives from the "old" thread affinity/binding

13:51 <hkaiser> nod

13:51 <jbjnr> so if the user wanted to get all N cores and ignore the hpx:threads option, we'd have to add some new funcions

13:51 <hkaiser> and it throws if two pools want to use the same core

13:52 <hkaiser> is oversubsciption a feature of the rp, the pool or is it PU-related?

13:52 <jbjnr> yes, the user code might need two pools, and add one numa domain to one, and another to the other, but on a sibgle socket machine ... unless they allow oversubscription ...

13:52 <jbjnr> feature of RP I guess

13:53 <hkaiser> ok

13:53 <jbjnr> because you can currently add the same PU to N pools

13:53 <hkaiser> right

13:53 <jbjnr> if you want - but we have not "handled" it yet

13:53 <hkaiser> I know, but I'd like to implement that

13:53 <jbjnr> me too

13:53 <jbjnr> feel free to start

13:54 <hkaiser> also we will need dynamic footprints for a pool

13:54 <jbjnr> all we need to do ig=s generate a pu_mask and get the tread indexing right

13:54 <jbjnr> for dynamic indexing I have a nice idea

13:54 <jbjnr> dynamic pools I mean

13:54 <hkaiser> our pools should already support that

13:55 <jbjnr> We should use a system that is similar to the way Qt layout work, were a pool can be stretchy, or fixed

13:55 <jbjnr> each Qt widget on screen can expand, or not and the user constrains it

13:55 <hkaiser> yes

13:55 <jbjnr> when we create pools, we should use similar semantics

13:55 <hkaiser> that's what we have for the nested schedulers using the old resource_manager

13:56 <jbjnr> (currently there is a problem in the schedulers etc, that if I add 2 cores to a pool, but they are core0 on domain 0, and core 0 on domain 1 - then the indexing might be dodgy)

13:56 <jbjnr> (I need to look into that)

13:57 <hkaiser> the pools don't care, the rp shoul dget that right

13:57 <jbjnr> yes. I'm just not sure if we create them all in the right order and get the pu numbering ight for all a\cases

13:57 <hkaiser> let's write tests

13:58 <jbjnr> so far all my pools have been 'contiguous' etc

13:58 <jbjnr> rn tests this morning and got goor mpi 1,2 ranks, but code locks up on N=4

13:58 <jbjnr> ran tests this morning and got good mpi 1,2 ranks, but code locks up on N=4

13:58 <jbjnr> is there anything we need to check when running in distributed that we might have messed up

13:59 <hkaiser> not sure what you mean

13:59 <jbjnr> (I am using run-hpx-main on all ranks - the code needs it)

13:59 <hkaiser> we need tests for a set of use cases to support

13:59 <jbjnr> yes

13:59 <hkaiser> manual testing usin gthe example does not scale

14:00 <jbjnr> (my tests don't work on N=4, so I am worried that I might have messed up and we create pools more than once by mistake or something like that.)

14:01 <hkaiser> turn it into a test and commit

14:01 <jbjnr> (matrix code btw)

14:01 <jbjnr> but intial results look much better than before - got some normal scaling instead of the massive drop on n=2

14:02 <hkaiser> good

14:02 <jbjnr> still super shit compared to parsec

14:02 <jbjnr> :(

14:02 <hkaiser> so you carve out one or two cores just for the network?

14:02 <jbjnr> yes. I disable all hpx networking and then put all mpi tasks onto an mpi pool

14:03 <jbjnr> cos raffaele has wrapped all his mpi comms in small tasks

14:03 <hkaiser> that will not scale ever

14:03 <jbjnr> should work the same as raw mpi realy

14:03 <hkaiser> as in a futurized execution tree, if some of the tsasks do mpi they will block the whole execution

14:04 <jbjnr> that's why they are in their own pool, and the DAG has been carefully generated to handle them

14:04 <hkaiser> ok

14:04 <jbjnr> this is whay I am doing all this work in the first place

14:06 <hkaiser> jbjnr: even if the mpi runs on separate cores, those tasks will still block all of the futurized execution graph as the mpi will 'cut through' this tree

14:07 <jbjnr> the tasks that need the mpi data cannot run without it - they are blocked on those tasks

14:07 <jbjnr> there is nothing we can do about that

14:08 <jbjnr> but moving them onto their own pool at least stops the blocking mpi calls from blocking other work that is not using the mpi data

14:09 <hkaiser> ok

14:10 <jbjnr> https://pasteboard.co/GEp9f9d.png

14:10 <jbjnr> so far

14:11 <jbjnr> must fix n=4,8,16, ...

14:12 <jbjnr> parsec is just so much better, it is really depressing,

14:17 <hkaiser> jbjnr: do you know why?

14:21 mcopik has quit [Ping timeout: 260 seconds]

14:21 <jbjnr> if I say "because hpx is a bit shit" then you will just shout at me. They schedule tasks before execution, so they optimize the dag, but I am truly stunned by the scale of the difference - I can get 870GFlops on one node - they get 1000 - it all seems to be do do with numa placement as far as I can tell. and the scheduling

14:22 <jbjnr> bbiab

14:22 <hkaiser> jbjnr: we know hpx is shit

14:22 <hkaiser> and yah, numa placement is a big thing

15:44 zbyerly__ has joined #ste||ar

15:57 mcopik has joined #ste||ar

16:51 david_pf_ has quit [Quit: david_pf_]

16:52 mcopik has quit [Ping timeout: 255 seconds]

17:02 hkaiser has quit [Read error: Connection reset by peer]

17:10 hkaiser has joined #ste||ar

17:14 <github> [hpx] hkaiser pushed 1 new commit to partitioned_vector: https://git.io/v7ovn

17:14 <github> hpx/partitioned_vector 1860cc2 Hartmut Kaiser: Fixing test (cherry pick from master)

17:26 <jbjnr> (hkaiser: when I said numa placement earlier - I should really include all cache related effects)

17:38 <jbjnr> hkaiser: I have 3 threads that I cannot account for. if I disable IO_POOL and TIMER_POOL and run with threads=1, I expect one app thread that gets suspended once the worker threads kick in - any idea what the other two are?

17:43 pree_ has joined #ste||ar

17:53 <hkaiser> tcp threads

17:53 <hkaiser> 2 of them

17:54 <hkaiser> jbjnr: ^^

17:58 <github> [hpx] hkaiser pushed 1 new commit to resource_partitioner: https://git.io/v7ofo

17:58 <github> hpx/resource_partitioner fe41eeb Hartmut Kaiser: Fixing inspect problems

18:04 hkaiser has quit [Quit: bye]

18:05 hkaiser has joined #ste||ar

18:08 <heller> jbjnr: also, the parsec code does not include the manual mpi message passing

18:13 <hkaiser> heller: did you see my message the other day?

18:13 <jbjnr> hkaiser: tcp threads. ok. How certain are you that they are sleeping and not interfering with anything?

18:13 <jbjnr> message the other day?

18:14 <hkaiser> 100% certain, like to io-pool and timer threads

18:14 <hkaiser> jbjnr: I asked heller

18:14 <jbjnr> asked what?

18:15 <hkaiser> heller: did you see my message the other day?

18:15 <hkaiser> ^^

18:15 <jbjnr> sorry

18:15 <jbjnr> didn't see the heller bit. just read on from the line before

18:15 <hkaiser> np

18:15 <heller> hkaiser: which one?

18:15 <hkaiser> about your use of intra- and inter-

18:15 <jbjnr> hkaiser: " also, the parsec code does not include the manual mpi message passing" what do you mean by this?

18:15 <heller> ah yes, saw it, thanks!

18:16 <heller> already fixed, thanks

18:16 <hkaiser> jbjnr: heller was saying that ;)

18:16 <heller> jbjnr: didn't we discuss this last time?

18:16 <heller> jbjnr: the parsec code is written 100% in parsec, no manual MPI in there?

18:16 <jbjnr> it uses mpi just like everyone else

18:17 <jbjnr> it's just interspersed with their task scheduling done beforehand

18:17 <heller> ok

18:17 <heller> so the message passing is 100% identical for both the HPX and the Parsec version?

18:18 <heller> sorry if this question annoys you, it's still not 100% clear to me

18:18 <hkaiser> heller: jbjnr is seeing perf differences on a single node already - that needs to be fixed first

18:18 <heller> sure

18:18 <heller> hkaiser: so what's your verdict so far?

18:19 <hkaiser> have read abstract/intro so far only

18:19 <jbjnr> it'snot fair. I work so hard, and I'm still so shit https://pasteboard.co/GEqM2XW.png

18:19 <hkaiser> reads nicely

18:19 <hkaiser> I like the first half of the abstract, the second half is getting weaker

18:20 <heller> good

18:20 <heller> so no reason to ditch the whole thing and start over again ;)

18:20 <hkaiser> jbjnr: numa effects?

18:20 <hkaiser> ;)

18:20 <heller> did you check what happens when staying on one numa node?

18:20 <heller> or a KNL?

18:20 <hkaiser> jbjnr: would also explain why the differences become larger for more data

18:20 <jbjnr> cache choherency and numa, but I'm running out of ideas to tweak things - I have one more thing to try ...

18:21 <heller> jbjnr: two localities per node

18:21 <hkaiser> yah try that

18:21 <jbjnr> hpx doesn't work with 2 localities per node

18:21 <hkaiser> sure it does

18:21 <hkaiser> why shouldn't it?

18:22 <jbjnr> network always bombs out with binding fail errors

18:23 <hkaiser> depends on how you invoke things

18:23 <hkaiser> could also be a problem related to shoshana's new startup code

18:23 <hkaiser> hold on, I tried running hello-world this way, seems to work

18:24 <jbjnr> I would not be able to run the matrix stuff out of the box, would need to make some changes to the row/col distribution etc

18:26 bikineev has joined #ste||ar

18:27 <hkaiser> heller: buildbot is greenish again

18:27 <heller> hkaiser: thanks!

18:27 bikineev has quit [Read error: Connection reset by peer]

18:30 bikineev has joined #ste||ar

18:31 mcopik has joined #ste||ar

18:33 bikineev has quit [Remote host closed the connection]

18:55 bikineev has joined #ste||ar

18:56 bibek_desktop_ has joined #ste||ar

19:01 pree_ has quit [Ping timeout: 260 seconds]

19:02 bikineev has quit [Read error: Connection reset by peer]

19:03 bikineev has joined #ste||ar

19:03 <jbjnr> last set of jobs finished - final results from this run https://pasteboard.co/GEr3Y6d.png not much change in the graphs

19:07 bibek_desktop has quit [*.net *.split]

19:07 auviga has quit [*.net *.split]

19:07 ABresting has quit [*.net *.split]

19:08 zbyerly__ has quit [Ping timeout: 248 seconds]

19:09 <hkaiser> jbjnr: what did you change in the scheduler for this?

19:14 <jbjnr> in these runs, I'm using my scheduler and an mpi pool with 1 thread reserved. the main difference in the scheduler is the placement of tasks and stealing

19:14 <jbjnr> I'm not finished with the scheduler yet, but getting a bit fed up.

19:15 pree_ has joined #ste||ar

19:15 bikineev has quit [Read error: Connection reset by peer]

19:16 <hkaiser> jbjnr: I hear you, it's like a piece of soap in the shower

19:16 <jbjnr> lol

19:16 thundergroudon[m has quit [Ping timeout: 255 seconds]

19:16 taeguk[m] has quit [Ping timeout: 246 seconds]

19:17 <hkaiser> you might have to use one pool per numa domain and be careful about placing tasks

19:17 bikineev has joined #ste||ar

19:17 <hkaiser> so the main difference is the dedicated core for the tasks which do MPI calls

19:18 <jbjnr> I'm almost doing that - in my scheduler, I allocate HPQueues based on numa domain and can control the stealing, so it is almost like having two pools.

19:18 auviga has joined #ste||ar

19:18 ABresting has joined #ste||ar

19:18 <hkaiser> but if the tasks run on the other numa domain you could be still in trouble

19:19 <jbjnr> the mpi pool is helping, but the scheduling improvments make a big differnece too. the speedup in some cases is very significant over the old hpx,

19:19 <jbjnr> ^^yes, the problem is that if you constrin tasks to one domain, then the other cores are idle, and finding a good balance has been tricky

19:19 <hkaiser> well, you don't know whether this is caused by the scheduler alone

19:19 <hkaiser> the one-node diffs are minimal

19:20 pagrubel is now known as patg[[w]]

19:22 <jbjnr> in these plots the one node diffs are not evident, in my other runs they are. I have so many settings to adjust I don't know what I'm actually doing any more.

19:22 <hkaiser> nod

19:22 <hkaiser> jbjnr: are those changes generalizable?

19:24 <jbjnr> IMHO we should throw away the siz schedulers in hpx and replace them with mine. so yes, very generalizable!

19:24 <jbjnr> ^siz

19:24 <jbjnr> ^six

19:24 <jbjnr> #$%^

19:24 <hkaiser> yes, let's do that

19:24 <jbjnr> :)

19:24 <jbjnr> I wasn't serious

19:24 <hkaiser> I was

19:24 <hkaiser> we use just one scheduler 100% anyways

19:25 <jbjnr> Let me do some graphs with all six though before we pursue this further

19:25 <hkaiser> sure

19:25 <jbjnr> I need to fix mine so that it 'always' outperforms the other - currently there are some combinations of params that are worse

19:25 <hkaiser> k

19:26 <jbjnr> which is why say that I don't know what I'm doing. The single node diffs should be larger usually

19:27 <jbjnr> (also - not all codes use high prority task the way this matrix stuff does, so other scheduklers might be appropriate)

19:29 <patg[[w]]> jbjnr: I'd be interested in your graphs

19:32 <jbjnr> patg[[w]]: I'll make sure you see them if and when I do them

19:32 <patg[[w]]> If and when???

19:32 vamatya_ has joined #ste||ar

19:33 patg[[w]] has quit [Quit: Leaving]

19:33 pat[[w]] has joined #ste||ar

19:33 <jbjnr> it takes some work!

19:34 <hkaiser> jbjnr: do you have to stick with mpi for this?

19:35 <hkaiser> can't you additionally use your pp?

19:35 <pat[[w]]> jbjnr: what is the status of your pp?

19:39 <jbjnr> hkaiser: cscs is not interested in my PP and I am under orders to make HPX work with MPI - hence the new pools and RP work

19:40 <hkaiser> understand

19:40 <hkaiser> so heller will need to fix those

19:40 <jbjnr> pat[[w]]: the PP should work ok, can still produce lockups we think at high ranks and intensive thread counts, but not sure - not tested recently

19:40 <jbjnr> hkaiser: I'm still going to work on the PP, just becasue cscs says no, that won't stop me

19:41 <jbjnr> they are wrong about mpi+x

19:41 <hkaiser> lol

19:41 <hkaiser> sure they are

19:41 <jbjnr> when they realize they were wrong, I'll be there to save them!

19:42 <hkaiser> jbjnr: we've got the first small chunk of the big project, btw

19:42 <jbjnr> yay \o/

19:42 <jbjnr> can I tell people?

19:42 <jbjnr> is there an "announcement"

19:43 <hkaiser> jbjnr: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1737785&HistoricalAwards=false

19:43 <jbjnr> thanks. small $$$ though

19:43 <hkaiser> as said small chunk

19:44 <hkaiser> keeps us afloat, though, it's just for one year - so not too bad

19:51 vamatya_ has quit [Ping timeout: 240 seconds]

19:57 pat[[w]] has quit [Quit: Leaving]

20:09 bikineev has quit [Read error: Connection reset by peer]

20:10 bikineev has joined #ste||ar

20:13 <jbjnr> hkaiser: looks like I screwed up and used some of the new scheduler (but no mpi pool) numbers in the plot for the graph in the old hpx scheduler line, so that's why they seem almost the same.

20:13 <jbjnr> (^just fyi)

20:16 bikineev has quit [Read error: Connection reset by peer]

20:22 bikineev has joined #ste||ar

20:28 <hkaiser> jbjnr: ok

20:30 bikineev has quit [Read error: Connection reset by peer]

20:32 bikineev has joined #ste||ar

20:37 bikineev_ has joined #ste||ar

20:37 bikineev has quit [Read error: Connection reset by peer]

20:40 bikineev has joined #ste||ar

20:42 bikineev_ has quit [Ping timeout: 246 seconds]

20:44 bikineev has quit [Read error: Connection reset by peer]

20:46 bikineev has joined #ste||ar

20:46 thundergroudon[m has joined #ste||ar

20:50 bikineev_ has joined #ste||ar

20:52 bikineev has quit [Read error: Connection reset by peer]

20:55 bikineev has joined #ste||ar

20:55 bikineev_ has quit [Read error: Connection reset by peer]

21:10 bikineev_ has joined #ste||ar

21:10 bikineev has quit [Read error: Connection reset by peer]

21:15 taeguk[m] has joined #ste||ar

21:18 pree_ has quit [Quit: AaBbCc]

21:19 bikineev_ has quit [Ping timeout: 246 seconds]

21:19 bikineev has joined #ste||ar

21:21 bikineev has quit [Read error: Connection reset by peer]

21:22 bikineev has joined #ste||ar

21:31 bikineev has quit [Read error: Connection reset by peer]

22:45 <github> [hpx] hkaiser pushed 1 new commit to resource_partitioner: https://git.io/v7onN

22:45 <github> hpx/resource_partitioner cc295bf Hartmut Kaiser: Enable over-subscription of pus in resource_partitioner...

22:48 Reazul has quit [*.net *.split]

22:53 Reazul has joined #ste||ar

23:07 eschnett has quit [Quit: eschnett]

23:13 mcopik has quit [Ping timeout: 240 seconds]