#ste||ar on 2018-10-02 — irc logs at irclog.cct.lsu.edu

2018-08-26 23:03 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

02:02 hkaiser has quit [Quit: bye]

03:15 nanashi55 has quit [Ping timeout: 244 seconds]

03:16 nanashi55 has joined #ste||ar

04:06 K-ballo has quit [Ping timeout: 244 seconds]

04:31 K-ballo has joined #ste||ar

07:26 david_pfander has joined #ste||ar

08:31 eschnett_ has quit [Read error: Connection reset by peer]

08:31 eschnett_ has joined #ste||ar

10:28 nikunj has joined #ste||ar

12:01 ste||ar-github has joined #ste||ar

12:01 ste||ar-github has left #ste||ar [#ste||ar]

12:01 <ste||ar-github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://github.com/STEllAR-GROUP/hpx/commit/3a281b1bf990911a408018945a23269a247797b4

12:01 <ste||ar-github> hpx/gh-pages 3a281b1 StellarBot: Updating docs

12:08 hkaiser has joined #ste||ar

12:08 <hkaiser> heller: g'morning

12:08 <heller> hkaiser: good mornin

12:08 <hkaiser> I'd like to merge #3472 asap

12:09 <hkaiser> it fixes master

12:09 <heller> hkaiser: no way!

12:09 <heller> hkaiser: I already did it

12:09 <hkaiser> :D

12:09 ste||ar-github has joined #ste||ar

12:09 <ste||ar-github> [hpx] sithhell pushed 1 new commit to master: https://github.com/STEllAR-GROUP/hpx/commit/19534c7fa32dcbf053c2d966fa2521d3f9ce6a57

12:09 <ste||ar-github> hpx/master 19534c7 Thomas Heller: Merge pull request #3472 from STEllAR-GROUP/clang_test_workaround...

12:09 ste||ar-github has left #ste||ar [#ste||ar]

12:09 <hkaiser> thanks

12:10 <heller> no problem

12:10 ste||ar-github has joined #ste||ar

12:10 <ste||ar-github> [hpx] sithhell deleted clang_test_workaround at c0f1254: https://github.com/STEllAR-GROUP/hpx/commit/c0f1254

12:10 ste||ar-github has left #ste||ar [#ste||ar]

12:52 eschnett_ is now known as eschnett

12:52 eschnett has quit [Quit: eschnett]

12:59 aserio has joined #ste||ar

13:22 eschnett_ has joined #ste||ar

13:43 hkaiser has quit [Quit: bye]

14:21 hkaiser has joined #ste||ar

14:40 nikunj has quit [Remote host closed the connection]

15:08 aserio has quit [Ping timeout: 264 seconds]

15:10 <hkaiser> heller: any idea how to fix the jemalloc issue?

15:10 <hkaiser> (the hpx_option gone wrong)

15:19 ste||ar-github has joined #ste||ar

15:19 <ste||ar-github> [hpx] hkaiser created fixing_jemalloc_prefix (+1 new commit): https://github.com/STEllAR-GROUP/hpx/commit/aa1f44bd72c2

15:19 <ste||ar-github> hpx/fixing_jemalloc_prefix aa1f44b Hartmut Kaiser: Fixing invalid cmake code if no jemalloc prefix was given

15:19 ste||ar-github has left #ste||ar [#ste||ar]

15:20 ste||ar-github has joined #ste||ar

15:20 <ste||ar-github> [hpx] hkaiser opened pull request #3473: Fixing invalid cmake code if no jemalloc prefix was given (master...fixing_jemalloc_prefix) https://github.com/STEllAR-GROUP/hpx/pull/3473

15:20 ste||ar-github has left #ste||ar [#ste||ar]

15:20 <hkaiser> heller: this ^^ should solve it, pls verify

15:21 aserio has joined #ste||ar

15:48 diehlpk has joined #ste||ar

15:49 K-ballo has quit [Quit: K-ballo]

15:51 K-ballo has joined #ste||ar

15:55 diehlpk has quit [Remote host closed the connection]

15:57 diehlpk has joined #ste||ar

16:33 david_pfander has quit [Ping timeout: 244 seconds]

16:59 aserio has quit [Ping timeout: 260 seconds]

17:16 diehlpk has quit [Ping timeout: 252 seconds]

17:34 aserio has joined #ste||ar

18:22 <heller> hkaiser: hey

18:23 <hkaiser> hey

18:23 <heller> appear.in?

18:24 <hkaiser> heller: I'm in

18:25 <hkaiser> heller: can't hear you

18:43 diehlpk has joined #ste||ar

19:28 hkaiser has quit [Quit: bye]

19:55 <heller> for the protocol: The MPI parcelport on Cray machines just sucks

19:55 <heller> plain and simple...

19:59 <heller> it needs to go ASAP

20:00 <khuck> heller: is libfabric ready for prime-time?

20:00 <heller> khuck: I think so yes.

20:00 <heller> I was using it on a 9k node run recently

20:00 <aserio> heller: slow is better than broken

20:01 <heller> even the TCP parcelport is faster than my implementation

20:02 <heller> than MPI with my applciation

20:02 <heller> aserio: and really, I am using the libfabric parcelport exclusively on cori, no problems there

20:03 <heller> consolidating the parcelports is something for post 1.2

20:05 <K-ballo> https://github.com/STEllAR-GROUP/hpx/blob/master/plugins/parcelport/CMakeLists.txt#L7

20:07 <heller> ha

20:07 <heller> that's a very old comment ;)

20:07 <heller> verbs should work, as well as libfabric

20:37 diehlpk has quit [Ping timeout: 272 seconds]

20:42 <heller> grr, I take that back ... just got a segfault with libfabric :/

21:07 hkaiser has joined #ste||ar

21:11 eschnett_ has quit [Quit: eschnett_]

21:20 mbremer has joined #ste||ar

21:21 jbjnr has quit [Read error: Connection reset by peer]

21:22 jbjnr has joined #ste||ar

21:24 mbremer has left #ste||ar [#ste||ar]

21:38 aserio has quit [Quit: aserio]

22:00 ste||ar-github has joined #ste||ar

22:00 <ste||ar-github> [hpx] biddisco force-pushed guided_executor from 11940fe to a3f161f: https://github.com/STEllAR-GROUP/hpx/commits/guided_executor

22:00 <ste||ar-github> hpx/guided_executor 47697eb John Biddiscombe: Add new guided_pool_executor that passes thread_schedule_hint to the scheduler...

22:00 <ste||ar-github> hpx/guided_executor 6665463 John Biddiscombe: Add overload for pool_numa_hint that can be used with lambdas

22:00 <ste||ar-github> hpx/guided_executor 58a2cd4 John Biddiscombe: Implementing then_execute for guided executor .then continuations with numa hints

22:00 ste||ar-github has left #ste||ar [#ste||ar]

22:34 quaz0r has quit [Ping timeout: 272 seconds]

22:37 <heller> khuck: do you remember if we have our input files still lying around somewhere?

22:38 <khuck> yes

22:38 <khuck> I remember :)

22:39 <khuck> can you access /project/projectdirs/xpress/hpx-lsu-cori-II ?

22:39 <heller> yes

22:39 <khuck> check the scaling subdirectory

22:39 <khuck> I copied your scripts, but hacked many of them

22:40 <khuck> my run results are in /global/cscratch1/sd/khuck/xpress/bell

22:41 <khuck> not sure if I ever got it to run to completion at scale

22:41 <khuck> (with apex)

22:43 <heller> hmm

22:43 <heller> I miss the restart file for 12 LoR

22:43 <khuck> ah

22:43 <khuck> let me look

22:46 <heller> got it

22:47 <heller> well, I calculated the file size...

22:47 <heller> should be 140 gigs

22:47 <khuck> kinda big.

22:48 <khuck> I think I always copied the files from your $SCRATCH and/or project directory

22:48 quaz0r has joined #ste||ar

23:13 quaz0r has quit [Ping timeout: 252 seconds]

23:38 <khuck> @hkaiser good news, or bad news?

23:38 <khuck> hkaiser: lets try that again...would you like the good news or the bad news?

23:44 <khuck> hkaiser: see https://github.com/STEllAR-GROUP/phylanx/issues/589

23:44 <khuck> (the bad news isn't that bad)

23:44 akheir has quit [Quit: Leaving]

23:44 <hkaiser> khuck: what's the bad news?

23:44 <khuck> HPX still calls a handful of posix routines from worker threads

23:45 <heller> hkaiser: text pushed

23:45 <khuck> but phylanx no longer does

23:45 <hkaiser> heller: thanks! bravo! much appreciated!

23:45 <heller> :D

23:45 <hkaiser> khuck: thanks for this - I was not aware of those

23:45 <khuck> np

23:45 <heller> 1.4 GB/s aggregated bandwidth isn't that much ;)

23:45 <hkaiser> khuck: those calls look like to happen before the runtime is up, though

23:46 <khuck> hkaiser: they were on the OS thread that was a "worker". Like I said, they are pretty minor - they hardly showed up on the profile and I had to hunt for them

23:46 <khuck> sorry, across two different threads. but workers nonetheless

23:47 <hkaiser> khuck: ok

23:47 <hkaiser> I'll try to identify them

23:47 <heller> maybe late module loading or ini parsing, maybe?

23:47 <hkaiser> the parse_command_line should happen right at the start, no idea why it shows up to run on a worker thread

23:49 <khuck> it's also possible it happens before that thread is labeled as a "worker"

23:50 quaz0r has joined #ste||ar

23:52 ste||ar-github has joined #ste||ar

23:52 <ste||ar-github> [hpx] sithhell closed pull request #3473: Fixing invalid cmake code if no jemalloc prefix was given (master...fixing_jemalloc_prefix) https://github.com/STEllAR-GROUP/hpx/pull/3473

23:52 ste||ar-github has left #ste||ar [#ste||ar]