#ste||ar on 2017-10-23 — irc logs at irclog.cct.lsu.edu

2017-05-17 13:54 aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:11 diehlpk has quit [Ping timeout: 264 seconds]

00:46 diehlpk has joined #ste||ar

01:03 gedaj has joined #ste||ar

01:28 parsa has joined #ste||ar

01:33 parsa| has joined #ste||ar

01:33 parsa has quit [Ping timeout: 258 seconds]

01:36 parsa has joined #ste||ar

01:39 parsa| has quit [Ping timeout: 248 seconds]

01:42 gedaj has quit [Remote host closed the connection]

01:43 gedaj has joined #ste||ar

01:47 gedaj has quit [Client Quit]

01:48 gedaj has joined #ste||ar

01:59 parsa has quit [Quit: Zzzzzzzzzzzz]

02:13 parsa has joined #ste||ar

02:17 diehlpk has quit [Remote host closed the connection]

02:34 hkaiser has quit [Quit: bye]

02:39 K-ballo has quit [Quit: K-ballo]

02:41 hkaiser has joined #ste||ar

02:44 parsa has quit [Quit: Zzzzzzzzzzzz]

02:58 parsa has joined #ste||ar

03:10 parsa has quit [Quit: Zzzzzzzzzzzz]

03:11 hkaiser has quit [Quit: bye]

03:18 parsa has joined #ste||ar

03:58 jaafar has joined #ste||ar

04:42 gedaj has quit [Read error: Connection reset by peer]

04:42 gedaj_ has joined #ste||ar

04:59 parsa has quit [Quit: Zzzzzzzzzzzz]

05:10 parsa has joined #ste||ar

05:20 parsa has quit [Quit: Zzzzzzzzzzzz]

05:51 jaafar has quit [Ping timeout: 252 seconds]

07:28 <msimberg> heller: I suppose the throttle test worked at some point? it hangs already removing the first pu, nothing to do with shutdown

07:29 <heller> msimberg: yes, it hangs when trying to remove the first PU

07:29 <heller> msimberg: the reason it hangs is that it tries to shutdown the specific PU

07:29 <heller> which is the same as is happening during shutdown

07:29 <msimberg> ah, I misunderstood your discussions then

07:29 <heller> trying to change it to make it work, led to the problems at shutdown

07:30 <heller> the shutdown problems should be fix at the moment

07:30 <msimberg> I see

07:30 <msimberg> heller: throttle test should work on master?

07:30 <heller> no

07:31 <heller> that's the problem ;)

07:31 <msimberg> ok, I'll continue looking into, at least we're talking about the same thing :)

07:31 <msimberg> thanks

07:32 <msimberg> heller: tried address sanitizer on some hpx code

07:32 <msimberg> is this a false positive?

07:32 <msimberg> Direct leak of 200 byte(s) in 200 object(s) allocated from:

07:32 <msimberg> #0 0x7f0a4f06e532 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x99532)

07:32 <msimberg> #1 0x7f0a4e3190e4 in hpx::get_thread_itt_domain() /home/simbergm/src/hpx/src/runtime.cpp:1243

07:32 <heller> no it's not a false positive

07:32 <heller> the thread local never gets freed

07:32 <heller> i ran into the same problem recently

07:33 <msimberg> ok, should I file an issue? there doesn't seem to be anything on gh

07:33 <heller> no there isn't

07:34 <heller> it is kind of a false positive, since all memory gets released at program end

07:34 <heller> and the thread local is supposed to live that long

07:34 david_pfander has joined #ste||ar

07:35 <msimberg> ok

07:38 <msimberg> heller: does that also apply when you're initializing hpx multiple times? then I suppose it's a proper leak...

07:38 <heller> hmm

07:38 <heller> good question

07:40 <msimberg> I think I'll open an issue :)

07:40 <heller> msimberg: you can even fix it

07:41 <msimberg> heller: yeah definitely, I'll try to do that

07:41 <heller> msimberg: in src/runtime.cpp add a reset_itt_domain() call to reset the thread specific pointer. In runtime/threads/detail/scheduling_loop.hpp, call this reset function once the scheduling loop exits

07:42 <msimberg> heller: okay, thanks

09:07 <github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vdAd1

09:07 <github> hpx/gh-pages 875227d StellarBot: Updating docs

10:38 K-ballo has joined #ste||ar

11:01 msimberg has quit [Ping timeout: 248 seconds]

11:18 msimberg has joined #ste||ar

11:33 hkaiser has joined #ste||ar

11:54 msimberg has quit [Ping timeout: 252 seconds]

11:56 msimberg has joined #ste||ar

13:07 parsa has joined #ste||ar

13:10 <msimberg> heller: this line seems to be causing problems: https://github.com/STEllAR-GROUP/hpx/blob/68dc2b7d8dea70b8865c8adb1174fa2648bc56f1/hpx/runtime/threads/detail/scheduling_loop.hpp#L670

13:11 <msimberg> cleanup_terminated(delete_all=true) returns false as long as there is at least one hpx thread on any scheduler pu and hpx_main is still running

13:11 <msimberg> why should all scheduler queues be cleaned up and not only the pu in which the scheduling loop is running?

13:13 <heller> msimberg: correct. I tried to change this, this led to all kinds of problems

13:13 <heller> msimberg: I think we should treat shutdown and suspending a core differently

13:13 <heller> https://github.com/STEllAR-GROUP/hpx/issues/2955

13:15 <hkaiser> msimberg: really? I thought cleanup_terminated was core-specific

13:15 <hkaiser> anyways, gtg - will look later

13:15 hkaiser has quit [Quit: bye]

13:16 <msimberg> hmm, okay

13:17 <msimberg> heller: suspending is quite different from the current remove_processing_unit... should the remove_processing_unit ever work other than at shutdown?

13:18 <heller> msimberg: currently, I don't think o

13:18 <heller> so

13:18 <heller> the throttle test should really just suspend a core

13:18 <msimberg> right, and it just happens to use remove_processing_unit because that already existed?

13:19 <msimberg> heller: and suspending or removing pus dynamically must rely on work stealing, no? what is supposed to happen if you try to remove or suspend the thread you're running on?

13:23 <heller> yes, I thought it might have been a good idea to use the already existing PU

13:23 <msimberg> and what was the problem if you try to only do cleanup_terminated on the one pu rather than all? hangs in other places?

13:23 <heller> what I already implemented is that suspending a thread you are currently running on should work

13:24 <heller> also scheduling of new threads to the suspended queue is avoided

13:24 <heller> yeah, the biggest problem was that I either had to lax shutdown conditiond

13:24 <heller> conditions

13:24 <heller> or that the background thread wasn't shut down correctly

13:26 <msimberg> heller: ok

13:26 <msimberg> but without work stealing it won't work?

13:27 <heller> it will work without work stealing

13:27 <msimberg> hmm, okay

13:27 <msimberg> how?

13:28 <msimberg> how does the suspending thread get moved to another pu?

13:31 parsa has quit [Quit: Zzzzzzzzzzzz]

13:32 <heller> https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/runtime/threads/detail/scheduled_thread_pool_impl.hpp#L1433

13:33 <heller> msimberg: ahh, it kind of depends on thread stealing after all ;)

13:34 <msimberg> indeed

13:35 <msimberg> but then I'm not sure if suspending yourself would ever be useful...

13:42 Guest17093 has quit [Quit: This computer has gone to sleep]

13:42 <heller> sure, since we "suspended" the current thread, meaning, we won't ever reschedule it there again

13:42 <heller> https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/runtime/threads/detail/scheduled_thread_pool_impl.hpp#L1425

13:45 aserio has joined #ste||ar

13:52 eschnett has quit [Quit: eschnett]

14:12 eschnett has joined #ste||ar

14:50 msimberg has quit [Ping timeout: 255 seconds]

15:31 hkaiser has joined #ste||ar

15:51 rod_t has quit [Remote host closed the connection]

15:51 rod_t has joined #ste||ar

16:08 bobakk3r has joined #ste||ar

16:17 bobakk3r has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

16:17 bobakk3r has joined #ste||ar

16:22 bobakk3r has quit [Remote host closed the connection]

16:58 rod_t has left #ste||ar [#ste||ar]

17:26 EverYoung has joined #ste||ar

17:31 david_pfander has quit [Ping timeout: 252 seconds]

17:37 <zao> http://aras-p.info/blog/2017/10/23/Best-unknown-MSVC-flag-d2cgsummary/

17:38 <zao> « Best MSVC flag no one knows about: /d2cgsummary. Tells you what was slow to compile! »

17:49 <github> [hpx] hkaiser created local_new_fallback (+1 new commit): https://git.io/vdxxb

17:49 <github> hpx/local_new_fallback 548a22e Hartmut Kaiser: Fall back to creating local components using local_new...

17:49 <github> [hpx] hkaiser opened pull request #2971: Fall back to creating local components using local_new (master...local_new_fallback) https://git.io/vdxxx

17:50 <github> [hpx] hkaiser closed pull request #2920: Adding test to trigger problem reported in #2916 (master...fixing_2916) https://git.io/vdZ8P

17:50 <github> [hpx] hkaiser deleted fixing_2916 at 30a4c49: https://git.io/vdxpJ

17:55 rod_t has joined #ste||ar

17:59 <K-ballo> zao: did you try it on hpx yet?

18:01 patg[[w]] has joined #ste||ar

18:32 EverYoun_ has joined #ste||ar

18:32 EverYoun_ has quit [Remote host closed the connection]

18:35 EverYoung has quit [Ping timeout: 252 seconds]

18:37 msimberg has joined #ste||ar

18:38 EverYoung has joined #ste||ar

18:38 patg[[w]] has quit [Quit: Leaving]

19:09 msimberg has quit [Ping timeout: 258 seconds]

19:23 msimberg has joined #ste||ar

20:08 <jbjnr> K-ballo: "I believe I misunderstood what your issue was" yes. probably - so the new question would be - do you know how I can get the args from the continaution in such a way that's compatible with the normal async (or hkaiser if you're there)

20:08 <jbjnr> and sorry - fell asleep last night unuxpectedly early.

20:08 <jbjnr> (too much walking around castles and stuff.)

20:09 hkaiser has quit [Quit: bye]

20:11 <K-ballo> jbjnr: no idea, as far as I can tell what you are seeing is by design

20:12 <jbjnr> hmmm. ok - then I'll give up waiting for you to fix it for me and start looking for a solution myself!

20:13 <K-ballo> yeah, or discuss it with hkaiser, he knows the design

20:13 <K-ballo> as far as I can tell, it's not a bug, so I won't have solutions for you

20:13 <K-ballo> we just so happen to be seeing similar failures caused by random bugs due to ... whatever it is

20:25 <jbjnr> ok

20:25 aserio1 has joined #ste||ar

20:28 aserio has quit [Ping timeout: 252 seconds]

20:28 aserio1 is now known as aserio

20:36 <zao> K-ballo: Nope.

20:37 <zao> I'm still fiddling around with my soak testing box. Turns out that putting stuff in a database is hard :)

20:39 eschnett has quit [Quit: eschnett]

20:49 parsa has joined #ste||ar

20:50 <heller> zao: tell me once you have a satisfying scheme ;)

20:51 <zao> Very first thing I try with hoisting common data out of a table is "well, this is a long-standing problem people have with postgresql, this is the hack used nowadays":D

21:09 eschnett has joined #ste||ar

21:10 parsa has quit [Quit: Zzzzzzzzzzzz]

21:13 hkaiser has joined #ste||ar

21:20 <zao> Also trying to figure out what information in Test.xml is actually interesting to keep over time.

21:20 <zao> https://gist.github.com/zao/07d82cdafce5eb7361615c275237e4ac

21:31 <zao> Trying my best to not write a buildbot :D

21:34 <jbjnr> hkaiser: yt?

21:40 msimberg has quit [Ping timeout: 260 seconds]

21:41 <hkaiser> jbjnr: here

21:43 <jbjnr> hkaiser: since the example I was playing with is not "wrong" - can you point me to a method I can use to extract the correct args from the continuation case of async - that way I can forward them on to the numa hint function.

21:43 <jbjnr> is there an 'unwrap' type anywhere that can give me what I want?

21:46 <hkaiser> decltype(unwrapped(...))

21:46 <hkaiser> unwrapping(), that is

21:46 <hkaiser> or unwrap() ?

21:46 <hkaiser> ;)

21:47 <hkaiser> the correct arguments for a continuation are jjust the future the continuation was attached to

21:48 <jbjnr> I think we've established that those are not what are passed into asyn_execute though.

21:49 <jbjnr> I suspect I need to have a look a bit deper into deferred_call and related ytpes

21:49 <hkaiser> jbjnr: that is exactly the problem, I think

21:49 <hkaiser> do you have a small use case now?

21:50 <hkaiser> how about the small example I posted the other day

21:50 <hkaiser> ?

21:50 <jbjnr> well the simle test from yesterday highlights what I see and all I want is simple Ts... args I can forward to the hint function

21:50 <hkaiser> jbjnr: so the example you posted yesterday shows the problem you're having?

21:51 <jbjnr> https://gist.github.com/biddisco/712fea6d794de0419bb7c31261b416e9 - the first putput is from async - it gets simple arguments - the second outputs are from the continuation - not simple arguments. I need to convert them to simple ones.

21:52 <hkaiser> ok

21:52 <jbjnr> that's the output from the simple test

21:52 <jbjnr> of yesterday

21:52 <jbjnr> https://gist.github.com/biddisco/aeaaa9c58069a41ba2a60d47131c0c13 is example code

21:54 aserio has quit [Quit: aserio]

21:56 <hkaiser> jbjnr: ahh, I start to understand what your problem is

21:56 <jbjnr> hurrah!

21:57 <jbjnr> both futures factory and async are able to cope with these args, so I guess I just need to poke around in that code and see if I can overload things to get what I want.

21:59 <hkaiser> jbjnr: give me a sec, you might need to implement a then_execute on your executor, but let me play with this

22:01 <hkaiser> jbjnr: the async_execute on your executor is not executing the original function, but a wrapper function which first calls your function and then does additional things (makes the future ready etc.)

22:01 <jbjnr> ah yes. that's what I must be missing

22:02 <jbjnr> ^ yes

22:02 <jbjnr> I will grep then_execute

22:02 <hkaiser> and the original arguments are bound somewhere in there

22:02 <hkaiser> jbjnr: I just looked we have no example for then_execute :/

22:04 <jbjnr> now I know where to look though. it should help

22:04 <jbjnr> packaged_continuation and then_execute. I will dig around.

22:05 <jbjnr> Currently on vacation, so not making a lot of progress. Back on Thurs.

22:06 <hkaiser> jbjnr: if an executor implements then_execute, it will be invoked here: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/parallel/executors/execution.hpp#L458

22:06 <hkaiser> if it doesn't implement it, th efunctionality is emulated here (by creating a wrapper which is invoked by post?async_execute: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/parallel/executors/execution.hpp#L434-L445

22:07 <jbjnr> thanks

22:07 jaafar has joined #ste||ar

22:08 <hkaiser> then_execute should receive your original arguments, but it also has to handle the actual 'then' aspect, i.e. delay things until the 'predessessor' future has become ready

22:08 <hkaiser> otoh, this shouldn't be hard as you have to call dataflow anyways

22:09 <jbjnr> thanks again. I will spend time playing with that tomorrow. should have most of the evening to look at it.

22:10 <hkaiser> sorry for me not getting what your problem was

22:11 <jbjnr> no worries. I've been away anyway, so it had to wait a few days in any case.

22:11 <jbjnr> terrible conference btw

22:12 <jbjnr> good excuse for a vacation though.

22:19 <zao> "LOGGING LIB internal error - should NEVER happen."

22:24 <zao> Running partitioned_vector_subview_test - https://gist.github.com/zao/604f3f952d03694001b1a29188d1fa2b

22:25 <zao> https://i.imgur.com/JQ26PR9.png

22:25 <zao> Something has scribbled hard into important state there :D

22:25 <zao> Only found it because the binary dung there isn't valid UTF-8.

22:26 bobakk3r has joined #ste||ar

22:27 <zao> Three megabytes of nonsense, tons of strings from what seems to be the binary.

22:29 <zao> I'm quite a number of commits behind I think, f3567170269f869b2bd2f9230e157b6810a594be

22:48 <zao> This is why we can't have nice things, like textual test output :D

23:09 patg[[w]] has joined #ste||ar

23:10 EverYoun_ has joined #ste||ar

23:13 EverYoung has quit [Ping timeout: 258 seconds]

23:15 EverYoun_ has quit [Ping timeout: 255 seconds]

23:16 parsa has joined #ste||ar

23:21 <zao> NAME USED AVAIL REFER MOUNTPOINT

23:21 <zao> stuff/postgres 640M 2.09T 640M /media/stuff/postgres

23:22 <zao> From 231M of raw XML, decoded and uncompressed.

23:25 rod_t has left #ste||ar [#ste||ar]

23:32 patg[[w]] has quit [Quit: Leaving]

23:49 EverYoung has joined #ste||ar

23:49 EverYoung has quit [Remote host closed the connection]

23:52 EverYoun_ has joined #ste||ar

23:53 EverYou__ has joined #ste||ar

23:53 EverYou__ has quit [Remote host closed the connection]

23:54 EverYou__ has joined #ste||ar

23:56 EverYoun_ has quit [Ping timeout: 248 seconds]