aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
diehlpk has quit [Ping timeout: 264 seconds]
diehlpk has joined #ste||ar
gedaj has joined #ste||ar
parsa has joined #ste||ar
parsa| has joined #ste||ar
parsa has quit [Ping timeout: 258 seconds]
parsa has joined #ste||ar
parsa| has quit [Ping timeout: 248 seconds]
gedaj has quit [Remote host closed the connection]
gedaj has joined #ste||ar
gedaj has quit [Client Quit]
gedaj has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
diehlpk has quit [Remote host closed the connection]
hkaiser has quit [Quit: bye]
K-ballo has quit [Quit: K-ballo]
hkaiser has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
hkaiser has quit [Quit: bye]
parsa has joined #ste||ar
jaafar has joined #ste||ar
gedaj has quit [Read error: Connection reset by peer]
gedaj_ has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
jaafar has quit [Ping timeout: 252 seconds]
<msimberg> heller: I suppose the throttle test worked at some point? it hangs already removing the first pu, nothing to do with shutdown
<heller> msimberg: yes, it hangs when trying to remove the first PU
<heller> msimberg: the reason it hangs is that it tries to shutdown the specific PU
<heller> which is the same as is happening during shutdown
<msimberg> ah, I misunderstood your discussions then
<heller> trying to change it to make it work, led to the problems at shutdown
<heller> the shutdown problems should be fix at the moment
<msimberg> I see
<msimberg> heller: throttle test should work on master?
<heller> no
<heller> that's the problem ;)
<msimberg> ok, I'll continue looking into, at least we're talking about the same thing :)
<msimberg> thanks
<msimberg> heller: tried address sanitizer on some hpx code
<msimberg> is this a false positive?
<msimberg> Direct leak of 200 byte(s) in 200 object(s) allocated from:
<msimberg> #0 0x7f0a4f06e532 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x99532)
<msimberg> #1 0x7f0a4e3190e4 in hpx::get_thread_itt_domain() /home/simbergm/src/hpx/src/runtime.cpp:1243
<heller> no it's not a false positive
<heller> the thread local never gets freed
<heller> i ran into the same problem recently
<msimberg> ok, should I file an issue? there doesn't seem to be anything on gh
<heller> no there isn't
<heller> it is kind of a false positive, since all memory gets released at program end
<heller> and the thread local is supposed to live that long
david_pfander has joined #ste||ar
<msimberg> ok
<msimberg> heller: does that also apply when you're initializing hpx multiple times? then I suppose it's a proper leak...
<heller> hmm
<heller> good question
<msimberg> I think I'll open an issue :)
<heller> msimberg: you can even fix it
<msimberg> heller: yeah definitely, I'll try to do that
<heller> msimberg: in src/runtime.cpp add a reset_itt_domain() call to reset the thread specific pointer. In runtime/threads/detail/scheduling_loop.hpp, call this reset function once the scheduling loop exits
<msimberg> heller: okay, thanks
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vdAd1
<github> hpx/gh-pages 875227d StellarBot: Updating docs
K-ballo has joined #ste||ar
msimberg has quit [Ping timeout: 248 seconds]
msimberg has joined #ste||ar
hkaiser has joined #ste||ar
msimberg has quit [Ping timeout: 252 seconds]
msimberg has joined #ste||ar
parsa has joined #ste||ar
<msimberg> cleanup_terminated(delete_all=true) returns false as long as there is at least one hpx thread on any scheduler pu and hpx_main is still running
<msimberg> why should all scheduler queues be cleaned up and not only the pu in which the scheduling loop is running?
<heller> msimberg: correct. I tried to change this, this led to all kinds of problems
<heller> msimberg: I think we should treat shutdown and suspending a core differently
<hkaiser> msimberg: really? I thought cleanup_terminated was core-specific
<hkaiser> anyways, gtg - will look later
hkaiser has quit [Quit: bye]
<msimberg> hmm, okay
<msimberg> heller: suspending is quite different from the current remove_processing_unit... should the remove_processing_unit ever work other than at shutdown?
<heller> msimberg: currently, I don't think o
<heller> so
<heller> the throttle test should really just suspend a core
<msimberg> right, and it just happens to use remove_processing_unit because that already existed?
<msimberg> heller: and suspending or removing pus dynamically must rely on work stealing, no? what is supposed to happen if you try to remove or suspend the thread you're running on?
<heller> yes, I thought it might have been a good idea to use the already existing PU
<msimberg> and what was the problem if you try to only do cleanup_terminated on the one pu rather than all? hangs in other places?
<heller> what I already implemented is that suspending a thread you are currently running on should work
<heller> also scheduling of new threads to the suspended queue is avoided
<heller> yeah, the biggest problem was that I either had to lax shutdown conditiond
<heller> conditions
<heller> or that the background thread wasn't shut down correctly
<msimberg> heller: ok
<msimberg> but without work stealing it won't work?
<heller> it will work without work stealing
<msimberg> hmm, okay
<msimberg> how?
<msimberg> how does the suspending thread get moved to another pu?
parsa has quit [Quit: Zzzzzzzzzzzz]
<heller> msimberg: ahh, it kind of depends on thread stealing after all ;)
<msimberg> indeed
<msimberg> but then I'm not sure if suspending yourself would ever be useful...
Guest17093 has quit [Quit: This computer has gone to sleep]
<heller> sure, since we "suspended" the current thread, meaning, we won't ever reschedule it there again
aserio has joined #ste||ar
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
msimberg has quit [Ping timeout: 255 seconds]
hkaiser has joined #ste||ar
rod_t has quit [Remote host closed the connection]
rod_t has joined #ste||ar
bobakk3r has joined #ste||ar
bobakk3r has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
bobakk3r has joined #ste||ar
bobakk3r has quit [Remote host closed the connection]
rod_t has left #ste||ar [#ste||ar]
EverYoung has joined #ste||ar
david_pfander has quit [Ping timeout: 252 seconds]
<zao> « Best MSVC flag no one knows about: /d2cgsummary. Tells you what was slow to compile! »
<github> [hpx] hkaiser created local_new_fallback (+1 new commit): https://git.io/vdxxb
<github> hpx/local_new_fallback 548a22e Hartmut Kaiser: Fall back to creating local components using local_new...
<github> [hpx] hkaiser opened pull request #2971: Fall back to creating local components using local_new (master...local_new_fallback) https://git.io/vdxxx
<github> [hpx] hkaiser closed pull request #2920: Adding test to trigger problem reported in #2916 (master...fixing_2916) https://git.io/vdZ8P
<github> [hpx] hkaiser deleted fixing_2916 at 30a4c49: https://git.io/vdxpJ
rod_t has joined #ste||ar
<K-ballo> zao: did you try it on hpx yet?
patg[[w]] has joined #ste||ar
EverYoun_ has joined #ste||ar
EverYoun_ has quit [Remote host closed the connection]
EverYoung has quit [Ping timeout: 252 seconds]
msimberg has joined #ste||ar
EverYoung has joined #ste||ar
patg[[w]] has quit [Quit: Leaving]
msimberg has quit [Ping timeout: 258 seconds]
msimberg has joined #ste||ar
<jbjnr> K-ballo: "I believe I misunderstood what your issue was" yes. probably - so the new question would be - do you know how I can get the args from the continaution in such a way that's compatible with the normal async (or hkaiser if you're there)
<jbjnr> and sorry - fell asleep last night unuxpectedly early.
<jbjnr> (too much walking around castles and stuff.)
hkaiser has quit [Quit: bye]
<K-ballo> jbjnr: no idea, as far as I can tell what you are seeing is by design
<jbjnr> hmmm. ok - then I'll give up waiting for you to fix it for me and start looking for a solution myself!
<K-ballo> yeah, or discuss it with hkaiser, he knows the design
<K-ballo> as far as I can tell, it's not a bug, so I won't have solutions for you
<K-ballo> we just so happen to be seeing similar failures caused by random bugs due to ... whatever it is
<jbjnr> ok
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 252 seconds]
aserio1 is now known as aserio
<zao> K-ballo: Nope.
<zao> I'm still fiddling around with my soak testing box. Turns out that putting stuff in a database is hard :)
eschnett has quit [Quit: eschnett]
parsa has joined #ste||ar
<heller> zao: tell me once you have a satisfying scheme ;)
<zao> Very first thing I try with hoisting common data out of a table is "well, this is a long-standing problem people have with postgresql, this is the hack used nowadays":D
eschnett has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
hkaiser has joined #ste||ar
<zao> Also trying to figure out what information in Test.xml is actually interesting to keep over time.
<zao> Trying my best to not write a buildbot :D
<jbjnr> hkaiser: yt?
msimberg has quit [Ping timeout: 260 seconds]
<hkaiser> jbjnr: here
<jbjnr> hkaiser: since the example I was playing with is not "wrong" - can you point me to a method I can use to extract the correct args from the continuation case of async - that way I can forward them on to the numa hint function.
<jbjnr> is there an 'unwrap' type anywhere that can give me what I want?
<hkaiser> decltype(unwrapped(...))
<hkaiser> unwrapping(), that is
<hkaiser> or unwrap() ?
<hkaiser> ;)
<hkaiser> the correct arguments for a continuation are jjust the future the continuation was attached to
<jbjnr> I think we've established that those are not what are passed into asyn_execute though.
<jbjnr> I suspect I need to have a look a bit deper into deferred_call and related ytpes
<hkaiser> jbjnr: that is exactly the problem, I think
<hkaiser> do you have a small use case now?
<hkaiser> how about the small example I posted the other day
<hkaiser> ?
<jbjnr> well the simle test from yesterday highlights what I see and all I want is simple Ts... args I can forward to the hint function
<hkaiser> jbjnr: so the example you posted yesterday shows the problem you're having?
<jbjnr> https://gist.github.com/biddisco/712fea6d794de0419bb7c31261b416e9 - the first putput is from async - it gets simple arguments - the second outputs are from the continuation - not simple arguments. I need to convert them to simple ones.
<hkaiser> ok
<jbjnr> that's the output from the simple test
<jbjnr> of yesterday
aserio has quit [Quit: aserio]
<hkaiser> jbjnr: ahh, I start to understand what your problem is
<jbjnr> hurrah!
<jbjnr> both futures factory and async are able to cope with these args, so I guess I just need to poke around in that code and see if I can overload things to get what I want.
<hkaiser> jbjnr: give me a sec, you might need to implement a then_execute on your executor, but let me play with this
<hkaiser> jbjnr: the async_execute on your executor is not executing the original function, but a wrapper function which first calls your function and then does additional things (makes the future ready etc.)
<jbjnr> ah yes. that's what I must be missing
<jbjnr> ^ yes
<jbjnr> I will grep then_execute
<hkaiser> and the original arguments are bound somewhere in there
<hkaiser> jbjnr: I just looked we have no example for then_execute :/
<jbjnr> now I know where to look though. it should help
<jbjnr> packaged_continuation and then_execute. I will dig around.
<jbjnr> Currently on vacation, so not making a lot of progress. Back on Thurs.
<hkaiser> jbjnr: if an executor implements then_execute, it will be invoked here: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/parallel/executors/execution.hpp#L458
<hkaiser> if it doesn't implement it, th efunctionality is emulated here (by creating a wrapper which is invoked by post?async_execute: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/parallel/executors/execution.hpp#L434-L445
<jbjnr> thanks
jaafar has joined #ste||ar
<hkaiser> then_execute should receive your original arguments, but it also has to handle the actual 'then' aspect, i.e. delay things until the 'predessessor' future has become ready
<hkaiser> otoh, this shouldn't be hard as you have to call dataflow anyways
<jbjnr> thanks again. I will spend time playing with that tomorrow. should have most of the evening to look at it.
<hkaiser> sorry for me not getting what your problem was
<jbjnr> no worries. I've been away anyway, so it had to wait a few days in any case.
<jbjnr> terrible conference btw
<jbjnr> good excuse for a vacation though.
<zao> "LOGGING LIB internal error - should NEVER happen."
<zao> Running partitioned_vector_subview_test - https://gist.github.com/zao/604f3f952d03694001b1a29188d1fa2b
<zao> Something has scribbled hard into important state there :D
<zao> Only found it because the binary dung there isn't valid UTF-8.
bobakk3r has joined #ste||ar
<zao> Three megabytes of nonsense, tons of strings from what seems to be the binary.
<zao> I'm quite a number of commits behind I think, f3567170269f869b2bd2f9230e157b6810a594be
<zao> This is why we can't have nice things, like textual test output :D
patg[[w]] has joined #ste||ar
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 258 seconds]
EverYoun_ has quit [Ping timeout: 255 seconds]
parsa has joined #ste||ar
<zao> NAME USED AVAIL REFER MOUNTPOINT
<zao> stuff/postgres 640M 2.09T 640M /media/stuff/postgres
<zao> From 231M of raw XML, decoded and uncompressed.
rod_t has left #ste||ar [#ste||ar]
patg[[w]] has quit [Quit: Leaving]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoun_ has joined #ste||ar
EverYou__ has joined #ste||ar
EverYou__ has quit [Remote host closed the connection]
EverYou__ has joined #ste||ar
EverYoun_ has quit [Ping timeout: 248 seconds]