aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
diehlpk has quit [Ping timeout: 264 seconds]
diehlpk has joined #ste||ar
gedaj has joined #ste||ar
parsa has joined #ste||ar
parsa| has joined #ste||ar
parsa has quit [Ping timeout: 258 seconds]
parsa has joined #ste||ar
parsa| has quit [Ping timeout: 248 seconds]
gedaj has quit [Remote host closed the connection]
gedaj has joined #ste||ar
gedaj has quit [Client Quit]
gedaj has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
diehlpk has quit [Remote host closed the connection]
hkaiser has quit [Quit: bye]
K-ballo has quit [Quit: K-ballo]
hkaiser has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
hkaiser has quit [Quit: bye]
parsa has joined #ste||ar
jaafar has joined #ste||ar
gedaj has quit [Read error: Connection reset by peer]
gedaj_ has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
jaafar has quit [Ping timeout: 252 seconds]
<msimberg>
heller: I suppose the throttle test worked at some point? it hangs already removing the first pu, nothing to do with shutdown
<heller>
msimberg: yes, it hangs when trying to remove the first PU
<heller>
msimberg: the reason it hangs is that it tries to shutdown the specific PU
<heller>
which is the same as is happening during shutdown
<msimberg>
ah, I misunderstood your discussions then
<heller>
trying to change it to make it work, led to the problems at shutdown
<heller>
the shutdown problems should be fix at the moment
<msimberg>
I see
<msimberg>
heller: throttle test should work on master?
<heller>
no
<heller>
that's the problem ;)
<msimberg>
ok, I'll continue looking into, at least we're talking about the same thing :)
<msimberg>
thanks
<msimberg>
heller: tried address sanitizer on some hpx code
<msimberg>
is this a false positive?
<msimberg>
Direct leak of 200 byte(s) in 200 object(s) allocated from:
<msimberg>
#0 0x7f0a4f06e532 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x99532)
<msimberg>
#1 0x7f0a4e3190e4 in hpx::get_thread_itt_domain() /home/simbergm/src/hpx/src/runtime.cpp:1243
<heller>
no it's not a false positive
<heller>
the thread local never gets freed
<heller>
i ran into the same problem recently
<msimberg>
ok, should I file an issue? there doesn't seem to be anything on gh
<heller>
no there isn't
<heller>
it is kind of a false positive, since all memory gets released at program end
<heller>
and the thread local is supposed to live that long
david_pfander has joined #ste||ar
<msimberg>
ok
<msimberg>
heller: does that also apply when you're initializing hpx multiple times? then I suppose it's a proper leak...
<heller>
hmm
<heller>
good question
<msimberg>
I think I'll open an issue :)
<heller>
msimberg: you can even fix it
<msimberg>
heller: yeah definitely, I'll try to do that
<heller>
msimberg: in src/runtime.cpp add a reset_itt_domain() call to reset the thread specific pointer. In runtime/threads/detail/scheduling_loop.hpp, call this reset function once the scheduling loop exits
<msimberg>
cleanup_terminated(delete_all=true) returns false as long as there is at least one hpx thread on any scheduler pu and hpx_main is still running
<msimberg>
why should all scheduler queues be cleaned up and not only the pu in which the scheduling loop is running?
<heller>
msimberg: correct. I tried to change this, this led to all kinds of problems
<heller>
msimberg: I think we should treat shutdown and suspending a core differently
<hkaiser>
msimberg: really? I thought cleanup_terminated was core-specific
<hkaiser>
anyways, gtg - will look later
hkaiser has quit [Quit: bye]
<msimberg>
hmm, okay
<msimberg>
heller: suspending is quite different from the current remove_processing_unit... should the remove_processing_unit ever work other than at shutdown?
<heller>
msimberg: currently, I don't think o
<heller>
so
<heller>
the throttle test should really just suspend a core
<msimberg>
right, and it just happens to use remove_processing_unit because that already existed?
<msimberg>
heller: and suspending or removing pus dynamically must rely on work stealing, no? what is supposed to happen if you try to remove or suspend the thread you're running on?
<heller>
yes, I thought it might have been a good idea to use the already existing PU
<msimberg>
and what was the problem if you try to only do cleanup_terminated on the one pu rather than all? hangs in other places?
<heller>
what I already implemented is that suspending a thread you are currently running on should work
<heller>
also scheduling of new threads to the suspended queue is avoided
<heller>
yeah, the biggest problem was that I either had to lax shutdown conditiond
<heller>
conditions
<heller>
or that the background thread wasn't shut down correctly
<msimberg>
heller: ok
<msimberg>
but without work stealing it won't work?
<heller>
it will work without work stealing
<msimberg>
hmm, okay
<msimberg>
how?
<msimberg>
how does the suspending thread get moved to another pu?
<zao>
« Best MSVC flag no one knows about: /d2cgsummary. Tells you what was slow to compile! »
<github>
[hpx] hkaiser created local_new_fallback (+1 new commit): https://git.io/vdxxb
<github>
hpx/local_new_fallback 548a22e Hartmut Kaiser: Fall back to creating local components using local_new...
<github>
[hpx] hkaiser opened pull request #2971: Fall back to creating local components using local_new (master...local_new_fallback) https://git.io/vdxxx
<github>
[hpx] hkaiser closed pull request #2920: Adding test to trigger problem reported in #2916 (master...fixing_2916) https://git.io/vdZ8P
EverYoun_ has quit [Remote host closed the connection]
EverYoung has quit [Ping timeout: 252 seconds]
msimberg has joined #ste||ar
EverYoung has joined #ste||ar
patg[[w]] has quit [Quit: Leaving]
msimberg has quit [Ping timeout: 258 seconds]
msimberg has joined #ste||ar
<jbjnr>
K-ballo: "I believe I misunderstood what your issue was" yes. probably - so the new question would be - do you know how I can get the args from the continaution in such a way that's compatible with the normal async (or hkaiser if you're there)
<jbjnr>
and sorry - fell asleep last night unuxpectedly early.
<jbjnr>
(too much walking around castles and stuff.)
hkaiser has quit [Quit: bye]
<K-ballo>
jbjnr: no idea, as far as I can tell what you are seeing is by design
<jbjnr>
hmmm. ok - then I'll give up waiting for you to fix it for me and start looking for a solution myself!
<K-ballo>
yeah, or discuss it with hkaiser, he knows the design
<K-ballo>
as far as I can tell, it's not a bug, so I won't have solutions for you
<K-ballo>
we just so happen to be seeing similar failures caused by random bugs due to ... whatever it is
<jbjnr>
ok
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 252 seconds]
aserio1 is now known as aserio
<zao>
K-ballo: Nope.
<zao>
I'm still fiddling around with my soak testing box. Turns out that putting stuff in a database is hard :)
eschnett has quit [Quit: eschnett]
parsa has joined #ste||ar
<heller>
zao: tell me once you have a satisfying scheme ;)
<zao>
Very first thing I try with hoisting common data out of a table is "well, this is a long-standing problem people have with postgresql, this is the hack used nowadays":D
eschnett has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
hkaiser has joined #ste||ar
<zao>
Also trying to figure out what information in Test.xml is actually interesting to keep over time.
<jbjnr>
hkaiser: since the example I was playing with is not "wrong" - can you point me to a method I can use to extract the correct args from the continuation case of async - that way I can forward them on to the numa hint function.
<jbjnr>
is there an 'unwrap' type anywhere that can give me what I want?
<hkaiser>
decltype(unwrapped(...))
<hkaiser>
unwrapping(), that is
<hkaiser>
or unwrap() ?
<hkaiser>
;)
<hkaiser>
the correct arguments for a continuation are jjust the future the continuation was attached to
<jbjnr>
I think we've established that those are not what are passed into asyn_execute though.
<jbjnr>
I suspect I need to have a look a bit deper into deferred_call and related ytpes
<hkaiser>
jbjnr: that is exactly the problem, I think
<hkaiser>
do you have a small use case now?
<hkaiser>
how about the small example I posted the other day
<hkaiser>
?
<jbjnr>
well the simle test from yesterday highlights what I see and all I want is simple Ts... args I can forward to the hint function
<hkaiser>
jbjnr: so the example you posted yesterday shows the problem you're having?
<hkaiser>
jbjnr: ahh, I start to understand what your problem is
<jbjnr>
hurrah!
<jbjnr>
both futures factory and async are able to cope with these args, so I guess I just need to poke around in that code and see if I can overload things to get what I want.
<hkaiser>
jbjnr: give me a sec, you might need to implement a then_execute on your executor, but let me play with this
<hkaiser>
jbjnr: the async_execute on your executor is not executing the original function, but a wrapper function which first calls your function and then does additional things (makes the future ready etc.)
<jbjnr>
ah yes. that's what I must be missing
<jbjnr>
^ yes
<jbjnr>
I will grep then_execute
<hkaiser>
and the original arguments are bound somewhere in there
<hkaiser>
jbjnr: I just looked we have no example for then_execute :/
<jbjnr>
now I know where to look though. it should help
<jbjnr>
packaged_continuation and then_execute. I will dig around.
<jbjnr>
Currently on vacation, so not making a lot of progress. Back on Thurs.
<hkaiser>
then_execute should receive your original arguments, but it also has to handle the actual 'then' aspect, i.e. delay things until the 'predessessor' future has become ready
<hkaiser>
otoh, this shouldn't be hard as you have to call dataflow anyways
<jbjnr>
thanks again. I will spend time playing with that tomorrow. should have most of the evening to look at it.
<hkaiser>
sorry for me not getting what your problem was
<jbjnr>
no worries. I've been away anyway, so it had to wait a few days in any case.
<jbjnr>
terrible conference btw
<jbjnr>
good excuse for a vacation though.
<zao>
"LOGGING LIB internal error - should NEVER happen."