aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
gedaj has joined #ste||ar
K-ballo has joined #ste||ar
K-ballo has quit [Ping timeout: 248 seconds]
K-ballo has joined #ste||ar
gedaj has quit [Remote host closed the connection]
gedaj has joined #ste||ar
Guest49917 has joined #ste||ar
Guest49917 is now known as ct_
hkaiser has quit [Quit: bye]
EverYoung has joined #ste||ar
<github>
[hpx] K-ballo force-pushed fix-spreading from 1b2db5d to 9b049f8: https://git.io/vd9pG
gedaj_ has quit [Read error: Connection reset by peer]
gedaj_ has joined #ste||ar
<jbjnr>
msimberg: in answer to another question you asked me - but I don't remember where/when. Yes. In the ideal world, the timer_pool and io_pool would be setup by the RP at star time, and could be accessed by a pool_executor. but there is one subtl difference - they can run os threads instead of hpx threads and so they would be scheduled differently - also they are not necessarily bound to cores...
<jbjnr>
...in the same way that hpx thread pools usuually are - they just ask for os threads and use anything available without binding and could move around if the OS needed/wanted to.
<jbjnr>
they are 'non critical'
<jbjnr>
and mostly sleeping
<hkaiser>
jbjnr: should work, yes
gedaj_ has quit [Remote host closed the connection]
gedaj_ has joined #ste||ar
<hkaiser>
jbjnr: I already created the io_service_thread_pool.hpp just for that - adapting those pools to the rp
gedaj_ has quit [Read error: Connection reset by peer]
gedaj_ has joined #ste||ar
<jbjnr>
hkaiser: cool - I will take a look
<jbjnr>
is it already merged to master or in a branch somewhere
<jbjnr>
hkaiser: 3pm swiss time tomorrow? (2pm portugal tie), not sure what time tha would be in LSU - err 8am?
<hkaiser>
it's on master for weeks now
<hkaiser>
jbjnr: have you moved off daylight saving time yet?
<jbjnr>
sun 29th is when the clocks change in europe, so no, not yet
<msimberg>
hkaiser, jbjnr: sorry went for lunch
<hkaiser>
jbjnr: ok, tomorrow morning at 8am
<msimberg>
also jbjnr that answers my question, thanks
<msimberg>
hkaiser: is it really 5:30 am for you now?
<hkaiser>
yes
<jbjnr>
msimberg: he's a workalholic
<msimberg>
do you two have each others skype information? my username is mikaelsimberg...
<hkaiser>
no, I'm terribly sick and can't sleep
<msimberg>
early bird sounds nicer
<jbjnr>
sorry to hear that.
<jbjnr>
get well soon
<hkaiser>
using you guys as a distraction ;)
<msimberg>
indeed, get well soon
<jbjnr>
flu type sick, or something worse?
<hkaiser>
flu sick
<jbjnr>
good. (that it's not worse)
<jbjnr>
(too many people dropping dead around me at the moment)
<msimberg>
well if the sickness gets worse (hopefully not) we can postpone
<hkaiser>
msimberg: I hope it will get better soon - let's plan for it
<hkaiser>
jbjnr: yah, I' going to drop dead soon anyways... - get ready ;)
<hkaiser>
50 years from today everything will be over for sure
<github>
[hpx] hkaiser created fixing_2959 (+1 new commit): https://git.io/vdHEK
<github>
hpx/fixing_2959 adcd1df Hartmut Kaiser: Fixing a couple of held locks during exception handling...
<github>
[hpx] hkaiser opened pull request #2966: Fixing a couple of held locks during exception handling (master...fixing_2959) https://git.io/vdHEX
<jbjnr>
hkaiser: I've got problems with my guided_pool_executor - some executors have async_execute marked as const others not.
<jbjnr>
is there a rule for them - should they all be const?
<jbjnr>
heller: AC2017 acceptance was 17% - unbelievable!
<heller>
jbjnr: nice!
<jbjnr>
still crap though
eschnett has joined #ste||ar
<jbjnr>
makes me feel slightly better
<hkaiser>
jbjnr: they can be anything, const, non-const, could even be static - it shouldn't matter for you
<hkaiser>
do you directly access async_execute of your executor?
<hkaiser>
we should soon hear from ASPLOS
<jbjnr>
I call " auto new_future = gf2.then(guided_cont_exec, [](hpx::future<double> &&df) { lambda in here" - but I get
<msimberg>
(and possibly the other checks of the same kind in that file)
<msimberg>
why does it not wait for the thread_count to be 0?
<hkaiser>
because the thread executing this code counts as well
<hkaiser>
(that's the '1 +')
<hkaiser>
and the hpx::get_os_thread_count() accounts for the background threads each of the scheduler threads maintains
<hkaiser>
msimberg: ^^
<msimberg>
hkaiser: tm.get_thread_count() returns the number of hpx threads in each pool (scheduler), correct?
<hkaiser>
the overall count of active (non-terminated) hpx threads
<hkaiser>
across all pools
<hkaiser>
so using hpx::get_os_thread_count() is probably wrong, even
<msimberg>
right, across all pools
<hkaiser>
not sure if it returns the number of active scheduler (os-threads) across all pools
<hkaiser>
might need fixing
<hkaiser>
heller: could that explain your hangs during shutdown?
<msimberg>
looks like it's the number of os threads in the config
<hkaiser>
nod
<msimberg>
so exactly what you said
<hkaiser>
msimberg: first task for you: fix this ;)
ct_ has quit [Ping timeout: 240 seconds]
<msimberg>
hkaiser: if you give me enough time...
<heller>
hkaiser: yes, that is the reason. I tried to get rid of the background thread, but not succesfully
<hkaiser>
sure, as much as you need - this would force you to get into things quickly
<hkaiser>
heller: wouldn't it be enough to always know how many scheduler threads are active, or even how many background threads are active?
<msimberg>
let's talk tomorrow :)
<hkaiser>
ok
<heller>
hkaiser: yes, there is another problem though, when you want to remove a PU, you have to kill its background thread, otherwise it won't get down
<hkaiser>
hmm, that has worked before
<hkaiser>
just exit the loop
<hkaiser>
we don't have to 'kill' threads during normal shutdown, do we?
<heller>
no
<heller>
but the dynamic removal/addition has impact on this, as we use the same mechanism there
<heller>
it's all mixed up in strange ways
<hkaiser>
ok
<heller>
I can't really tell what's going on ...
<hkaiser>
I don't see this yet, but you have looked much closer into this lately
<heller>
yeah, it's way too complicated ...
<heller>
I have no clear understanding of it either
<hkaiser>
ok
jaafar has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 252 seconds]
jaafar has quit [Ping timeout: 255 seconds]
<jbjnr>
"hellerI have no clear understanding of it either: - there's nothing like professional software development and standards!
<hkaiser>
jbjnr: if we knew what we were doing it would be called research
<hkaiser>
would not*
<jbjnr>
all the code around the new RP stuff needs to be cleaned up and looked at carefully.
<jbjnr>
I suggest msimberg does it :)
<heller>
I second that :P
<jbjnr>
carried then!
<jbjnr>
since he has to fix thread pool suspend/resume anyway, it all fits nicely.
<hkaiser>
msimberg: heh - that's called 'crossed the hallway at the wrong moment' ...
<hkaiser>
;)
<jbjnr>
I'd quite like to see the threadmanager class somehow merged or integrated a bit with RP. (just my own personal wish).
<hkaiser>
jbjnr: those stand for different things
<hkaiser>
let's keep them separate for now
<jbjnr>
yeah. just don't like quite a lot of the api/code duplciation that I see. but just a cleanup will be enough
<hkaiser>
ok
<msimberg>
I feel the pressure rising...
* zao
pokes relief holes in msimberg
<msimberg>
much better!
<msimberg>
hkaiser: your fix for the ini thing works very nicely btw, thank you!
eschnett has quit [Quit: eschnett]
<jbjnr>
msimberg: the secret is to know how to ask the right questions - what you do is say "hkaiser - if I wanted to change this to that, wuld this other thing work?" then he says, no no, you want to do it like this, and then he does it for you.
<jbjnr>
(same rule applies to heller and to me if we are not too busy.)
aserio has joined #ste||ar
zbyerly__ has quit [Ping timeout: 255 seconds]
<zao>
Nerd sniping, eh?
eschnett has joined #ste||ar
<aserio>
+1
K-ballo has joined #ste||ar
<hkaiser>
msimberg: pls add a note to the ticket for the records
<github>
[hpx] K-ballo force-pushed fix-spreading from 9b049f8 to 21d08b6: https://git.io/vd9pG
<heller>
didn't we have hpx::lcos::local::static once?
<hkaiser>
sure we do
<heller>
where?
<heller>
I only see reinitializable_static
<K-ballo>
there's util::static
<heller>
yeah, I need hpx locks
<heller>
mutecis
<hkaiser>
heller: static_ nowadays uses thread_local anyways, so why bother?
<heller>
hkaiser: I get deadlocks with static because I need to block in the constructor ... :/
<hkaiser>
ahh no, scratch that
<aserio>
hkaiser: see pm
<K-ballo>
local::static would use hpx's call_once
<hkaiser>
right
<heller>
yeah, that's what I want ...
<heller>
if I use static foo f; and f's ctor blocks, I am screwed
<heller>
if the access is happening very frequently and it's not ready yet
<heller>
and from a lot of HPX threads
<hkaiser>
heller: what is it waiting for?
<heller>
basename registration
<hkaiser>
does that give you a future?
<hkaiser>
doesn't*
<heller>
sure
<hkaiser>
well, you know better - just create a lcos::local::static_, then
<heller>
I might be able to get rid of waiting in this ctor
<heller>
might be an idea eventually
<heller>
reinitializable_static is good enough though
<heller>
if it would work...
<heller>
I'll make the ctor nonblocking ...
<hkaiser>
store the future and append a continuation
<heller>
yeah, i'll try that
<parsa[w]>
how does one get the current type of the data in a std::variant?
<K-ballo>
type? it's a runtime property
<K-ballo>
there's .index() for the active alternative
<hkaiser>
parsa[w]: or get_if<type>() if you want to check for a particular type
<parsa[w]>
hkaiser: would it throw?
<hkaiser>
no, it returns pointers
<heller>
hkaiser: splendid. By far the better solution ;)
<hkaiser>
could be nullptr
* jbjnr
still can't get his async continuation test to compile. grrrrr.
<hkaiser>
I offered to help _ pfff
<parsa[w]>
just to check? so say a .get<double> == nullptr is true if there's an int inside
<hkaiser>
int* p = get_if<int>(v); if (p != nullptr) ...
<hkaiser>
int* p = get_if<int>(&v); if (p != nullptr) ...
<parsa[w]>
that worked. thanks
<diehlpk_work>
hkaiser, heller, zbyerly When do we plan to skype today?
<hkaiser>
don't know - have we scheduled it already?
aserio has quit [Remote host closed the connection]
aserio has joined #ste||ar
<diehlpk_work>
hkaiser, No, there was some confsion with time zones
zbyerly__ has joined #ste||ar
<diehlpk_work>
I just know that we have to submit the final version this Friday
<aserio>
wash[m]: Will you be joining us
<hkaiser>
nod, zach promised to make that happen
<diehlpk_work>
zbyerly, Is changing the template and addressed some of the remarks
<diehlpk_work>
For one of them he would need hkaiser's help
<hkaiser>
he didn't say anything when talked to him yesterday
<jbjnr>
hkaiser: the problem seems to be that future 1, returns a double and the continuation is note getting a double, but a boost::intrusive_ptr<hpx::lcos::detail::continuation<hpx::lcos::future<double>, hpx::util::detail::bound<hpx::util::detail::one_shot_wrapper<hpx_main(boost::program_options::variables_map&)::<lambda(hpx:: lcos::future<double>&&)> >(const hpx::util::detail::placeholder<1>&)>, d
<jbjnr>
ouble> >
<jbjnr>
what's going on there? it shouldhave been 'unwrapped' by my unwarpping call.
<hkaiser>
jbjnr: difficult to tell without seeing the code ;)
<hkaiser>
or a minimal self-contained test case
<diehlpk_work>
hkaiser, Great to see developers from across the world involved in this effort. More than the number of commits, a report or summary on what were the features requested by the users and if they were addressed or not by HPX - narrative along these lines would be more useful.
<diehlpk_work>
Can you say anhything about this?
<jbjnr>
hkaiser: will try. bit complicated. gtg now doing talk in 20mins.
<hkaiser>
diehlpk_work: not sure what you're referring to
<diehlpk_work>
hkaiser, OpenSuCo paper
<diehlpk_work>
One of the reviewers asked this
<zbyerly__>
hkaiser, for example, "hey I need migration for my application"
<zbyerly__>
hkaiser, "okay, sure, we will put that into HPX"
<diehlpk_work>
I think they want to know if people request for features or if the contribute to hpx
<diehlpk_work>
and how we handle these requests
<hkaiser>
asked what?
<hkaiser>
guys, please stop talking in riddles, I have no idea what's going on
<zbyerly__>
hkaiser, one of the reviewers of the OpenSuCo paper asked: "Great to see developers from across the world involved in this effort. More than the number of commits, a report or summary on what were the features requested by the users and if they were addressed or not by HPX - narrative along these lines would be more useful."
<hkaiser>
uhh
<hkaiser>
should we list the closed tickets?
<zbyerly__>
hkaiser, so diehlpk_work and I were wondering if you have any examples of features that users asked for
<hkaiser>
I see
<hkaiser>
let me think
<hkaiser>
zbyerly__: flexible but standards conforming parallel algorithms
<hkaiser>
IBVerbs and libfabrics parcelports
<hkaiser>
thread priorities were a feature many people asked for
<zbyerly__>
hkaiser, that's perfect
<zbyerly__>
hkaiser, I will write a paragraph about that, i'll ping you here if i have any questions
<jbjnr>
how about this for a feature request. "Make the code actually work and not print 'hello world'more times than I actually asked for when I run a simple test"
<jbjnr>
and custom thread pools!
<hkaiser>
jbjnr: 'make it the best library you know which actually works'
<heller>
hkaiser: zbyerly: the ratio of closed tickets coming from non core devs vs total
<hkaiser>
that's not too high
aserio has quit [Ping timeout: 246 seconds]
<wash[m]>
Hey aserio, at the llvm dev meeting today
<wash[m]>
aserio: I should be there next week :)
rod_t has joined #ste||ar
<parsa[w]>
hkaiser: should phylanx throw an exception if someone calls matrix on a node_data with a scalar value?
<hkaiser>
parsa[w]: don't think so - did the old code throw?
<parsa[w]>
everything was a matrix in the old code
<hkaiser>
right
<parsa[w]>
we decided not to convert everything to a matrix with blaze
<hkaiser>
parsa[w]: pls don't try to implement any new functionality
<hkaiser>
parsa[w]: did we?
<parsa[w]>
blaze vectors ops require a vector, not a matrix
aserio has joined #ste||ar
<parsa[w]>
so i thought we decided we preserve the type of whatever the user gave us
<hkaiser>
yah, but couldn't you represent a vector as a matrix 1xN and use matrix().column(0) or something?
<parsa[w]>
i can do that, but it'd look awful
<hkaiser>
is this a beauty contest? ;)
<parsa[w]>
fair enough
<parsa[w]>
hkaiser: in that case, what's a move constructor that gets a vector and is supposed to store a matrix supposed to do?
<hkaiser>
if everything is a matrix then this should work, no?
david_pfander has quit [Ping timeout: 260 seconds]
<parsa[w]>
yeah but the move won't be doing anything
<parsa[w]>
i'll be copying stuff anyway
<hkaiser>
well, I hope it would move the matrix
zbyerly__ has quit [Remote host closed the connection]
<hkaiser>
parsa[w]: don't worry about this, just make it work first
<parsa[w]>
okay, but i'll be copying vectors to new matrices
<hkaiser>
nod
<hkaiser>
make a note, we'll revisit this
EverYoung has joined #ste||ar
patg has joined #ste||ar
zbyerly_ has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
zbyerly_ has quit [Remote host closed the connection]
zbyerly_ has joined #ste||ar
<Bibek>
hkaiser : I am seeing the assertion failure that I was seeing for the while primitive on CircleCI in the for_loop pull request although I have not touched the while loop code in any way. But If I try to pull the latest master from the phylanx repo, all the tests pass including the while operation test.
K-ballo has quit [Read error: Connection reset by peer]
<hkaiser>
Bibek: the while loop is failing on your branch?
<aserio>
hkaiser: after running HPX_TEST(!phylanx::execution_tree::valid(phylanx::execution_tree::extract_literal_value(p)); I get --primitive_argument_type does not hold a literal value type: HPX(bad_parameter)
<jbjnr>
hkaiser: did we decide "yes" on a skype call tomorrow? (8am your time?)
jaafar has joined #ste||ar
jaafar has quit [Read error: Connection reset by peer]
jaafar_ has joined #ste||ar
<hkaiser>
jbjnr: I think so, yes
<hkaiser>
aserio: yah
<hkaiser>
makes sense - your if() works as expected ;)
eschnett has quit [Quit: eschnett]
Aalice has joined #ste||ar
<Aalice>
@hkaiser, are you available for a phone call?
aserio has quit [Quit: aserio]
<hkaiser>
Aalice: sure
<hkaiser>
Aalice see pm, pls
<parsa[w]>
hkaiser: ping
jaafar_ has quit [Remote host closed the connection]
akheir has quit [Remote host closed the connection]
<hkaiser>
ok, should I just build it to see the problem?
<jbjnr>
if you checkout the branch, run cmake then make guided_pool_test_exe to see the problem
<jbjnr>
that'd be great thanks
<jbjnr>
I must sign off in a mo unfortunately. Don't worry about this today, but if you have time tomorrow etc .... I will spend another day poking around to see if I can work out what's wrong
jbjnr has quit [Quit: ChatZilla 0.9.93 [Firefox 56.0.1/20171002220106]]
rod_t has left #ste||ar [#ste||ar]
jbjnr_ has joined #ste||ar
<jbjnr_>
rats. I hit the wrong key and disconnected
jbjnr_ has quit [Client Quit]
jbjnr has joined #ste||ar
<jbjnr>
grrr..
<jbjnr>
the numa hint function is called with 2 args instead of 1, and it seems like the contunuation args are munged somehwere
<jbjnr>
note that I may have left istakes in the code as I was changing bits to experiment with differnet errors, but commenting out the numa_function call and using a const int instead compiles ok.