hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
K-ballo has quit [Quit: K-ballo]
diehlpk_work_ has quit [Remote host closed the connection]
hkaiser has quit [Quit: Bye!]
zao has quit [Ping timeout: 250 seconds]
zao has joined #ste||ar
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 248 seconds]
K-ballo1 is now known as K-ballo
qiu has joined #ste||ar
<jedi18[m]> In our scan partitioner, our implementation is slightly different from the one mentioned here https://en.wikipedia.org/wiki/Prefix_sum.
<jedi18[m]> In the wikipedia implementation all the f1 tasks are done first, and then the prefix sum of the offsets in done in one go (which in our implementation are the f2 tasks) and then the f3 tasks can be run in parallel.
<jedi18[m]> The advantage with our implementation is that the f3 tasks can be started for the earlier chunks without having to wait for all f1 tasks to finish. However we have to wait for the previous chunk's f2 and f3 tasks to finish.
<jedi18[m]> Why dind't we go for the wikipedia implementation since that way all f3 tasks can be run in parallel? Is it so that the scan partitioner can be used by other algorithms?
<jedi18[m]> hkaiser: Should I try adding a scan partitioner tag that uses this wikipedia implementation and use that for exclusive and inclusive scan and see if that speeds up those algorithms?
<hkaiser> jedi18[m]: I'd start with implementing something outside of HPX and see where that gets us
<hkaiser> jedi18[m]: I don't know why we didn't use the wikipedia algorithm , probably just ignorance ;-)
<jedi18[m]> <hkaiser> "jedi18: I'd start with implement..." <- Ok I'll do that, thanks!
qiu has quit [Quit: Client closed]
diehlpk_work has joined #ste||ar
<ms[m]> hkaiser, gnikunj, gonidelis, do you think you'd manage without me in the gsod meeting tomorrow? I can't make it at that time tomorrow... I could make it earlier but I don't think I'm needed
<gonidelis[m]> i could make it earlier if needed
<gonidelis[m]> dkaratza: can you?
<gnikunj[m]> ms earlier as in before 9am cdt?
<ms[m]> gnikunj is that the regular hpx meeting time? If yes, I think that might just work, but as I said I think you can do the regular time without me as well
<gnikunj[m]> I'm fine with the regular hpx meeting time but not before. Otherwise, I'll be certainly available from 9-11am cdt
FunMiles has joined #ste||ar
<FunMiles> It seems that
<FunMiles> `shared_future<int> f{ promise.get_shared_future() };` does not behave the same as `shared_future<int> f(promise.get_future())};`
<FunMiles> The latter works as expected while the former gives a warning message that the promise was in a state where the future was not retrieved.
<FunMiles> Am I missing something about `get_shared_future` vs `get_future` ?
<FunMiles> Actually, it is not a warning, it is an error.
<FunMiles> future has not been retrieved from this promise yet: HPX(invalid_status)
<K-ballo> there's a `get_shared_future`? that's suspicious..
<FunMiles> There is one :)
<FunMiles> I would have thought it'd be equivalent but maybe was there to save an allocation or to make it visibly explicit in the code that one intends to have several dependent tasks.
<K-ballo> it cannot be saving an allocation, and it's not the responsibility of the promise to handle sharing of the result.. adding a get_shared_future weakens promise interface
<K-ballo> as for the actual warning you mention, I tried some possible related phrases but get no hits
<FunMiles> promise_base.hpp line 246
<FunMiles> It's a throw
<FunMiles> And it could save an allocation in the same way that make_shared saves an allocation over calling the constructor of a shared pointer with an actual pointer as argument (the control block is allocated next to the object instead of in a separate allocation)
<FunMiles> I should have said the control block is located next to...
<K-ballo> the "control block" is allocated when the promise is created, not when the future is retrieved
<K-ballo> the shared state is the "control block" plus the actual result itself, and they are allocated in one go, next to each other
<FunMiles> I m in distributed.
<FunMiles> I was talking about shared_ptr control block. I was only speculating that shared_future has similar issues of reference counting than shared_ptr.
<K-ballo> it does
<K-ballo> the shared state does, not the shared future
<K-ballo> the shared state is shared between a provider (like promise) and a result object (like future and shared_future)
<K-ballo> shared_future can then further share it at no extra count (other than the corresponding refcounts)
<FunMiles> So where is the refcounts kept? It is the same refcount that also holds the promise and is always in the same block then? If so, there is no allocation saving indeed. Unlike make_shared.
<K-ballo> the shared state has a control block (including ref-count), and a result (value or exception)
<K-ballo> the shared state is constructed when the promise is created
<K-ballo> it looks as if the distributed promise hadn't been updated accordingly when get_shared_future was added
<FunMiles> OK. So it's a bug.
<K-ballo> looks like it
<FunMiles> I can report it in an issue.
<K-ballo> the difference between promise.get_shared_future() and shared_future(promise.get_future()) is that you can call the former multiple times
<K-ballo> the promise gets burdened with the sharing responsibility of the result object
<FunMiles> Makes sense.
<K-ballo> which presumably means promise is doing extra synchronizing too, guaranteeing those multiple calls to be concurrent, as the rest of its inteface guarantees
<K-ballo> although looking at the code, it doesn't look that way, it's setting a plain bool
<K-ballo> i'd recommend you handle your own result object sharing yourself, which you need to do anyways to workaround the bug
<FunMiles> That's what I've been doing.
<hkaiser> FunMiles: the get_shared_future is supposed to be an internal interface, don't use that
<hkaiser> I thought that it was private, so it's a bug
<FunMiles> @hkaiser OK. Good to know.
<FunMiles> How does run_guarded works? Does it keep a task in a staged state if its guard is busy? What mechanism deals with multiple tasks with the same guard? Is that still lightweight in terms of costs? To give a concrete example if I have an object X that is a combination from several data D_i that are all asynchronously obtained, is creating a guard shared by all the updates X <- f(X, D_i) a good way to approach the issue?
FunMiles has quit [Quit: Textual IRC Client: www.textualapp.com]
FunMiles has joined #ste||ar
<hkaiser> FunMiles: run_guarded is a weird one, I have never really understood what's it doing
<hkaiser> I can get you in contact with the person inventing it
<hkaiser> wouldn't dataflow be what you need: dataflow(f, X, D_i)?
<hkaiser> f would be invoked once all X and D_i hav ebecome ready
<FunMiles> f is pretty expensive, and it should be started any time any of the D_i is ready. But there's an additional twist which is that the set of D_i is dynamic.....
<hkaiser> when_any(D_i).then(f), perhaps?
<FunMiles> Though the pattern is know beforehand, so at least I can create promises, and rely on the future, at the cost of keeping track of which promise to resolve.
<hkaiser> for the dynamic part - all facilities (when_all, dataflow, etc.) accept vectors of futures
<FunMiles> when_any(D_i) works using promises that get resolved as I describe, but if I create the D_i on the fly, I could have when_any(vec_of_futures) with vec_of_futures = { D_1, D_2} and then I would need to add D_3, but the when_any has already been queued.
<hkaiser> hmmm, not sure what you're trying to achieve
<hkaiser> you want to run f as soon as any of the currently available D_i is available
<hkaiser> what then
<FunMiles> Well, it's a tree problem. When X_k is done, it becomes a D_k for X_k+1
<FunMiles> Yes the operations all commute.
<FunMiles> So I want to run as soon as any of the D_i is available.
<FunMiles> When all the operations have been done, X becomes an input for another X....
<hkaiser> ok
<FunMiles> So the pattern of creating promises for all the X_i and then using a shared_promises for all the dependencies in when_any can address my issue as long as I requeue the when_any with the vector from which the resolved future is removed.
<hkaiser> when_any returns a future itself and an index of which future made it ready
<hkaiser> so you can requeue the rest
<FunMiles> Yes. That is quite doable. And I can remove the completed future from the vector in O(1)
<FunMiles> auto tmp = futures.back(); futures.pop_back(); futures[resolved] = std::move(tmp);
<FunMiles> Thanks for the input.
<FunMiles> Slightly more efficient: std::swap(futures[resolved], futures.back()); futures.pop_back();
zao has quit [Read error: Connection reset by peer]
zao has joined #ste||ar
FunMiles has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
diehlpk_work_ has joined #ste||ar
diehlpk_work has quit [Ping timeout: 250 seconds]