hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
<hkaiser> weilewei: yah, it's a nice way to learn how people code
<weilewei> hkaiser sure, I plan to watch more like this, and I also start reading the book
<diehlpk_work> hkaiser, simbergm Should we add a license to https://github.com/STEllAR-GROUP/hpx-docs
<hkaiser> diehlpk_work: we probably should, this repo is automatically populated, however, so we might need to do some scripting
akheir has quit [Read error: Connection reset by peer]
akheir has joined #ste||ar
<hkaiser> diehlpk_work: done, let's see if it gets overwritten
hkaiser has quit [Quit: bye]
weilewei has quit [Remote host closed the connection]
bita has joined #ste||ar
bita has quit [Quit: Leaving]
nan1 has quit [Remote host closed the connection]
akheir has quit [Quit: Leaving]
mdiers_ has quit [Remote host closed the connection]
mdiers_ has joined #ste||ar
<wash[m]> Sorry folks, what did you need my consent for?
<wash[m]> Ah for the JOSS submission? That's fine :)
<simbergm> hkaiser: thanks for adding the license
<simbergm> I think it should stay there iirc what the scripts do
<heller1> so I just noticed one thing ...
<heller1> ... and I think hpx::init needs to go. Here is the reason: Each test that requirese HPX threads to run, needs to go through hpx_init ... creating nice cycles...
<simbergm> heller: if anything the tests need to be rewritten not to use hpx::init
<simbergm> but it's only tests, the modules otherwise don't have that dependency
<simbergm> and kicking out hpx::init just for the sake that feels user-hostile
<simbergm> but there might be good solutions
<simbergm> do we have a way of making a future<void> from a future<tuple<future<void>, future<void>>> (returned from e.g. when_all) without spawning a task?
<simbergm> unwrap will actually wait for the future, but I'd like to just collapse it into a future<void>
<heller1> future<future<T>> -> works
<heller1> * future<future<T>> -> future<void> work
<heller1> but we don't have a way to further inspect that
<heller1> ms[m]: there's split_future, if that helps
<simbergm> yeah, I guess split future would actually do the right thing, even if it's semantically a bit iffy
<simbergm> thanks!
nikunj has quit [Remote host closed the connection]
nikunj has joined #ste||ar
hkaiser has joined #ste||ar
<hkaiser> simbergm: master is broken since the latest merges
<simbergm> hkaiser: right you are
<simbergm> sorry, entirely my fault
<simbergm> I'll fix it
<hkaiser> thanks a lot
<hkaiser> simbergm: thanks for your thorough review of #4540
hkaiser_ has joined #ste||ar
hkaiser has quit [Ping timeout: 260 seconds]
rtohid has joined #ste||ar
weilewei has joined #ste||ar
<simbergm> hkaiser: I think I caught all of them with https://github.com/STEllAR-GROUP/hpx/pull/4545
<simbergm> we'll wait and see...
<hkaiser_> simbergm: thanks!
<diehlpk_work> hkaiser_, I went through all ste||ar group repos and added a ticket were a license is missing
<hkaiser_> diehlpk_work: thanks
nan11 has joined #ste||ar
gonidelis has joined #ste||ar
akheir has joined #ste||ar
<hkaiser_> diehlpk_work: do we have a meeting now?
<diehlpk_work> Yes, we are already in
<diehlpk_work> hkaiser_, I sent the Zoom link to the operation bell list
karame_ has joined #ste||ar
rtohid has left #ste||ar [#ste||ar]
rtohid has joined #ste||ar
akheir has quit [Read error: Connection reset by peer]
akheir1 has joined #ste||ar
mcopik has joined #ste||ar
mcopik has quit [Client Quit]
bita has joined #ste||ar
bita_ has joined #ste||ar
bita_ has quit [Quit: Leaving]
gonidelis has quit [Ping timeout: 240 seconds]
rtohid has left #ste||ar [#ste||ar]
nan11 has quit [Remote host closed the connection]
nan11 has joined #ste||ar
weilewei has quit [Remote host closed the connection]
weilewei has joined #ste||ar
<weilewei> hkaiser_ how should I insert timer of communication phase correctly for ringG algorithm? for the computation part, I can insert start and end timer around line 73. but for the communication phase, it is an async operation and more importantly, it is a loop (depending how many ranks), also I do not want to count memorycpy phase as well
<hkaiser_> you can only measure the overall time reliably, I think
<weilewei> I see
<hkaiser_> or each timestep in the loop
<weilewei> I see, is it a wise choice to time each function inside the loop and also each step? And then do communication_time = total_time_per_step - compute_time - copy_time
Amy1 has quit [Ping timeout: 256 seconds]
Amy1 has joined #ste||ar
karame_ has quit [Quit: Ping timeout (120 seconds)]
<hkaiser_> weilewei: try it - you can't really measure comunication time as it's overlapped
<weilewei> hkaiser_ ah, I see now, even when receiving data, the program is doing copy and update...
<hkaiser_> weilewei: the most you can do is to measure how long it sits in mpi_wait
<hkaiser_> to allow to assess how muc cu time is wasted ;-)
<hkaiser_> *cpu time*
Rory89 has joined #ste||ar
<weilewei> hkaiser_ hmmm true... but still, if I meaure mpi_wait, that does not reflect the complete picture of communication time. maybe just measure the whole for-loop
<hkaiser_> right
<bita> hkaiser_, Rory89 and I was talking about parallel inverse. What kind of algorithm should be worked on? I was telling Rory that having a for loop parallelized with constraint is not what we do in Phylanx
<hkaiser_> Rory89 and Avah have discussed what algorithm to use, no?
<bita> I think Avah has OpenMP thing in her mind
<hkaiser_> sure, that's what we could start with, no?
<Rory89> Yeah, it's just Gauss Inverse with different localities owning different columns
<hkaiser_> in the first step our implementation will suck perf-wise anyways ;-)
<bita> I am not sure how it can be implemented. Rory can you explain more about its detail?
<hkaiser_> Rory89: several columns per locality?
<Rory89> Perhaps more than one column per locality. If you have an nxn matrix, it just splits those n columns up evenly, or approximately so, across the localities
<hkaiser_> nod
<hkaiser_> makes sense
<hkaiser_> so you need to do different operatiions: a) find pivot, b) find coefficients, and c) apply coefficients
<hkaiser_> is there more?
<bita> I was telling Rory that he needs to make a distributed matrix (for the one that starts as the identity)
<Rory89> The problem I was having was how to handle the result matrix.
<hkaiser_> yes
<bita> and he had questions about where we have annotations
<hkaiser_> ok, what's the problem?
<hkaiser_> the inverse returns a new (tiled) matrix with a corresponding annotation attached
<bita> Rory, can you implement the 3 functions that hkaiser_ mentioned?
<hkaiser_> it's very similar to what we have done in other places
<bita> In other places we didn't have iterations
<Rory89> The user sends off a matrix A to be inverted, all of the localities need access to another matrix, call it B. So thats the only problem, creating a new matrix in the code that isnt sent in the test
<Rory89> that all of the localities can read and write to their respective locations.
<bita> in guass inverse by Rory everything is happening in a for loop
<hkaiser_> Rory89: same is done in dot product
<hkaiser_> it creates a new matrix and fills it with the result of the operation
<hkaiser_> the only difference is that you attach the distributed matrix to the result and not to the input
<hkaiser_> or possibly to both, not sure
<hkaiser_> you start off with an identity matrix, right?
<Rory89> Ah, so B in this case is essentially like "result_matrix" in dist_dot?
<Rory89> Yep, exactly
<hkaiser_> yah
<hkaiser_> we can even call inverse with both matrices, the one to invert and the identity matrix generated by nan's identity_d()
<hkaiser_> inverse_d(A, __arg(B, identity_d(shape(A))) or somesuch
<hkaiser_> if that helps, that is
<hkaiser_> so you don't have to duplicate Nan's code
<Rory89> Yep that makes sense, thanks!
nan11 has quit [Remote host closed the connection]
bita has quit [Quit: Leaving]
Rory89 has quit [Remote host closed the connection]