#ste||ar on 2023-03-14 — irc logs at irclog.cct.lsu.edu

2021-08-06 22:55 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu

02:37 Yorlik_ has joined #ste||ar

02:40 Yorlik has quit [Ping timeout: 260 seconds]

02:57 hkaiser has quit [Quit: Bye!]

04:49 K-ballo1 has joined #ste||ar

04:50 K-ballo has quit [Ping timeout: 276 seconds]

04:50 K-ballo1 is now known as K-ballo

07:30 HHN93 has joined #ste||ar

07:35 <HHN93> does the c++ std suggest against execution policies for inner product?

07:36 <HHN93> haven't found an overload which accepts execution policy for std and hpx versions

09:07 Yorlik_ is now known as Yorlik

09:40 HHN93 has quit [Quit: Client closed]

10:00 HHN93 has joined #ste||ar

11:23 <HHN93> I have tried running copy and move algorithms on my machine for 1M and 1000M elements with seq, par execution policies and got the same execution time.

11:24 <HHN93> 16 core machine, other algorithms like for_each are being parallelised.

11:25 <HHN93> are the copy and move algorithms not parallelised?

11:29 HHN93 has quit [Quit: Client closed]

11:54 <pansysk75[m]> HHN93: You are refering to the algorithms under the hpx namespace (for example hpx::copy), correct? Those should run in parallel when passed a par execution policy, afaik

11:57 hkaiser has joined #ste||ar

12:09 <pansysk75[m]> In case you are refering to the std:: namespace, that will depend on the compiler, but implementations of parallel implementations are generally lacking (passing std::execution::par and such will still compile, but it will nevertheless still run sequentially)

12:09 <pansysk75[m]> s/implementations/algorithms/

12:48 K-ballo1 has joined #ste||ar

12:49 K-ballo has quit [Ping timeout: 276 seconds]

12:49 K-ballo1 is now known as K-ballo

14:26 diehlpk_work has joined #ste||ar

16:48 HHN93 has joined #ste||ar

16:54 <HHN93> is hpx::copy parallelised by calling futures on chunks?

16:54 <hkaiser> HHN93: yes

16:54 <HHN93> ok

16:55 <HHN93> how do we plan to implement unseq on such algorithms

16:55 <hkaiser> HHN93: btw, the parallel version of inner_product is transform_reduce

16:55 <HHN93> such => those which use futures to be parallelised

16:55 <hkaiser> (similarly to reduce being the parallel version of accumulate)

16:56 <hkaiser> HHN93: using unseq(task) should do the trick

16:56 <HHN93> ok sounds simple, will try looking into it

16:56 <hkaiser> HHN93: well, sorry I might have misunderstood your question

16:56 <hkaiser> par_unseq does uses tasks to run the loop functions on chunks

16:58 <HHN93> oh ok, loop body in the future call is going to be vectorized by pragmas?

16:58 <HHN93> cool, will look into it

16:58 <HHN93> thank you

17:04 <HHN93> hpx;:generate is parallelised by calling std generate on chunks

17:04 <HHN93> is that the best way of doing it?

17:04 <HHN93> I am not sure how unseq can be implemented if we are calling std::generate

17:06 <HHN93> ok wait my bad, I think I confused something. it does not call std generate

17:07 HHN93 has quit [Quit: Client closed]

17:09 HHN93 has joined #ste||ar

17:17 <hkaiser> HHN93: we'd have to reimplement sequential generate, I think it was already done for the simd policy

17:17 <HHN93> ok will have a look at it

17:19 <hkaiser> HHN93: yep, it's here: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/core/algorithms/include/hpx/parallel/algorithms/detail/generate.hpp, and it's being used here: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/core/algorithms/include/hpx/parallel/algorithms/generate.hpp#L253

17:21 <hkaiser> HHN93: not sure where you have seen uthe use of std::generate

17:22 <HHN93> it might've been generate_n (probably messed up while noting it down) https://github.com/STEllAR-GROUP/hpx/blob/master/libs/core/algorithms/include/hpx/parallel/algorithms/detail/generate.hpp#L58

17:22 <hkaiser> ok, that needs implementing by dispatching to the appropriate loop function

17:25 <srinivasyadav18[> yes, we could use util::loop_ind, the same way we used it here : https://github.com/STEllAR-GROUP/hpx/blob/master/libs/core/algorithms/include/hpx/parallel/algorithms/detail/generate.hpp#L24, which takes advantage of loop unrolling

17:48 <HHN93> can someone help me understand how the hpx copy algorithm works at this point https://github.com/STEllAR-GROUP/hpx/blob/master/libs/core/algorithms/include/hpx/parallel/algorithms/copy.hpp#L420

17:49 <HHN93> I am guessing we are trying to generate futures for each partition

17:49 <hkaiser> HHN93: it uses foreach_partitioner

17:49 <hkaiser> foreach_partitioner divides the input sequence into chunks and runs the lambda on each of those

17:49 <HHN93> so it generates a future to execute each partition, right?

17:49 <HHN93> ok got it

17:50 <HHN93> the get_in_out_result is responsible for synchronisation of the futures?

17:50 <hkaiser> whether it generates a future for each chunk (partition) or not depends on the used executor

17:51 <hkaiser> the default executor assiciated with par generates one HPX task per used core and synchronizes everything using a singe future

17:57 <HHN93> is this (https://github.com/STEllAR-GROUP/hpx/blob/master/libs/core/algorithms/include/hpx/parallel/util/foreach_partitioner.hpp#L196) where the foreach_partitioner is supposed to break ?

18:00 <hkaiser> HHN93: no, it should stop here: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/core/algorithms/include/hpx/parallel/util/foreach_partitioner.hpp#L40 (if used with par, par_simd, or par_unseq)

18:03 <HHN93> aren't we calling call method of the foreach_partitioner ? https://github.com/STEllAR-GROUP/hpx/blob/master/libs/core/algorithms/include/hpx/parallel/algorithms/copy.hpp#L421

18:04 <hkaiser> HHN93: sorry, it's here: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/core/algorithms/include/hpx/parallel/util/foreach_partitioner.hpp#L99

18:06 <HHN93> ok, what's the difference between the 2 call functions at (https://github.com/STEllAR-GROUP/hpx/blob/master/libs/core/algorithms/include/hpx/parallel/util/foreach_partitioner.hpp#L99) and (https://github.com/STEllAR-GROUP/hpx/blob/master/libs/core/algorithms/include/hpx/parallel/util/foreach_partitioner.hpp#L196) is it that one of them returns a

18:06 <HHN93> future and the other one doesn't depending on the executor

18:18 HHN93 has quit [Quit: Client closed]

19:17 K-ballo1 has joined #ste||ar

19:18 K-ballo has quit [Ping timeout: 246 seconds]

19:18 K-ballo1 is now known as K-ballo

19:22 tufei has joined #ste||ar

19:45 <Isidoros[m]> Hello, I have a question about relocation semantics:

19:45 <Isidoros[m]> When an object that manages some heap memory (e.g. `unique_pointer` or `vector`) is relocated, a bitwise copy of it is created with `memcpy` or `memmove`, and we end up with two objects that manage the same memory buffer. When the original (e.g.) `unique_pointer` leaves the scope will it not call free() on the pointer that both vectors hold?

19:45 <Isidoros[m]> Looking at facebook folly's fbvector I cannot see how this is handled in case T is a `unique_pointer`. (Note that a `unique_pointer` is a "trivially relocatable" type)

19:45 <Isidoros[m]> Here is a snippet from facebook's "folly" implementation of reallocation for fbvector: https://pastebin.com/iQFR040j

19:52 <Isidoros[m]> s///, s/vectors/`unique_pointer`s/, s///

20:03 <hkaiser> Isidoros[m]: no, that's not what happens

20:04 sivoais has quit [Ping timeout: 256 seconds]

20:04 <hkaiser> unique_ptr and vector behave differently when being assigned/moved

20:05 <hkaiser> assigning a vector to another one will copy the data it holds, moving a vector will hand over the buffer to the vector it is assigned to and sets its own internal pointer to nullptr

20:05 <hkaiser> assigning unique_ptr is not allowed, moving it will hand over the internal pointer to the object it is assigned to

20:06 <Isidoros[m]> I was referring to relocation, which as far as I understand does not alter the source object in any way.

20:07 <hkaiser> I don't know what 'relocation' is in terms of C++

20:07 <hkaiser> you can either copy or move an object

20:08 <Isidoros[m]> I am referring to this: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2786r0.pdf

20:08 <Isidoros[m]> and: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-(GSoC)-2023#implement-the-relocate-algorithms

20:09 <hkaiser> ahh

20:09 <hkaiser> makes sense now

20:11 <hkaiser> relocation is equivalent to a bitcopy assuming the source object's destructor will not be called

20:13 <hkaiser> i.e. it's sematically equivalent to moving and immediate calling of the source's destructor

20:14 <Isidoros[m]> I see, so how can we stop the destructor from being called?

20:15 <hkaiser> delete [] (char*)ptr; instead of delete[] ptr; ?

20:16 <Isidoros[m]> facebook's library doesn't even bother with the deallocation

20:16 <hkaiser> right

20:19 <Isidoros[m]> but it should, right?

20:54 <Aarya[m]> Hi so I started writing the proposal for the "hpxMP: HPX threading system for LLVM OpenMP". And I had a question that is the project only consisting of adding hpxmp calls(and other symbols) from hpxc in llvm/openmp for all pthread calls.

20:58 <Aarya[m]> Or are there some pthread symbols not implemented in hpxc

21:04 sivoais has joined #ste||ar

21:13 <hkaiser> Aarya[m]: most likely not all of the APIs needed are implemented in hpxc yet

21:15 <Aarya[m]> Ah okay

21:17 <hkaiser> Aarya[m]: for instance threa attributes are not in place, iirc

21:19 <Aarya[m]> So the hpx thread calls are implemented using hpx library

21:19 <Aarya[m]> * calls are to be implemented using

21:49 <hkaiser> Aarya[m]: yes

22:04 diehlpk_work has quit [Ping timeout: 248 seconds]

23:11 <sarkar_t[m]> Hi hkaiser gonidelis ! I am Tanmay Sarkar, a 2022 graduate of the Electrical Engineering Department of IIT Roorkee and currently working as a backend developer for around 8 months now. Since the final year of my bachelors degree I really wanted to get involved with the HPX project but couldn't do so because of being engaged in other projects.

23:11 <sarkar_t[m]> But with those things out of the way I want to start by contributing to the GSoC project "(Re-)Implement executor API on top of sender/receiver infrastructure"

23:17 <sarkar_t[m]> So, I have a few initial questions about this project. https://github.com/STEllAR-GROUP/hpx/pull/5758 this PR is linked to the project description.

23:17 <sarkar_t[m]> So, this PR is about adding `completion_signature`. Is `completion_signature` similar to `completion_handler` which is kind of what receivers are in S/R proposal (saying from my understanding of the S/R architecture)

23:18 <sarkar_t[m]> * So, I have a few initial questions about this project. https://github.com/STEllAR-GROUP/hpx/pull/5758 this PR is linked to the project description.

23:18 <sarkar_t[m]> So, this PR is about adding `completion_signature`. Is `completion_signature` similar to `completion_handler` which is kind of what receivers are in S/R proposal (saying from what I understand about the S/R architecture so-far)

23:20 <hkaiser> sarkar_t[m]: welcome

23:21 <hkaiser> completion_signatures are a means for receivers to figure out what types connected senders will provide

23:22 <hkaiser> also, for the project you mentioned, there has been done some work in the mean time, but I think we can extend it to using s/r for implementing algorithms

23:23 <sarkar_t[m]> hkaiser: If this is the case, does "adding facilities to support completion_signatures" mean adding support for methods like `set_value`, `set_error` and `set_done` which are basically the functions that the operation state calls to notify the receiver?

23:23 <hkaiser> these three functions are being invoked by senders on their connected receiver

23:24 <hkaiser> completion signatures are exposed by a sender encoding what types it will send through set_value/set_error, and whether it exposes set_stopped

23:24 <sarkar_t[m]> hkaiser: Is the work that has been done, related to https://github.com/STEllAR-GROUP/hpx/pull/5758 this PR, or there is more work as well?

23:27 <hkaiser> sarkar_t[m]: initial work was done here: https://github.com/STEllAR-GROUP/hpx/commit/2848a4c149f3f6db82f161be612316a49b688d8a

23:27 <hkaiser> but what I meant would involve changing the existing algorithms to support s/r based executors

23:28 <sarkar_t[m]> hkaiser: Can you please explain a bit what you mean by "what types it will send" or maybe link something related to this that I can refer to for understanding the implication of the quoted part of your statement?

23:28 <hkaiser> like done in the same commit for for_each, for instance:

23:28 <hkaiser> set_value(...) takes some arguments representing the result of a senders operation

23:29 <hkaiser> those arguments can be of arbitrary type, the completion signatures of a sender expose those types

23:30 <sarkar_t[m]> hkaiser: Okay

23:31 <hkaiser> see here: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2300r6.html#spec-execution.snd_rec_utils.completion_sigs

23:33 <sarkar_t[m]> hkaiser: Thank you for this. I will look into it, understand this, and get back with further queries.

23:34 <hkaiser> sarkar_t[m]: the whole of this document is the main source of information about s/r

23:35 <sarkar_t[m]> Yes, I did look at it from the top. Now will look into it in detail

23:41 <sarkar_t[m]> <hkaiser> "but what I meant would involve..." <- Okay, so basically the underlying support for S/R architecture is there in HPX, and in this project I need to do changes in the existing parallel algorithms in HPX so that they make use of the S/R architecture, am I right in saying this hkaiser ?

23:43 <hkaiser> yes