#ste||ar on 2023-03-23 — irc logs at irclog.cct.lsu.edu

2021-08-06 22:55 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu

00:11 diehlpk_work has quit [Remote host closed the connection]

00:24 Yorlik__ has quit [Ping timeout: 265 seconds]

00:26 Yorlik has joined #ste||ar

00:32 Yorlik has quit [Ping timeout: 265 seconds]

00:33 Yorlik has joined #ste||ar

00:37 <dkaratza[m]> hkaiser: I just sent you a first version of the GSoD proposal. If you have time to review it before our meeting, then we can discuss it tomorrow

00:42 <hkaiser> dkaratza[m]: thanks! I'll have a look!

01:00 Yorlik has quit [Ping timeout: 250 seconds]

01:09 Yorlik has joined #ste||ar

01:57 Yorlik has quit [Ping timeout: 240 seconds]

01:58 Yorlik has joined #ste||ar

02:19 hkaiser has quit [Quit: Bye!]

02:32 Yorlik_ has joined #ste||ar

02:36 Yorlik has quit [Ping timeout: 265 seconds]

04:33 prakhar has joined #ste||ar

05:31 prakhar has quit [Quit: Client closed]

07:22 K-ballo1 has joined #ste||ar

07:23 K-ballo has quit [Ping timeout: 240 seconds]

07:23 K-ballo1 is now known as K-ballo

11:23 <mdiers[m]> I have a problem probably with the parcelport_mpi. It does not occur with parcelport_tcp. When doing distributed calculations with e.g. four nodes, I do not always reliably get the state is_ready from the future, although the call is processed there. However, this occurs very rarely and not reproducibly, on average every 30 minutes.

11:23 <mdiers[m]> Are there special hpx logs for the area that could be output? Or other environment variables from mpi (OMPI_MCA_???_verbose=100)?

11:39 hkaiser has joined #ste||ar

13:11 prakhar has joined #ste||ar

13:26 HHN93 has joined #ste||ar

13:26 <HHN93> hkaiser how do I share performance analysis graphs?

13:27 <hkaiser> one way would be an email exchange, if you have concise graphs, you could add them to the github PR (if related)

13:28 <hkaiser> another way is a wiki page somewhere, a blog post is possible as well

13:28 <HHN93> another way is a wiki page somewhere, a blog post is possible as well

13:28 <HHN93> this sounds great

13:28 <HHN93> for now I will try to add it to github

13:28 <hkaiser> ok

14:00 HHN93 has quit [Ping timeout: 260 seconds]

14:13 prakhar has quit [Quit: Client closed]

14:36 HHN93 has joined #ste||ar

14:36 <HHN93> hkaiser can you please review #6199, #6200

14:38 <hkaiser> HHN93: will do

14:40 <HHN93> the performance benefits seem to be minimal for par vs par_unseq. Any advice on how I can look into improving it?

14:50 HHN93 has quit [Quit: Client closed]

14:55 HHN93 has joined #ste||ar

15:17 HHN93 has quit [Ping timeout: 260 seconds]

15:23 HHN93 has joined #ste||ar

15:23 <hkaiser> HHN93: adding the #pragmas instructs the compiler to try figuring out what can be vectorized

15:24 <HHN93> so if the compiler misinterprets it, performance benefits would be minimal?

15:24 <hkaiser> so if the code inside the loop is complex or involves function invocations the compiler can't see through, the effect will be minimal as no or only minimal vectorization is happening

15:24 <HHN93> oh ok

15:25 <hkaiser> HHN93: so I'd start looking at the generated assembly to see what gets vectorized and what not

15:27 <HHN93> is this the reason why we wanted to add hpx support to god bolt?

15:42 <hkaiser> HHN93: sure, that would make it much easier to experiment with hpx, wouldn't it?

15:42 <hkaiser> HHN93: the assembly could be looked at outside of godbolt as well

15:43 <HHN93> YES, but until then do we have to take snippet of code and edit it accordingly before using godbolt?

15:43 <HHN93> HHN93: the assembly could be looked at outside of godbolt as well

15:43 <HHN93> hmmm, are you suggesting a decompiler?

15:43 <hkaiser> HHN93: just add the correct command line option to your compiler invocation and it will generate the assembly for you

16:29 K-ballo has quit [Ping timeout: 240 seconds]

16:32 K-ballo has joined #ste||ar

17:08 diehlpk_work has joined #ste||ar

17:35 HHN93 has quit [Quit: Client closed]

17:45 HHN93 has joined #ste||ar

18:00 HHN93 has quit [Quit: Client closed]

18:01 tufei has quit [Remote host closed the connection]

18:01 tufei has joined #ste||ar

18:19 HHN93 has joined #ste||ar

18:27 <dkaratza[m]> hkaiser: I just submitted the application (google form) for the GSoD, I did a PR to list our project at the organizations list (https://github.com/google/season-of-docs/pull/1029) and have also created the two pages in wiki (https://github.com/STEllAR-GROUP/hpx/wiki/GSoD-2023-Project-Proposal and https://github.com/STEllAR-GROUP/hpx/wiki/GSoD-2023-Project-Ideas).

18:28 <HHN93> hkaiser https://gist.github.com/Johan511/a6dce6693abe92f1474df8bcf02e468d gives me very similar assembly with/without the pragmas, any hint on what might be the mistake? or what should I expect to be the difference in assembly code?

18:29 <hkaiser> dkaratza[m]: marvelous! many thanks!

18:29 <hkaiser> HHN93: as expected - so the compiler is not able to do any vectorization

18:30 <HHN93> I am accessing 2 vectors, each of 1B elements, vectorization seems like a good idea

18:30 <hkaiser> HHN93: I don't know if you can convince your compiler to tell you where and why it is (or isn't) vectorize

18:30 <HHN93> and this is the most simple case of vectorization I can think of

18:30 <hkaiser> I know Intels C++ compiler can do that

18:31 <hkaiser> HHN93: look at stdlibc/lic++ source code what they do to get some ideas

18:31 <HHN93> ok will try it out with a different compiler in that case

18:31 <HHN93> HHN93: look at stdlibc/lic++ source code what they do to get some ideas

18:31 <HHN93> what am I looking for?

18:32 <hkaiser> the unseq implementations

18:33 <HHN93> stdlib/libc++ of the compiler's repo?

18:41 <hkaiser> yes

18:41 <hkaiser> stdlibc is gcc's C++ library, libc++ is clang's

18:47 K-ballo1 has joined #ste||ar

18:47 <HHN93> found stdlib.h in /usr/include but what exactly am I looking for ?

18:48 K-ballo has quit [Ping timeout: 248 seconds]

18:48 K-ballo1 is now known as K-ballo

18:50 <zao> HHN93: You want to look at `libstdc++` for the GCC implementation of the C++ standard library and `libc++` for Clang's dito.

18:52 <zao> `/usr/include/stdlib.h` is a C header from (probably the GNU) libc.

18:52 <zao> Quite unrelated :D

18:52 <HHN93> yes, I felt similarly

18:52 <HHN93> not sure where I can find the src code for libstdc++

18:53 <HHN93> tried cloning clang repo and am going through it

18:54 <zao> The API docs has <https://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/files.html>, otherwise you could look at a mirror like <https://github.com/gcc-mirror/gcc/tree/master/libstdc%2B%2B-v3>.

18:54 <zao> libc++ similarly has it in some LLVM repo somewhere, probably a monorepo knowing google :D

18:55 <zao> Or on a machine where you've got an implementation, stepping into your headers and looking around.

18:57 <HHN93> llvm-project has a seperate directory name openmp, isn't that what I should be looking at?

18:57 <zao> The header part for libstdc++ is typically down in `/usr/include/c++/11` or so.

18:58 <zao> I have no idea what your project is nor what you're supposed to investigate, but if it's about par/unseq stuff I would expect it would be more about the parallel algorithms part?

18:58 <zao> std::async? std::transform? Not sure what you're doing right now.

18:58 <HHN93> yes, I am trying to implement par_unseq for HPX algorithms

18:59 <HHN93> there were no performance gains when using vectorization, hkaiser suggested that it might be because the compiler does not understand how to vectorize

19:00 <HHN93> so I am going through assembly code of simple cases of #pragma omp simd to see how the assembly generated is differetn

19:03 <Aarya[m]> Hi, can anyone explain the difference between what is done here "https://github.com/STEllAR-GROUP/hpxMP" and what needs to be done for the "hpxMP: HPX threading system for LLVM OpenMP" project rtohid

19:03 <hkaiser> Aarya[m]: hpxMP is a new implementation of OpenMP independent of LLVM

19:04 <hkaiser> what we would like to do is to 'port' the LLVM openmp runtime implementation to HPX

19:04 <hkaiser> Aarya[m]: rtohid[m] will have all the details

19:05 <zao> Is there anywhere we _can't_ sneak in HPX? :P

19:06 <zao> HHN93: One trick I have when I don't quite know where something is in library code is that I write a small program that uses it and step through the execution into the library code or Go To Definition in an IDE.

19:11 <HHN93> I am trying to see how using pragma omp simd changes the executable generated, pragmas are just pre processor directives, right? how can I step into it?

19:11 <hkaiser> zao: that's the point - taking over the world one project at a time ;-)

19:11 HHN93 has quit [Quit: Client closed]

19:12 HHN93 has joined #ste||ar

19:13 <Aarya[m]> <hkaiser> "Aarya: hpxMP is a new implementa..." <- Is this implementation performing worse than the LLVM openMP implementation?

19:28 <hkaiser> Aarya[m]: much worse

19:28 <hkaiser> it also doesn't support many of the omp #pragmas properly

19:28 <hkaiser> moving to the LLVM runtime would relieve us from ever having to worry about new omp #pragmas they might introduse as the runtime would tak care of that

19:30 <Aarya[m]> So we just need to make all pthread alternatives available. And llvm will handle everything else

19:32 <hkaiser> exactly

19:52 <gonidelis[m]> <zao> "HHN93: One trick I have when I..." <- damn.... that's smart

21:37 HHN93 has quit [Quit: Client closed]

21:38 diehlpk_work has quit [Ping timeout: 252 seconds]

23:55 K-ballo has quit [Ping timeout: 240 seconds]

23:56 K-ballo has joined #ste||ar