hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
diehlpk_work has quit [Remote host closed the connection]
Yorlik__ has quit [Ping timeout: 265 seconds]
Yorlik has joined #ste||ar
Yorlik has quit [Ping timeout: 265 seconds]
Yorlik has joined #ste||ar
<dkaratza[m]> hkaiser: I just sent you a first version of the GSoD proposal. If you have time to review it before our meeting, then we can discuss it tomorrow
<hkaiser> dkaratza[m]: thanks! I'll have a look!
Yorlik has quit [Ping timeout: 250 seconds]
Yorlik has joined #ste||ar
Yorlik has quit [Ping timeout: 240 seconds]
Yorlik has joined #ste||ar
hkaiser has quit [Quit: Bye!]
Yorlik_ has joined #ste||ar
Yorlik has quit [Ping timeout: 265 seconds]
prakhar has joined #ste||ar
prakhar has quit [Quit: Client closed]
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 240 seconds]
K-ballo1 is now known as K-ballo
<mdiers[m]> I have a problem probably with the parcelport_mpi. It does not occur with parcelport_tcp. When doing distributed calculations with e.g. four nodes, I do not always reliably get the state is_ready from the future, although the call is processed there. However, this occurs very rarely and not reproducibly, on average every 30 minutes.
<mdiers[m]> Are there special hpx logs for the area that could be output? Or other environment variables from mpi (OMPI_MCA_???_verbose=100)?
hkaiser has joined #ste||ar
prakhar has joined #ste||ar
HHN93 has joined #ste||ar
<HHN93> hkaiser how do I share performance analysis graphs?
<hkaiser> one way would be an email exchange, if you have concise graphs, you could add them to the github PR (if related)
<hkaiser> another way is a wiki page somewhere, a blog post is possible as well
<HHN93> another way is a wiki page somewhere, a blog post is possible as well
<HHN93> this sounds great
<HHN93> for now I will  try to add it to github
<hkaiser> ok
HHN93 has quit [Ping timeout: 260 seconds]
prakhar has quit [Quit: Client closed]
HHN93 has joined #ste||ar
<HHN93> hkaiser can you please review #6199, #6200
<hkaiser> HHN93: will do
<HHN93> the performance benefits seem to be minimal for par vs par_unseq. Any advice on how I can look into improving it?
HHN93 has quit [Quit: Client closed]
HHN93 has joined #ste||ar
HHN93 has quit [Ping timeout: 260 seconds]
HHN93 has joined #ste||ar
<hkaiser> HHN93: adding the #pragmas instructs the compiler to try figuring out what can be vectorized
<HHN93> so if the compiler misinterprets it, performance benefits would be minimal?
<hkaiser> so if the code inside the loop is complex or involves function invocations the compiler can't see through, the effect will be minimal as no or only minimal vectorization is happening
<HHN93> oh ok
<hkaiser> HHN93: so I'd start looking at the generated assembly to see what gets vectorized and what not
<HHN93> is this the reason why we wanted to add hpx support to god bolt?
<hkaiser> HHN93: sure, that would make it much easier to experiment with hpx, wouldn't it?
<hkaiser> HHN93: the assembly could be looked at outside of godbolt as well
<HHN93> YES, but until then do we have to take snippet of code and edit it accordingly before using godbolt?
<HHN93> HHN93: the assembly could be looked at outside of godbolt as well
<HHN93> hmmm, are you suggesting a decompiler?
<hkaiser> HHN93: just add the correct command line option to your compiler invocation and it will generate the assembly for you
K-ballo has quit [Ping timeout: 240 seconds]
K-ballo has joined #ste||ar
diehlpk_work has joined #ste||ar
HHN93 has quit [Quit: Client closed]
HHN93 has joined #ste||ar
HHN93 has quit [Quit: Client closed]
tufei has quit [Remote host closed the connection]
tufei has joined #ste||ar
HHN93 has joined #ste||ar
<dkaratza[m]> hkaiser: I just submitted the application (google form) for the GSoD, I did a PR to list our project at the organizations list (https://github.com/google/season-of-docs/pull/1029) and have also created the two pages in wiki (https://github.com/STEllAR-GROUP/hpx/wiki/GSoD-2023-Project-Proposal and https://github.com/STEllAR-GROUP/hpx/wiki/GSoD-2023-Project-Ideas).
<HHN93> hkaiser https://gist.github.com/Johan511/a6dce6693abe92f1474df8bcf02e468d gives me very similar assembly with/without the pragmas, any hint on what might be the mistake? or what should I expect to be the difference in assembly code?
<hkaiser> dkaratza[m]: marvelous! many thanks!
<hkaiser> HHN93: as expected - so the compiler is not able to do any vectorization
<HHN93> I am accessing 2 vectors, each of 1B elements, vectorization seems like a good idea
<hkaiser> HHN93: I don't know if you can convince your compiler to tell you where and why it is (or isn't) vectorize
<HHN93> and this is the most simple case of vectorization I can think of
<hkaiser> I know Intels C++ compiler can do that
<hkaiser> HHN93: look at stdlibc/lic++ source code what they do to get some ideas
<HHN93> ok will try it out with a different compiler in that case
<HHN93> HHN93: look at stdlibc/lic++ source code what they do to get some ideas
<HHN93> what  am I looking for?
<hkaiser> the unseq implementations
<HHN93> stdlib/libc++ of the compiler's repo?
<hkaiser> yes
<hkaiser> stdlibc is gcc's C++ library, libc++ is clang's
K-ballo1 has joined #ste||ar
<HHN93> found stdlib.h in /usr/include but what exactly am I looking for ?
K-ballo has quit [Ping timeout: 248 seconds]
K-ballo1 is now known as K-ballo
<zao> HHN93: You want to look at `libstdc++` for the GCC implementation of the C++ standard library and `libc++` for Clang's dito.
<zao> `/usr/include/stdlib.h` is a C header from (probably the GNU) libc.
<zao> Quite unrelated :D
<HHN93> yes, I felt similarly
<HHN93> not sure where I can find the src code for libstdc++
<HHN93> tried cloning clang repo and am going through it
<zao> libc++ similarly has it in some LLVM repo somewhere, probably a monorepo knowing google :D
<zao> Or on a machine where you've got an implementation, stepping into your headers and looking around.
<HHN93> llvm-project has a seperate directory name openmp, isn't that what I should be looking at?
<zao> The header part for libstdc++ is typically down in `/usr/include/c++/11` or so.
<zao> I have no idea what your project is nor what you're supposed to investigate, but if it's about par/unseq stuff I would expect it would be more about the parallel algorithms part?
<zao> std::async? std::transform? Not sure what you're doing right now.
<HHN93> yes, I am trying to implement par_unseq for HPX algorithms
<HHN93> there were no performance gains when using vectorization, hkaiser suggested that it might be because the compiler does not understand how to vectorize
<HHN93> so I am going through assembly code of simple cases of #pragma omp simd to see how the assembly generated is differetn
<Aarya[m]> Hi, can anyone explain the difference between what is done here "https://github.com/STEllAR-GROUP/hpxMP" and what needs to be done for the "hpxMP: HPX threading system for LLVM OpenMP" project rtohid
<hkaiser> Aarya[m]: hpxMP is a new implementation of OpenMP independent of LLVM
<hkaiser> what we would like to do is to 'port' the LLVM openmp runtime implementation to HPX
<hkaiser> Aarya[m]: rtohid[m] will have all the details
<zao> Is there anywhere we _can't_ sneak in HPX? :P
<zao> HHN93: One trick I have when I don't quite know where something is in library code is that I write a small program that uses it and step through the execution into the library code or Go To Definition in an IDE.
<HHN93> I am trying to see how using pragma omp simd changes the executable generated, pragmas are just pre processor directives, right? how can I step into it?
<hkaiser> zao: that's the point - taking over the world one project at a time ;-)
HHN93 has quit [Quit: Client closed]
HHN93 has joined #ste||ar
<Aarya[m]> <hkaiser> "Aarya: hpxMP is a new implementa..." <- Is this implementation performing worse than the LLVM openMP implementation?
<hkaiser> Aarya[m]: much worse
<hkaiser> it also doesn't support many of the omp #pragmas properly
<hkaiser> moving to the LLVM runtime would relieve us from ever having to worry about new omp #pragmas they might introduse as the runtime would tak care of that
<Aarya[m]> So we just need to make all pthread alternatives available. And llvm will handle everything else
<hkaiser> exactly
<gonidelis[m]> <zao> "HHN93: One trick I have when I..." <- damn.... that's smart
HHN93 has quit [Quit: Client closed]
diehlpk_work has quit [Ping timeout: 252 seconds]
K-ballo has quit [Ping timeout: 240 seconds]
K-ballo has joined #ste||ar