#ste||ar on 2022-04-19 — irc logs at irclog.cct.lsu.edu

2021-08-06 22:55 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu

00:03 hkaiser has joined #ste||ar

00:07 diehlpk_work has quit [Remote host closed the connection]

00:28 ahmed_ has quit [Quit: Connection closed for inactivity]

01:38 K-ballo has quit [Quit: K-ballo]

01:49 jehelset has joined #ste||ar

03:07 hkaiser has quit [Quit: Bye!]

05:27 Yorlik has joined #ste||ar

08:00 jehelset has quit [Ping timeout: 260 seconds]

08:15 <akcube[m]> Hey everyone! I'm Kishore. I enjoy working with systems / low-level code in general and love working on HPC problems. I came across HPX while looking for GSoC orgs in this field. I was looking through some of the issues, one in particular is (https://github.com/STEllAR-GROUP/hpx/issues/2271).... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/9b23c8074c5f57602c2508cab30f48c52271a3d6)

08:26 <srinivasyadav227> hi akcube , Welcome.

08:27 <srinivasyadav227> #2271 concentrates on implementing two new policies hpx::unseq and hpx::par_unseq similar to std::unseq and std::par_unseq which internally should use autovectorization, pragma directives etc.. using which we plan to implement the hpx algorithms

08:30 <srinivasyadav227> `"Perhaps for more specialized functions we could write the vectorization with intrinsics and unroll? I have seen such code have better performance than compiler auto-vectorization, specifically with GCC and icc. It eliminates a few redundant instructions and helps the compiler exploit register renaming to the fullest. But I'm not sure if writing the intrinsics manually for specialized functions is considered good practice for

08:30 <srinivasyadav227> all-platform performance. What else is expected to be done here?"` yes you are right, using intrinsics gives more performance, thats why use explicit vectorization libraries like Vc, std::experimental::simd, EVE (in development) which map high level abstractions to low level architecture specific simd intrinsics. So we have two policies which does this job hpx::execution::simd and hpx::execution::par_simd

08:31 <srinivasyadav227> For more info regarding explicit vectorization you can refer `https://github.com/STEllAR-GROUP/hpx/issues/2333`

08:32 <srinivasyadav227> but again #2271 focuses on using compiler auto-vectorization to vectorize

09:00 <akcube[m]> Hey! And thanks for replying. Right, I get what you mean. So this project mainly involves porting a bunch of implemented algorithms which are vectorization-safe to work with the unseq policies by using no more than compiler specific hints and minor developer assistance?

09:01 <akcube[m]> <srinivasyadav227> "For more info regarding explicit..." <- Damn, looks really interesting :p

09:04 <akcube[m]> I believe this (#2333) is more relevant to this project idea right? https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2022#conduct-a-thorough-performance-analysis-on-hpx-parallel-algorithms-and-optimize

10:08 jehelset has joined #ste||ar

10:09 <srinivasyadav227> <akcube[m]> "Hey! And thanks for replying..." <- yess

10:10 <srinivasyadav227> <akcube[m]> "I believe this (#2333) is more..." <- no, thats a different project

10:29 <akcube[m]> Ah sorry. I should've read a bit more. (https://github.com/STEllAR-GROUP/hpx/pull/2330 cleared things up). Thanks :)

11:08 <srinivasyadav227> no prob :)

11:43 <satacker[m]> Hi,

11:43 <satacker[m]> just putting it here for updates https://github.com/SAtacker/hpx_parallel_matrix_multiplication

11:43 <satacker[m]> as mentioned in the wiki

12:10 jehelset has quit [Ping timeout: 248 seconds]

12:27 hkaiser has joined #ste||ar

12:29 K-ballo has joined #ste||ar

12:49 <K-ballo> > hkaiser starred STEllAR-GROUP/hpx 3 days ago

12:49 <K-ballo> it happened again

12:50 <hkaiser> yes

12:50 <hkaiser> :/

12:50 Yorlik has quit [Ping timeout: 250 seconds]

14:04 <gnikunj[m]> hkaiser: srinivasyadav227: I’ll be late by a fee minutes. Setting up my laptop with zoom.

14:08 diehlpk_work has joined #ste||ar

14:09 <satacker[m]> Any last minute suggestions for proposal?

14:10 <gonidelis[m]> (final gsoc day is always fun)

14:16 <gonidelis[m]> hkaiser: https://github.com/STEllAR-GROUP/hpx/pull/5857#issuecomment-1102585322

14:16 <gonidelis[m]> when i use a free fucntion i get "must be a non-static member function"

14:16 <gonidelis[m]> K-ballo: any ideas?

14:16 <gonidelis[m]> how do i overload the `=` operator for enum classes?

14:19 jehelset has joined #ste||ar

14:41 Yorlik has joined #ste||ar

14:44 <K-ballo> gonidelis[m]: which op=? the copy/move ones must be members

14:45 <gonidelis[m]> just realized i wanted to overload ==

14:45 <gonidelis[m]> K-ballo: but yet the question emerges: why do state change operators need to be members?

14:45 <K-ballo> wrong question, some (most) state change operators can be free functions

14:46 <gonidelis[m]> and to make it even more complex: how can a value producing operator be overloaded outside the class?

14:46 <gonidelis[m]> is it overloaded within the namespace? (that goes to the 2nd question)

14:46 <gonidelis[m]> within the namespace instead*

14:46 <K-ballo> too complex, I do not understand

14:47 <gonidelis[m]> how can these operators be overloaded outside the `enum class` ? https://github.com/STEllAR-GROUP/hpx/blob/9ca801475733aad99410614b3a38c140ff735487/libs/full/performance_counters/include/hpx/performance_counters/counters_fwd.hpp#L146-L153

14:47 <gonidelis[m]> K-ballo: https://stackoverflow.com/a/871280/8242494

14:48 <gonidelis[m]> even more the +15 coment on the answer

14:48 <K-ballo> they just can... what's the question?

14:48 <gonidelis[m]> alright

14:49 <gonidelis[m]> the question is how is the counter_type class aware of the overloading since it happens outside its body

14:49 <K-ballo> why would it need to be aware?

14:49 <gonidelis[m]> cause it's using it

14:49 <gonidelis[m]> ?

14:49 <gonidelis[m]> its objects are being used with that overloaded operator

14:49 <K-ballo> sometimes it does need to be aware, and those are the cases where you can't define things outside of class

14:50 <K-ballo> but most of the time it doesn't need to know at all

14:50 <K-ballo> you can define op< anywhere everywhere, and even call it too, the class doesn't care

14:50 <gonidelis[m]> and the definition holds within the namespace only?

14:51 <K-ballo> what does that mean?

14:51 <K-ballo> to "hold"?

14:51 <gonidelis[m]> lol

14:51 <zao> *grabs popcorn*

14:51 <gonidelis[m]> the overloading is defined within the namespace only

14:51 <gonidelis[m]> which namespace includes the class of course

14:52 <gonidelis[m]> zao: 😅😅😅

14:52 <K-ballo> https://godbolt.org/z/8c8brE7Wo

14:53 <K-ballo> A isn't aware of ns1::op< nor ns2::op<, nor does it have to be

14:53 <K-ballo> each place in which < is used does need to be aware, and it finds the right "overload" based on lookup rules

14:54 <zao> There's kind of two distinct concepts at play here, some operators can be defined as free functions, typically the relational ones as you tend to want to define them for different (possibly non-editable) types (A, B) and (B, A). Then there's the actual overload resolution which considers all eligible candidates, most notably ones found via ADL or normal lookup.

14:55 <gonidelis[m]> K-ballo: 's code exmples should be protected by unesco

14:56 <gonidelis[m]> amazing! got it! thanks

14:57 <K-ballo> some assignment operators must be members because the compiler decides to provide implicit ones or not based on the ones you declare

14:57 <gonidelis[m]> zao: alright so going back to the subject, overload `operator=` cannot be defined as a free function

14:57 <K-ballo> that implicit definition decision needs to be made at the closing brace of the class definition

14:58 <gonidelis[m]> implicit ones?

14:58 <K-ballo> "if you don't provide one, the compiler provides one for you"

14:58 <gonidelis[m]> oh

14:58 <gonidelis[m]> ohh

14:58 <gonidelis[m]> dangerous

14:58 <gonidelis[m]> trickyu

14:58 <gonidelis[m]> tricky*

14:59 <gonidelis[m]> "compilers"

14:59 <gonidelis[m]> alright.... got the point

15:01 <zao> I just remembered that Phoenix and operator-comma existed now, thanks :D

15:02 <K-ballo> you could always pretend they don't

15:06 <gonidelis[m]> what's phoenix

15:06 <gonidelis[m]> operator-comma 😅😅😅 lol

15:07 <gonidelis[m]> didn't even know it was an operator

15:07 <K-ballo> there's both an operator and a separator

15:08 <zao> Boost.Phoenix is a C++-like DSL for composing lazy expressions, kind of for making lambdas before lambdas were a thing. Popularly used with Boost.Spirit for parsers/generators. Some of the masterminds are in here ;)

15:09 <zao> It notably used comma for statement composition as you can't overload semicolon ^_^

15:09 <K-ballo> we recently looked at compilation times at work, phoenix was taking the largest share of the parsing time (all the preprocessed files)

15:11 <zao> On something completely different, I can now see why some people like TBB's allocators so much for parallel work... I ran some bulk parses (open+read file, decompress contents, parse data into structure) the other day and decided to fan out via TBB parallel_for_each across 16c32t. With MSVC's stock malloc/new I got around 3% meaningful work done, _everything_ was blocked on allocation.

15:12 <zao> Snuck in some tbb::scalable_allocator for the most common vectors and tbb::enumerable_thread_specific for caching some zstandard decompressors and am now hitting 40-50% meaningful work, all in a night of profiling :D

15:13 <zao> https://i.imgur.com/CF5nd9Q.png -> https://i.imgur.com/2c6G02o.png

15:15 <K-ballo> the second picture gives me a nice warm feeling

15:25 <zao> Not brave enough to adopt HPX in this codebase, and I doubt that the profiler would understand it well. It does wonders with traditional sync dependencies, being able to tell what is blocked on what and for how long.

15:34 <gonidelis[m]> K-ballo: where do you work?

15:39 <K-ballo> i'm an independent contractor, but was referring to quasar.ai

15:44 jehelset has quit [Remote host closed the connection]

16:02 <gonidelis[m]> data engineering and AI ... huh

16:34 fdadsfadfasdf has joined #ste||ar

16:53 fdadsfadfasdf has quit [Quit: Client closed]

17:18 <hkaiser> K-ballo: I fixed the cyclic dependencies now and updated the report: https://hpx.stellar-group.org/files/report/

17:18 <hkaiser> thanks again

17:19 <hkaiser> still need to find out what causes the '(unknown)' entry in the Module Levels view

18:54 aacirino has joined #ste||ar

20:17 ahmed_ has joined #ste||ar

20:48 <gonidelis[m]> hkaiser: see pm

20:48 <gonidelis[m]> plz

20:49 Yorlik has quit [Ping timeout: 240 seconds]

21:59 diehlpk_work has quit [Remote host closed the connection]

22:07 <gdaiss[m]> gonidelis: at least the non-distributed version of octo-tiger runs without any problems using hpx 1.8.0-rc1: "100% tests passed, 0 tests failed out of 693"

22:07 <gdaiss[m]> that's with clang 12. I'll test with gcc next

22:11 <gonidelis[m]> gdaiss: wow! one would wonder who's behind this RC

22:19 <gdaiss[m]> gonidelis: indeed! always a nice surprise if stuff works out-of-the-box! :P

22:20 <gonidelis[m]> τηανκσ φορ ψηεψκινγ

22:20 <gonidelis[m]> thanks for checking!!!!

22:20 <gonidelis[m]> **

22:27 ahmed_ has quit [Quit: Connection closed for inactivity]

22:40 aacirino has quit [Remote host closed the connection]

22:53 <gdaiss[m]> gonidelis: the build with gcc-10 also works: "100% tests passed, 0 tests failed out of 909"

23:33 <gonidelis[m]> gdaiss: awesome!