hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
hkaiser has joined #ste||ar
diehlpk_work has quit [Remote host closed the connection]
ahmed_ has quit [Quit: Connection closed for inactivity]
K-ballo has quit [Quit: K-ballo]
jehelset has joined #ste||ar
hkaiser has quit [Quit: Bye!]
Yorlik has joined #ste||ar
jehelset has quit [Ping timeout: 260 seconds]
<akcube[m]> Hey everyone! I'm Kishore. I enjoy working with systems / low-level code in general and love working on HPC problems. I came across HPX while looking for GSoC orgs in this field. I was looking through some of the issues, one in particular is (https://github.com/STEllAR-GROUP/hpx/issues/2271).... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/9b23c8074c5f57602c2508cab30f48c52271a3d6)
<srinivasyadav227> hi akcube , Welcome.
<srinivasyadav227> #2271 concentrates on implementing two new policies hpx::unseq and hpx::par_unseq similar to std::unseq and std::par_unseq which internally should use autovectorization, pragma directives etc.. using which we plan to implement the hpx algorithms
<srinivasyadav227> `"Perhaps for more specialized functions we could write the vectorization with intrinsics and unroll? I have seen such code have better performance than compiler auto-vectorization, specifically with GCC and icc. It eliminates a few redundant instructions and helps the compiler exploit register renaming to the fullest. But I'm not sure if writing the intrinsics manually for specialized functions is considered good practice for
<srinivasyadav227> all-platform performance. What else is expected to be done here?"` yes you are right, using intrinsics gives more performance, thats why use explicit vectorization libraries like Vc, std::experimental::simd, EVE (in development) which map high level abstractions to low level architecture specific simd intrinsics. So we have two policies which does this job hpx::execution::simd and hpx::execution::par_simd
<srinivasyadav227> For more info regarding explicit vectorization you can refer `https://github.com/STEllAR-GROUP/hpx/issues/2333`
<srinivasyadav227> but again #2271 focuses on using compiler auto-vectorization to vectorize
<akcube[m]> Hey! And thanks for replying. Right, I get what you mean. So this project mainly involves porting a bunch of implemented algorithms which are vectorization-safe to work with the unseq policies by using no more than compiler specific hints and minor developer assistance?
<akcube[m]> <srinivasyadav227> "For more info regarding explicit..." <- Damn, looks really interesting :p
jehelset has joined #ste||ar
<srinivasyadav227> <akcube[m]> "Hey! And thanks for replying..." <- yess
<srinivasyadav227> <akcube[m]> "I believe this (#2333) is more..." <- no, thats a different project
<akcube[m]> Ah sorry. I should've read a bit more. (https://github.com/STEllAR-GROUP/hpx/pull/2330 cleared things up). Thanks :)
<srinivasyadav227> no prob :)
<satacker[m]> Hi,
<satacker[m]> just putting it here for updates https://github.com/SAtacker/hpx_parallel_matrix_multiplication
<satacker[m]> as mentioned in the wiki
jehelset has quit [Ping timeout: 248 seconds]
hkaiser has joined #ste||ar
K-ballo has joined #ste||ar
<K-ballo> > hkaiser starred STEllAR-GROUP/hpx 3 days ago
<K-ballo> it happened again
<hkaiser> yes
<hkaiser> :/
Yorlik has quit [Ping timeout: 250 seconds]
<gnikunj[m]> hkaiser: srinivasyadav227: I’ll be late by a fee minutes. Setting up my laptop with zoom.
diehlpk_work has joined #ste||ar
<satacker[m]> Any last minute suggestions for proposal?
<gonidelis[m]> (final gsoc day is always fun)
<gonidelis[m]> when i use a free fucntion i get "must be a non-static member function"
<gonidelis[m]> K-ballo: any ideas?
<gonidelis[m]> how do i overload the `=` operator for enum classes?
jehelset has joined #ste||ar
Yorlik has joined #ste||ar
<K-ballo> gonidelis[m]: which op=? the copy/move ones must be members
<gonidelis[m]> just realized i wanted to overload ==
<gonidelis[m]> K-ballo: but yet the question emerges: why do state change operators need to be members?
<K-ballo> wrong question, some (most) state change operators can be free functions
<gonidelis[m]> and to make it even more complex: how can a value producing operator be overloaded outside the class?
<gonidelis[m]> is it overloaded within the namespace? (that goes to the 2nd question)
<gonidelis[m]> within the namespace instead*
<K-ballo> too complex, I do not understand
<gonidelis[m]> even more the +15 coment on the answer
<K-ballo> they just can... what's the question?
<gonidelis[m]> alright
<gonidelis[m]> the question is how is the counter_type class aware of the overloading since it happens outside its body
<K-ballo> why would it need to be aware?
<gonidelis[m]> cause it's using it
<gonidelis[m]> ?
<gonidelis[m]> its objects are being used with that overloaded operator
<K-ballo> sometimes it does need to be aware, and those are the cases where you can't define things outside of class
<K-ballo> but most of the time it doesn't need to know at all
<K-ballo> you can define op< anywhere everywhere, and even call it too, the class doesn't care
<gonidelis[m]> and the definition holds within the namespace only?
<K-ballo> what does that mean?
<K-ballo> to "hold"?
<gonidelis[m]> lol
<zao> *grabs popcorn*
<gonidelis[m]> the overloading is defined within the namespace only
<gonidelis[m]> which namespace includes the class of course
<gonidelis[m]> zao: 😅😅😅
<K-ballo> A isn't aware of ns1::op< nor ns2::op<, nor does it have to be
<K-ballo> each place in which < is used does need to be aware, and it finds the right "overload" based on lookup rules
<zao> There's kind of two distinct concepts at play here, some operators can be defined as free functions, typically the relational ones as you tend to want to define them for different (possibly non-editable) types (A, B) and (B, A). Then there's the actual overload resolution which considers all eligible candidates, most notably ones found via ADL or normal lookup.
<gonidelis[m]> K-ballo: 's code exmples should be protected by unesco
<gonidelis[m]> amazing! got it! thanks
<K-ballo> some assignment operators must be members because the compiler decides to provide implicit ones or not based on the ones you declare
<gonidelis[m]> zao: alright so going back to the subject, overload `operator=` cannot be defined as a free function
<K-ballo> that implicit definition decision needs to be made at the closing brace of the class definition
<gonidelis[m]> implicit ones?
<K-ballo> "if you don't provide one, the compiler provides one for you"
<gonidelis[m]> oh
<gonidelis[m]> ohh
<gonidelis[m]> dangerous
<gonidelis[m]> trickyu
<gonidelis[m]> tricky*
<gonidelis[m]> "compilers"
<gonidelis[m]> alright.... got the point
<zao> I just remembered that Phoenix and operator-comma existed now, thanks :D
<K-ballo> you could always pretend they don't
<gonidelis[m]> what's phoenix
<gonidelis[m]> operator-comma 😅😅😅 lol
<gonidelis[m]> didn't even know it was an operator
<K-ballo> there's both an operator and a separator
<zao> Boost.Phoenix is a C++-like DSL for composing lazy expressions, kind of for making lambdas before lambdas were a thing. Popularly used with Boost.Spirit for parsers/generators. Some of the masterminds are in here ;)
<zao> It notably used comma for statement composition as you can't overload semicolon ^_^
<K-ballo> we recently looked at compilation times at work, phoenix was taking the largest share of the parsing time (all the preprocessed files)
<zao> On something completely different, I can now see why some people like TBB's allocators so much for parallel work... I ran some bulk parses (open+read file, decompress contents, parse data into structure) the other day and decided to fan out via TBB parallel_for_each across 16c32t. With MSVC's stock malloc/new I got around 3% meaningful work done, _everything_ was blocked on allocation.
<zao> Snuck in some tbb::scalable_allocator for the most common vectors and tbb::enumerable_thread_specific for caching some zstandard decompressors and am now hitting 40-50% meaningful work, all in a night of profiling :D
<K-ballo> the second picture gives me a nice warm feeling
<zao> Not brave enough to adopt HPX in this codebase, and I doubt that the profiler would understand it well. It does wonders with traditional sync dependencies, being able to tell what is blocked on what and for how long.
<gonidelis[m]> K-ballo: where do you work?
<K-ballo> i'm an independent contractor, but was referring to quasar.ai
jehelset has quit [Remote host closed the connection]
<gonidelis[m]> data engineering and AI ... huh
fdadsfadfasdf has joined #ste||ar
fdadsfadfasdf has quit [Quit: Client closed]
<hkaiser> K-ballo: I fixed the cyclic dependencies now and updated the report: https://hpx.stellar-group.org/files/report/
<hkaiser> thanks again
<hkaiser> still need to find out what causes the '(unknown)' entry in the Module Levels view
aacirino has joined #ste||ar
ahmed_ has joined #ste||ar
<gonidelis[m]> hkaiser: see pm
<gonidelis[m]> plz
Yorlik has quit [Ping timeout: 240 seconds]
diehlpk_work has quit [Remote host closed the connection]
<gdaiss[m]> gonidelis: at least the non-distributed version of octo-tiger runs without any problems using hpx 1.8.0-rc1: "100% tests passed, 0 tests failed out of 693"
<gdaiss[m]> that's with clang 12. I'll test with gcc next
<gonidelis[m]> gdaiss: wow! one would wonder who's behind this RC
<gdaiss[m]> gonidelis: indeed! always a nice surprise if stuff works out-of-the-box! :P
<gonidelis[m]> τηανκσ φορ ψηεψκινγ
<gonidelis[m]> thanks for checking!!!!
<gonidelis[m]> **
ahmed_ has quit [Quit: Connection closed for inactivity]
aacirino has quit [Remote host closed the connection]
<gdaiss[m]> gonidelis: the build with gcc-10 also works: "100% tests passed, 0 tests failed out of 909"
<gonidelis[m]> gdaiss: awesome!