hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
hkaiser has joined #ste||ar
diehlpk_work has quit [Remote host closed the connection]
ahmed_ has quit [Quit: Connection closed for inactivity]
<srinivasyadav227>
#2271 concentrates on implementing two new policies hpx::unseq and hpx::par_unseq similar to std::unseq and std::par_unseq which internally should use autovectorization, pragma directives etc.. using which we plan to implement the hpx algorithms
<srinivasyadav227>
`"Perhaps for more specialized functions we could write the vectorization with intrinsics and unroll? I have seen such code have better performance than compiler auto-vectorization, specifically with GCC and icc. It eliminates a few redundant instructions and helps the compiler exploit register renaming to the fullest. But I'm not sure if writing the intrinsics manually for specialized functions is considered good practice for
<srinivasyadav227>
all-platform performance. What else is expected to be done here?"` yes you are right, using intrinsics gives more performance, thats why use explicit vectorization libraries like Vc, std::experimental::simd, EVE (in development) which map high level abstractions to low level architecture specific simd intrinsics. So we have two policies which does this job hpx::execution::simd and hpx::execution::par_simd
<srinivasyadav227>
but again #2271 focuses on using compiler auto-vectorization to vectorize
<akcube[m]>
Hey! And thanks for replying. Right, I get what you mean. So this project mainly involves porting a bunch of implemented algorithms which are vectorization-safe to work with the unseq policies by using no more than compiler specific hints and minor developer assistance?
<akcube[m]>
<srinivasyadav227> "For more info regarding explicit..." <- Damn, looks really interesting :p
<K-ballo>
A isn't aware of ns1::op< nor ns2::op<, nor does it have to be
<K-ballo>
each place in which < is used does need to be aware, and it finds the right "overload" based on lookup rules
<zao>
There's kind of two distinct concepts at play here, some operators can be defined as free functions, typically the relational ones as you tend to want to define them for different (possibly non-editable) types (A, B) and (B, A). Then there's the actual overload resolution which considers all eligible candidates, most notably ones found via ADL or normal lookup.
<gonidelis[m]>
K-ballo: 's code exmples should be protected by unesco
<gonidelis[m]>
amazing! got it! thanks
<K-ballo>
some assignment operators must be members because the compiler decides to provide implicit ones or not based on the ones you declare
<gonidelis[m]>
zao: alright so going back to the subject, overload `operator=` cannot be defined as a free function
<K-ballo>
that implicit definition decision needs to be made at the closing brace of the class definition
<gonidelis[m]>
implicit ones?
<K-ballo>
"if you don't provide one, the compiler provides one for you"
<gonidelis[m]>
oh
<gonidelis[m]>
ohh
<gonidelis[m]>
dangerous
<gonidelis[m]>
trickyu
<gonidelis[m]>
tricky*
<gonidelis[m]>
"compilers"
<gonidelis[m]>
alright.... got the point
<zao>
I just remembered that Phoenix and operator-comma existed now, thanks :D
<K-ballo>
you could always pretend they don't
<gonidelis[m]>
what's phoenix
<gonidelis[m]>
operator-comma 😅😅😅 lol
<gonidelis[m]>
didn't even know it was an operator
<K-ballo>
there's both an operator and a separator
<zao>
Boost.Phoenix is a C++-like DSL for composing lazy expressions, kind of for making lambdas before lambdas were a thing. Popularly used with Boost.Spirit for parsers/generators. Some of the masterminds are in here ;)
<zao>
It notably used comma for statement composition as you can't overload semicolon ^_^
<K-ballo>
we recently looked at compilation times at work, phoenix was taking the largest share of the parsing time (all the preprocessed files)
<zao>
On something completely different, I can now see why some people like TBB's allocators so much for parallel work... I ran some bulk parses (open+read file, decompress contents, parse data into structure) the other day and decided to fan out via TBB parallel_for_each across 16c32t. With MSVC's stock malloc/new I got around 3% meaningful work done, _everything_ was blocked on allocation.
<zao>
Snuck in some tbb::scalable_allocator for the most common vectors and tbb::enumerable_thread_specific for caching some zstandard decompressors and am now hitting 40-50% meaningful work, all in a night of profiling :D
<K-ballo>
the second picture gives me a nice warm feeling
<zao>
Not brave enough to adopt HPX in this codebase, and I doubt that the profiler would understand it well. It does wonders with traditional sync dependencies, being able to tell what is blocked on what and for how long.
<gonidelis[m]>
K-ballo: where do you work?
<K-ballo>
i'm an independent contractor, but was referring to quasar.ai
jehelset has quit [Remote host closed the connection]
<hkaiser>
still need to find out what causes the '(unknown)' entry in the Module Levels view
aacirino has joined #ste||ar
ahmed_ has joined #ste||ar
<gonidelis[m]>
hkaiser: see pm
<gonidelis[m]>
plz
Yorlik has quit [Ping timeout: 240 seconds]
diehlpk_work has quit [Remote host closed the connection]
<gdaiss[m]>
gonidelis: at least the non-distributed version of octo-tiger runs without any problems using hpx 1.8.0-rc1: "100% tests passed, 0 tests failed out of 693"
<gdaiss[m]>
that's with clang 12. I'll test with gcc next
<gonidelis[m]>
gdaiss: wow! one would wonder who's behind this RC
<gdaiss[m]>
gonidelis: indeed! always a nice surprise if stuff works out-of-the-box! :P
<gonidelis[m]>
τηανκσ φορ ψηεψκινγ
<gonidelis[m]>
thanks for checking!!!!
<gonidelis[m]>
**
ahmed_ has quit [Quit: Connection closed for inactivity]
aacirino has quit [Remote host closed the connection]
<gdaiss[m]>
gonidelis: the build with gcc-10 also works: "100% tests passed, 0 tests failed out of 909"