#ste||ar on 2023-04-09 — irc logs at irclog.cct.lsu.edu

2021-08-06 22:55 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu

01:00 soulreaper has quit [Quit: Client closed]

01:43 Yorlik_ has joined #ste||ar

01:46 Yorlik has quit [Ping timeout: 248 seconds]

02:27 hkaiser has quit [Quit: Bye!]

04:18 K-ballo1 has joined #ste||ar

04:19 K-ballo has quit [Ping timeout: 255 seconds]

04:19 K-ballo1 is now known as K-ballo

09:31 Yorlik_ is now known as Yorlik

11:46 K-ballo1 has joined #ste||ar

11:47 K-ballo has quit [Ping timeout: 248 seconds]

11:47 K-ballo1 is now known as K-ballo

13:51 hkaiser has joined #ste||ar

14:25 soulreaper has joined #ste||ar

15:07 HHN93 has joined #ste||ar

15:19 HHN93 has quit [Ping timeout: 260 seconds]

15:28 soulreaper has quit [Ping timeout: 260 seconds]

15:32 HHN93 has joined #ste||ar

15:32 <HHN93> In most cases -O3 vectorizes loops already, but it does seem to make sense that we add explicit vectorization

15:33 <hkaiser> HHN93: in cases where no user defined lambdas are involved, using experimental::simd is certainly an option

15:33 <HHN93> Any advice on how I should approach this issue as performance is often not improved significantly on adding explicit vectorization

15:34 <hkaiser> if the compiler can't apply vectorization because it isn't able to look through the loop, we have two options

15:35 <hkaiser> a) simlify the loop such that the compiler can understand it better (i.e. integral boundaries instead of iterators, not function calls inside the loop, etc.

15:35 <HHN93> no the issue I am highlighting is that sometimes the loops are already vectorized

15:35 <hkaiser> b) use truely explicit vectorization, i.e. experiental::simd

15:35 <hkaiser> well, there is c) don't do anything, obviously

15:35 <hkaiser> HHN93: sure, but that's not guaranteed, not even close

15:36 <HHN93> isn't compiler optimisations deterministic?

15:37 <hkaiser> but if all compilers already apply vectorization without being asked to do it, then we certainly don't need to do anything special

15:38 <HHN93> no, my issue is reasoning that my changes improve the performance. Because it seems that the loops are sometimes already vectorized

15:38 <hkaiser> most pragma's we use are not to force vectorization, though - it's to give certain assurances to the compiler that it can actually consider vectorizing code

15:38 <HHN93> but it does make sense to still add vectorization pragmas

15:39 <hkaiser> like #pragma ivdep, which tells the compiler that there is aliasing going on

15:39 <hkaiser> yes, that's what I'm trying to say - it is worth adding the pragmas if the execution policies request it

15:40 <hkaiser> in general, execution policies are not to instruct the implementation to do things in certain ways (i.e. par doesn't mean 'do parallelize)

15:40 <hkaiser> execution policies convey guarantees to the implementation that may enable certain optimizations

15:40 <HHN93> `yes, that's what I'm trying to say - it is worth adding the pragmas if the execution policies request it`

15:40 <HHN93> I agree with it

15:41 <HHN93> But sometimes there are no performance/assembly instructions improvements on adding vectorization pragmas

15:41 <hkaiser> so par means 'it's safe to execute the iterations in any order and potentially concurrently'

15:41 <HHN93> because -O3 had already vectorized them

15:41 <hkaiser> not always

15:42 <HHN93> yes it is not always the case but rather sometimes

15:42 <hkaiser> the compiler for instance can't assume that involved pointers are not aliasing the same data

15:42 <hkaiser> #pragam ivdep tells the compiler that this can be assumed, etc.

15:43 <hkaiser> I agree

15:43 <HHN93> in the case of generate_n par and par_unseq have very similar performance despite making the change, I observed that this was the case because std::generate_n has same performance for seq and unseq

15:43 <hkaiser> the compiler applies vectorization on its own if it can prove that this doesn't change semantics

15:44 <HHN93> so in the github PR is there anything I can add to prove that my PR is an improvement

15:44 <hkaiser> it could be that the implementation doesn't do anything if std::generate_n has same performance for seq and unseq

15:45 <hkaiser> well, you showed that unseq is faster than seq

15:45 <HHN93> `the compiler applies vectorization on its own if it can prove that this doesn't change semantics`

15:45 <HHN93> I am not sure how generate_n actually assumes this to be true. But as seen by the benchmarks, on enabling -O3 both have same performance

15:45 <HHN93> `well, you showed that unseq is faster than seq`

15:45 <HHN93> when no optimisations are on

15:45 <HHN93> on -O3 both are very close

15:45 <hkaiser> ahh, that's not a criteria, then

15:46 <HHN93> `ahh, that's not a criteria, then`

15:46 <HHN93> can you please elaborate?

15:46 <hkaiser> are you sure your implementation of std::generate_n is actually doing additional vectorization for unseq?

15:46 <hkaiser> can you please elaborate? - doing perf measurements with anything but -O3 is pointless

15:47 <hkaiser> what I may suggest that to implement unseq using experimental::simd and see if that improves the picture

15:48 <HHN93> yes, I have checked they do add a #pragma simd.

15:48 <hkaiser> for stdlibc++ or libc++? or for the msvc std library?

15:48 <HHN93> `can you please elaborate? - doing perf measurements with anything but -O3 is pointless`

15:48 <HHN93> ok so the fact that unseq is faster when no optimisations are enables doesn't prove anything?

15:49 <hkaiser> no it doesn't

15:49 <HHN93> g++ compiler

15:49 <hkaiser> it just forces vectorization even for not optimized code

15:51 HHN93 has quit [Quit: Client closed]

17:35 hkaiser has quit [Quit: Bye!]

18:19 K-ballo1 has joined #ste||ar

18:19 K-ballo has quit [Ping timeout: 276 seconds]

18:19 K-ballo1 is now known as K-ballo

18:43 soulreaper has joined #ste||ar

21:34 hkaiser has joined #ste||ar

22:02 soulreaper has quit [Quit: Client closed]