#ste||ar on 2023-08-04 — irc logs at irclog.cct.lsu.edu

2021-08-06 22:55 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu

00:15 hkaiser has joined #ste||ar

01:15 K-ballo has quit [Quit: K-ballo]

01:56 hkaiser has quit [Quit: Bye!]

03:25 tufei_ has quit [Remote host closed the connection]

03:26 tufei_ has joined #ste||ar

12:21 hkaiser has joined #ste||ar

13:59 K-ballo has joined #ste||ar

14:03 tufei__ has joined #ste||ar

14:04 tufei_ has quit [Remote host closed the connection]

14:13 hkaiser has quit [Quit: Bye!]

15:47 tufei_ has joined #ste||ar

15:48 tufei_ has quit [Remote host closed the connection]

15:48 tufei_ has joined #ste||ar

15:49 tufei__ has quit [Ping timeout: 240 seconds]

16:38 ct-clmsn has joined #ste||ar

17:09 hkaiser has joined #ste||ar

17:35 ct-clmsn has quit [Quit: This computer has gone to sleep]

17:36 ct-clmsn has joined #ste||ar

17:52 hkaiser has quit [Quit: Bye!]

18:02 hkaiser has joined #ste||ar

18:51 hkaiser has quit [Quit: Bye!]

20:09 hkaiser has joined #ste||ar

21:05 ct-clmsn has quit [Quit: This computer has gone to sleep]

22:18 <gonidelis[m]> it's as real as chat gpt is useful

22:20 <gonidelis[m]> answer: somewhat

22:21 <gnikunj[m]> Automatic parallelism is hard to do. Technically, a compiler can only parallelise bits that are independent or basic dependency analysis. Extracting parallelism becomes harder for this reason. You should realize that the compiler has no understanding of the original code at this point to logically derive any useful parallelism :)

22:25 <gnikunj[m]> That’s what I meant by independence. Anti dependence analysis over variables gets very tricky.

22:26 <gnikunj[m]> You can work with the dependence analysis you do. Now to derive parallelism you will need to work on that depence graph which complicates it further. There are many publciations on automatic parallelism and they work well when data interleave is low. That’s where OpenMP shines as well btw.

22:28 <gonidelis[m]> pansysk75[m]: not only that. you have to reason about partitioning and reduction. Also a sequential algorithm might look nothing like a parallel algorithm when it runs on a single core

22:28 <gonidelis[m]> single thread*

22:30 <gonidelis[m]> Bryce Adelstein said it very nicely once. Efficiency and Performance are not equivalent

22:31 <gnikunj[m]> Yup, you trade one for the other. Always.

22:31 <gonidelis[m]> you might right a very efficient single threaded algorithm but when parallelizing getting no scaling

22:31 <gonidelis[m]> scale*

22:31 <gonidelis[m]> now, you make this algo dumber, can all of a sudden 2 or 4 threads run faster than the efficient single threaded one

22:31 <gonidelis[m]> now you tell me how intel smart parallel API bot can do that conversion

22:32 <gonidelis[m]> now give me a bot that gives you the hotspots and chatgpt-y solution on how to fix em. that's sth i would be willin to consider and can think of it being doable

22:33 <gonidelis[m]> but "auto-parallelization" is a different beast, if it's anything at all

22:33 <gonidelis[m]> write*

22:33 <gonidelis[m]> yeah omp beat them to it

22:34 <gnikunj[m]> Right. It’s the same with auto vectorization. Although, many research compilers these days are as good or better than hand vectorization, production compilers are far from it.

22:35 <gnikunj[m]> A conditional statement in your loop will make it ineligible for auto vectorization for example

22:35 <gonidelis[m]> totally

22:36 <gonidelis[m]> thread divergence sounds like a beautiful thing you would wanna hear on a Friday afternoon