hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
hkaiser has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: Bye!]
tufei_ has quit [Remote host closed the connection]
tufei_ has joined #ste||ar
hkaiser has joined #ste||ar
K-ballo has joined #ste||ar
tufei__ has joined #ste||ar
tufei_ has quit [Remote host closed the connection]
hkaiser has quit [Quit: Bye!]
tufei_ has joined #ste||ar
tufei_ has quit [Remote host closed the connection]
tufei_ has joined #ste||ar
tufei__ has quit [Ping timeout: 240 seconds]
ct-clmsn has joined #ste||ar
hkaiser has joined #ste||ar
ct-clmsn has quit [Quit: This computer has gone to sleep]
ct-clmsn has joined #ste||ar
hkaiser has quit [Quit: Bye!]
hkaiser has joined #ste||ar
hkaiser has quit [Quit: Bye!]
hkaiser has joined #ste||ar
ct-clmsn has quit [Quit: This computer has gone to sleep]
<gonidelis[m]> it's as real as chat gpt is useful
<gonidelis[m]> answer: somewhat
<gnikunj[m]> Automatic parallelism is hard to do. Technically, a compiler can only parallelise bits that are independent or basic dependency analysis. Extracting parallelism becomes harder for this reason. You should realize that the compiler has no understanding of the original code at this point to logically derive any useful parallelism :)
<gnikunj[m]> That’s what I meant by independence. Anti dependence analysis over variables gets very tricky.
<gnikunj[m]> You can work with the dependence analysis you do. Now to derive parallelism you will need to work on that depence graph which complicates it further. There are many publciations on automatic parallelism and they work well when data interleave is low. That’s where OpenMP shines as well btw.
<gonidelis[m]> pansysk75[m]: not only that. you have to reason about partitioning and reduction. Also a sequential algorithm might look nothing like a parallel algorithm when it runs on a single core
<gonidelis[m]> single thread*
<gonidelis[m]> Bryce Adelstein said it very nicely once. Efficiency and Performance are not equivalent
<gnikunj[m]> Yup, you trade one for the other. Always.
<gonidelis[m]> you might right a very efficient single threaded algorithm but when parallelizing getting no scaling
<gonidelis[m]> scale*
<gonidelis[m]> now, you make this algo dumber, can all of a sudden 2 or 4 threads run faster than the efficient single threaded one
<gonidelis[m]> now you tell me how intel smart parallel API bot can do that conversion
<gonidelis[m]> now give me a bot that gives you the hotspots and chatgpt-y solution on how to fix em. that's sth i would be willin to consider and can think of it being doable
<gonidelis[m]> but "auto-parallelization" is a different beast, if it's anything at all
<gonidelis[m]> write*
<gonidelis[m]> yeah omp beat them to it
<gnikunj[m]> Right. It’s the same with auto vectorization. Although, many research compilers these days are as good or better than hand vectorization, production compilers are far from it.
<gnikunj[m]> A conditional statement in your loop will make it ineligible for auto vectorization for example
<gonidelis[m]> totally
<gonidelis[m]> thread divergence sounds like a beautiful thing you would wanna hear on a Friday afternoon