#ste||ar on 2021-12-20 — irc logs at irclog.cct.lsu.edu

2021-08-06 22:55 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu

02:04 K-ballo has quit [Quit: K-ballo]

02:28 hkaiser has quit [Quit: Bye!]

11:54 K-ballo has joined #ste||ar

13:01 hkaiser has joined #ste||ar

13:33 <gonidelis[m]> hkaiser: been studying the roofline model and they propose certain optimizations to be made in order to improve perf.

13:33 <gonidelis[m]> at one point they mention "a significant fraction of the instruction mix be floating-point operations".

13:34 <hkaiser> ok?

13:34 <gonidelis[m]> what exactly does that mean in the implementation side? how do i enforce an op to be floating point?

13:35 <hkaiser> the roofline model assesses the absolute efficiency of a particular code compared to the theoretical capabilities of the hardware

13:35 <gonidelis[m]> yes

13:36 <hkaiser> for scientific software, where floating point operations are where the action happens, you want to do as many floating point operations as possible compared to all the other types of operations (indexing, execution flow, etc.)

13:37 <hkaiser> that's what this means - increase the relative amount of FP ops in the instruction mix, IOW reduce overheads

13:38 <gonidelis[m]> that means. do as many multiplications and additions as you can compared to branching or searching or initialization ?

13:38 <gonidelis[m]> maximize the ammount of math in your problem in other words

13:40 <gonidelis[m]> cause a scientific algorithm is modeled in a very particular way and seems weird to me saying "try to compress it to do as much fops as possible". though we've seen some (sort of) relative examples on elements of programming. like reducing `if` checks or simplifying recurssion calls for example

13:42 <hkaiser> yes

13:43 <gonidelis[m]> elements of programming contributing towards performance (except for readability, modularity and safety) is sth i wouldn't expect

13:44 <hkaiser> :D

15:05 diehlpk_work has joined #ste||ar

15:46 <gonidelis[m]> what is "register blocking optimization" ?>

15:53 <hkaiser> gonidelis[m]: not sure, it could be a compiler optimization technique making sure that variables that are used often end up in processor registers

17:15 <gonidelis[m]> blocking

17:15 <gonidelis[m]> huh

17:15 <gonidelis[m]> cool thanks

17:41 <gonidelis[m]> hkaiser: do (should) we take care for memory alignment on TaskBench? i reckong this would be a future of the hpx for_loop used.

17:41 <gonidelis[m]> (for_loop probably already takes care of data alignmnent, isn't it?)

17:45 <hkaiser> gonidelis[m]: alignment is important only whenvectorization gets into the picture

17:46 <gonidelis[m]> 0.0

17:46 <gonidelis[m]> really?

17:46 <gonidelis[m]> ok

17:46 <gonidelis[m]> doesn't it affect caching?

17:53 <hkaiser> well, it could affect caching, yes - but only if a lot of padding gets introduced because of alignment

17:53 <gonidelis[m]> padding is introduced manually or automatically?

17:53 <hkaiser> I don't think this is important to look into ATM

17:54 <hkaiser> if you align variables the compiler may have to insert padding

17:54 <hkaiser> e.g.: struct A {char a; double d; };

17:54 <hkaiser> here, if doubles are aligned, there might be padding between 'a' and 'd'

17:55 <gonidelis[m]> yes

17:55 <gonidelis[m]> got it

17:56 <gnikunj[m]> gonidelis: register blocking algorithms are those with tile sizes such that the data always either reside in a register or in L1 caches. (Essentially no cache misses)

17:57 <gonidelis[m]> how do you even accomplish that?

17:59 <gnikunj[m]> By designing algorithms such that it works on tiles. Try searching for blocked matrix matrix multiplication.

17:59 <gonidelis[m]> so you adjust the tile according to the given architecture?

17:59 <gonidelis[m]> yeah i know block mm mult

18:00 <hkaiser> gnikunj[m]: yah, makes sense

18:00 <gnikunj[m]> Also for the alignment part, compilers add passing to meet the architecture requirements. So, most likely you don’t need to add any alignment yourself. In most situations, you’ll end up with worse performance unless you calculate the passing for each and every data structure and change alignment accordingly.

18:00 <hkaiser> thanks for the explanation

18:01 <gnikunj[m]> gonidelis[m]: Yes, based on the size of L1 caches

18:01 <hkaiser> gnikunj[m]: on x64 there is no penalty for unaligned memory access (except for vector operations), however

18:01 <gnikunj[m]> s/passing/padding/

18:02 <hkaiser> it loads the whole cache line (which is aligned) anyways

18:02 <gnikunj[m]> Compilers make it 8byte align so it doesn’t have to do multiple loads to fetch data

18:02 <gnikunj[m]> IIRC

18:03 <gonidelis[m]> hkaiser: the cache line is always equal to the bits of the architecture? 64?

18:04 <hkaiser> no

18:05 <hkaiser> cache line sizes are different on different architectures, usually however it's 64 bytes

18:06 <gonidelis[m]> ok

19:10 <diehlpk_work> ms[m], Hi

19:10 <diehlpk_work> I am correct that you build hpx with spack on Piz Daint?

19:11 <diehlpk_work> One asked I'd be curious to see if you have a gitlab pipeline on software.nersc.gov with HPX building and with what compiler?

19:17 <ms[m]> diehlpk_work: correct on the first, and I don't have any (hpx or otherwise) ci running at nersc (I don't know if someone else does)

19:18 <diehlpk_work> ms[m], They want help to get HPX compiled

19:18 <diehlpk_work> So I will let them know that you cna show them your spack file

19:19 <hkaiser> our spack recipe is in the spack repo, I believe

19:19 <ms[m]> diehlpk_work: what hkaiser said ^

19:20 <ms[m]> the only thing that's specific to daint are the compilers because cray already provide them, but you can do without them as well

19:21 <ms[m]> an issue on the spack or hpx repo would be best (I'm guessing they have compilation errors?)

19:44 <diehlpk_work> ms[m], I have no idea what kind of issues they have.

19:44 <diehlpk_work> I just wanted to be nice and get them in touch with you, since I know you build on Cray as well.

19:54 <ms[m]> diehlpk_work: ok, no problem :P feel free to put them in touch with me

20:06 <diehlpk_work> akheir, yet?

20:20 <hkaiser> diehlpk_work: Alireza is on vacation all week

20:30 <gonidelis[m]> K-ballo: what is the reasoning for not being able to split an rvlue string in cpp?

20:33 <K-ballo> do you have a specific split interface in mind?

20:33 <gonidelis[m]> no

20:34 <gonidelis[m]> i find it hard to believe that this is the reason thoug, "we just couldn't come up with an interface"

20:34 <K-ballo> the reason depends on the interface

20:34 <K-ballo> for some interfaces it will make sense, for others it wont

20:35 <gonidelis[m]> what do you mena?

20:35 <gonidelis[m]> mean*

20:36 <K-ballo> come up with an interface

20:37 <gonidelis[m]> i don't exactly get what you mean by interface but `auto parts = "abc" | std::views::split('b');` seems like it should work (?)

20:38 <K-ballo> if you have a view then you don't own the parts

20:38 <K-ballo> you return some kind of string view that points into the splitted string

20:38 <K-ballo> which if it is an rvalue it has expired and now you have dangling references

20:39 <K-ballo> if you have an action that returns new strings, they own the storage, the fact that the input is an rvalue doesn't matter

20:40 <gonidelis[m]> wow

20:40 <gonidelis[m]> wow!

20:40 <gonidelis[m]> seems cogent

20:40 <gonidelis[m]> it's that views don't own

20:41 <gonidelis[m]> 88% percent of my questions are answered with "the views don't own"

20:41 <gonidelis[m]> thanks

20:41 <K-ballo> most string split interfaces I know return strings, not views

20:41 <K-ballo> which is not very efficient, considering this is C++

20:42 <gonidelis[m]> you mean C++ wants to be performant

20:42 <gonidelis[m]> and returning strings is not perfrormant

20:50 <diehlpk_work> hkaiser, Thanks for the update

21:33 <gonidelis[m]> hkaiser: ms where did this file go? https://github.com/STEllAR-GROUP/hpx/blob/f97998830c57e2cd17a03e63b0f49ee6c45d4997/libs/core/execution_base/include/hpx/execution_base/receiver.hpp

21:35 <hkaiser> gonidelis[m]: here: https://github.com/STEllAR-GROUP/hpx-local/blob/master/libs/core/execution_base/include/hpx/execution_base/receiver.hpp

21:38 <gonidelis[m]> does not the original hpx include the local?

21:40 <hkaiser> yes, it does fetch_content() hpx-local, so it's probably under <build>/_deps/hpxlocal-src

21:41 <gonidelis[m]> ...

21:41 <gonidelis[m]> nice