hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: Bye!]
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
<gonidelis[m]> hkaiser: been studying the roofline model and they propose certain optimizations to be made in order to improve perf.
<gonidelis[m]> at one point they mention "a significant fraction of the instruction mix be floating-point operations".
<hkaiser> ok?
<gonidelis[m]> what exactly does that mean in the implementation side? how do i enforce an op to be floating point?
<hkaiser> the roofline model assesses the absolute efficiency of a particular code compared to the theoretical capabilities of the hardware
<gonidelis[m]> yes
<hkaiser> for scientific software, where floating point operations are where the action happens, you want to do as many floating point operations as possible compared to all the other types of operations (indexing, execution flow, etc.)
<hkaiser> that's what this means - increase the relative amount of FP ops in the instruction mix, IOW reduce overheads
<gonidelis[m]> that means. do as many multiplications and additions as you can compared to branching or searching or initialization ?
<gonidelis[m]> maximize the ammount of math in your problem in other words
<gonidelis[m]> cause a scientific algorithm is modeled in a very particular way and seems weird to me saying "try to compress it to do as much fops as possible". though we've seen some (sort of) relative examples on elements of programming. like reducing `if` checks or simplifying recurssion calls for example
<hkaiser> yes
<gonidelis[m]> elements of programming contributing towards performance (except for readability, modularity and safety) is sth i wouldn't expect
<hkaiser> :D
diehlpk_work has joined #ste||ar
<gonidelis[m]> what is "register blocking optimization" ?>
<hkaiser> gonidelis[m]: not sure, it could be a compiler optimization technique making sure that variables that are used often end up in processor registers
<gonidelis[m]> blocking
<gonidelis[m]> huh
<gonidelis[m]> cool thanks
<gonidelis[m]> hkaiser: do (should) we take care for memory alignment on TaskBench? i reckong this would be a future of the hpx for_loop used.
<gonidelis[m]> (for_loop probably already takes care of data alignmnent, isn't it?)
<hkaiser> gonidelis[m]: alignment is important only whenvectorization gets into the picture
<gonidelis[m]> 0.0
<gonidelis[m]> really?
<gonidelis[m]> ok
<gonidelis[m]> doesn't it affect caching?
<hkaiser> well, it could affect caching, yes - but only if a lot of padding gets introduced because of alignment
<gonidelis[m]> padding is introduced manually or automatically?
<hkaiser> I don't think this is important to look into ATM
<hkaiser> if you align variables the compiler may have to insert padding
<hkaiser> e.g.: struct A {char a; double d; };
<hkaiser> here, if doubles are aligned, there might be padding between 'a' and 'd'
<gonidelis[m]> yes
<gonidelis[m]> got it
<gnikunj[m]> gonidelis: register blocking algorithms are those with tile sizes such that the data always either reside in a register or in L1 caches. (Essentially no cache misses)
<gonidelis[m]> how do you even accomplish that?
<gnikunj[m]> By designing algorithms such that it works on tiles. Try searching for blocked matrix matrix multiplication.
<gonidelis[m]> so you adjust the tile according to the given architecture?
<gonidelis[m]> yeah i know block mm mult
<hkaiser> gnikunj[m]: yah, makes sense
<gnikunj[m]> Also for the alignment part, compilers add passing to meet the architecture requirements. So, most likely you don’t need to add any alignment yourself. In most situations, you’ll end up with worse performance unless you calculate the passing for each and every data structure and change alignment accordingly.
<hkaiser> thanks for the explanation
<gnikunj[m]> gonidelis[m]: Yes, based on the size of L1 caches
<hkaiser> gnikunj[m]: on x64 there is no penalty for unaligned memory access (except for vector operations), however
<gnikunj[m]> s/passing/padding/
<hkaiser> it loads the whole cache line (which is aligned) anyways
<gnikunj[m]> Compilers make it 8byte align so it doesn’t have to do multiple loads to fetch data
<gnikunj[m]> IIRC
<gonidelis[m]> hkaiser: the cache line is always equal to the bits of the architecture? 64?
<hkaiser> no
<hkaiser> cache line sizes are different on different architectures, usually however it's 64 bytes
<gonidelis[m]> ok
<diehlpk_work> ms[m], Hi
<diehlpk_work> I am correct that you build hpx with spack on Piz Daint?
<diehlpk_work> One asked I'd be curious to see if you have a gitlab pipeline on software.nersc.gov with HPX building and with what compiler?
<ms[m]> diehlpk_work: correct on the first, and I don't have any (hpx or otherwise) ci running at nersc (I don't know if someone else does)
<diehlpk_work> ms[m], They want help to get HPX compiled
<diehlpk_work> So I will let them know that you cna show them your spack file
<hkaiser> our spack recipe is in the spack repo, I believe
<ms[m]> diehlpk_work: what hkaiser said ^
<ms[m]> the only thing that's specific to daint are the compilers because cray already provide them, but you can do without them as well
<ms[m]> an issue on the spack or hpx repo would be best (I'm guessing they have compilation errors?)
<diehlpk_work> ms[m], I have no idea what kind of issues they have.
<diehlpk_work> I just wanted to be nice and get them in touch with you, since I know you build on Cray as well.
<ms[m]> diehlpk_work: ok, no problem :P feel free to put them in touch with me
<diehlpk_work> akheir, yet?
<hkaiser> diehlpk_work: Alireza is on vacation all week
<gonidelis[m]> K-ballo: what is the reasoning for not being able to split an rvlue string in cpp?
<K-ballo> do you have a specific split interface in mind?
<gonidelis[m]> no
<gonidelis[m]> i find it hard to believe that this is the reason thoug, "we just couldn't come up with an interface"
<K-ballo> the reason depends on the interface
<K-ballo> for some interfaces it will make sense, for others it wont
<gonidelis[m]> what do you mena?
<gonidelis[m]> mean*
<K-ballo> come up with an interface
<gonidelis[m]> i don't exactly get what you mean by interface but `auto parts = "abc" | std::views::split('b');` seems like it should work (?)
<K-ballo> if you have a view then you don't own the parts
<K-ballo> you return some kind of string view that points into the splitted string
<K-ballo> which if it is an rvalue it has expired and now you have dangling references
<K-ballo> if you have an action that returns new strings, they own the storage, the fact that the input is an rvalue doesn't matter
<gonidelis[m]> wow
<gonidelis[m]> wow!
<gonidelis[m]> seems cogent
<gonidelis[m]> it's that views don't own
<gonidelis[m]> 88% percent of my questions are answered with "the views don't own"
<gonidelis[m]> thanks
<K-ballo> most string split interfaces I know return strings, not views
<K-ballo> which is not very efficient, considering this is C++
<gonidelis[m]> you mean C++ wants to be performant
<gonidelis[m]> and returning strings is not perfrormant
<diehlpk_work> hkaiser, Thanks for the update
<gonidelis[m]> does not the original hpx include the local?
<hkaiser> yes, it does fetch_content() hpx-local, so it's probably under <build>/_deps/hpxlocal-src
<gonidelis[m]> ...
<gonidelis[m]> nice