hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
diehlpk_work_ has quit [Remote host closed the connection]
hkaiser has quit [Quit: bye]
nikunj97 has joined #ste||ar
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 260 seconds]
Nikunj__ has quit [Read error: Connection reset by peer]
Nikunj__ has joined #ste||ar
Nikunj__ is now known as nikunj97
nikunj97 has quit [Read error: Connection reset by peer]
nikunj97 has joined #ste||ar
<nikunj97> ms[m], is there a bibtex format to cite hpx github repo?
<nikunj97> I am not able to find it :/
<ms[m]> nikunj97: like this: https://zenodo.org/record/3675272/export/hx?
<ms[m]> I don't think there's one just for the github repo as a whole
<ms[m]> diehlpk_work: would know what exactly to use for citing
<ms[m]> but we'll be adding something to the documentation for 1.5.0: https://github.com/STEllAR-GROUP/hpx/issues/4698
<nikunj97> ms[m], not exactly
<nikunj97> ms[m], #4698 is what I'm looking for
<nikunj97> ms[m], btw I wanted to ask if HPX backend for kokkos is complete (for on node)
<ms[m]> nikunj97: ok, but what is the bibtex you're looking for? https://zenodo.org/record/3675272/export/hx and the same for the joss paper is essentially what we want to add to the docs
<ms[m]> or are you actually looking for a link to the hpx docs to give to someone else?
<nikunj97> kokkos has a port of miniFE and I wanted to benchmark it wrt hpx. Will the performance be comparable to having a hpx port itself?
<nikunj97> ms[m], I wanted to cite the repo in my paper. I already have other HPX related citations in place
<ms[m]> yeah, the kokkos backend is feature complete
<nikunj97> ok great!
<ms[m]> performance will be worse than the openmp backend naturally, but compared to vanilla hpx it may even be faster (it uses a slightly different executor)
<ms[m]> it essentially uses what is now the thread_pool_executor in hpx itself
<ms[m]> it's just not called that in the kokkos backend, because it came first
<ms[m]> for citing the repo zenodo is the right thing
<nikunj97> is thread_pool_executor better than the block_executor we have?
<nikunj97> alright, I'll cite from zenodo
<ms[m]> better is subjective; block_executor uses the thread_pool_executor
<ms[m]> the thread_pool_executor has a more limited interface, block_executor lets you choose to run work on an arbitrary set of pus/cores/numa nodes
<nikunj97> that's why the block_executor improved significantly recently
<ms[m]> thread_pool_executor only allows a contiguous range of worker thread ids (actually it's restricted_thread_pool_executor)
<ms[m]> yeah, it improved after me having made it worse ;)
<nikunj97> why do we not have papers on executor performance improvements btw?
<ms[m]> nikunj97: this is what you get with the zenodo doi doi.org/10.5281/zenodo.598202
<ms[m]> no, it's all too recent
<nikunj97> ohh yea, this is what I wanted. This will do!
<ms[m]> K-ballo: this should still work: https://github.com/msimberg/hpx/tree/core-shared-lib-archive
<ms[m]> it's the state of the branch before I removed the change to object libraries
<ms[m]> it does contain quite a few other changes as well (the libs have been split into two parts) which I don't think affects cmake generation time, but I don't know for sure
Nikunj__ has joined #ste||ar
mcopik has joined #ste||ar
mcopik has quit [Client Quit]
Nikunj__ has quit [Read error: Connection reset by peer]
nikunj97 has quit [Ping timeout: 246 seconds]
Amy2 has joined #ste||ar
Amy1 has quit [Ping timeout: 240 seconds]
Amy2 has quit [Ping timeout: 264 seconds]
Amy2 has joined #ste||ar
nikunj97 has joined #ste||ar
Amy2 has quit [Ping timeout: 265 seconds]
Amy2 has joined #ste||ar
kale[m] has joined #ste||ar
kale[m] has quit [Client Quit]
kale[m] has joined #ste||ar
nikunj97 has quit [Remote host closed the connection]
Amy2 has quit [Ping timeout: 264 seconds]
Amy2 has joined #ste||ar
Amy2 has quit [Ping timeout: 256 seconds]
Amy2 has joined #ste||ar
hkaiser has joined #ste||ar
<weilewei> ms[m] meeting now
<weilewei> gsoc
<ms[m]> weilewei: thanks!
<ms[m]> weilewei: please start without me, I'll join as soon as I can
Amy2 has quit [Ping timeout: 256 seconds]
Amy2 has joined #ste||ar
nan111 has joined #ste||ar
mcopik has joined #ste||ar
mcopik has quit [Client Quit]
diehlpk_work has joined #ste||ar
kale[m] has quit [Ping timeout: 256 seconds]
kale[m] has joined #ste||ar
karame_ has joined #ste||ar
<K-ballo> ms[m]: there are cyclic dependencies between modules, is that
<K-ballo> known?
<hkaiser> on master?
<K-ballo> yes
<hkaiser> uhh
<hkaiser> why does circleci pass, then?
<ms[m]> K-ballo: there's one I know of which isn't caught by cpp-dependencies
<ms[m]> does the circleci check complain?
<K-ballo> there's a circle ci check for dependencies?
<hkaiser> yes
<ms[m]> in any case, what is the cyclic dependency?
<ms[m]> cpp-dependencies doesn't know about our generated headers
bita_ has joined #ste||ar
<K-ballo> if I'm reading this correctly, a bunch of modules depend on hpx_timed_execution, which in turn depends on those modules
<K-ballo> I'm looking only at the cmake target level
<K-ballo> I'd like to generate the graph but it seems to be too big for online tools, will try offline when I have some time
rtohid has joined #ste||ar
<ms[m]> K-ballo: it's possible, if you do get a graph (or the actual header dependencies) let me know and we'll try to fix it
<ms[m]> latest master I guess?
<K-ballo> I'm not looking at headers, just cmake targets
<ms[m]> ok, in that case there might be some dependencies that aren't needed anymore (and cmake allows cyclic dependencies with static libs)
<ms[m]> I've tried to clean up some of them but there are probably some left
<ms[m]> do you have a list of the modules that depend on timed_execution?
<K-ballo> i'll look into it in detail once I have time, I just wanted to now it was expected or not, I found it surprising
<ms[m]> yep, no worries and thanks for lettimg me know
<hkaiser> ms[m]: is there a way to convince cmake to print the dependency tree?
<hkaiser> make can do that, iirc
<ms[m]> hkaiser: cmake targets or actual build targets? no convincing required: https://cmake.org/cmake/help/latest/module/CMakeGraphVizOptions.html
<hkaiser> yah
<hkaiser> that should give us what we need
<ms[m]> yeah, it works quite well
<hkaiser> ms[m]: this might be useful:: https://github.com/jeremylong/DependencyCheck
akheir has joined #ste||ar
<ms[m]> hkaiser isn't that for vulnerabilities in external dependencies?
<hkaiser> ms[m]: it can scan cmake dependencies, I think
<hkaiser> haven't looked too closely, though
<ms[m]> K-ballo: iirc I had to fix some cyclic cmake target dependencies on the object libraries branch, it wouldn't have compiled otherwise
<ms[m]> hkaiser: I may be misunderstanding it as well
LiliumAtratum has joined #ste||ar
LiliumAtratum has quit [Remote host closed the connection]
nikunj has quit [Ping timeout: 260 seconds]
nikunj has joined #ste||ar
kale[m] has quit [Ping timeout: 260 seconds]
kale[m] has joined #ste||ar
karame_ has quit [Quit: Ping timeout (120 seconds)]
kale[m] has quit [Ping timeout: 256 seconds]
kale[m] has joined #ste||ar
karame_ has joined #ste||ar
akheir has quit [Ping timeout: 246 seconds]
kale[m] has quit [Ping timeout: 272 seconds]
kale[m] has joined #ste||ar
<nikunj> heller1: is it possible to have a decreasing memory bandwidth for increasing core counts? I'm seeing this behavior on raspberry pi and am unable to explain it.
sayefsakin has joined #ste||ar
rtohid has left #ste||ar [#ste||ar]
kale[m] has quit [Ping timeout: 265 seconds]
kale[m] has joined #ste||ar
<heller1> Yes, that's possible
<heller1> If the bus and/or memory controller can't deal with the concurrency
kale[m] has quit [Ping timeout: 260 seconds]
Amy2 has quit [Ping timeout: 256 seconds]
Amy2 has joined #ste||ar
<nikunj> Why would anyone want more processing units when the bus can't handle concurrency?
<hkaiser> nikunj: more PUs doesn't necessarily mean more memory bus pressure
<nikunj> Ohh, so you mean the bus can handle only a certain amount of memory bandwidth and concurrency at the same time?
<hkaiser> yes
<nikunj> That answers why my results were distorted. Thanks heller1 and hkaiser
<nikunj> hkaiser: see pm pls
joe[m]1 has joined #ste||ar
<Yorlik> How could HPX help me to improve <level 3> cache locality in this topology? Is there a way to keep tasks local to a group of cores they were put in? Is HPX already trying this? link: https://i.imgur.com/Z83PFaz.png
sayef_ has joined #ste||ar
<hkaiser> Yorlik: you can create separate thread pools and keep those confined to a numa domain
sayefsakin has quit [Ping timeout: 256 seconds]
<hkaiser> however, that would prevent stealing across the pools
<Yorlik> A parallel loop would probably distribute only to one pool or could I spread?
<Yorlik> Like round robin the pools
<Yorlik> In the moment I'm trying to put stress from the cache by using small object pools. the pool I wrote works nicely in a test and also is faster, unfortunately there's some Lua interop issue I need to solve first.
<Yorlik> The speedup just by a naive vector based pool already is like 3-10x.
<Yorlik> It's crazy.
<Yorlik> But the memory bandwidth I calculated form the loss in performance can go down to an abysmal 25-50 MB per second - so I really need to do something. Still not fully understanding the problem ...
K-ballo has quit [Quit: K-ballo]