#ste||ar on 2020-06-08 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

02:34 diehlpk_work_ has quit [Remote host closed the connection]

02:52 hkaiser has quit [Quit: bye]

02:52 nikunj97 has joined #ste||ar

03:07 Nikunj__ has joined #ste||ar

03:11 nikunj97 has quit [Ping timeout: 260 seconds]

04:35 Nikunj__ has quit [Read error: Connection reset by peer]

04:36 Nikunj__ has joined #ste||ar

04:37 Nikunj__ is now known as nikunj97

04:48 nikunj97 has quit [Read error: Connection reset by peer]

04:49 nikunj97 has joined #ste||ar

07:02 <nikunj97> ms[m], is there a bibtex format to cite hpx github repo?

07:02 <nikunj97> I am not able to find it :/

07:10 <ms[m]> nikunj97: like this: https://zenodo.org/record/3675272/export/hx?

07:11 <ms[m]> I don't think there's one just for the github repo as a whole

07:12 <ms[m]> diehlpk_work: would know what exactly to use for citing

07:12 <ms[m]> but we'll be adding something to the documentation for 1.5.0: https://github.com/STEllAR-GROUP/hpx/issues/4698

07:23 <nikunj97> ms[m], not exactly

07:24 <nikunj97> ms[m], #4698 is what I'm looking for

07:26 <nikunj97> ms[m], btw I wanted to ask if HPX backend for kokkos is complete (for on node)

07:26 <ms[m]> nikunj97: ok, but what is the bibtex you're looking for? https://zenodo.org/record/3675272/export/hx and the same for the joss paper is essentially what we want to add to the docs

07:26 <ms[m]> or are you actually looking for a link to the hpx docs to give to someone else?

07:27 <nikunj97> kokkos has a port of miniFE and I wanted to benchmark it wrt hpx. Will the performance be comparable to having a hpx port itself?

07:27 <nikunj97> ms[m], I wanted to cite the repo in my paper. I already have other HPX related citations in place

07:27 <ms[m]> yeah, the kokkos backend is feature complete

07:28 <nikunj97> ok great!

07:28 <ms[m]> performance will be worse than the openmp backend naturally, but compared to vanilla hpx it may even be faster (it uses a slightly different executor)

07:29 <ms[m]> it essentially uses what is now the thread_pool_executor in hpx itself

07:29 <ms[m]> it's just not called that in the kokkos backend, because it came first

07:29 <ms[m]> for citing the repo zenodo is the right thing

07:29 <nikunj97> is thread_pool_executor better than the block_executor we have?

07:29 <nikunj97> alright, I'll cite from zenodo

07:30 <ms[m]> better is subjective; block_executor uses the thread_pool_executor

07:31 <ms[m]> the thread_pool_executor has a more limited interface, block_executor lets you choose to run work on an arbitrary set of pus/cores/numa nodes

07:31 <nikunj97> that's why the block_executor improved significantly recently

07:31 <ms[m]> thread_pool_executor only allows a contiguous range of worker thread ids (actually it's restricted_thread_pool_executor)

07:32 <ms[m]> yeah, it improved after me having made it worse ;)

07:33 <nikunj97> why do we not have papers on executor performance improvements btw?

07:33 <ms[m]> nikunj97: this is what you get with the zenodo doi doi.org/10.5281/zenodo.598202

07:33 <ms[m]> no, it's all too recent

07:33 <nikunj97> ohh yea, this is what I wanted. This will do!

07:38 <ms[m]> K-ballo: this should still work: https://github.com/msimberg/hpx/tree/core-shared-lib-archive

07:38 <ms[m]> it's the state of the branch before I removed the change to object libraries

07:39 <ms[m]> it does contain quite a few other changes as well (the libs have been split into two parts) which I don't think affects cmake generation time, but I don't know for sure

08:05 Nikunj__ has joined #ste||ar

08:06 mcopik has joined #ste||ar

08:06 mcopik has quit [Client Quit]

08:08 Nikunj__ has quit [Read error: Connection reset by peer]

08:09 nikunj97 has quit [Ping timeout: 246 seconds]

09:46 Amy2 has joined #ste||ar

09:47 Amy1 has quit [Ping timeout: 240 seconds]

10:04 Amy2 has quit [Ping timeout: 264 seconds]

10:06 Amy2 has joined #ste||ar

10:16 nikunj97 has joined #ste||ar

10:28 Amy2 has quit [Ping timeout: 265 seconds]

10:34 Amy2 has joined #ste||ar

11:00 kale[m] has joined #ste||ar

11:00 kale[m] has quit [Client Quit]

11:00 kale[m] has joined #ste||ar

11:02 nikunj97 has quit [Remote host closed the connection]

11:05 Amy2 has quit [Ping timeout: 264 seconds]

11:08 Amy2 has joined #ste||ar

11:31 Amy2 has quit [Ping timeout: 256 seconds]

11:33 Amy2 has joined #ste||ar

12:02 hkaiser has joined #ste||ar

13:31 <weilewei> ms[m] meeting now

13:31 <weilewei> gsoc

13:32 <ms[m]> weilewei: thanks!

13:35 <ms[m]> weilewei: please start without me, I'll join as soon as I can

13:50 Amy2 has quit [Ping timeout: 256 seconds]

13:54 Amy2 has joined #ste||ar

14:05 nan111 has joined #ste||ar

14:07 mcopik has joined #ste||ar

14:08 mcopik has quit [Client Quit]

14:08 diehlpk_work has joined #ste||ar

14:22 kale[m] has quit [Ping timeout: 256 seconds]

14:23 kale[m] has joined #ste||ar

14:40 karame_ has joined #ste||ar

15:02 <K-ballo> ms[m]: there are cyclic dependencies between modules, is that

15:02 <K-ballo> known?

15:02 <hkaiser> on master?

15:02 <K-ballo> yes

15:02 <hkaiser> uhh

15:02 <hkaiser> why does circleci pass, then?

15:03 <ms[m]> K-ballo: there's one I know of which isn't caught by cpp-dependencies

15:03 <ms[m]> does the circleci check complain?

15:03 <K-ballo> there's a circle ci check for dependencies?

15:03 <hkaiser> yes

15:03 <ms[m]> in any case, what is the cyclic dependency?

15:04 <ms[m]> cpp-dependencies doesn't know about our generated headers

15:04 <hkaiser> https://app.circleci.com/pipelines/github/STEllAR-GROUP/hpx/3960/workflows/46e0bfb3-b740-4086-b2ff-2a6c18bd750d/jobs/199227

15:07 bita_ has joined #ste||ar

15:07 <K-ballo> if I'm reading this correctly, a bunch of modules depend on hpx_timed_execution, which in turn depends on those modules

15:08 <K-ballo> I'm looking only at the cmake target level

15:08 <K-ballo> I'd like to generate the graph but it seems to be too big for online tools, will try offline when I have some time

15:09 rtohid has joined #ste||ar

15:10 <ms[m]> K-ballo: it's possible, if you do get a graph (or the actual header dependencies) let me know and we'll try to fix it

15:10 <ms[m]> latest master I guess?

15:10 <K-ballo> I'm not looking at headers, just cmake targets

15:14 <ms[m]> ok, in that case there might be some dependencies that aren't needed anymore (and cmake allows cyclic dependencies with static libs)

15:14 <ms[m]> I've tried to clean up some of them but there are probably some left

15:14 <ms[m]> do you have a list of the modules that depend on timed_execution?

15:27 <K-ballo> i'll look into it in detail once I have time, I just wanted to now it was expected or not, I found it surprising

15:28 <ms[m]> yep, no worries and thanks for lettimg me know

15:40 <hkaiser> ms[m]: is there a way to convince cmake to print the dependency tree?

15:41 <hkaiser> make can do that, iirc

15:41 <ms[m]> hkaiser: cmake targets or actual build targets? no convincing required: https://cmake.org/cmake/help/latest/module/CMakeGraphVizOptions.html

15:41 <hkaiser> yah

15:42 <hkaiser> that should give us what we need

15:43 <ms[m]> yeah, it works quite well

16:15 <hkaiser> ms[m]: this might be useful:: https://github.com/jeremylong/DependencyCheck

16:15 <hkaiser> https://jeremylong.github.io/DependencyCheck/analyzers/cmake.html

16:17 akheir has joined #ste||ar

16:20 <ms[m]> hkaiser isn't that for vulnerabilities in external dependencies?

16:21 <hkaiser> ms[m]: it can scan cmake dependencies, I think

16:21 <hkaiser> haven't looked too closely, though

16:21 <ms[m]> K-ballo: iirc I had to fix some cyclic cmake target dependencies on the object libraries branch, it wouldn't have compiled otherwise

16:22 <ms[m]> hkaiser: I may be misunderstanding it as well

16:30 LiliumAtratum has joined #ste||ar

16:34 LiliumAtratum has quit [Remote host closed the connection]

16:52 nikunj has quit [Ping timeout: 260 seconds]

16:55 nikunj has joined #ste||ar

18:21 kale[m] has quit [Ping timeout: 260 seconds]

18:22 kale[m] has joined #ste||ar

18:41 karame_ has quit [Quit: Ping timeout (120 seconds)]

18:49 kale[m] has quit [Ping timeout: 256 seconds]

18:54 kale[m] has joined #ste||ar

20:09 karame_ has joined #ste||ar

20:10 akheir has quit [Ping timeout: 246 seconds]

20:23 kale[m] has quit [Ping timeout: 272 seconds]

20:23 kale[m] has joined #ste||ar

20:28 <nikunj> heller1: is it possible to have a decreasing memory bandwidth for increasing core counts? I'm seeing this behavior on raspberry pi and am unable to explain it.

20:32 sayefsakin has joined #ste||ar

21:07 rtohid has left #ste||ar [#ste||ar]

21:33 kale[m] has quit [Ping timeout: 265 seconds]

21:33 kale[m] has joined #ste||ar

21:38 <heller1> Yes, that's possible

21:38 <heller1> If the bus and/or memory controller can't deal with the concurrency

21:59 kale[m] has quit [Ping timeout: 260 seconds]

22:16 Amy2 has quit [Ping timeout: 256 seconds]

22:20 Amy2 has joined #ste||ar

22:36 <nikunj> Why would anyone want more processing units when the bus can't handle concurrency?

22:37 <hkaiser> nikunj: more PUs doesn't necessarily mean more memory bus pressure

22:38 <nikunj> Ohh, so you mean the bus can handle only a certain amount of memory bandwidth and concurrency at the same time?

22:39 <hkaiser> yes

22:40 <nikunj> That answers why my results were distorted. Thanks heller1 and hkaiser

22:43 <nikunj> hkaiser: see pm pls

22:47 joe[m]1 has joined #ste||ar

23:33 <Yorlik> How could HPX help me to improve <level 3> cache locality in this topology? Is there a way to keep tasks local to a group of cores they were put in? Is HPX already trying this? link: https://i.imgur.com/Z83PFaz.png

23:38 sayef_ has joined #ste||ar

23:40 <hkaiser> Yorlik: you can create separate thread pools and keep those confined to a numa domain

23:40 sayefsakin has quit [Ping timeout: 256 seconds]

23:40 <hkaiser> however, that would prevent stealing across the pools

23:40 <Yorlik> A parallel loop would probably distribute only to one pool or could I spread?

23:41 <Yorlik> Like round robin the pools

23:42 <Yorlik> In the moment I'm trying to put stress from the cache by using small object pools. the pool I wrote works nicely in a test and also is faster, unfortunately there's some Lua interop issue I need to solve first.

23:43 <Yorlik> The speedup just by a naive vector based pool already is like 3-10x.

23:43 <Yorlik> It's crazy.

23:49 <Yorlik> But the memory bandwidth I calculated form the loss in performance can go down to an abysmal 25-50 MB per second - so I really need to do something. Still not fully understanding the problem ...

23:58 K-ballo has quit [Quit: K-ballo]