#ste||ar on 2020-04-15 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

00:16 <zao> Gah, my HPX code that works on VS2019 doesn't compile on GCC.

00:16 <zao> Let's see if I can get it into a shape I can share tomorrow, can't provide the output as-is.

00:24 <zao> Is there some inherent limit to the arity of plain functions you want to invoke with hpx::async?

00:45 <zao> Is the proper solution when you want to have a non-const lvalue reference in a hpx::async call to wrap it in a std::reference_wrapper?

00:45 <zao> Code that works on MSVC but not Linux GCC - https://gist.github.com/zao/f0cc8cae2ac51ffbc19ce42a0207e914

00:46 <zao> I guess that this is the good old "we can bind temporaries to rvalue reference huhuhu" extension of VC++ kicking in somehow?

00:46 <hkaiser> zao: yah

00:47 <hkaiser> msvc is notoriously allowing you to bind lvalue refs to rvalue refs, that's illigal

00:48 <hkaiser> zao use std::ref(pdn)

00:51 <zao> Odd, now I can't get the standalone test to compile with MSVC. Oh well, I know the solution at least.

01:01 <zao> Curious. It works for something like std::vector<float>&, but fails for float&

01:02 <zao> (compiles on MSVC, that is)

01:04 <zao> Oh gods, it's doing The Wrong Thing silently.

01:06 <zao> https://gist.github.com/zao/6ff50ed7273d91a2f13686104d248715

01:06 <zao> Mutating a temporary, good times! :D

01:08 <hkaiser> zao: async knows that the function is caled once so it tries to move the arguments to the function

01:18 <zao> I'm gonna hit the sack, but if I have several functions needing the result of an async call, can I pass a shared_future to them all, or should I try to get the data out at some point and hand them shared_ptr:s?

01:19 <zao> Also still not sure how to handle async functions returning several different things, is a tuple of futures my best bet, or should I bake some custom return struct for each one?

01:20 <zao> (this codebase has lots of nice functions of arity >20 with a lot of big input and output arrays, scientific code at its best)

01:20 <hkaiser> async return a future which you can turn into a shared_future

01:20 <hkaiser> f.share()

01:21 <hkaiser> let the function return a tuple<> (i.e. async will give you a future<tuple<>>)

01:21 <hkaiser> or use the returned future just as a flag that the ref'ed args are ready

01:23 <hkaiser> or create a struct and return an instance of it which gives you a future<foo>, might require move operators and somesuch, though

01:28 <weilewei> hkaiser and @everyone, if I want to understand more about operating system and computer architecture, what books will you recommend?

01:28 <hkaiser> the book with the dinosaurs on the cover

01:29 <weilewei> feel like I need to understand more about the underlying hardware

01:29 <weilewei> Operating System Concepts Avi Silberschatz

01:29 <hkaiser> https://www.amazon.com/Operating-Concepts-Silberschatz-Abraham-Hardcover-dp-B011DB56OO/dp/B011DB56OO/ref=mt_hardcover?_encoding=UTF8&me=&qid=

01:30 <hkaiser> I have a copy in my office ;-)

01:30 <weilewei> oh!! I wish I can borrow it to read

01:30 <weilewei> But we cannot enter the CCT building now

01:31 <hkaiser> but you're not supposed to go there and my office is locked

01:31 <hkaiser> weilewei: but you're lucky that I'm not supposed to go there anyways and will not get a book that I'm not supposed to give to Karame this week

01:32 <hkaiser> ;-)

01:32 <nan11> lol

01:32 <weilewei> lol, maybe I can find a E-book online of it

01:33 <hkaiser> I'll let you know once I have not been there and where you can't find the book

01:35 <weilewei> hkaiser thanks!! Let me know then

01:35 <hkaiser> I won't

01:35 <nan11> I hear nothing xD

01:36 <weilewei> lol

01:45 <weilewei> If I have a kernel-thread only, once it finishes its work, then it will get destroyed (right?). Then, if I have a hpx user-level thread running on top of one kernel thread, if this user-level threads finishes same work and some other user level threads will come right after, then does the kernel-thread get destroyed as well?

01:45 <weilewei> Will a new kernel-thread gets created to support the new coming user-level threads?

01:46 <weilewei> Or just reuse the first kernel-thread?

01:47 <zao> HPX seems to have a whole lot of persistent OS threads to serve as workers and IO runners.

01:48 <zao> There’s not much point in scaling them up and down if you can keep them around for cheap.

01:48 <weilewei> IC, that's how we save overhead

01:50 <weilewei> btw, does this explanation exist somewhere? I barely find good online pages explain this concept well. Some of them just same kernel thread is expensive to create and manage, then period

01:50 <zao> If you break a HPX program in a debugger you can see nicely named threads (if the OS supports names) and get a feeling for what their stacks look like.

01:51 <zao> The concept of thread pools is reasonably common out there, the ways work gets onto them tends to vary a bit.

01:51 <weilewei> True, I will take a look

01:52 <weilewei> Right, that's down to implementation level

01:53 <zao> One of the niftier things of HPX is how work can yield for other work as needed, something that’s otherwise hard.

01:54 <weilewei> Can't OS scheduler do similar thing?

01:55 <hkaiser> zao: those threads are mostly dormant

01:57 <weilewei> hkaiser what are "those threads"? those keep OS busy?

01:59 <hkaiser> no, they simply sit and wait in the kernel, doing nothing - mostly

02:00 <hkaiser> weilewei: we have 6 additional threads in HPX, 2 for IO, 2 for timers, and two for networking

02:01 <weilewei> hkaiser these 6 additional threads are waiting for responding tasks and will be immediately used when needed, right?

02:01 <hkaiser> yes

02:03 <weilewei> If there is not much tasks in user space, will hpx keep those free kernel threads from being destroyed?

02:05 <hkaiser> those 6 threads are not the ones doing the hpx tasks

02:07 <weilewei> I see, I guess my question is what happen to kernel threads that have hpx tasks to work

02:08 <weilewei> that have *no hpx tasks

02:08 <hkaiser> they keep running the scheduling loop

02:08 <weilewei> oh, I understand now, thanks. hpx scheduler makes them busy

02:12 <hkaiser> the threads that don't run hpx threads sleep when there is no work

02:12 <zao> If you’d write your own thread pool, you’d typically have some control function running on each OS thread waiting for work to appear.

02:13 <hkaiser> zao: yah, we keep them running to be able react faster

02:13 <hkaiser> they however do a exponential backoff if there is no work for a longish time

02:13 <zao> You _could_ make that self-terminate and have the issuer start new threads when need arises again, but in the HPC world you don’t have much need to.

02:14 <hkaiser> right, you want to keep overheads down

02:15 nan11 has quit [Remote host closed the connection]

02:17 <zao> So there’s tiers of overhead. You can actively spin for work, consuming CPU resources and hoping work appears shortly, you can back off and wait on a heavier primitive, and you could again in theory shut down the thread and have someone wind up a new one later at great cost.

02:18 <weilewei> I see

02:18 <zao> Yielding out to the OS puts you at the mercy of its scheduler, which tends to be rather coarse-grained.

02:19 <zao> (please correct me if I’m off on something)

02:21 <weilewei> hkaiser actually I still confused about the wording, you said "the threads that don't run hpx threads sleep when there is no work", and then you said "zao: yah, we keep them running to be able react faster". In the later sentence, what does "them" refer to?

02:22 <weilewei> in my mind, i am thinking "kernel threads that don't have any hpx tasks to do" as "them"

02:22 <hkaiser> the threads that do hpx work run all the time, the others sleep

02:23 <hkaiser> zao: exactly

02:24 <hkaiser> except that we don't stop and restart the threads, but we do let them back off if there is no work

02:24 <weilewei> hkaiser ah, I see now

02:24 <hkaiser> putting them to sleep

02:29 akheir1 has quit [Quit: Leaving]

03:03 hkaiser has quit [Quit: bye]

03:10 bita has quit [Quit: Leaving]

04:21 nikunj_ has joined #ste||ar

04:31 weilewei has quit [Remote host closed the connection]

05:11 nikunj_ has quit [Ping timeout: 256 seconds]

05:14 nikunj has quit [Ping timeout: 240 seconds]

05:17 nikunj has joined #ste||ar

05:37 nikunj_ has joined #ste||ar

05:45 nikunj_ has quit [Ping timeout: 240 seconds]

06:14 Amy1 has quit [Ping timeout: 246 seconds]

06:15 Amy1 has joined #ste||ar

08:01 <heller1> https://www.olcf.ornl.gov/wp-content/uploads/2019/05/frontier_specsheet.pdf

08:06 <Yorlik> Sweet. Lets hope we will profit from this statement of the paper: " ORNL and Cray will partner with AMD to co-design and develop enhanced GPU programming tools designed for performance, productivity and portability,"

08:34 kale_ has joined #ste||ar

08:39 kale_ has quit [Ping timeout: 258 seconds]

08:40 kale_ has joined #ste||ar

09:04 <heller1> nod

09:04 kale_ has quit [Read error: No route to host]

10:12 kale_ has joined #ste||ar

10:18 kale_ has quit [Remote host closed the connection]

10:44 nikunj_ has joined #ste||ar

12:18 hkaiser has joined #ste||ar

12:34 <Yorlik> Did any of you have this problem? :

12:34 <Yorlik> Could not find a configuration file for package "boost_system" that exactly matches requested version "1.72.0".

12:34 <Yorlik> It only comes up in debug build

12:35 <Yorlik> The file is there, but it doesn't accept it

12:41 <Yorlik> Like this: The following configuration files were considered but not accepted:

12:41 <Yorlik> .../Debug/lib/cmake/boost_system-1.72.0/boost_system-config.cmake, version: unknown

12:41 <Yorlik> Building HPX was not a problem.

12:41 <Yorlik> It seems my normally working find_boost is kinda messed up.

12:46 <zao> Yorlik: Had something similar the other day, I think FindBoost doesn't know about 1.72 yet?

12:46 <Yorlik> It seems so. The odd thing is it used to work all the time even with 1.72.

12:47 <Yorlik> Now I deleted and rebuilt everything after a VS update and now this is broken.

12:47 <Yorlik> I gave find_package the correct directory and everything ...

12:48 <Yorlik> seems I might have to use th variables find_package produces manually

12:49 <Yorlik> zao: How did you fix it?

12:51 <zao> I can't quite reproduce it now, just know I saw it the other night on _some_ system.

12:56 <simbergm> zao, Yorlik: we do this when looking for Boost: https://github.com/STEllAR-GROUP/hpx/blob/eababeb2dbf869377672250c6a2d4af506e262f8/cmake/HPX_SetupBoost.cmake#L17-L33

12:56 <simbergm> don't know if that helps? (and we're obviously still missing 1.72...)

12:58 <Yorlik> I just added it and 1.72 and .0, but the error persists.

12:58 <Yorlik> Funnily enough HPX compiled without issues.

13:12 <Yorlik> The debug version was missing boost_system-config-version.cmake

13:12 <Yorlik> I just copied it over

13:38 <simbergm> zao: since you spend all your days fighting bad build systems... are we being bad citizens by having SOVERSION set to the release version (as opposed to a counter that we increment every time we break the abi, i.e. all the time) when we don't guarantee abi compatibility between minor releases? or does noone care?

13:39 <zao> In my particular EasyBuild world we don't care, as we have exact matches of versions. For packaging in general there might be some assumptions of what works with what, but that I don't really know about.

13:44 <simbergm> thanks, I suppose it could be a problem in distros

13:44 <simbergm> but maybe we can worry about that later

14:07 <zao> Does HPX declare somewhere what kind of versioning scheme is in use, semver and other guarantees?

14:07 <hkaiser> nope

14:07 <hkaiser> nothing formal

14:10 <K-ballo> we should formalize the no guarantees

14:11 <hkaiser> K-ballo: I'll bring it up tomorrow during the PMC meeting

14:14 <simbergm> I was going to write up a draft for that today or tomorrow... (but for 2.0, where I hope we'll start using semver; good idea to write down the no-guarantees given at the moment)

14:18 <hkaiser> simbergm: we could start collecting something along the lines of Python PEP documents

14:20 <zao> Adopt the motto of our computer club - "no-one ever promised that anything would work"

14:21 <hkaiser> isn't that self-evident?

14:21 <zao> hkaiser: I'm curious, who's this "aurianer" person working on the windows path issue I had?

14:21 <K-ballo> 2.what?

14:22 <hkaiser> zao: she is working with Mikael in Zuerich

14:27 <hkaiser> K-ballo: once the modularization is done, we plan to release it as HPX V2

14:28 nan11 has joined #ste||ar

14:28 <simbergm> K-ballo, hkaiser: it could be more than just modularization but let's see how much energy we have for that

14:29 <simbergm> https://github.com/STEllAR-GROUP/hpx/issues/4329

14:29 K-ballo has quit [Remote host closed the connection]

14:30 K-ballo has joined #ste||ar

14:30 <zao> Hitting 2.0 before Boost :P

14:31 <simbergm> hkaiser: I would just add PR/issue tag HEP :)

14:31 <hkaiser> sure, whatever ;-)

14:31 <simbergm> while I like the idea of a PEP in general, I find it funny that enhancement proposals stay as enhancement proposals once they've been accepted

14:31 <hkaiser> no need to overdo things

14:31 <hkaiser> zao: Boost will never hit 2.0

14:31 <simbergm> the api guarantees belong in our documentation anyway

14:32 <zao> Yeah :)

14:36 weilewei has joined #ste||ar

14:58 <hkaiser> nikunj_: yt?

14:58 <nikunj_> hkaiser, here

14:58 <hkaiser> hey

14:58 <nikunj_> hkaiser, hey! hope you're safe and well

14:58 <hkaiser> where did we host the resiliency paper?

14:58 <hkaiser> thanks, all is well

14:58 <nikunj_> you mean apply in SC?

14:58 <hkaiser> yah

14:59 <nikunj_> it was FTXS

14:59 <hkaiser> can't find it - we've got permission to publish it now

14:59 <hkaiser> yah, but where is it?

15:00 <nikunj_> here: https://github.com/STEllAR-GROUP/ftxs_2019/blob/master/extended_abstract/paper.pdf

15:00 <hkaiser> ahh ok, thanks

15:00 <nikunj_> we're publishing at arxiv?

15:00 <hkaiser> yes

15:00 <nikunj_> nice!

15:04 karame_ has joined #ste||ar

15:08 weilewei has quit [Remote host closed the connection]

15:13 <nikunj_> heller1, want to see some great plots?

15:13 <nikunj_> hkaiser, heller1: check out the new plots. Great results finally

15:14 <hkaiser> great!

15:15 <nikunj_> yes, all thanks to the new and improved block executor

15:15 <hkaiser> did you create that?

15:15 <nikunj_> I think we are seeing some cache effects in there, so the assumed arithmetic intensity isn't quite right

15:15 <nikunj_> hkaiser, no

15:15 <nikunj_> it was recently improved in HPX

15:15 <hkaiser> ahh, cool

15:15 <nikunj_> simbergm asked me to update HPX for better results

15:16 <nikunj_> but I'd like to work on executor stuff as well :D

15:16 <hkaiser> nikunj_: send him a pizza ;-)

15:16 <nikunj_> I was thinking more in like of a drink

15:17 <nikunj_> *in line

15:17 <nikunj_> ;-)

15:17 <zao> Kebab or pineapple+ham? :P

15:18 weilewei has joined #ste||ar

15:19 <nikunj_> zao, you like pineapples on pizza?

15:19 <zao> Yeah, they're nice.

15:20 <zao> One of my default choices is the Hawaii Special, which has ham, pineapple, sliced banana and curry spice.

15:20 <nikunj_> yeah! Didn't know many people liked it. The idea of pineapples on pizza make my friends hate pizza :/

15:21 <nikunj_> ohh that sounds delicious

15:26 Yorlik has quit [Read error: Connection reset by peer]

15:30 <simbergm> we can discuss this tomorrow if there's time: https://github.com/STEllAR-GROUP/hpx/pull/4524

15:31 <simbergm> (I think it's a bit too early to finalize anything since we're not even close to thinking about 2.0.0, but we can start thinking about this)

15:31 Yorlik has joined #ste||ar

15:31 <simbergm> zao: you monster (jk, I used to love pizza hawaii as a kid, now I just accept it)

15:32 <simbergm> nikunj_: I'm glad things are faster :D I probably broke the block_executor in the first place though, so I shouldn't get much credit for making it faster again

15:32 <simbergm> also, just for the record, I don't like meaty drinks

15:33 <Yorlik> No meatonade?

15:33 <nikunj_> simbergm, my code is running faster due to your efforts ;)

15:34 bita has joined #ste||ar

15:37 kale_ has joined #ste||ar

15:38 kale_ has quit [Client Quit]

15:46 <nikunj_> hkaiser, is there any book you'd recommend on metaprogramming other than C++ Template Metaprogramming: Concepts that you gave me last summer?

15:46 <nikunj_> something more on the line of modern C++

15:51 <simbergm> hkaiser: heller rori another topic for tomorrow: http://hpx.stellar-group.org/roadmap/ (one can't browse to that page yet)

15:52 <simbergm> I think that hpx 2 issue is pretty much our high level goal for the next year, but we can put something more formal over there as well

15:55 <hkaiser> simbergm: great, thanks!

16:20 <heller1> Very nice! We should maybe add a kanban board (project) to better track what's going on

16:29 gonidelis has joined #ste||ar

16:33 <heller1> nikunj: c++ templates the complete guide

16:34 <nikunj_> heller1, ok will take a look. Did you look at the plots?

16:34 <heller1> http://www.tmplbook.com/

16:35 <heller1> The plots look cool

16:35 <nikunj_> right, I can say that we're seeing good amounts of cache benefits. So arithmetic intensity isn't 1/8 imo

16:39 weilewei99 has joined #ste||ar

16:39 weilewei99 has quit [Remote host closed the connection]

16:44 <heller1> Yes, which is nice

16:44 <nikunj_> absolutely!

16:44 <heller1> Did you implement any blocking?

16:44 <nikunj_> heller1, blocking as in block iterators?

16:45 <heller1> As in cache blocking, such that you iterate over your domain in a tiling fashion

16:45 <nikunj_> I have not, how do I do that?

16:47 <nikunj_> current results are pure HPX results. Only thing I tried was to limit any complexities in the code so that the compiler can optimize the code better.

16:54 <zao> Yorlik: Ah, the thing I was thinking of was:

16:54 <zao> 1> [CMake] CMake Warning at C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.16/Modules/FindBoost.cmake:1147 (message):

16:54 <zao> 1> [CMake] New Boost version may have incorrect or missing dependencies and imported

16:54 <zao> 1> [CMake] targets

16:55 <Yorlik> Yes - that warning is vommon - but it doesn't break builds.

17:12 <nikunj_> heller1, cache blocking is present inherently

17:14 <nikunj_> since the stencil dimension is 8192*131072 and carrying line wise updates, we're already using cache blocking

17:25 <nikunj_> heller1, currently my stencil fits in L2 cache perfectly so we're cache benefits already

17:25 <heller1> I see, there you go

17:26 <nikunj_> when I say stencil, I mean the line updates fits into L2 cache nicely

17:28 <nikunj_> heller1, with these numbers, I can write my results for the lab based project and gain university credits. Now I want to extend this work so that I can get a paper out of it. What would you suggest?

17:28 <heller1> Read the prior art about the topic

17:29 <heller1> And think about what you're making differently, what are your benefits, where are the drawbacks, etc

17:29 <nikunj_> I have done some literature review and most of them usually ends with either a new tiling solution or making tiling easier to use for application user

17:30 gonidelis has quit [Ping timeout: 240 seconds]

17:31 <heller1> Where's your approach going in a different direction?

17:31 <nikunj_> right now, my application is a very basic one. It's nothing different from what people have been using. It's just an optimized version making use of cache effectively

17:31 <heller1> What architectures did you investigate?

17:32 <nikunj_> aah, none of them were really done on ARM

17:32 <nikunj_> they were all based out of x86 architectures

17:33 <heller1> Is it performing equally good everywhere?

17:33 <nikunj_> well you have the results. It's performing nicely.

17:33 <nikunj_> What I couldn't explain was simd floats on thunderX2

17:34 <nikunj_> it was performing way better than available bandwidth

17:34 <heller1> I'm asking the questions for you to answer.

17:35 <nikunj_> I investigated x86 and arm and I saw that arm had irregular scaling while x86 had a regular overall scaling

17:35 <heller1> If you find an answer to those questions which have not been answered by previous papers, you have yours

17:37 <nikunj_> so essentially start with literature review on stencils and its performance

17:53 RostamLog has joined #ste||ar

17:54 akheir has joined #ste||ar

17:56 gonidelis has joined #ste||ar

19:04 K-ballo has quit [Remote host closed the connection]

19:04 K-ballo has joined #ste||ar

19:10 nikunj_ has quit [Ping timeout: 240 seconds]

20:23 Amy1 has quit [Killed (Sigyn (Stay safe off irc))]

20:25 nikunj has quit [Read error: Connection reset by peer]

20:26 nikunj has joined #ste||ar

20:28 nikunj has quit [Read error: Connection reset by peer]

20:28 nikunj has joined #ste||ar

20:46 nikunj has quit [Read error: Connection reset by peer]

20:46 nikunj has joined #ste||ar

21:35 diehlpk_work has quit [Remote host closed the connection]

21:38 wate123_Jun has joined #ste||ar

21:39 diehlpk_work has joined #ste||ar

22:32 mreese3 has quit [Read error: Connection reset by peer]

22:46 bita has quit [Ping timeout: 256 seconds]

22:59 nikunj97 has joined #ste||ar

23:11 wate123_Jun has quit []

23:32 gonidelis has quit [Remote host closed the connection]

23:34 weilewei has quit [Remote host closed the connection]

23:35 weilewei has joined #ste||ar

23:47 nikunj97 has quit [Quit: Leaving]