hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
<zao> Gah, my HPX code that works on VS2019 doesn't compile on GCC.
<zao> Let's see if I can get it into a shape I can share tomorrow, can't provide the output as-is.
<zao> Is there some inherent limit to the arity of plain functions you want to invoke with hpx::async?
<zao> Is the proper solution when you want to have a non-const lvalue reference in a hpx::async call to wrap it in a std::reference_wrapper?
<zao> Code that works on MSVC but not Linux GCC - https://gist.github.com/zao/f0cc8cae2ac51ffbc19ce42a0207e914
<zao> I guess that this is the good old "we can bind temporaries to rvalue reference huhuhu" extension of VC++ kicking in somehow?
<hkaiser> zao: yah
<hkaiser> msvc is notoriously allowing you to bind lvalue refs to rvalue refs, that's illigal
<hkaiser> zao use std::ref(pdn)
<zao> Odd, now I can't get the standalone test to compile with MSVC. Oh well, I know the solution at least.
<zao> Curious. It works for something like std::vector<float>&, but fails for float&
<zao> (compiles on MSVC, that is)
<zao> Oh gods, it's doing The Wrong Thing silently.
<zao> Mutating a temporary, good times! :D
<hkaiser> zao: async knows that the function is caled once so it tries to move the arguments to the function
<zao> I'm gonna hit the sack, but if I have several functions needing the result of an async call, can I pass a shared_future to them all, or should I try to get the data out at some point and hand them shared_ptr:s?
<zao> Also still not sure how to handle async functions returning several different things, is a tuple of futures my best bet, or should I bake some custom return struct for each one?
<zao> (this codebase has lots of nice functions of arity >20 with a lot of big input and output arrays, scientific code at its best)
<hkaiser> async return a future which you can turn into a shared_future
<hkaiser> f.share()
<hkaiser> let the function return a tuple<> (i.e. async will give you a future<tuple<>>)
<hkaiser> or use the returned future just as a flag that the ref'ed args are ready
<hkaiser> or create a struct and return an instance of it which gives you a future<foo>, might require move operators and somesuch, though
<weilewei> hkaiser and @everyone, if I want to understand more about operating system and computer architecture, what books will you recommend?
<hkaiser> the book with the dinosaurs on the cover
<weilewei> feel like I need to understand more about the underlying hardware
<weilewei> Operating System Concepts Avi Silberschatz
<hkaiser> I have a copy in my office ;-)
<weilewei> oh!! I wish I can borrow it to read
<weilewei> But we cannot enter the CCT building now
<hkaiser> but you're not supposed to go there and my office is locked
<hkaiser> weilewei: but you're lucky that I'm not supposed to go there anyways and will not get a book that I'm not supposed to give to Karame this week
<hkaiser> ;-)
<nan11> lol
<weilewei> lol, maybe I can find a E-book online of it
<hkaiser> I'll let you know once I have not been there and where you can't find the book
<weilewei> hkaiser thanks!! Let me know then
<hkaiser> I won't
<nan11> I hear nothing xD
<weilewei> lol
<weilewei> If I have a kernel-thread only, once it finishes its work, then it will get destroyed (right?). Then, if I have a hpx user-level thread running on top of one kernel thread, if this user-level threads finishes same work and some other user level threads will come right after, then does the kernel-thread get destroyed as well?
<weilewei> Will a new kernel-thread gets created to support the new coming user-level threads?
<weilewei> Or just reuse the first kernel-thread?
<zao> HPX seems to have a whole lot of persistent OS threads to serve as workers and IO runners.
<zao> There’s not much point in scaling them up and down if you can keep them around for cheap.
<weilewei> IC, that's how we save overhead
<weilewei> btw, does this explanation exist somewhere? I barely find good online pages explain this concept well. Some of them just same kernel thread is expensive to create and manage, then period
<zao> If you break a HPX program in a debugger you can see nicely named threads (if the OS supports names) and get a feeling for what their stacks look like.
<zao> The concept of thread pools is reasonably common out there, the ways work gets onto them tends to vary a bit.
<weilewei> True, I will take a look
<weilewei> Right, that's down to implementation level
<zao> One of the niftier things of HPX is how work can yield for other work as needed, something that’s otherwise hard.
<weilewei> Can't OS scheduler do similar thing?
<hkaiser> zao: those threads are mostly dormant
<weilewei> hkaiser what are "those threads"? those keep OS busy?
<hkaiser> no, they simply sit and wait in the kernel, doing nothing - mostly
<hkaiser> weilewei: we have 6 additional threads in HPX, 2 for IO, 2 for timers, and two for networking
<weilewei> hkaiser these 6 additional threads are waiting for responding tasks and will be immediately used when needed, right?
<hkaiser> yes
<weilewei> If there is not much tasks in user space, will hpx keep those free kernel threads from being destroyed?
<hkaiser> those 6 threads are not the ones doing the hpx tasks
<weilewei> I see, I guess my question is what happen to kernel threads that have hpx tasks to work
<weilewei> that have *no hpx tasks
<hkaiser> they keep running the scheduling loop
<weilewei> oh, I understand now, thanks. hpx scheduler makes them busy
<hkaiser> the threads that don't run hpx threads sleep when there is no work
<zao> If you’d write your own thread pool, you’d typically have some control function running on each OS thread waiting for work to appear.
<hkaiser> zao: yah, we keep them running to be able react faster
<hkaiser> they however do a exponential backoff if there is no work for a longish time
<zao> You _could_ make that self-terminate and have the issuer start new threads when need arises again, but in the HPC world you don’t have much need to.
<hkaiser> right, you want to keep overheads down
nan11 has quit [Remote host closed the connection]
<zao> So there’s tiers of overhead. You can actively spin for work, consuming CPU resources and hoping work appears shortly, you can back off and wait on a heavier primitive, and you could again in theory shut down the thread and have someone wind up a new one later at great cost.
<weilewei> I see
<zao> Yielding out to the OS puts you at the mercy of its scheduler, which tends to be rather coarse-grained.
<zao> (please correct me if I’m off on something)
<weilewei> hkaiser actually I still confused about the wording, you said "the threads that don't run hpx threads sleep when there is no work", and then you said "zao: yah, we keep them running to be able react faster". In the later sentence, what does "them" refer to?
<weilewei> in my mind, i am thinking "kernel threads that don't have any hpx tasks to do" as "them"
<hkaiser> the threads that do hpx work run all the time, the others sleep
<hkaiser> zao: exactly
<hkaiser> except that we don't stop and restart the threads, but we do let them back off if there is no work
<weilewei> hkaiser ah, I see now
<hkaiser> putting them to sleep
akheir1 has quit [Quit: Leaving]
hkaiser has quit [Quit: bye]
bita has quit [Quit: Leaving]
nikunj_ has joined #ste||ar
weilewei has quit [Remote host closed the connection]
nikunj_ has quit [Ping timeout: 256 seconds]
nikunj has quit [Ping timeout: 240 seconds]
nikunj has joined #ste||ar
nikunj_ has joined #ste||ar
nikunj_ has quit [Ping timeout: 240 seconds]
Amy1 has quit [Ping timeout: 246 seconds]
Amy1 has joined #ste||ar
<Yorlik> Sweet. Lets hope we will profit from this statement of the paper: " ORNL and Cray will partner with AMD to co-design and develop enhanced GPU programming tools designed for performance, productivity and portability,"
kale_ has joined #ste||ar
kale_ has quit [Ping timeout: 258 seconds]
kale_ has joined #ste||ar
<heller1> nod
kale_ has quit [Read error: No route to host]
kale_ has joined #ste||ar
kale_ has quit [Remote host closed the connection]
nikunj_ has joined #ste||ar
hkaiser has joined #ste||ar
<Yorlik> Did any of you have this problem? :
<Yorlik> Could not find a configuration file for package "boost_system" that exactly matches requested version "1.72.0".
<Yorlik> It only comes up in debug build
<Yorlik> The file is there, but it doesn't accept it
<Yorlik> Like this: The following configuration files were considered but not accepted:
<Yorlik> .../Debug/lib/cmake/boost_system-1.72.0/boost_system-config.cmake, version: unknown
<Yorlik> Building HPX was not a problem.
<Yorlik> It seems my normally working find_boost is kinda messed up.
<zao> Yorlik: Had something similar the other day, I think FindBoost doesn't know about 1.72 yet?
<Yorlik> It seems so. The odd thing is it used to work all the time even with 1.72.
<Yorlik> Now I deleted and rebuilt everything after a VS update and now this is broken.
<Yorlik> I gave find_package the correct directory and everything ...
<Yorlik> seems I might have to use th variables find_package produces manually
<Yorlik> zao: How did you fix it?
<zao> I can't quite reproduce it now, just know I saw it the other night on _some_ system.
<simbergm> don't know if that helps? (and we're obviously still missing 1.72...)
<Yorlik> I just added it and 1.72 and .0, but the error persists.
<Yorlik> Funnily enough HPX compiled without issues.
<Yorlik> The debug version was missing boost_system-config-version.cmake
<Yorlik> I just copied it over
<simbergm> zao: since you spend all your days fighting bad build systems... are we being bad citizens by having SOVERSION set to the release version (as opposed to a counter that we increment every time we break the abi, i.e. all the time) when we don't guarantee abi compatibility between minor releases? or does noone care?
<zao> In my particular EasyBuild world we don't care, as we have exact matches of versions. For packaging in general there might be some assumptions of what works with what, but that I don't really know about.
<simbergm> thanks, I suppose it could be a problem in distros
<simbergm> but maybe we can worry about that later
<zao> Does HPX declare somewhere what kind of versioning scheme is in use, semver and other guarantees?
<hkaiser> nope
<hkaiser> nothing formal
<K-ballo> we should formalize the no guarantees
<hkaiser> K-ballo: I'll bring it up tomorrow during the PMC meeting
<simbergm> I was going to write up a draft for that today or tomorrow... (but for 2.0, where I hope we'll start using semver; good idea to write down the no-guarantees given at the moment)
<hkaiser> simbergm: we could start collecting something along the lines of Python PEP documents
<zao> Adopt the motto of our computer club - "no-one ever promised that anything would work"
<hkaiser> isn't that self-evident?
<zao> hkaiser: I'm curious, who's this "aurianer" person working on the windows path issue I had?
<K-ballo> 2.what?
<hkaiser> zao: she is working with Mikael in Zuerich
<hkaiser> K-ballo: once the modularization is done, we plan to release it as HPX V2
nan11 has joined #ste||ar
<simbergm> K-ballo, hkaiser: it could be more than just modularization but let's see how much energy we have for that
K-ballo has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
<zao> Hitting 2.0 before Boost :P
<simbergm> hkaiser: I would just add PR/issue tag HEP :)
<hkaiser> sure, whatever ;-)
<simbergm> while I like the idea of a PEP in general, I find it funny that enhancement proposals stay as enhancement proposals once they've been accepted
<hkaiser> no need to overdo things
<hkaiser> zao: Boost will never hit 2.0
<simbergm> the api guarantees belong in our documentation anyway
<zao> Yeah :)
weilewei has joined #ste||ar
<hkaiser> nikunj_: yt?
<nikunj_> hkaiser, here
<hkaiser> hey
<nikunj_> hkaiser, hey! hope you're safe and well
<hkaiser> where did we host the resiliency paper?
<hkaiser> thanks, all is well
<nikunj_> you mean apply in SC?
<hkaiser> yah
<nikunj_> it was FTXS
<hkaiser> can't find it - we've got permission to publish it now
<hkaiser> yah, but where is it?
<hkaiser> ahh ok, thanks
<nikunj_> we're publishing at arxiv?
<hkaiser> yes
<nikunj_> nice!
karame_ has joined #ste||ar
weilewei has quit [Remote host closed the connection]
<nikunj_> heller1, want to see some great plots?
<nikunj_> hkaiser, heller1: check out the new plots. Great results finally
<hkaiser> great!
<nikunj_> yes, all thanks to the new and improved block executor
<hkaiser> did you create that?
<nikunj_> I think we are seeing some cache effects in there, so the assumed arithmetic intensity isn't quite right
<nikunj_> hkaiser, no
<nikunj_> it was recently improved in HPX
<hkaiser> ahh, cool
<nikunj_> simbergm asked me to update HPX for better results
<nikunj_> but I'd like to work on executor stuff as well :D
<hkaiser> nikunj_: send him a pizza ;-)
<nikunj_> I was thinking more in like of a drink
<nikunj_> *in line
<nikunj_> ;-)
<zao> Kebab or pineapple+ham? :P
weilewei has joined #ste||ar
<nikunj_> zao, you like pineapples on pizza?
<zao> Yeah, they're nice.
<zao> One of my default choices is the Hawaii Special, which has ham, pineapple, sliced banana and curry spice.
<nikunj_> yeah! Didn't know many people liked it. The idea of pineapples on pizza make my friends hate pizza :/
<nikunj_> ohh that sounds delicious
Yorlik has quit [Read error: Connection reset by peer]
<simbergm> we can discuss this tomorrow if there's time: https://github.com/STEllAR-GROUP/hpx/pull/4524
<simbergm> (I think it's a bit too early to finalize anything since we're not even close to thinking about 2.0.0, but we can start thinking about this)
Yorlik has joined #ste||ar
<simbergm> zao: you monster (jk, I used to love pizza hawaii as a kid, now I just accept it)
<simbergm> nikunj_: I'm glad things are faster :D I probably broke the block_executor in the first place though, so I shouldn't get much credit for making it faster again
<simbergm> also, just for the record, I don't like meaty drinks
<Yorlik> No meatonade?
<nikunj_> simbergm, my code is running faster due to your efforts ;)
bita has joined #ste||ar
kale_ has joined #ste||ar
kale_ has quit [Client Quit]
<nikunj_> hkaiser, is there any book you'd recommend on metaprogramming other than C++ Template Metaprogramming: Concepts that you gave me last summer?
<nikunj_> something more on the line of modern C++
<simbergm> hkaiser: heller rori another topic for tomorrow: http://hpx.stellar-group.org/roadmap/ (one can't browse to that page yet)
<simbergm> I think that hpx 2 issue is pretty much our high level goal for the next year, but we can put something more formal over there as well
<hkaiser> simbergm: great, thanks!
<heller1> Very nice! We should maybe add a kanban board (project) to better track what's going on
gonidelis has joined #ste||ar
<heller1> nikunj: c++ templates the complete guide
<nikunj_> heller1, ok will take a look. Did you look at the plots?
<heller1> The plots look cool
<nikunj_> right, I can say that we're seeing good amounts of cache benefits. So arithmetic intensity isn't 1/8 imo
weilewei99 has joined #ste||ar
weilewei99 has quit [Remote host closed the connection]
<heller1> Yes, which is nice
<nikunj_> absolutely!
<heller1> Did you implement any blocking?
<nikunj_> heller1, blocking as in block iterators?
<heller1> As in cache blocking, such that you iterate over your domain in a tiling fashion
<nikunj_> I have not, how do I do that?
<nikunj_> current results are pure HPX results. Only thing I tried was to limit any complexities in the code so that the compiler can optimize the code better.
<zao> Yorlik: Ah, the thing I was thinking of was:
<zao> 1> [CMake] CMake Warning at C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.16/Modules/FindBoost.cmake:1147 (message):
<zao> 1> [CMake] New Boost version may have incorrect or missing dependencies and imported
<zao> 1> [CMake] targets
<Yorlik> Yes - that warning is vommon - but it doesn't break builds.
<nikunj_> heller1, cache blocking is present inherently
<nikunj_> since the stencil dimension is 8192*131072 and carrying line wise updates, we're already using cache blocking
<nikunj_> heller1, currently my stencil fits in L2 cache perfectly so we're cache benefits already
<heller1> I see, there you go
<nikunj_> when I say stencil, I mean the line updates fits into L2 cache nicely
<nikunj_> heller1, with these numbers, I can write my results for the lab based project and gain university credits. Now I want to extend this work so that I can get a paper out of it. What would you suggest?
<heller1> Read the prior art about the topic
<heller1> And think about what you're making differently, what are your benefits, where are the drawbacks, etc
<nikunj_> I have done some literature review and most of them usually ends with either a new tiling solution or making tiling easier to use for application user
gonidelis has quit [Ping timeout: 240 seconds]
<heller1> Where's your approach going in a different direction?
<nikunj_> right now, my application is a very basic one. It's nothing different from what people have been using. It's just an optimized version making use of cache effectively
<heller1> What architectures did you investigate?
<nikunj_> aah, none of them were really done on ARM
<nikunj_> they were all based out of x86 architectures
<heller1> Is it performing equally good everywhere?
<nikunj_> well you have the results. It's performing nicely.
<nikunj_> What I couldn't explain was simd floats on thunderX2
<nikunj_> it was performing way better than available bandwidth
<heller1> I'm asking the questions for you to answer.
<nikunj_> I investigated x86 and arm and I saw that arm had irregular scaling while x86 had a regular overall scaling
<heller1> If you find an answer to those questions which have not been answered by previous papers, you have yours
<nikunj_> so essentially start with literature review on stencils and its performance
RostamLog has joined #ste||ar
akheir has joined #ste||ar
gonidelis has joined #ste||ar
K-ballo has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
nikunj_ has quit [Ping timeout: 240 seconds]
Amy1 has quit [Killed (Sigyn (Stay safe off irc))]
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
diehlpk_work has quit [Remote host closed the connection]
wate123_Jun has joined #ste||ar
diehlpk_work has joined #ste||ar
mreese3 has quit [Read error: Connection reset by peer]
bita has quit [Ping timeout: 256 seconds]
nikunj97 has joined #ste||ar
wate123_Jun has quit []
gonidelis has quit [Remote host closed the connection]
weilewei has quit [Remote host closed the connection]
weilewei has joined #ste||ar
nikunj97 has quit [Quit: Leaving]