hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
Yorlik has quit [Ping timeout: 256 seconds]
hkaiser has joined #ste||ar
weilewei has quit [Remote host closed the connection]
Yorlik has joined #ste||ar
diehlpk_work has quit [Remote host closed the connection]
hkaiser has quit [Quit: bye]
nan11 has quit [Remote host closed the connection]
bita_ has quit [Ping timeout: 240 seconds]
nikunj has quit [Ping timeout: 252 seconds]
nikunj has joined #ste||ar
sayef_ has joined #ste||ar
sayefsakin has quit [Ping timeout: 256 seconds]
carola[m] has joined #ste||ar
sayef_ has quit [Ping timeout: 260 seconds]
hkaiser has joined #ste||ar
LiliumAtratum has joined #ste||ar
kale[m] has joined #ste||ar
K-ballo has quit [Ping timeout: 256 seconds]
K-ballo has joined #ste||ar
kale[m] has quit [Ping timeout: 260 seconds]
kale[m] has joined #ste||ar
LiliumAtratum has quit [Remote host closed the connection]
kale[m] has quit [Ping timeout: 272 seconds]
kale[m] has joined #ste||ar
<ms[m]> heller: hkaiser jbjnr apologies for merging https://github.com/STEllAR-GROUP/hpx/pull/4689 and soon https://github.com/STEllAR-GROUP/hpx/pull/4696, they'll mean at least some conflicts for most PRs; but merging other PRs first would've meant eternal catchup for these two PRs
<heller1> gotcha
<hkaiser> ms[m]: np
jaafar has quit [Remote host closed the connection]
jaafar has joined #ste||ar
diehlpk_work has joined #ste||ar
parsa has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
parsa has joined #ste||ar
weilewei has joined #ste||ar
<Yorlik> heller1: YT?
<heller1> Yorlik: what up?
<Yorlik> From the tests I did yesterday it seems there is a big increase in frametime, when the level2 cache gets trashed, however - this is at relatively low object numbers already and was to be expected.
<Yorlik> What concerns me more is, when I do tests with more cores, calculate the times and multiply them by the number of cores afterwards to see the time per object and core used. This bring another serious increase in times and I am not yet sure if this is due shared caches or because there is something I need to improve in my application.
<Yorlik> I calculated and objects/second and memory bandwidth from these times and that is indeed abysmal compared to what is theoretically possible , which it shouldn't be. But that also might be a result of wasting time in Lua of course, which increases frametime.
<Yorlik> The numbers per object single threaded are 0.020 microseconds at 1000 objects and 0.013 at 5000 objects. I think HPX can work more efficiently here and that makes the improvement of the times.
<Yorlik> However at ~20000-100000 objects the time goes up to 6.00 and up to 60 microseconds at large object counts, which is most likely a cache effect.
<Yorlik> However with 12 cores the times per object AND core [ (obj*corecount) / sec ] are pretty high (~ 150.00 - 336.00 Microseconds) - so there's some inefficiency which is probably related to the entire setup with Lua. It might be necessary, that I aggregate several object updates into one single call into Lua, instead of calling into Lua for every single object separately, because now I need a LuaEngine per object
<Yorlik> update which is probably trashing the cache even more.
<Yorlik> I think I need to understand the situation better first and design some automated experiments/measurements to understand what's really going on.
<diehlpk_work> hkaiser, Do we have your meeting?
<hkaiser> yes
nikunj has quit [Ping timeout: 260 seconds]
weilewei has quit [Remote host closed the connection]
weilewei has joined #ste||ar
nikunj has joined #ste||ar
LiliumAtratum has joined #ste||ar
<LiliumAtratum> Hello, I stumbled upon a problem integrating hpx into our project. We use Qt on windows and a bit of boost. It is hard to track what is included where and in which order (the order never matter so far). However, it seems hpx needs to be included before windows (thus Qt) - otherwise I get errors. Any tips for that problem?
<Yorlik> LiliumAtratum: IIRC you need to include HPX specifically before Boost.
<heller1> Yorlik: ok. you should be able to see cache misses etc directly when using something like vtune
<heller1> Yorlik: no, windows.h
<Yorlik> heller1: So it's Boost and windows.h which must come after HPX?
<heller1> yes IIRC
<heller1> but I am not a windows person
<Yorlik> Double Bubble then ;)
<heller1> i think hpx/config.hpp alone should be good
<K-ballo> it's the old winsock story, if windows is included first it defaults to winsock1.0
<Yorlik> heller1: Does vtune work nicely with AMD machines?
<heller1> yes
<Yorlik> I know Intel has a dubious history with how they treated AMD machines in their libraries.
<heller1> sure
<Yorlik> Probably vtune isn'Ät affected then.
<heller1> vtune might tell you about missing hardware performance counters etc
<zao> Yorlik: VTune is quite limited if you're not on an Intel chip.
<heller1> but profiling in general will work
<Yorlik> OK
<zao> Falls back to rather coarse sampling techniques and a fair number of the interesting metrics are disabled.
<Yorlik> The AMD stuff looks horribly outdated and some of it didn't even install on my machine
<Yorlik> I don't understand how they cannot support their developers better.
<heller1> alrighty what zao says then ;)
<heller1> AMD?
<Yorlik> Independently from what Intel was doping in the past, concerning developer tools I think AMD really needs to get up to speed.
<heller1> well ... who knows?
<Yorlik> I tried installing their profiler and it miserably failed
<Yorlik> Thanks ! I'll check that out.
nan111 has joined #ste||ar
<heller1> zao: does PAPI work on windows?
<zao> I have no idea.
<heller1> probably not though
<LiliumAtratum> heller1, do I understand correctly - you say that including `hpx/config.hpp` before everything should be enough?
<K-ballo> that would be enough, yes
<LiliumAtratum> thank you!
<K-ballo> it will include winsock2.h, so then windows.h wont conflict
<K-ballo> another option should be win32's lean and mean macros
<heller1> LiliumAtratum: correct
<LiliumAtratum> lean-and-mean disables winsock? I am currently experimenting with defining `_WINSOCKAPI_`, because I found that's what you do in `hpx/config.hpp` ;)
<zao> There's a special circle of hell reserved for people who define LEAN_AND_MEAN for others.
<LiliumAtratum> hahaha
<LiliumAtratum> those people are simply MEAN ;)
<K-ballo> lean-and-mean disables implicit winsock1.0 on windows.h
<K-ballo> as long as nobody else includes winsock1 (and why would one ever??) then it'd be fine
<zao> It also guts a lot of other things that tend to be actually needed if you're going to use UI toolkits.
<LiliumAtratum> `/D_WINSOCKAPI_`solved the problem for me :)
<K-ballo> some day people will start #including the stuff they use
<LiliumAtratum> I think #includes will become obsolete before that happens :)
<K-ballo> that's fine, modules will force people to do the right thing
<K-ballo> modules don't leak
karame_ has joined #ste||ar
rtohid has joined #ste||ar
<LiliumAtratum> but... how much time till windows.h is replaced by a module? ;)
* zao quietly chants "Rust, rust, rust" in his corner of the channel
<zao> MS would rather have you use the newfangled WinRT or other replacement techs and have the old stuff forgotten.
<zao> The new stuff with projections is kind of nice, btw.
<LiliumAtratum> I touched rust a bit and it is interesting indeed. But I am just too deep in C++ to jump ship ;)
<LiliumAtratum> if I could I would probably change a few things in the Rust syntax though
<zao> I've got to relearn game-style C++ soon, itching for a job change.
<Yorlik> zao: We could always need an intern ;)
<weilewei> zao how's difference comparing c++ programming between hpc and game development?
<Yorlik> Game Dev is HPC, sortof. Always was, I think.
<zao> Interactivity.
<zao> In games, at least on the runtime side of things, you've got hard(ish) deadlines of a frame to do things.
<nan111> hkaiser I just updated hpx and tried to compile phylanx. But got a error, says " fatal error: hpx/format.hpp: No such file or directory #include <hpx/format.hpp>"
<Yorlik> You have frames, which give you a tight time budget and you need to cram all sorts of stuff in it and then you add client networking on top of that ...
<zao> Any concurrency you do tends to need to fan back in preferably during the same 6-16ms frame, the longer it takes the more latency a player will observe.
<nan111> hkaiser do you have any suggestions for it?
<zao> In HPC, you can run extremely heavy computations and aim to fill all your cores with long pipes of work such that nothing ever idles.
<ms[m]> nan111: we just merged a big pr that renames headers
<ms[m]> you'll need hpx/modules/format.hpp, but let me check first if we need to put back a compatibility header for hpx/format.hpp
<zao> In games you often also have quite a wide variety of hardware you're running on, so your game needs to deal with less capable GPUs or slow storage, etc.
<weilewei> good point, it does sound like different ideologies
<zao> While in HPC, you tend to have a typical style of cluster you target, and quite often write bespoke code tuned for a particular µarch.
<nan111> ms[m] got it. Thanks.
<zao> Both fields care about performance and bloat, just in different ways ^^
<weilewei> zao nice! Hope you get a new job in game dev soon : ). Maybe when I have time, I should checkout some game dev C++ projects to educate myself..
<zao> I'm a bit tired of always installing new software, always something new that breaks in OS, always some new broken code.
<zao> It would be nice to work on a codebase that you actually own, which can be improved and fixed.
<hkaiser> nan111: I'll do a sweep over the Phylanx code to fix this asap
<ms[m]> hkaiser: keep in mind we might want to put back some of these
<nan111> hkaiser thank you
<ms[m]> I just checked and hpx/format.hpp was already there in 1.4.1
Yorlik has quit [Read error: Connection reset by peer]
<ms[m]> would we nice if we don't break phylanx completely with this
<hkaiser> well, currently it's completely broken ;-)
<ms[m]> yeah, I get that, but before the next release...
<ms[m]> you won't be the only ones including stuff like that
<ms[m]> I'll try add back the ones that we already had in 1.4.1
<hkaiser> ok, thanks
<hkaiser> ms[m]: I'll create a list of headers needed by phylanx that are now missing
<heller1> zao: you know ... there's this great open source project, which is always open for creative people to take ownership of some modules ;)
<zao> heller1: It's unfortunately written in a weird C++ accent that no-one understands :P
<ms[m]> hkaiser: don't worry, I have a list of all the headers in phylanx, and I think it might not cover all the headers that changed
<heller1> zao: some also call it magic
<hkaiser> NO!
Yorlik has joined #ste||ar
<ms[m]> :P
<heller1> hkaiser: I said that just for you ;)
<hkaiser> I know ;-)
<LiliumAtratum> @zao You point interesting differences between hpx and gaming. I am working on a CAD program, which I find is somewhere in between: there is an interactivity part (we don't want to drop below 30fps), and there are some heavy algorithms that can run - in most extreme cases - for a few hours on a PC.
<zao> Meanwhile asset builds quite resemble the kind of pipelines that bioinformaticians do, with lots of tools glued together to transform resources, often in farms.
<heller1> zao: are most jobs offered as remote nowadays?
<zao> Haven't looked lately, normally everything is on-site or consultancy stuff.
<zao> You've got a few remote places like Mozilla, that are innately distributed.
<heller1> Right, just wondering since everyone is sitting at home these days
<heller1> If companies already changed their hiring criterion
<zao> Game studios have largely been forced to go into WfH, which somewhat works out with infinite Zoom calls.
<heller1> Almost all software shops, I'd assume (us as well)
<hkaiser> ms[m]: yt?
<ms[m]> hkaiser: yep
<hkaiser> currently all header compatibility options need to be defined at HPX configure time, there is no way to disable those for a certain dependent project only
<hkaiser> may I suggest we change all #if defined(HPX_<module>_HAVE_DEPRECATION_WARNINGS) to #if HPX_<module>_HAVE_DEPRECATION_WARNINGS ?
<hkaiser> that would allow to set thing to zero independently of how HPX is preconfigured
<hkaiser> that would also require to change the #define HPX_<module>_HAVE_DEPRECATION_WARNINGS to something guarded with a #ifndef
<hkaiser> ms[m]: concretely, we use hpx/runtime/get_os_thread_count.hpp in blaze and I wouldn't like to change that as it may break use of blaze with older hpx versions
<ms[m]> like hpx_add_config_cond_define?
<hkaiser> but having this warning 100 times while compiling phyalnx is just annoying
<hkaiser> mdiers[m]: yes
<hkaiser> ms[m]: yes
<hkaiser> ms[m]: this would also require touching all compatibility headers
<ms[m]> and you'd go and add in phylanx target_compile_definition(... HPX_RUNTIME_LOCAL_HAVE_DEPRECATION_WARNINGS=0)?
<hkaiser> yes
<ms[m]> should be fairly easily doable with some scripting, don't have anything against it
<hkaiser> thanks
<ms[m]> I'll open an issue, I bet rori_[m] could do it in no time
<hkaiser> perfect!
<rori> 👍️
<hkaiser> ms[m]: also hpx/async.hpp is completely gone (not even as hpx/modules/async.hpp) should this be hpx/modules/hpx/async_local.hpp?
<hkaiser> hpx/modules/async_local.hpp?
LiliumAtratum has quit [Remote host closed the connection]
<hkaiser> hrmmm
<ms[m]> good point
<ms[m]> it should not be the local one
kale[m] has quit [Ping timeout: 260 seconds]
<ms[m]> it might be missing a compatibility header
<ms[m]> I'll add that back as well
<ms[m]> it'd be in async_distributed if anywhere
<ms[m]> not even sure it should be a compatibility header yet (i.e. it should just be a plain header)
<hkaiser> nod
<ms[m]> related to that, you shouldn't need to include hpx/runtime/get_os_thread_count.hpp
<ms[m]> I think it should be hpx/runtime.hpp, but that's temporarily gone now, so hpx/include/runtime.hpp should do it
<hkaiser> what should I include instead (and be backwards compatible)?
<hkaiser> but I just need that one function...
<ms[m]> hpx/include/runtime.hpp is meant to work in older versions and master
<hkaiser> well, fair point
<ms[m]> I mean, hpx/runtime/get_os_thread_count.hpp will also work, but with the warning as you said
<ms[m]> there's a tradeoff between having a header for every tiny function and having stable headers (at the moment...)
<hkaiser> yes, I understand - this #include was added 3 years ago...
<ms[m]> you can keep using that one, I just wanted to remind you (myself, really) that we should have a hpx/runtime.hpp header
<ms[m]> as a "public header"
<hkaiser> nod
<hkaiser> another one: async_mpi/src/force_linking.cpp refers to hpx/mpi/force_linking.hpp now, which doesn't exist anymore
<ms[m]> good catch, will change that as well
kale[m] has joined #ste||ar
kale[m] has quit [Ping timeout: 252 seconds]
<hkaiser> ms[m]: this is not a real problem, looks like a stale file on Kevins mashine
<hkaiser> machine, even
<ms[m]> hkaiser: hmm, looks real to me
<ms[m]> at least I just changed it on a branch based off latest master :P
<ms[m]> it's not a generated file
kale[m] has joined #ste||ar
<hkaiser> it is for me
<hkaiser> ah darn, I was looking into mpi_base
<ms[m]> that should be it but I haven't yet managed a full build (just started one)
<ms[m]> although hpx itself should be fine...
<hkaiser> let me try it, I might be able to discard my #include changes now ;-)
<ms[m]> we could change more of these to non-compatibility headers if we want to have them public in 2.0.0 anyway (otherwise users will go back and forth between hpx/x.hpp and hpx/modules/x.hpp, and the latter should be for internal use only)
<ms[m]> e.g. hpx/format.hpp
<ms[m]> there might be others
<ms[m]> plus we also still have this before 1.5.0: https://github.com/STEllAR-GROUP/hpx/issues/4639
bita_ has joined #ste||ar
<ms[m]> so what we're forwarding headers to right now might not be the same in 1.5.0
<hkaiser> right
<ms[m]> gtg now, but I'll check back in later on the ci (fingers crossed I didn't miss any formatting...)
<hkaiser> I had to change a handful of headers, format, async, error, testing, plugin
<hkaiser> program_options.hpp
kale[m] has quit [Ping timeout: 260 seconds]
<hkaiser> datastructures
<ms[m]> damn, I missed the non-generated ones (like format)
kale[m] has joined #ste||ar
<ms[m]> if you have time feel free to push, otherwise I'll add it later
<ms[m]> and I just force pushed because I did forget to format everything...
<hkaiser> ok
<gonidelis[m]> Why do we take std::enable_if<>::type here and not std::enable_if<>::value ? https://github.com/STEllAR-GROUP/hpx/blob/master/libs/iterator_support/include/hpx/iterator_support/traits/is_range.hpp#L25-L26
<hkaiser> because this expects a type and not a value
<hkaiser> gonidelis[m]: if the enable_if fails (SFINAE's), then the default template is used, i.e. std::false
kale[m] has quit [Ping timeout: 240 seconds]
<hkaiser> if the enable_if succeeds the enable_if<>::type evaluates to void which takes precendence over the default template (it's more specialized) and the specialization is chosen, i.e. std::true_type
kale[m] has joined #ste||ar
<gonidelis[m]> So the `enable_if` succeeds only when `T` can exist either as an iterator either as a sentinel???
<hkaiser> the enable_if currently succeeds only if bothe, the iterator derived from T and the sentinel derived from T are the same type
kale[m] has quit [Ping timeout: 246 seconds]
kale[m] has joined #ste||ar
<K-ballo> gonidelis[m]: enable_if<> has no ::value
<hkaiser> IOW, if decltype(begin(declval<T>())) == decltype(end(declval<T>()))
<gonidelis[m]> So when we call `is_range<std::vector<int>>` is there a sentinel derived from the `vector<int>` ?
<K-ballo> yes, end()
<hkaiser> our is_range is too restrictive, it should be changed to use your new is_sentinel_for, I think
<K-ballo> what would it use for is_sentinel_for ?
<K-ballo> it's always end() for a range, it won't be less restrictive
<gonidelis[m]> I haven't completed `is_sentinel_for` yet :/ . I am trying to figure out what specialization to use in order to take advantage of the `equality_result` and `inequality_result` types
<gonidelis[m]> K-ballo: But end() is not a sentinel... is it?
<K-ballo> yes, all end iterators are sentinels
<K-ballo> that was part of the design goal for sentinel, remember?
<gonidelis[m]> Ahhh ok... got it
kale[m] has quit [Read error: Connection reset by peer]
kale[m] has joined #ste||ar
kale[m] has quit [Ping timeout: 260 seconds]
rtohid has left #ste||ar [#ste||ar]
<hkaiser> K-ballo: currently it compares for equality of begin() and end(), not whether they are equality comparable
<hkaiser> bita_: could we meet 5-10 minutes later, please?
<K-ballo> that sounds more restrictive, not less..?
<hkaiser> currenlty i requires that the types returned from begin and end are the same
<hkaiser> thats more restrictive than if those types are not the same but equality comparable
<bita_> hkaiser, of course
rtohid has joined #ste||ar
<K-ballo> ah, yes, got it now.. indeed
<bita_> I get 2 linking error in windows that don't exist when I run the program in linux
<bita_> 3>dist_argmax.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) public: __cdecl phylanx::ir::node_data<unsigned __int64>::node_data<unsigned __int64>(void)" (__imp_??0?$node_data@_K@ir@phylanx@@QEAA@XZ) referenced in function "private: struct phylanx::execution_tree::primitive_argument_type __cdecl phylanx::dist_matrixops::primitives::dist_argminmax<struct phylanx::common::argmax_op,class phylanx::dist_matrixops
<bita_> ::primitives::dist_argmax>::argminmax2d(class std::vector<struct phylanx::execution_tree::primitive_argument_type,class std::allocator<struct phylanx::execution_tree::primitive_argument_type> > &&)const " (?argminmax2d@?$dist_argminmax@Uargmax_op@common@phylanx@@Vdist_argmax@primitives@dist_matrixops@3@@primitives@dist_matrixops@phylanx@@AEBA?AUprimitive_argument_type@execution_tree@4@$$QEAV?$vector@Uprimitive_argument_type@executio
<bita_> n_tree@phylanx@@V?$allocator@Uprimitive_argument_type@execution_tree@phylanx@@@std@@@std@@@Z)
<bita_> 3>dist_argmin.obj : error LNK2001: unresolved external symbol "__declspec(dllimport) public: __cdecl phylanx::ir::node_data<unsigned __int64>::node_data<unsigned __int64>(void)" (__imp_??0?$node_data@_K@ir@phylanx@@QEAA@XZ)
<bita_> 3>dist_argmax.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) public: __cdecl phylanx::ir::node_data<unsigned __int64>::~node_data<unsigned __int64>(void)" (__imp_??1?$node_data@_K@ir@phylanx@@QEAA@XZ) referenced in function "private: struct phylanx::execution_tree::primitive_argument_type __cdecl phylanx::dist_matrixops::primitives::dist_argminmax<struct phylanx::common::argmax_op,class phylanx::dist_matrixop
<bita_> s::primitives::dist_argmax>::argminmax2d(class std::vector<struct phylanx::execution_tree::primitive_argument_type,class std::allocator<struct phylanx::execution_tree::primitive_argument_type> > &&)const " (?argminmax2d@?$dist_argminmax@Uargmax_op@common@phylanx@@Vdist_argmax@primitives@dist_matrixops@3@@primitives@dist_matrixops@phylanx@@AEBA?AUprimitive_argument_type@execution_tree@4@$$QEAV?$vector@Uprimitive_argument_type@executi
<bita_> on_tree@phylanx@@V?$allocator@Uprimitive_argument_type@execution_tree@phylanx@@@std@@@std@@@Z)
<bita_> 3>dist_argmin.obj : error LNK2001: unresolved external symbol "__declspec(dllimport) public: __cdecl phylanx::ir::node_data<unsigned __int64>::~node_data<unsigned __int64>(void)" (__imp_??1?$node_data@_K@ir@phylanx@@QEAA@XZ)
<bita_> sorry for copying these long errors. Did I forget any HPX_REGISTER?
<bita_> hkaiser, would you please take a look at this^?
<hkaiser> bita_: could you paste that on gist or similar, please?
<bita_> of course
<bita_> I comment out evrything and I think this line is responsible for that: ir::node_data<std::size_t> indices;
<bita_> Is there any problem having ir::node_data<std::size_t>?
<hkaiser> bita_: you're trying to construct a node_data<> from an unsigned 64 bit it (size_t)?
<bita_> yes
<hkaiser> we support node_data<> for double, int64_t and uint8_t only
<bita_> Okay, now I think I have had this problem before
<bita_> sorry to disturb you
<hkaiser> no worries
<bita_> eventually, I ran out of linking error, thanks for your kindness
hkaiser has quit [Quit: bye]
zao has quit [Ping timeout: 272 seconds]
wash[m] has quit [Ping timeout: 272 seconds]
zao has joined #ste||ar
rtohid has left #ste||ar [#ste||ar]
wash[m] has joined #ste||ar