hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
<Yorlik> Which of the boost libraries does HPX need?
<K-ballo> from the ones that need building/linking?
<Yorlik> Yes
<K-ballo> those are listed somewhere in the prerequs, the header only ones are not
<hkaiser> Yorlik: linking: system, regex, filesystem, program_options
<Yorlik> Thanks !
<K-ballo> regex should only be needed for inspect
<hkaiser> building: almost all of them as spirit depends on half of boost
<Yorlik> I just had issues building serialization, thats why I asked
<hkaiser> K-ballo: right
<hkaiser> Yorlik: HPX has its own serialization library
<hkaiser> no boost dep there
<Yorlik> :)
* Yorlik is scared of any serialization
<K-ballo> funnily enough we do have the means to serialize `any`
<K-ballo> ...it is somewhat scary
<Yorlik> memcpy anything as base64?
* Yorlik hides
<Yorlik> Using email for messages would surely be ~special ;)
<hkaiser> K-ballo: async() uses a promise to set the value, while dataflow is the 'promise' itself
<hkaiser> I'll have a look how its done in promise, that should resolve the issue
<hkaiser> also, other async providers might have the same issue (when_all and friends)
<K-ballo> sounds like something that should be handled by the shared state instead, and definitely not for remote only
<hkaiser> right
<hkaiser> the remote stuff doesn't belong into the shared state, though
<hkaiser> belongs into the remote promise/lco
<K-ballo> indeed
jaafar has quit [Ping timeout: 250 seconds]
<Yorlik> It seems the links on this page are outdated: http://stellar.cct.lsu.edu/docs/
<hkaiser> Yorlik: right, thanks
hkaiser has quit [Quit: bye]
nikunj has quit [Ping timeout: 268 seconds]
nikunj has joined #ste||ar
jaafar has joined #ste||ar
mbremer has quit [Quit: Leaving.]
Yorlik has quit [Ping timeout: 250 seconds]
jaafar has quit [Quit: Konversation terminated!]
Yorlik has joined #ste||ar
mdiers_ has quit [Remote host closed the connection]
<Yorlik> Seems trating warnings as errors is not really a good idea when building boost ...
mdiers_ has joined #ste||ar
<simbergm> Yorlik: thanks for noticing, will update
<zao> Yorlik: Heh, definitely not :)
<zao> Spoke to the Spack guy last night, "yeah, some recipes are not quite-as-maintained" :)
<Yorlik> I have to overcome a ton of newbie hurdles in the moment.
<Yorlik> I've been doing solely Lua scripting the last 4 years
<zao> I wonder if I should contribute this back upstream... do we have any release yet where we've fixed that exception snafu, or should one clamp at 1.68 onward?
<Yorlik> And a wee bit of C#
<Yorlik> The first big wall for me in the moment is learning modern CMake and adopting a superbuild pattern
david_pfander has joined #ste||ar
david_pfander1 has joined #ste||ar
mdiers_ has quit [Remote host closed the connection]
david_pfander1 has quit [Ping timeout: 240 seconds]
mdiers_ has joined #ste||ar
<Yorlik> Does HPX expect boost to be built as shared libs?
<Yorlik> Because my libs are called libboost_xyz and HPX is looking for boost_xyz
<zao> Guessing you're on Windows?
<zao> I forget whether we disable autolinking or not.
<Yorlik> Yes
<Yorlik> I'm using a CMake project in VS
<Yorlik> Trying to integrate boost libs and hpx into my project superbuild
<Yorlik> Since our client will be on windows and we will have sherd code I can't really get away from that cross platform thing
<Yorlik> sherd = shared
<zao> Docs suggests --build-type=complete on Windows, which builds all variations.
<Yorlik> Allright - I have minimal
<Yorlik> Should've read more / better ;)
<zao> In general, you kind of want a shared Boost/CRT if you've got multiple DLL modules, which HPX kind of has if you go for components.
<Yorlik> The good thing is, since it's all in the superbuild I just have to change 2 lines and just restart :)
<Yorlik> Automation is fun :)
<Yorlik> zao: you're using tagged b2 layout?
<zao> Haven't built on Windows for a long while, but either tagged or versioned, IIRC.
<Yorlik> "system" doesn't work together with "complete" buildtype
nikunj has quit [Remote host closed the connection]
jan_ has joined #ste||ar
<Yorlik> I shouldn't compile boost with -j8 - now my hpc tutorial videos freeze and I can't watch them anymore .... :o
jan_ has quit [Quit: Leaving]
heller_ has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
heller_ has joined #ste||ar
<zao> Fair warning - HPX is a bit memory-hungry when compiling/linking some bits, so be careful :)
<zao> Mostly for the tests/examples, tho.
<Yorlik> My inexperience with CMake is the worst - I'm going through some harsh lessons.
<zao> Bonus points for using VS's built-in interpretation of CMake :)
<Yorlik> Example: Creating a CMake Variable with SET(MYVAR SomeValue PARENT_SCOPE) Does not also add the variable to your local scope
<Yorlik> Actually VS is not the issue
<zao> Heh.
<Yorlik> It's CMake itself - VS is actually a improvement, since the json config takes the burdon oiff me to write command lines.
<Yorlik> I just have two configuration files in my superbuild: One for VS and one for Sublime
<Yorlik> Sublime CMake Builder is really nice for the Linux side of thins
<Yorlik> And adding stuff from my CMake build to intellisense is easy - so - I actually like it.
<Yorlik> However - cross compiling is a harsh miustress at times.
<zao> I've long said that "CMake is the least horrible build generator out there", still holds.
<Yorlik> Yup
<Yorlik> Unreal engine uses a custom C# written build system.
<Yorlik> I'll have to deal with that too once we integrate the client side of things
<Yorlik> I'm building boost with multithreading enabled and all libraries are named boost_xyz-mt.dll - I hope that won't be an issue ß
<Yorlik> Can't wait to have my superbuild stable - it still likes to explode every now and then.
<zao> ißue :)
<K-ballo> Unreal engine and HPX?
<Yorlik> The idea is to make a custom server with hpx which uses a client dll which communicates with Unreal Engine
<Yorlik> So Unreal is just a wrapping around the real client
<Yorlik> But the client dll and the server will have some shared code
<Yorlik> Since it affects the simulation there might even be hpx in it
<Yorlik> But we'll have an internal API between UE and the client
<Yorlik> Essentially the idea is to make the client an Unreal plugion
<Yorlik> Main application for hpx is the server
<Yorlik> And the task based Lua engine we want to create
<Yorlik> A Lua State will only have immutable global state while the game objects are just reacting on events with Lua tasks which run on the Lua states
<Yorlik> All dynamic game object data is meant to live in the game objects, probably in some sort of multithreaded buffered ECS
<Yorlik> At least thats the plan
<K-ballo> who is "we" above?
<Yorlik> A bunch of people - advanced hobbyists, some of them programmers
<Yorlik> No company.
<K-ballo> sounds pretty cool
<Yorlik> We've been working together for 4 years already until we ultimately decided to abandon the crappy platform we were on.
<Yorlik> So we had a serious setback, but gained a lot of freedom now
<K-ballo> ooh, which platform was that?
<Yorlik> Its a game created by a small indie studio in Virginia
<Yorlik> They make it moddable, but its horrible.
<Yorlik> AFter 4 years we had enough
<Yorlik> Name is "Legends of Aria"
<Yorlik> I'm happy we quit it
<Yorlik> Now the plan is to make our own game fully from scratch
<Yorlik> It's scary, but we see it as a doable challenge.
<Yorlik> Main constraint is to keep it fun all the time, no matter how far we go. :)
<Yorlik> Does it matter for hpc if I build boost with or without multithreading support?
<Yorlik> hpx
* Yorlik messes it up all the time
<K-ballo> I thought boost had retired the single-threaded builds
<K-ballo> they no longer show up witith --build-type=complete at least
<Yorlik> There still is an option for b2
<Yorlik> I had it on and my taggede builds now are all named boost_xyz-mt.dll
<Yorlik> complete buildtype
<Yorlik> b2 args = install --build-dir=E:/__A/Arc_Sb/_MSVC/BUILD/Windows-X64-Release/boost/boost-1.68.0 address-model=64 architecture=x86 toolset=msvc --prefix=E:/__A/Arc_Sb/INSTALL/Release/boost/boost-1.68.0 --exec-prefix=E:/__A/Arc_Sb/INSTALL/Release/boost/boost-1.68.0 --libdir=E:/__A/Arc_Sb/INSTALL/Release/boost/boost-1.68.0/lib --includedir=E:/__A/Arc_Sb/INSTALL/Release/boost/boost-1.68.0/include --build-type=complete
<Yorlik> --layout=tagged warnings=on warnings-as-errors=off -q debug-symbols=off variant=release link=shared runtime-link=shared threading=multi -j8
<Yorlik> Thats my b2 parameters
<Yorlik> Just swapped threading out for single this time
<K-ballo> aren't all those params conflicting? a complete build type together with all those build flags?
<K-ballo> my b2 command line on windows is just `b2 --build-type=complete address-model=32,64 -j8` because I still build 32bit for some reason (you'll need 64bit for hpx)
<Yorlik> I had problems when I used system layout - but I might be doing something wrong. I'm still very new to this
<K-ballo> building boost is painful... we are trying to phase that out, but it will take forever
<Yorlik> I saw a convo on git which showed that.
<Yorlik> After its a big and complex system
<Yorlik> Phasing out stuff means you ave to do it yourself or find a good replacement ...
<Yorlik> That's not trivial.
<zao> Blargh... just installed singularity-3.0.3 on my machine. Tarbombed my build tree and then installed into /usr/local contrary to all existing Go resources. \o/
<zao> Move the whole project to Golang, don't document shit, don't make security or feature fixes to 2.x.... *angry rant*
<zao> And oh, break the interface.
<Yorlik> Sounds very .... modern ;)
<zao> Straying into off-topicness, but we have to use it :(
<heller_> is singularity getting mature? now why would I want to use it instead of plain old docker?
<zao> heller_: They've recently shed their old C codebase for Go, currently at 3.0.3.
<zao> The use case is more HPC-friendly, tagline being "bring-your-own-environment".
<zao> Bring a container with the OS, deps and program you need to run, invoke `singularity exec container.simg /awesome-hpx-app` in your batch job.
<zao> No persistent daemon with weird rights, in many configuration not even needing suid binaries for Singularity.
<zao> We use it at HPC2N in production to host the horrible CentOS bullshit that ATLAS LHC jobs require.
<zao> They've got functionality to expose NV GPUs, some MPI impls, and other aspects of the surrounding world into the container.
<zao> Also sharing the networking or running with full network isolation, also things like binding paths into the container, with or without writable overlays.
<zao> Nice in theory, great when it works :)
<Yorlik> There is no scarcity of great concepts ...
hkaiser has joined #ste||ar
aserio has joined #ste||ar
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] sithhell pushed 1 new commit to stackptr_powerpc: https://github.com/STEllAR-GROUP/hpx/commit/cf4ba771b654b72a7b14ca06035984ab727971cc
<ste||ar-github> hpx/stackptr_powerpc cf4ba77 Thomas Heller: Merge branch 'master' into stackptr_powerpc
ste||ar-github has left #ste||ar [#ste||ar]
aserio has quit [Ping timeout: 250 seconds]
aserio has joined #ste||ar
<simbergm> does anyone know how to get nvcc to give longer include traces on error messages
<simbergm> does anyone know how to get nvcc to give longer include traces on error messages?
<zao> Actual paths, or template instantiations?
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] K-ballo force-pushed put_parcel-light from 09bdc6a to b95e122: https://github.com/STEllAR-GROUP/hpx/commits/put_parcel-light
<ste||ar-github> hpx/put_parcel-light 159faae Agustin K-ballo Berge: Simplify parcel creation
<ste||ar-github> hpx/put_parcel-light b95e122 Agustin K-ballo Berge: De-bind put_parcel
ste||ar-github has left #ste||ar [#ste||ar]
<heller_> zao: sounds nice ;)
<simbergm> either
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] sithhell closed pull request #3637: Merging the executor-enabled overloads of shared_future<>::then (master...fixing_3634) https://github.com/STEllAR-GROUP/hpx/pull/3637
ste||ar-github has left #ste||ar [#ste||ar]
<hkaiser> Yorlik: I'm using vcpkg on Windows, takes out most of the pain of building things
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] sithhell closed pull request #3543: added Ubuntu dependency list to readme (master...readme_add_ubuntu_dependenc_list) https://github.com/STEllAR-GROUP/hpx/pull/3543
ste||ar-github has left #ste||ar [#ste||ar]
<Yorlik> It's also a learning execise for me.
<Yorlik> I'm so new to Cmake I'm hammering it with weird jobs to learn it
<hkaiser> Yorlik: if you stick to V1.2, even that comes with vcpkg
<Yorlik> Like controlling boost from a CMake file using Git download and custom commands
<Yorlik> I might give it a try - but i reall want to integrate every tool/library we use int our master CMake structure
<Yorlik> At least until I feel familiar enough with it
<zao> For templates they supposedly have --ftemplate-backtrace-limit=
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] hkaiser created fixing_3639 (+1 new commit): https://github.com/STEllAR-GROUP/hpx/commit/cc44301bc1b4
<ste||ar-github> hpx/fixing_3639 cc44301 Hartmut Kaiser: Fixing ticket 3639, dataflow now works with functions that return a reference.
ste||ar-github has left #ste||ar [#ste||ar]
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] hkaiser opened pull request #3640: Dataflow now works with functions that return a reference (master...fixing_3639) https://github.com/STEllAR-GROUP/hpx/pull/3640
ste||ar-github has left #ste||ar [#ste||ar]
daissgr has joined #ste||ar
<Yorlik> hkaiser: Just checked out vcpg - really a nice tool - thanks for the tip. Still I have to figure out how to get my CMake thing right. Gotta get through my self tought crash course in build engineering. ;)
daissgr has quit [Ping timeout: 240 seconds]
hkaiser has quit [Quit: bye]
aserio has quit [Ping timeout: 264 seconds]
jaafar has joined #ste||ar
diehlpk_work has joined #ste||ar
aserio has joined #ste||ar
hkaiser has joined #ste||ar
<K-ballo> can't build release anymore :| vs gave up
<K-ballo> 1>Error: The operation could not be completed. Unspecified error
parsa[[[w]]] has joined #ste||ar
<K-ballo> hkaiser: does it work for you? either release or relwithdebuginfo
aserio1 has joined #ste||ar
<K-ballo> wiping the .vs folder solves it, some sort of experimental caching/fast load gonewrong
aserio has quit [Ping timeout: 252 seconds]
aserio1 is now known as aserio
hello has joined #ste||ar
<hello> hello all.
<hello> how to debug long time met bug???
<hello> how to debug bug that occur after long time?
<aserio> hello: It depends on the error. In general, you try to reduce your problem down to a minimal reproducible case
<hello> but it takes too time to occour again...
<aserio> What error are you getting
<hello> segment fault
<hello> and I dump, nothing got.
<hello> and valgrind also nothing got.
<aserio> can you send the error in a link
<hello> ./run.sh: line 11: 577 Segmentation fault (core dumped)
<aserio> What are you doing at line 11
<hello> run my c++ program
<hello> the program use 256G memory
<aserio> Well then you have a memory leak
akheir has quit [Remote host closed the connection]
<aserio> are you using pointers?
<hello> my machine is 378G
<hello> yes, but velgrind nothing got.
<aserio> Don't
<hello> My program run well in scale 2-29, but failed at 30.
<aserio> There is a 99% p
<hello> ???
<aserio> chance that you missed an index or something
<hello> yes
<aserio> I would advise rewriting your code to get rid of the raw pointers
<hello> no , I am using c
<hello> not using c++, I want to get speed.
<hello> and optimize
<aserio> lol, then you must suffer the consequences :p
akheir has joined #ste||ar
david_pfander has quit [Read error: Connection reset by peer]
david_pfander has joined #ste||ar
<hello> new progress
<hello> I found the program segment fault at outEdgeArray[pos] = dst;
<hello> and outEdgeArray is 0x0
<hello> but I don't know who changed it.
<aserio> I assume you did because it is your code...
david_pfander has quit [Ping timeout: 250 seconds]
<hello> thank
<hkaiser> K-ballo: I fixed that future<R&> issue we talked about yesterday
<K-ballo> hkaiser: I saw it, looks ok
<K-ballo> shared state should not need to know about remote (in set_remote_state), but I can look into that in the future
<hkaiser> K-ballo: I know
<hkaiser> was the simplest solution...
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 250 seconds]
aserio1 is now known as aserio
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://github.com/STEllAR-GROUP/hpx/commit/2d98f4ecee47f9fc1e1ea27a6d7ab2351eafbcc3
<ste||ar-github> hpx/gh-pages 2d98f4e StellarBot: Updating Sphinx docs
ste||ar-github has left #ste||ar [#ste||ar]
<K-ballo> how are partitioner tags (static, auto, default) used?
daissgr has joined #ste||ar
hello has quit [Quit: Going offline, see ya! (www.adiirc.com)]
aserio has quit [Ping timeout: 240 seconds]
daissgr has quit [Ping timeout: 250 seconds]
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] hkaiser force-pushed various_optimizations from 5af6979 to 7c85c88: https://github.com/STEllAR-GROUP/hpx/commits/various_optimizations
<ste||ar-github> hpx/various_optimizations 702ffb6 Hartmut Kaiser: Applying various (mostly minor) optimizations...
<ste||ar-github> hpx/various_optimizations f459ae2 Hartmut Kaiser: Improving error message...
<ste||ar-github> hpx/various_optimizations e2000b5 Hartmut Kaiser: Partially revert previous changes, fix inspect errors...
ste||ar-github has left #ste||ar [#ste||ar]
<heller_> we really need to hunt those timeouts down ...
<hkaiser> indeed
<heller_> those are super annoying
<hkaiser> hapens during startup
<heller_> ok, do you have more hints?
<hkaiser> heller_: simbergm has posted a log file recently
<heller_> hmmm
<heller_> hkaiser: are those timeouts also showing up in the phylanx unit tests?
<hkaiser> yes
<hkaiser> occasionally
<heller_> i see, life lock during load_components
<hkaiser> it's a deadlock, actually
<hkaiser> I think
<heller_> yeah, sure
aserio has joined #ste||ar
<heller_> lifelock only because of our spinlock ...
<K-ballo> those partitioner tags seem to be future work
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] sithhell created fix_timeout (+1 new commit): https://github.com/STEllAR-GROUP/hpx/commit/58b38af69634
<ste||ar-github> hpx/fix_timeout 58b38af Thomas Heller: Unlocking locks before throwing exceptions
ste||ar-github has left #ste||ar [#ste||ar]
<heller_> K-ballo: for the resource partitioner?
<heller_> K-ballo: or for the parallel algorithms?
<K-ballo> for the parallel algorithms
<K-ballo> those in parallel/traits/extract_partitioner.hpp
<heller_> indeed
<heller_> you need an active project there though
<heller_> hkaiser: this might be something for phylanx...
<hkaiser> nod
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] hkaiser force-pushed fixing_3639 from cc44301 to cd4481c: https://github.com/STEllAR-GROUP/hpx/commits/fixing_3639
ste||ar-github has left #ste||ar [#ste||ar]
<ste||ar-github> hpx/fixing_3639 cd4481c Hartmut Kaiser: Fixing ticket 3639, dataflow now works with functions that return a reference.
<hkaiser> heller_, K-ballo: I have added another overload to future_data_result<Result&> and the compilation error goes away - however I'm not sure if this is a correct solution
<heller_> unit test it ;)
<hkaiser> heller_: smartass
<heller_> :P
<K-ballo> oh, I remember that trait... we didn't have a T& specialization before? odd
<hkaiser> we did
<hkaiser> I added the set(Result&&)
<K-ballo> mmmmh
<hkaiser> it feels wrong
<heller_> taking an address of a rvalue, hmmm
<hkaiser> the 'rvalue' is whatever is trurned from a function Result& f()
<hkaiser> returned*
<K-ballo> yeah, but that doesn't make sense
<K-ballo> are we moving from an lvalue?
<K-ballo> do we have a std::move that should be an std::forward?
<hkaiser> it is the (stable) reference that was returned from the function
<hkaiser> no idea why clang decided not to bind that to the set(Result&) overload
<heller_> why doesn't it match the overload on line 156?
<K-ballo> we are wrongly turning Result& into Result&& somehow
<K-ballo> or into Result even
<hkaiser> it's all forward<Result>'s
<heller_> is there a missing remove_reference somewhere which we need to pass to future_data_result?
<hkaiser> look at the error stack heller_ posted in the ticket
<hkaiser> no
<hkaiser> it has to be a reference
<heller_> hmmm
<hkaiser> essentially this code is invoked from future<T&> f = dataflow([]()->T&{...});
<K-ballo> I'll reproduce locally later when I have the chance
<K-ballo> ...something seems off...
<hkaiser> hold on, I think I see where it happens
<hkaiser> that should be a forward
<hkaiser> yep, that does the trick
<heller_> what do you forward?
<heller_> why to have the res temporary again?
<hkaiser> forward<result_type>(res)
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] hkaiser force-pushed fixing_3639 from cd4481c to fe314b6: https://github.com/STEllAR-GROUP/hpx/commits/fixing_3639
<ste||ar-github> hpx/fixing_3639 fe314b6 Hartmut Kaiser: Fixing ticket 3639, dataflow now works with functions that return a reference.
ste||ar-github has left #ste||ar [#ste||ar]
<hkaiser> no reason, we could directly pass it on
<heller_> yeah, I'd be in favor of doing that instead
aserio has quit [Ping timeout: 250 seconds]
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] hkaiser force-pushed fixing_3639 from fe314b6 to 0c26825: https://github.com/STEllAR-GROUP/hpx/commits/fixing_3639
<ste||ar-github> hpx/fixing_3639 0c26825 Hartmut Kaiser: Fixing ticket 3639, dataflow now works with functions that return a reference.
ste||ar-github has left #ste||ar [#ste||ar]
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
<K-ballo> ah, yeah, that looks more like it
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] K-ballo created partitioner-cleanup (+3 new commits): https://github.com/STEllAR-GROUP/hpx/compare/63fb400e23e0^...a9fde136b72e
<ste||ar-github> hpx/partitioner-cleanup 63fb400 Agustin K-ballo Berge: Revert "Perfect forward for_loop argumewnts"...
<ste||ar-github> hpx/partitioner-cleanup 7b67745 Agustin K-ballo Berge: Cleanup parallel partitioners
<ste||ar-github> hpx/partitioner-cleanup a9fde13 Agustin K-ballo Berge: Drop unused partitioner tags
ste||ar-github has left #ste||ar [#ste||ar]
hkaiser has joined #ste||ar
<K-ballo> hkaiser: the move(get_remote_result_type(...)) looks wrong too, could it be a remnant of the pre-move days?
<hkaiser> K-ballo: yah, I was thinking about this very same thing an hour ago ;-)
akheir has quit [Remote host closed the connection]
akheir has joined #ste||ar
aserio has quit [Ping timeout: 268 seconds]
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] sithhell pushed 1 new commit to fix_timeout: https://github.com/STEllAR-GROUP/hpx/commit/a2258ddc3ac9b05d475a228ab54fa65c9725b495
<ste||ar-github> hpx/fix_timeout a2258dd Thomas Heller: Enable deadlock detection for the CI runs
ste||ar-github has left #ste||ar [#ste||ar]
aserio has joined #ste||ar
<hkaiser> aserio: anything important during the meeting?
<aserio> hkaiser: We discussed Seg. Faults during shut down on 64 cores
<hkaiser> uggh
<hkaiser> Steve?
<aserio> heller_ created/mentioned this issues
<aserio> Alex actually
<hkaiser> is there a ticket?
<aserio> But Steve sees it with the sanitizer
<heller_> I think those are distinct
<heller_> I answered the sanitizer ticket
khuck has joined #ste||ar
<hkaiser> heller_: so, do we have a ticket?
<heller_> yes and no
<heller_> yes for the sanitizer, no for alex' problem
<hkaiser> is it Phylanx specific?
<khuck> can someone remind me what this error means: terminating with uncaught exception of type hpx::detail::exception_with_info<hpx::exception>: <unknown> (while trying to resolve: =3:7910): HPX(network_error)
<heller_> yes and no ;)
<hkaiser> khuck: could be zombies lurking
<heller_> K-ballo: another HPX process is running on the same machine
<khuck> impossible
<K-ballo> ?
<khuck> K-ballo: mistaken identity - he meant me
<hkaiser> wasn't me this time ;-)
<khuck> yeah, usually hartmut does it
<khuck> heller_: I am the only user on the machine, it's only been up 2 days, and nothing else is running
<hkaiser> khuck: =3:7910 is a strage ip address
<khuck> hkaiser: agreed
<heller_> khuck: ss might help
<khuck> heller_: ss?
<hkaiser> does it use ipv4 or ipv6?
<K-ballo> ...
<heller_> khuck: the new netcat
<heller_> anyway, looks like it can't resolve =3 to an IP
<heller_> wherever that comes from
<khuck> I am on our power8 node
<khuck> I was getting that error on another node, and I thought it was because Monil was also running.
<heller_> khuck: what does 'hostname' and 'getent hostname'?
<khuck> heller_: hostname -> centaur
<khuck> getent doesn't work
<heller_> can you show me the commandline?
<heller_> yeah, figured
<khuck> /home/users/khuck/buildbot/slaves/phylanx/ppc64le-clang5-release/build/tools/buildbot/build-centaur-ppc64le-Linux-clang/phylanx-Release/bin/als_csv_instrumented -t36 --data_csv=/home/users/khuck/buildbot/data/MovieLens_20m.csv --hpx:bind=balanced --hpx:numa-sensitive --iterations=1 -r=0.25 -a=3 --f=3 --row_stop=700 --col_stop=20000
<heller_> aha
<heller_> -a=3
<khuck> folks on the phylanx call told me that was "alpha"...
<heller_> i think -a is, or rather used to be a shorthand for --hpx:agas
<khuck> so did the help message from the application :/
<khuck> yup, works now
<hkaiser> uhh, I thought we disabled the shortcuts in Phylanx :/
<heller_> so you are setting the agas hostname to =3
<khuck> groovy, thanks.
<hkaiser> heller_: so again, what has Alex reported?
<hkaiser> segafult when running with 64 cores?
<heller_> hkaiser: something going wrong at shutdown. I had audio problems so only heard the end of the conversation. I told him to open a ticket
<hkaiser> k, thanks
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] hkaiser force-pushed fixing_3639 from 0c26825 to a3aec35: https://github.com/STEllAR-GROUP/hpx/commits/fixing_3639
<ste||ar-github> hpx/fixing_3639 a3aec35 Hartmut Kaiser: Fixing ticket 3639, dataflow now works with functions that return a reference.
ste||ar-github has left #ste||ar [#ste||ar]
hkaiser has quit [Read error: Connection reset by peer]
<khuck> heller_: how should I interpret an average idle rate of 42%? not good, I assume
aserio has quit [Quit: aserio]
<heller_> khuck: yeah, not optimal
<khuck> heller_: I think I might be on the trail of the poor scaling in ALS.
<khuck> the number of HPX tasks increases with the number of threads
<khuck> but the problem is constant?
<heller_> well, it has to increase, otherwise, you might not get enough parallelism
<heller_> it might increase too much though
<khuck> going from 8 to 16 threads, the idle rate goes from 42 to 55
<khuck> then again, this is the power system, and we disabled direct actions (stack overflow)
<heller_> ok, that explains it
<heller_> are you sure it's a stack overflow though?
bibek has quit [Quit: Konversation terminated!]
<heller_> the direct action problem, that is
<khuck> two of the add primitives and two of the transpose primitives double in count.
<khuck> heller_: well, hartmut suggested that change. it was a while ago.
<heller_> yeah, I remember that discussion
<heller_> this was a possible move to fix it
<heller_> do you have clang available there?
<heller_> and can you point me to the code which would reenable the direct actions?
<heller_> I can directly test it on centaur
<khuck> I am testing it there, I am looking at the code
<khuck> src/execution_tree/primitives/primitive_component_base.cpp - line 290
<khuck> I am using a spack install of clang 7, so it might be worth re-enabling it
bibek has joined #ste||ar
<heller_> i'll run it with asan
<khuck> hmmm
<khuck> re-enabling direct actions helped, but not much.
<khuck> heller_: how do I change that value on the command line? the code suggests I can do that.
<heller_> which value?
<khuck> hpx::get_config_entry("phylanx.exec_time_upper_threshold")
<heller_> ahh..
<heller_> --hpx:ini="phylanx.exec_time_upper_threshold!=..."
<khuck> with the ! ?
<heller_> yes
<khuck> huh.
<heller_> the ! is for forcing the value into existance during command line parsing if the entry doesn't exist
<khuck> as opposed to "not equal to" :)
<heller_> yeah ;)
<heller_> it's a ugly workaround to a strange detail in the runtime config parsing...
<khuck> heller_: so...I am running with 16 OS threads, but only seeing a load of about ~8.
<khuck> ok, it's up to 11
<khuck> 12
<heller_> hmmm
<khuck> heller_: does high idle rate mean not enough concurrency?
<khuck> or not enough?
<heller_> too much IO of some sort?
<heller_> depends...
<khuck> there's no I/O
<khuck> other than reading in the file at the beginning
<heller_> if you say your CPU utilization is very low, I'd assume too little
<khuck> cores in htop look pegged
<heller_> khuck: how many HPX tasks are executed per second?
<khuck> ~30000
<heller_> that's roughly 1875 per thread
<heller_> that should be enough