aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
bikineev has quit [Remote host closed the connection]
parsa has joined #ste||ar
parsa has quit [Client Quit]
parsa has joined #ste||ar
mcopik has quit [Ping timeout: 246 seconds]
EverYoung has quit [Ping timeout: 255 seconds]
akheir has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
parsa has quit [Quit: Zzzzzzzzzzzz]
hkaiser_ has quit [Quit: bye]
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
parsa has quit [Client Quit]
vamatya has joined #ste||ar
vamatya has quit [Ping timeout: 248 seconds]
vamatya has joined #ste||ar
Matombo has joined #ste||ar
akheir has quit [Remote host closed the connection]
jaafar has quit [Ping timeout: 246 seconds]
bikineev has joined #ste||ar
Matombo has quit [Remote host closed the connection]
bikineev has quit [Remote host closed the connection]
david_pfander has joined #ste||ar
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/v54bv
<github> hpx/gh-pages 797af29 StellarBot: Updating docs
bikineev has joined #ste||ar
bikineev_ has joined #ste||ar
bikineev has quit [Ping timeout: 255 seconds]
Matombo has joined #ste||ar
mcopik has joined #ste||ar
bikineev_ has quit [Ping timeout: 255 seconds]
bikineev has joined #ste||ar
bikineev has quit [Ping timeout: 252 seconds]
vamatya has quit [Ping timeout: 248 seconds]
bikineev has joined #ste||ar
hkaiser has joined #ste||ar
K-ballo has joined #ste||ar
zbyerly_ has joined #ste||ar
<heller> jbjnr: grrr. can't get clang to build on daint :/
* heller is a massive failure
<jbjnr> oh dear - what's the problem
<heller> doesn't seem to link correctly
<jbjnr> want me to try?
<jbjnr> (he asys knowing the answer)
<heller> sure, if you like
<heller> in the meantime, I try to get nvcc running
<jbjnr> wrong answer - you were supposed to say no. There is nothing I can do/try that you cannot
<github> [hpx] hkaiser pushed 1 new commit to inspect_assert: https://git.io/v5B3X
<github> hpx/inspect_assert d3f4c98 Hartmut Kaiser: Fixing more inspect problems
<jbjnr> heller: I'd forgotten how much baggae clang comes with - did you scriptify your download and build so that I can use it too :)
<jbjnr> ^baggage
<github> [hpx] hkaiser force-pushed serialize_boost_variant from 51f1be9 to 5be49af: https://git.io/v5W7o
<github> hpx/serialize_boost_variant 5be49af Hartmut Kaiser: Changed serialization of boost.variant to use variadic templates
<hkaiser> jbjnr: how can I test whether your RP fix relly solved the problem?
<jbjnr> <trust me>
<heller> jbjnr: no, did everything by hand
<jbjnr> it only fixed the mask setting, but to test it would depend somewhat on the user's binding options
<heller> jbjnr: I am on to something with nvcc though
<jbjnr> one easy test (that I will ad) is to simply check that each pool has a mask that is non zero, but that doesn't check the correctness of the mask
<hkaiser> jbjnr: well, I remember in your initial problem report the mask was supposed to be the or'ed result of other masks
<jbjnr> ok, true, but to verify that, we'd have to do the same thing that the mask setting does
<hkaiser> we need to somehow get a handle on testing the RP - we'll never see the end of problems otherwise
<jbjnr> iterate over each pu in the pool and set the flag in the mask - but since that's what the code does anyway - testing it would just be doing the same thing again.
<hkaiser> jbjnr: at least it would verify that we don't break anything while changing stuff later on
<jbjnr> ok, easy test : just create one pool and check that the pu mask has all pus in it -same as the mask that is created for the node
<jbjnr> that'll catch most problems if anything breaks
<jbjnr> no real need to iterate over every pool. if one breaks, the others will too (we suppose)
<hkaiser> ok
<jbjnr> heller: did you keep a log of all your by hand checkouts and cmake invocations. I've got an error on the first llvm cmake run about python can't find directory ....
<heller> oh
<heller> i didn't encounter this
<heller> I am using clang 4.0.1
<jbjnr> oh. I see. you have to build in source tree subdir.
<jbjnr> blimey
<heller> hmm, I don't do that
hkaiser has quit [Quit: bye]
<jbjnr> gosh I cannot find and build instructions for llvm itself
aserio has joined #ste||ar
<jbjnr> zao: thanks.
<zao> Also mentions the dependencies you may need.
Matombo has quit [Ping timeout: 264 seconds]
zbyerly_ has quit [Ping timeout: 246 seconds]
akheir has joined #ste||ar
hkaiser has joined #ste||ar
Matombo has joined #ste||ar
<heller> hkaiser: I am a bit stuck on fixing the new unwrap code
<hkaiser> aserio: could you give me the link for the storm meeting now, pls
<hkaiser> heller: what's up?
<heller> it looks like edg is stumbling across lvalue vs. rvalue vs. forwarding references etc.
<hkaiser> aserio: nvm, found it
<heller> leading to SFINAE failures which are really hard to track down
<hkaiser> heller: I previously suggested to use th eold code for cuda
<hkaiser> at least for now
<heller> hmm
<heller> next on my list is trying cuda9
<heller> but that doesn't help for next week :/
<hkaiser> you need to patch boost for that
<heller> or use trunk, yeah
<heller> I'd really like to know what's going on there
jaafar has joined #ste||ar
<jbjnr> heller: Disk quota exceeded - my daint usage is doomed
<hkaiser> heller: brain-dead compiler - that's what's going on
<heller> jbjnr: $SCRATCH
<heller> hkaiser: it's the EDG frontend they use
<hkaiser> for next week, simply use the old code
<heller> major endavour to rebase the old code onto current master :/
<hkaiser> nah, it's just one file
<heller> and all usage
<hkaiser> and add dummy unwrapping and unwrap functions which forward to the old unwrapped()
<heller> that's not a solution we want to merge to master though :/
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
<heller> one culprit is coming from the binbacking distribution policy
<hkaiser> remove it
mcopik has quit [Ping timeout: 248 seconds]
bikineev has quit [Ping timeout: 240 seconds]
bikineev has joined #ste||ar
<heller> hkaiser: MSVC doesn't have any problems with that new code, right?
<heller> I am thinking in the direction that nvcc is just too eager to instantiate templates...
<heller> getting there now ... disabling code when you are in device compilation mode, really helps a lot
akheir has quit [Remote host closed the connection]
<hkaiser> no problems there
<jbjnr> heller: there is an easybuild config for clang 3.7.1 on our sstem - should I try it, or would it be too old for us?
<heller> jbjnr: TBH, I have no idea
<heller> but sure, try it!
<heller> jbjnr: there is also a llvm PrgEnv on newer crays
<heller> that one here
<jbjnr> no llvm on our machine
<heller> yeah
<heller> what are the odds it will get installed till monday?
<jbjnr> about the same as my PhD being accepted for examination. = 0. it's being printed as we speak
<heller> yay!
<heller> congrats!
<jbjnr> not yet. if they accept it, then I'll be happy.
<heller> ;)
<heller> yes yes!
akheir has joined #ste||ar
<diehlpk_work> heller, hkaiser Can not do a skype this Monday because we have labor day here
bikineev has quit [Ping timeout: 252 seconds]
<jbjnr> heller: am I missing any components we might need here
<jbjnr> cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$HOME/apps/daint/llvm -DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi" -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" $TOP_LEVEL_DIR/llvm-project/llvm
<heller> jbjnr: looks good
<jbjnr> that's the superbuild setup thingy for clang that I hope will build what we want in one go
<jbjnr> mostly I wonder - do we need any of more from this list AArch64, AMDGPU, ARM, BPF, Hexagon, Mips, MSP430, NVPTX, PowerPC, Sparc, SystemZ, X86, XCore.
<jbjnr> the amdgpu is interesting. didn't know they had that
aserio has quit [Ping timeout: 246 seconds]
<heller> don't think we need any other
<jbjnr> good. I don't know what several of them are ...
<heller> great. the stream benchmark builds now, but fails at runtime
<heller> hooray for the resource partitioner :P
<zao> I got started on setting up buildbot for my own projects the other day. How 0.8-bound is the stellar buildbot repo?
<zao> I see there's a lot of magic to generate the slaves :)
<zao> *workers
<heller> yes
<jbjnr> heller: why hooray for RP?
<heller> zao: one of the biggest limitations of buildbot, IMHO, is the connection to the slaves
<heller> over an insecure socket
david_pfander has quit [Remote host closed the connection]
<heller> jbjnr: because it's the reason why it fails at startup
<heller> give me a second
<jbjnr> oh dear
<diehlpk_work> heller, jbjnr, hkaiser, zbyerly Any chnages at the paper you like to commit before I working on it today?
aserio has joined #ste||ar
<zao> Do they expect you to set up your own tunnels somehow?
<zao> Or just don't care?
<heller> diehlpk_work: no, don't have one
<heller> zao: I think they don't care and assume everything is on a secure private network
<heller> jbjnr: what(): hpx::resource::get_partitioner() can be called only after the resource partitioner has been allowed to parse the command line options.
<zao> What's the worst that could happen? Fake slave submits bogus results with a stolen password I guess.
<zao> Anyway, looks like some quality python wrangling to get a matrix up.
<zao> 0.9's web UI looks "nifty".
<heller> it does
<jbjnr> heller: look at examples/resource_partitioner/ for correct use
<heller> zao: the worst that could happen is that you find an exploit in buildbot ;)
<jbjnr> first call to rp, pass in argc argv etc
<heller> jbjnr: TBH, it just sucks that previously working code is not properly broken
<heller> now
<jbjnr> sorry
<heller> without a notice, even ;)
<jbjnr> I think to be fair, the code was merged a little too soon, but at least now, you're fixing things too. :) Silver lining!
<heller> jbjnr: and if I do lkike the example does, I get a segfault
<jbjnr> if you have a test that fails, feel free to let me have a go.
<jbjnr> leaving office in a mo, but will try later and over wekend
<jbjnr> clang 70%
<heller> jbjnr: sure, hartmut just says: "na, there is a reason why I left it broken. It needs to be ported!"
<jbjnr> looking good so far
<heller> signing of too
<heller> ttyl
<jbjnr> PS. I have no idea how to use cuda+clang, so I'll be asking about that later once my build completes (80%)
<heller> Great
<heller> As long as we have a working compiler...
bikineev has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
<heller> hmm, without nvcc it works
<heller> of course...
<diehlpk_work> Do we have a citation for the integration of xeon phi?
<jbjnr> heller daint104:/scratch/snx1600/biddisco/build/llvm-build$ /users/biddisco/apps/daint/llvm/bin/clang++ --version
<jbjnr> clang version 6.0.0
<jbjnr> Target: x86_64-unknown-linux-gnu
<jbjnr> Thread model: posix
<jbjnr> InstalledDir: /users/biddisco/apps/daint/llvm/bin
<jbjnr> \o/
<heller> jbjnr: woohooo!
<jbjnr> going home now. will test later
<heller> jbjnr: HPX_WITH_CUDA=On HPX_WITH_CUDA_CLANG=On
<K-ballo> jbjnr: built clang yourself? consider specifying build type release, otherwise it crawls
aserio has joined #ste||ar
<heller> jbjnr: /users/biddisco/apps/daint/llvm/bin/clang++ --cuda-path=$CUDATOOLKIT_HOME hello.cu -L/opt/nvidia/cudatoolkit8.0/8.0.54_2.2.8_ga620558-2.1/lib64 -L/opt/nvidia/cudatoolkit8.0/8.0.54_2.2.8_ga620558-2.1/extras/CUPTI/lib64 -Wl,--as-needed -Wl,-lcupti -Wl,-lcudart -Wl,--no-as-needed -L/opt/cray/nvidia/default/lib64 -L/opt/cray/nvidia/default/lib64 -lcuda
<heller> jbjnr: that's the command that works!
<heller> jbjnr: those flags are from "module show cudatoolkit"
Matombo has quit [Remote host closed the connection]
bibek_desktop has quit [Remote host closed the connection]
bibek_desktop has joined #ste||ar
aserio has quit [Ping timeout: 246 seconds]
zbyerly_ has joined #ste||ar
bikineev has quit [Ping timeout: 240 seconds]
zbyerly_ has quit [Ping timeout: 240 seconds]
mcopik has joined #ste||ar
<diehlpk_work> hkaiser, mcopik Could you do the final evaluation before I leave for the long weekend? I will arrive home after the deadline
akheir has quit [Remote host closed the connection]
<hkaiser> diehlpk_work: will definitely do today
<diehlpk_work> Thanks, we have now two pages for the paper
<diehlpk_work> Will writelater more
<heller> Got a seminar today, will get to the paper once cuda is operational again
<heller> hkaiser: so, nvcc did it again.
zbyerly_ has joined #ste||ar
<heller> Looks like it's close to unfixable. I'm hoping for clang now...
<heller> First test on daint succeeded...
<heller> The problem seems to be with the hpx_main function pointer passed into the RP. Some invalid memory accesses going on in and around argument parsing
<heller> The only thing that really changed here is that the init stuff was in libhpx_init.a and it's in libhpx.so now. Could that really lead to problems?
<heller> Or asking differently: what would break if I move that part of the RP into init.a?
<hkaiser> heller: try it
aserio has joined #ste||ar
<heller> hkaiser: I'll commit what I have so far in a few minutes. Would be interesting what the situation on Windows is
zbyerly_ has quit [Ping timeout: 260 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
bikineev has joined #ste||ar
<jbjnr> K-ballo: yes. I used CMAKE_BUILD_TYPE=Release when I compiled clang
<jbjnr> heller: awesome. it works
<jbjnr> (hello world at least)
<heller> jbjnr: yes. I am trying the hpx examples now
<heller> fingers pressed
<jbjnr> ^crossed
<heller> false friends
<jbjnr> ?
<heller> jbjnr: in english classes, we have a category named "false friends", which are words that have the same sounds than in german, but translate totally different
<heller> or phrases you translate literaly which don't make sense anymore then
<jbjnr> I'm wiser now than I was before
<github> [hpx] sithhell created fix_cuda (+1 new commit): https://git.io/v5Rf6
<github> hpx/fix_cuda 3a72e69 Thomas Heller: Attempting to fix nvcc
<heller> hkaiser: ^^
<heller> jbjnr: do we have a jemalloc install somewhere?
<jbjnr> I have not recompiled stuff with clang yet, but I was planning on doing that. jemalloc, hwloc, apex, otf, etc hpx
<jbjnr> all my gcc libs are in /project/csviz/biddisco/apps/daint of a similar location as the clang install (same root, can't remember it otomh)
<heller> c libraries should be fine
<jbjnr> JEMALLOC_INCLUDE_DIR:PATH=/users/biddisco/apps/daint/jemalloc/4.5.0/include
<jbjnr> JEMALLOC_LIBRARY:FILEPATH=/users/biddisco/apps/daint/jemalloc/4.5.0/lib/libjemalloc.a
<jbjnr> JEMALLOC_ROOT:PATH=/users/biddisco/apps/daint/jemalloc/4.5.0
aserio has quit [Ping timeout: 246 seconds]
<jbjnr> that's what I use for my current hp build
<jbjnr> ^hpx
aserio has joined #ste||ar
bikineev has quit [Remote host closed the connection]
<heller> jbjnr: thanks
<jbjnr> # /users/biddisco/apps/daint contains my previous installs of tools, let me know if permissions are wrong for other users
<heller> no, they are fine
<heller> boost building
<jbjnr> lovely
<heller> forgot to set BOOST_ROOT
<heller> configuring was a success
<heller> awesome ;)
<heller> lots of warnings now ...
<heller> jbjnr: did you compile with libcxx?
<heller> thank god you did
<jbjnr> if you mean -DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi" then yes
<jbjnr> if you mean something else, then I dunno
<heller> na
<heller> that's fine
<heller> wow. clang trunk removed std::auto_ptr
<heller> jbjnr: you need boost 1.65.0
<jbjnr> ok. I will upgrade when I buil. still playing with excel and not working. sorry :)
<heller> jbjnr: ./b2 -j8 toolset=clang variant=release link=shared cxxflags="-std=c++17 -stdlib=libc++" linkflags="-std=c++17 -stdlib=libc++"
<heller> that's my b2 command
<heller> ;)
<heller> NP
<jbjnr> very good. we'll put all this into a gist/wiki and use it all next week if everything works
<heller> IF
<heller> I have one more idea regarding the other failure
<heller> made good progress already ... at least figured out where the ICE wsa coming from
<heller> libhpx.so built
david_pfander has joined #ste||ar
david_pfander has quit [Client Quit]
david_pfander has joined #ste||ar
hkaiser has quit [Quit: bye]
bikineev has joined #ste||ar
<mcopik> diehlpk_work: Patrick, my evaluation is done
<diehlpk_work> Thanks
patg has quit [Quit: See you later]
<heller> jbjnr: clang compiler errors. yay
<heller> but fixable this time
<heller> need a few more hours, but also a break. TTYl
<mbremer> Hi, I'm getting the following compiler error when trying to build hpx with_ittnotify=On
<mbremer> "/work/02578/bremer/stampede2/hpx/hpx/util/bind.hpp:345:31: error: cannot convert ‘hpx::util::itt::string_handle’ to const char*’ in return >::call(_f);"
<mbremer> Does anyone have any suggestions on what might be going wrong?
<K-ballo> mbremer: could you paste the full context on some pastesite and link it?
bibek_desktop has quit [Quit: Leaving]
<mbremer> K-ballo: sure thing, just a sec
<mbremer> @K-ballo: should I open an issue?
<K-ballo> mbremer: I think so, but I'm not 100% sure... you could wait a few minutes for hkaiser to confirm
<mbremer> Sure, I'm in rush
<K-ballo> that commit introduces an `itt::string_handle` which `hpx::function` is not aware of
<mbremer> in no*
mcopik has quit [Ping timeout: 248 seconds]
<mbremer> @K-ballo: I'm going to head out. I'll check in later when hkaiser is here. Thanks for your help!
hkaiser has joined #ste||ar
<K-ballo> hkaiser: ^
eschnett has quit [Quit: eschnett]
<github> [hpx] K-ballo deleted std-atomic at d0613ef: https://git.io/v5RZn
diehlpk_work has quit [Quit: Leaving]
aserio has quit [Quit: aserio]
david_pfander has quit [Remote host closed the connection]
david_pfander has joined #ste||ar
akheir has joined #ste||ar
mbremer has quit [Quit: Page closed]
david_pfander has quit [Ping timeout: 255 seconds]
eschnett has joined #ste||ar
bikineev has quit [Remote host closed the connection]
EverYoung has quit [Ping timeout: 255 seconds]