aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
bikineev has quit [Remote host closed the connection]
parsa has joined #ste||ar
parsa has quit [Client Quit]
parsa has joined #ste||ar
mcopik has quit [Ping timeout: 246 seconds]
EverYoung has quit [Ping timeout: 255 seconds]
akheir has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
parsa has quit [Quit: Zzzzzzzzzzzz]
hkaiser_ has quit [Quit: bye]
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
parsa has quit [Client Quit]
vamatya has joined #ste||ar
vamatya has quit [Ping timeout: 248 seconds]
vamatya has joined #ste||ar
Matombo has joined #ste||ar
akheir has quit [Remote host closed the connection]
jaafar has quit [Ping timeout: 246 seconds]
bikineev has joined #ste||ar
Matombo has quit [Remote host closed the connection]
bikineev has quit [Remote host closed the connection]
<heller>
jbjnr: grrr. can't get clang to build on daint :/
* heller
is a massive failure
<jbjnr>
oh dear - what's the problem
<heller>
doesn't seem to link correctly
<jbjnr>
want me to try?
<jbjnr>
(he asys knowing the answer)
<heller>
sure, if you like
<heller>
in the meantime, I try to get nvcc running
<jbjnr>
wrong answer - you were supposed to say no. There is nothing I can do/try that you cannot
<github>
[hpx] hkaiser pushed 1 new commit to inspect_assert: https://git.io/v5B3X
<github>
hpx/inspect_assert d3f4c98 Hartmut Kaiser: Fixing more inspect problems
<jbjnr>
heller: I'd forgotten how much baggae clang comes with - did you scriptify your download and build so that I can use it too :)
<jbjnr>
^baggage
<github>
[hpx] hkaiser force-pushed serialize_boost_variant from 51f1be9 to 5be49af: https://git.io/v5W7o
<github>
hpx/serialize_boost_variant 5be49af Hartmut Kaiser: Changed serialization of boost.variant to use variadic templates
<hkaiser>
jbjnr: how can I test whether your RP fix relly solved the problem?
<jbjnr>
<trust me>
<heller>
jbjnr: no, did everything by hand
<jbjnr>
it only fixed the mask setting, but to test it would depend somewhat on the user's binding options
<heller>
jbjnr: I am on to something with nvcc though
<jbjnr>
one easy test (that I will ad) is to simply check that each pool has a mask that is non zero, but that doesn't check the correctness of the mask
<hkaiser>
jbjnr: well, I remember in your initial problem report the mask was supposed to be the or'ed result of other masks
<jbjnr>
ok, true, but to verify that, we'd have to do the same thing that the mask setting does
<hkaiser>
we need to somehow get a handle on testing the RP - we'll never see the end of problems otherwise
<jbjnr>
iterate over each pu in the pool and set the flag in the mask - but since that's what the code does anyway - testing it would just be doing the same thing again.
<hkaiser>
jbjnr: at least it would verify that we don't break anything while changing stuff later on
<jbjnr>
ok, easy test : just create one pool and check that the pu mask has all pus in it -same as the mask that is created for the node
<jbjnr>
that'll catch most problems if anything breaks
<jbjnr>
no real need to iterate over every pool. if one breaks, the others will too (we suppose)
<hkaiser>
ok
<jbjnr>
heller: did you keep a log of all your by hand checkouts and cmake invocations. I've got an error on the first llvm cmake run about python can't find directory ....
<heller>
oh
<heller>
i didn't encounter this
<heller>
I am using clang 4.0.1
<jbjnr>
oh. I see. you have to build in source tree subdir.
<jbjnr>
that's the superbuild setup thingy for clang that I hope will build what we want in one go
<jbjnr>
mostly I wonder - do we need any of more from this list AArch64, AMDGPU, ARM, BPF, Hexagon, Mips, MSP430, NVPTX, PowerPC, Sparc, SystemZ, X86, XCore.
<jbjnr>
the amdgpu is interesting. didn't know they had that
aserio has quit [Ping timeout: 246 seconds]
<heller>
don't think we need any other
<jbjnr>
good. I don't know what several of them are ...
<heller>
great. the stream benchmark builds now, but fails at runtime
<heller>
hooray for the resource partitioner :P
<zao>
I got started on setting up buildbot for my own projects the other day. How 0.8-bound is the stellar buildbot repo?
<zao>
I see there's a lot of magic to generate the slaves :)
<zao>
*workers
<heller>
yes
<jbjnr>
heller: why hooray for RP?
<heller>
zao: one of the biggest limitations of buildbot, IMHO, is the connection to the slaves
<heller>
over an insecure socket
david_pfander has quit [Remote host closed the connection]
<heller>
jbjnr: because it's the reason why it fails at startup
<heller>
give me a second
<jbjnr>
oh dear
<diehlpk_work>
heller, jbjnr, hkaiser, zbyerly Any chnages at the paper you like to commit before I working on it today?
aserio has joined #ste||ar
<zao>
Do they expect you to set up your own tunnels somehow?
<zao>
Or just don't care?
<heller>
diehlpk_work: no, don't have one
<heller>
zao: I think they don't care and assume everything is on a secure private network
<heller>
jbjnr: what(): hpx::resource::get_partitioner() can be called only after the resource partitioner has been allowed to parse the command line options.
<zao>
What's the worst that could happen? Fake slave submits bogus results with a stolen password I guess.
<zao>
Anyway, looks like some quality python wrangling to get a matrix up.
<zao>
0.9's web UI looks "nifty".
<heller>
it does
<jbjnr>
heller: look at examples/resource_partitioner/ for correct use
<heller>
zao: the worst that could happen is that you find an exploit in buildbot ;)
<jbjnr>
first call to rp, pass in argc argv etc
<heller>
jbjnr: TBH, it just sucks that previously working code is not properly broken
<heller>
now
<jbjnr>
sorry
<heller>
without a notice, even ;)
<jbjnr>
I think to be fair, the code was merged a little too soon, but at least now, you're fixing things too. :) Silver lining!
<heller>
jbjnr: and if I do lkike the example does, I get a segfault
<jbjnr>
if you have a test that fails, feel free to let me have a go.
<jbjnr>
leaving office in a mo, but will try later and over wekend
<jbjnr>
clang 70%
<heller>
jbjnr: sure, hartmut just says: "na, there is a reason why I left it broken. It needs to be ported!"
<jbjnr>
looking good so far
<heller>
signing of too
<heller>
ttyl
<jbjnr>
PS. I have no idea how to use cuda+clang, so I'll be asking about that later once my build completes (80%)
<heller>
Great
<heller>
As long as we have a working compiler...
bikineev has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
<heller>
hmm, without nvcc it works
<heller>
of course...
<diehlpk_work>
Do we have a citation for the integration of xeon phi?
Matombo has quit [Remote host closed the connection]
bibek_desktop has quit [Remote host closed the connection]
bibek_desktop has joined #ste||ar
aserio has quit [Ping timeout: 246 seconds]
zbyerly_ has joined #ste||ar
bikineev has quit [Ping timeout: 240 seconds]
zbyerly_ has quit [Ping timeout: 240 seconds]
mcopik has joined #ste||ar
<diehlpk_work>
hkaiser, mcopik Could you do the final evaluation before I leave for the long weekend? I will arrive home after the deadline
akheir has quit [Remote host closed the connection]
<hkaiser>
diehlpk_work: will definitely do today
<diehlpk_work>
Thanks, we have now two pages for the paper
<diehlpk_work>
Will writelater more
<heller>
Got a seminar today, will get to the paper once cuda is operational again
<heller>
hkaiser: so, nvcc did it again.
zbyerly_ has joined #ste||ar
<heller>
Looks like it's close to unfixable. I'm hoping for clang now...
<heller>
First test on daint succeeded...
<heller>
The problem seems to be with the hpx_main function pointer passed into the RP. Some invalid memory accesses going on in and around argument parsing
<heller>
The only thing that really changed here is that the init stuff was in libhpx_init.a and it's in libhpx.so now. Could that really lead to problems?
<heller>
Or asking differently: what would break if I move that part of the RP into init.a?
<hkaiser>
heller: try it
aserio has joined #ste||ar
<heller>
hkaiser: I'll commit what I have so far in a few minutes. Would be interesting what the situation on Windows is
zbyerly_ has quit [Ping timeout: 260 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
bikineev has joined #ste||ar
<jbjnr>
K-ballo: yes. I used CMAKE_BUILD_TYPE=Release when I compiled clang
<jbjnr>
heller: awesome. it works
<jbjnr>
(hello world at least)
<heller>
jbjnr: yes. I am trying the hpx examples now
<heller>
fingers pressed
<jbjnr>
^crossed
<heller>
false friends
<jbjnr>
?
<heller>
jbjnr: in english classes, we have a category named "false friends", which are words that have the same sounds than in german, but translate totally different
<heller>
or phrases you translate literaly which don't make sense anymore then