aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
denis_blank has quit [Quit: denis_blank]
patg has joined #ste||ar
hkaiser_ has quit [Quit: bye]
eschnett has joined #ste||ar
ajaivgeorge has quit [Ping timeout: 240 seconds]
K-ballo has quit [Quit: K-ballo]
patg has quit [Quit: This computer has gone to sleep]
patg has joined #ste||ar
vamatya has joined #ste||ar
eschnett has quit [Quit: eschnett]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
vamatya has quit [Ping timeout: 260 seconds]
vamatya has joined #ste||ar
patg has quit [Quit: This computer has gone to sleep]
pree has joined #ste||ar
pree has quit [Ping timeout: 276 seconds]
Matombo has joined #ste||ar
pree has joined #ste||ar
david_pfander has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 240 seconds]
<david_pfander>
Vir: yt?
<Vir>
david_pfander: yes, so far I have not succeeded in compiling octotiger
<david_pfander>
Vir: oh, that's bad
<david_pfander>
Vir: what's your current error?
<david_pfander>
Vir: the scripts I provided should run on a somewhat recent ubuntu machine (or similar)
<Vir>
last issue is about a undefined reference on linking libhpx.so
* Vir
uses Xenial (plus KDE Neon on top)
<david_pfander>
Vir: can you post the error somewhere?
<david_pfander>
Vir: and I have a conversion question: I have this type:
<david_pfander>
using simd_vector = typename Vc::datapar<double, Vc::datapar_abi::fixed_size<8>>;
<david_pfander>
and this type:
<david_pfander>
using int_simd_vector = typename Vc::datapar<int32_t, Vc::datapar_abi::fixed_size<8>>;
<david_pfander>
how can I convert com int_simd_vector to simd_vector with Vc2?
<david_pfander>
with Vc one, I think the conversion was automatic
<Vir>
use static_datapar_cast<double>(...)
vamatya has quit [Ping timeout: 246 seconds]
<Vir>
david_pfander: it would be very helpful if you can document your issues (what you like or dislike) with implicit and explicit casting. That's going to be important feedback to the standardization process.
<david_pfander>
Vir: I can try, as a very first feedback: I read about the static_datapar_cast in the datapar.pdf, but intuitively I thought that I would need to provide the vector types
<david_pfander>
Vir: I principle, I like the more explicit casting, mainly because I care a lot about performance, and conversions aren't free
<heller>
Vir: boost was compiled with the wrong -std= flag
<heller>
Vir: stupid gcc for introducing ABI breakage there ;)
<Vir>
david_pfander: It's actually not too hard to make static_datapar_cast accept both an element type and a vector type. The latter allows to query a specific ABI.
<david_pfander>
heller: thx
<Vir>
heller, david_pfander: Oh, where do I add the flag?
<heller>
Vir: at the b2 invocation
<heller>
Vir: b2 cxxflags=-std=c++11 should do the trick
<Vir>
david_pfander: that looks like a bug in the SFINAE logic of static_datapar_cast
<david_pfander>
Vir: Ok, I'm opening an issue. Is there a workaround? :)
<Vir>
david_pfander: yes, because currently the cast implementation is just a stub that calls the generator ctor of the resulting datapar cast to do element by element static_casting
<Vir>
you can easily do that yourself
<Vir>
whether it performs well, I don't know. Depends on the compiler...
<david_pfander>
Vir: Yeah, performance was kind of my concern :)
<david_pfander>
Vir: Ok, will do it with scalar operations for now :)
<Vir>
david_pfander: there's another thing you should look at. Clang and GCC say they can optimize better if you work with vector builtins and operations on them - instead of intrinsics - because they have more information (esp. with integer types).
<Vir>
david_pfander: that's the VectorBuiltin aliasing strategy. I also implemented everything without vector builtins - just Intel intrinsics. You get that by choosing the MayAlias strategy.
<david_pfander>
Vir: where's the connection to Vc? is that a configuration option?
<Vir>
It's a macro to override the default decision, yes
<Vir>
my experience is that the promise of Clang and GCC is only half-true
<david_pfander>
Vir: Should I worry if I cross-compiling for KNL and detected host CPU is stated as haswell? (-march=knl is set, AVX512 checks succeed)
<Vir>
david_pfander: you mean in octotiger/hpx? I recommend you set the TARGET_ARCHITECTURE cmake variable to knl. Set it to none if you supply the architecture flags another way.
<david_pfander>
Vir: thx, I'll use TARGET_ARCHITECTURE
bikineev has joined #ste||ar
shoshijak has joined #ste||ar
shoshijak has quit [Ping timeout: 240 seconds]
K-ballo has joined #ste||ar
bikineev has quit [Ping timeout: 246 seconds]
<Vir>
david_pfander: octotiger is running on my machine now. The code-gen looks really bad :-( lots of functions that should have been inlined...
bikineev has joined #ste||ar
eschnett has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 240 seconds]
pree has quit [Ping timeout: 246 seconds]
bikineev has quit [Ping timeout: 255 seconds]
pree has joined #ste||ar
bikineev has joined #ste||ar
<heller>
Vir: why is there such a huge regression over Vc1?
<Vir>
david_pfander: I just made it 2x faster
<david_pfander>
Vir: excellent!
<Vir>
heller: the GCC inliner...
<heller>
wow
<heller>
does clang have the problem?
<heller>
Vir: we discovered a similar regression with the Intel compiler (in general)
<Vir>
it determines that a function doing two instructions is better not inlined and the input and outputs better passed via the stack instead of reusing the registers...
<Vir>
that's just crazy function call overhead
<Vir>
dunno
<Vir>
there's also a sqr function somewhere in octotiger (I guess). It's also not getting inlined and causing huge optimization issues
<heller>
guess those should be force inlined as well
<heller>
Vir: ok, that fits the picture perfectly
<heller>
thanks for investigating!
<heller>
as a general rule of thumb, it seems that every function that has inline assembly/intrinsics seems to be not favorable for inlining
<david_pfander>
Vir: any idea why there are those vzeroupper? IIRC they aren't free and should never be needed in our codebase
<Vir>
I also learned something important. If you use the data member of the fixed_size datapar as function argument (by value), GCC inlines. If you pass the fixed_size datapar itself by value it does not inline
<Vir>
david_pfander: it's a compatibility thing. The function could be called from a different shared library that was compiled without AVX support (i.e. no VEX coded SSE)
<david_pfander>
Vir: Yes, but can we somehow get rid of them in our hot loops
<Vir>
david_pfander: if you know that scenario is never going to happen for your code then tell the compiler to stop it
<Vir>
yes, just turn it off completely
<david_pfander>
Vir: And how would I do that without writting everything with intrinsics?
<Vir>
-mno-vzeroupper
<david_pfander>
Vir: Ahh
<david_pfander>
Vir: Did you already push the inline fixes?
<Vir>
do you need a quick solution. Because I have a quick fix. Not something I'm certain I want to ship, though.
<Vir>
I need to understand the issue a little bit better
<david_pfander>
Vir: no, take you time. It's just that I'm testing on a KNL right now, and KNL performance is still utterly horrible, so I was somewhat excited :)
<Vir>
:-)
<heller>
david_pfander: Vir: would be good if we had the improvements ASAP so we can move on
bikineev has quit [Read error: Connection reset by peer]
<heller>
and see if all the kernel optimizations *really* pay of
<heller>
time is ticking ;)
<Vir>
I think you can have it ready by tomorrow
<david_pfander>
heller: Yesterday was the first time in the whole project timeline that we could actually run AVX512 code, there is still a huge amount of work to do
<heller>
david_pfander: that's why I am saying we need it *now*
<david_pfander>
heller: yeah, but I don't have a magic wand...
<heller>
david_pfander: let me show you :P
<heller>
Vir: mind pushing the hot fix to branch so david can have a whirl and is able to move on?
<heller>
;)
<Vir>
heller: well, I can do that. Just don't expect a fast-forward on that branch :-)
bikineev has joined #ste||ar
<heller>
well, switching branches should be easy enough ;)
hkaiser has joined #ste||ar
<heller>
anyway, just random thoughts...
bikineev has quit [Read error: No route to host]
shoshijak has joined #ste||ar
bikineev has joined #ste||ar
<github>
[hpx] K-ballo force-pushed bump-gcc from e299005 to 211eda3: https://git.io/vQmYu
<github>
hpx/bump-gcc 211eda3 Agustin K-ballo Berge: Adjust code for minimal supported GCC having being bumped to 4.9
<david_pfander>
also produces very problematic code
<david_pfander>
should be something like a single AVX512 masked store, but is a huge list of scalar instruction plus stores with xmm registers
<Vir>
david_pfander: yes, masked stores are only implemented as a loop fallback. If the compiler isn't smart about it...
<heller>
david_pfander: on tave, loading CMake/3.8.1 and doing a "export CRAYPE_LINK_TYPE=dynamic" before running cmake (start with a fresh build dir), should get you going. Don't forget to set the CMAKE_TOOLCHAIN_FILE to CrayKNL.cmake for HPX _and_ octotiger
<Vir>
I just implemented masked loads recently
<david_pfander>
Vir: If you need a line octotiger/src/kernels/m2m_kernel_blocked_interaction.cpp:244
<Vir>
david_pfander: feel free to give an implementation a shot. Look at Vc/detail/avx512.h the masked_store function needs to be overloaded as needed.
Matombo has quit [Remote host closed the connection]
<aserio>
This is what I think I am screwing up...
Matombo has joined #ste||ar
<aserio>
K-ballo: one final question do you have to assign when_all to a future or can you write it just like you wrote it?
eschnett has quit [Quit: eschnett]
<K-ballo>
you can write when_all(vec-futs).then(...) as I wrote
<aserio>
cool
<aserio>
Thanks!
<K-ballo>
np
<heller>
or dataflow!
<heller>
that gets rid of the extra future
<aserio>
lol yea I thought of that. But it was annoying me that I couldn't get it to work...when it should work
<K-ballo>
I was thinking dataflow, does it support ranges too?
akheir has joined #ste||ar
<hkaiser>
K-ballo: yes
eschnett has joined #ste||ar
<aserio>
hkaiser: when you declare wait_all and don't assign it to a future what happens to the future wait_all returns?
<hkaiser>
you mean when_all?
<aserio>
sorry yes
<hkaiser>
nothing will happen to the future if you don't use it
<hkaiser>
but will not have any way of knowing when it becomes ready
<aserio>
Will HPX ensure that it is executed before shutdown?
<hkaiser>
yes
<aserio>
or before it goes out of scope?
<hkaiser>
before shutdown
<hkaiser>
aserio: a future does not block on destruction
<aserio>
ah... I see
hkaiser has quit [Read error: Connection reset by peer]
<david_pfander>
heller: mhh, the only thing I'm doing differently is that export. Let's see whether it helps. Thx
<david_pfander>
Vir: I'll look into implementing it on Sunday. This is really critical for us.
shoshijak has joined #ste||ar
hkaiser has joined #ste||ar
<heller>
david_pfander: it does. That's the cause of the problems.
denis_blank has quit [Quit: denis_blank]
shoshijak has quit [Ping timeout: 240 seconds]
<aserio>
hkaiser: see pm
david_pfander has quit [Quit: david_pfander]
bikineev has quit [Ping timeout: 255 seconds]
<heller>
FAU won the HPL challenge, weeh
<hkaiser>
I voted for them!
<heller>
What I found most interesting, is that they had the same performance (energy and flop wise) as the Earth simulator
<heller>
With only 2 CPUs and 8 GPUs
<heller>
But also using a full crossbar "switch" ;)
<hkaiser>
heller: so what's the plan for the EuroHack?
<hkaiser>
octotiger?
<heller>
Yes
mcopik has quit [Ping timeout: 268 seconds]
<heller>
Full system piz daint as the goal ;)
<hkaiser>
:D
<hkaiser>
cool
<hkaiser>
yet another nice paper, then
pree has quit [Read error: Connection reset by peer]
pree_ has joined #ste||ar
<heller>
hkaiser: I don't want to think about that right now ;)
<hkaiser>
lol
<heller>
Enough writing ahead of me in the next 31 days
<hkaiser>
stop complaining ;)
<heller>
Isn't complaining the purpose of life?
<heller>
Where would we be if we didn't complain about mpi all the time
david_pfander has joined #ste||ar
jakemp has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
<jbjnr_>
heller: david_pfander Great! It's not that I'm pessimistic - it's that the office next to me has the hackathon organizer and the one below me the reviewer and both told me that this year it was not a 'given' that we'd get in as we had many more applicants.
<jbjnr_>
Not sure I'm happy about being the mentor - we need a mentor who actually knows a lot about cuda etc!
<jbjnr_>
aserio: you keep asking questions about futures etc. does that mean you have become a programmer now? (I thought you only got involved in admin previously)
<jbjnr_>
heller: lugano hackathon - will you stil be coming to lugano again for the tutorial as well? I hope you don't get red carded and have to abandon me then!
<aserio>
jbjnr_: I have decided to muddle the already unclear role that I play :p
<jbjnr_>
good for you
<heller>
jbjnr_: would be perfect if we could arrange it in a way that the two events are either very close or sufficiently far apart
<heller>
Do we have a date for the tutorial yet?
bikineev has joined #ste||ar
<heller>
jbjnr_: biggest issue: I'm in Stockholm in the last September week
<heller>
Hartmut was invited there as well. Having the tutorial right after or before that would be convenient
<heller>
hkaiser: do have any ideas if you can make Stockholm?
<hkaiser>
heller: not decided yet :/
<hkaiser>
I may have a talk at cppcon Sep24-29
<hkaiser>
if the reject my talk I could come, I guess
<hkaiser>
they*
<heller>
Ok
<heller>
Let me talk to the program committee :p
<heller>
What will you talk about?
<heller>
The usual?
<hkaiser>
The C++ Async Programming Model
<hkaiser>
Standard C++* even
EverYoung has joined #ste||ar
<heller>
Ahh
<heller>
Did you have a peek of what I sent you yet?
<hkaiser>
heller: cursory look only - still trying to be on vacation ;)
<heller>
Right
<heller>
Reading is always relaxing, I hear ;)
<hkaiser>
heh
<jbjnr_>
heller: hpx course is (as you knew months ago!) 5-6 october - I forgot the dates, a month after hackathon. sorry
EverYoung has quit [Ping timeout: 240 seconds]
<jbjnr_>
hkaiser: you are welcome to the hpx tutorial at cscs if you like.
<hkaiser>
jbjnr_: what dates?
<hkaiser>
ahh, ok - nvm
aserio has quit [Ping timeout: 255 seconds]
vamatya has joined #ste||ar
mcopik has joined #ste||ar
ajaivgeorge has joined #ste||ar
<github>
[hpx] hkaiser created fixing_destroy_tests (+1 new commit): https://git.io/vQmxP
<github>
hpx/fixing_destroy_tests 3f9a9db Hartmut Kaiser: Fixing UB in destroy tests
bikineev has quit [Ping timeout: 255 seconds]
<github>
[hpx] hkaiser opened pull request #2709: Fixing UB in destroy tests (master...fixing_destroy_tests) https://git.io/vQmxD
<github>
[hpx] hkaiser created code_of_conduct (+1 new commit): https://git.io/vQmp9
<github>
hpx/code_of_conduct 3acd1d7 Hartmut Kaiser: Adding code of conduct
<github>
[hpx] hkaiser opened pull request #2710: Adding code of conduct (master...code_of_conduct) https://git.io/vQmpp
<ABresting>
hkaiser: any word about the email thread we shared earlier?
<ABresting>
haksier: I am struck, wash is on vacation :/
<hkaiser>
ABresting: where are you stuck?
<hkaiser>
also, I have not seen any email thread - sorry
<hkaiser>
ABresting: here is what wash wrote (to me): "I think Abhi should focus on getting the stack overflow detection mechanism designed and functioning first, and then we can address the
<hkaiser>
stack growing mechanism. He has been asking design question about both
<hkaiser>
(which is good), but the stack growing mechanism depends on getting a
<hkaiser>
first."
<hkaiser>
working stack overflow detection mechanism, so we should focus on that
<ABresting>
first of all there is wrapper which depends on the libsigsegv on user side(so no licensing issue)
<hkaiser>
k
<ABresting>
I believe the wrapper is taking care of it
<ABresting>
but growing part is where the problem lies
<ABresting>
otherwise there will be no point
<ABresting>
hakiser: now irony is that everyone thinks that libsigsegv is great but there is a grey area
<ABresting>
and I am facing the grey area now
<hkaiser>
ABresting: do you have the segementation fault detection implemented?
<ABresting>
yes its a fall back, if SIGSEGV in not stack overflow then its segmentation fault
<hkaiser>
ABresting: where is the code, does it work in HPX>
<hkaiser>
?
<hkaiser>
sorry, I meant, stack overflow detection
<ABresting>
in threading environment there is a problem
<ABresting>
I thought you had it from email where wash cc'ed you
<hkaiser>
does it work?
<ABresting>
yes
<hkaiser>
do we have a test?
<hkaiser>
well, the code you linked is not integrated with HPX
<ABresting>
it's failing in my multi-threaded environment in the first place
<ABresting>
heller told me how to go about it but it has to on local dev tests first
<hkaiser>
ABresting: so let's do what wash suggested: integrate the stack overflow detection with hpx and instead of just printing segementation fault report it as what it is - a stack overflow
<ABresting>
we will put it in the hpx::init()
<hkaiser>
ABresting: please do that
<hkaiser>
have it integrated with hpx, add a test verifying its functionality
<ABresting>
hkaiser: sounds good
<ABresting>
but what about the problem at hand?
<hkaiser>
that should take a while, at least until wash returns
<hkaiser>
ABresting: that _is_ the problem at hand
<ABresting>
when is he coming back?
<hkaiser>
next week or so, perhaps the week after next
<ABresting>
damn!
<hkaiser>
ABresting: what 'damn'?
<hkaiser>
you have a lot of things to do without wash
<ABresting>
what about my evaluation :/
<hkaiser>
so where is the problem?
<hkaiser>
we will take care of the evaluation, no worries
<ABresting>
I thought we could discuss a few things but yes integration first and then testing
<hkaiser>
yes
<ABresting>
ok on it !
<hkaiser>
let's do small steps and finish those before attempting to do the next step
<ABresting>
sure
aserio has joined #ste||ar
<heller>
jbjnr_: you should know me by now. Being organized is not one of my attributes
<github>
[hpx] hkaiser pushed 2 new commits to fixing_2699: https://git.io/vQYT8
<github>
hpx/fixing_2699 dc23b2e Hartmut Kaiser: Fixing all_of, any_of, and none_of
<K-ballo>
lol, how did the code of conduct break the build?
<github>
[hpx] K-ballo opened pull request #2711: Adjust code for minimal supported GCC having being bumped to 4.9 (master...bump-gcc) https://git.io/vQYGV
bikineev has quit [Remote host closed the connection]