aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
ajaivgeorge has joined #ste||ar
ajaivgeorge has quit [Ping timeout: 260 seconds]
eschnett has joined #ste||ar
ajaivgeorge has joined #ste||ar
ajaivgeorge has quit [Remote host closed the connection]
ajaivgeorge has joined #ste||ar
ajaivgeorge_ has joined #ste||ar
ajaivgeorge has quit [Ping timeout: 268 seconds]
shoshijak has quit [Ping timeout: 240 seconds]
ajaivgeorge has joined #ste||ar
ajaivgeorge_ has quit [Ping timeout: 260 seconds]
ajaivgeorge has quit [Ping timeout: 276 seconds]
hkaiser_ has quit [Quit: bye]
K-ballo has quit [Quit: K-ballo]
patg has joined #ste||ar
eschnett has quit [Quit: eschnett]
patg has quit [Quit: This computer has gone to sleep]
patg has joined #ste||ar
patg is now known as Guest70047
Guest70047 is now known as patg
patg is now known as patg_away
patg_away has quit [Quit: See you later]
patg_away has joined #ste||ar
Matombo has joined #ste||ar
shoshijak has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
shoshijak has quit [Ping timeout: 240 seconds]
shoshijak has joined #ste||ar
Matombo has quit [Remote host closed the connection]
<jbjnr_>
"hkaiser_heller: a thread 'remembers' the scheduler it was running on" - yes, but normally in hpx there is only one thread pool, one scheduler, one everything, so ther's really nothing to remember or do specialy here.
Matombo has joined #ste||ar
<heller>
jbjnr_: not true, executors can have different schedulers
<jbjnr_>
only the os thread pool executor
<heller>
Not saying that it might not be buggy
<jbjnr_>
other exeuctors share the same pool and same scheduler
<heller>
Schedulers don't necessarily have to have their own underlying thread pool
<jbjnr_>
-and- the scheduler is set at startup and doesn't change
<heller>
Ok, so I assume the thread pool executor is still working?
david_pfander has joined #ste||ar
<jbjnr_>
yes, bu we are hoping to make it obsolete
<heller>
There should be a test for that
<heller>
Of course
<heller>
But it is essentially doing what the resource manager does, right?
<jbjnr_>
yes
<jbjnr_>
except that it works and our stuff doesn't
pree has joined #ste||ar
<heller>
Right
pree has quit [Read error: Connection reset by peer]
<heller>
So where is the difference?
<jbjnr_>
well, it mostly works, but random segfaults on shutdown indicate something is wrong
pree has joined #ste||ar
<heller>
Could it be that at some places, the wrong scheduler is being called?
<jbjnr_>
could be.
<jbjnr_>
tasks run ok, complete ok, but we have errors on shutdown
<heller>
That assert is leading to the right direction I think
<jbjnr_>
suspect that tasks are migrating from one pol to another, ut I do not know how
<heller>
Try to put in more diagnostics
<jbjnr_>
I am adding my loging stuff locally
<jbjnr_>
^logging
<heller>
Assert in all scheduling functions that the scheduler is the same
<jbjnr_>
shoshijak: is doing her final presnetation in half an hour or so
<jbjnr_>
^ the same as what though!
<heller>
That is, compare this with task->scheduler, or whatever it's called
<jbjnr_>
k
<heller>
jbjnr_: task->get_scheduler_base()
<jbjnr_>
yup. found that trhanks
<heller>
jbjnr_: create_thread should look at thread_init_data
<heller>
jbjnr_: almost no, old style, thread creation function seems to set any scheduler
<heller>
jbjnr_: mostly the ones defined in src/runtime/applier/applier.cpp
shoshijak has quit [Read error: Connection reset by peer]
shoshijak has joined #ste||ar
shoshijak has quit [Client Quit]
shoshijak has joined #ste||ar
shoshijak has quit [Ping timeout: 240 seconds]
<Vir>
david_pfander: can you tell me how to build the binary myself so that I can test a solution?
<david_pfander>
Vir: In principle, yes. Problem is that building all the dependencies is kind of cumbersome, let me think a few minutes how I can make this easy for you.
<Vir>
david_pfander: right, if we could somehow extract this one function using some mock types or so?
<Vir>
extracting this function could be very useful to add a performance regression test to Vc
<heller>
yeah
<heller>
I am just afraid that this behavior only shows up due to some optimiser hickups
<heller>
different passes running in the "wrong" order and such
<heller>
that is, the bug only shows up in very specific situations (combination of force inlining etc.)
<heller>
strange aliasing and what not
* heller
is waiting on the day where those optimization passes can be transparently being looked at giving proper diagnostics on why a certain one could not be applied
<Vir>
you may be right. But I'm almost certain that a change to Vc can fix it (hopefully without adding always_inline attributes). If that's the case then it's likely reproducable in isolation.
<heller>
guess that's right
<heller>
especially since it doesn't show up in VC1
* Vir
is currently trying to get the CI times down again. 10h Travis per commit is just too crazy
<heller>
yeah ...
<heller>
you might get some speedup for the cmake step when using the new cxx_feature tests
<Vir>
the cmake step is my smallest problem. My unit tests compile for >2 minutes and require gigabytes of memory to compile
<heller>
split them up?
<Vir>
the resulting binary (without -g) is ~MB
<Vir>
yes, that's how I got to 10h of Travis. It kills a job after ~45 minutes
<heller>
we use circle-ci for exactly that reason
<heller>
allows you to run forever
<Vir>
But its bad in general. I run nightlies on my office machine and it runs for half of the day.
<Vir>
I'm now randomizing the things that are compiled and tested. gcc-help also didn't have helpful ideas to improve compile times (other than to use -O0)
<Vir>
that wasn't really helpful though
<heller>
hmm
<heller>
are there so many templates being instantiated?
<Vir>
according to GCC -ftime-report all the times is lost in the optimizer and code-gen
<heller>
you might have some luck with profiling your unit tests with templight
<heller>
ah, ok
<heller>
interesting
<heller>
which could still hint to too many instantiated functions
<heller>
which might not even be used in the specific tests
<Vir>
as well as all the memory allocations. Well true, I have lots of wrapper functions and simple one-liner functions that all get inlined down to nothing.
<Vir>
most of my functions are templates though, so it should just skip 'em as they're not instantiated when unused
<Vir>
I'm waiting for the day when this becomes part of the C++ standard and the compiler teams start writing and compiling those tests :-)
<heller>
;)
<heller>
do you have lots of statics or similar?
<Vir>
no, just testing all operations on all type combinations
<heller>
I meant in the headers
<Vir>
ah, no
<heller>
which might lead to extensive instantiation
<Vir>
well static member functions, yes. But almost no static variables
<heller>
all operations on all type combinations, that sounds extensive ;)
<heller>
constexpr?
<Vir>
it's bad. I have an outer product of all arithmetic types (i.e. 13x13), which produces the typelist my test function is instantiated with
<Vir>
that just blows up. So I split on of the typelist up in cmake and build multiple binaries
<heller>
that sounds like a sensible approach
<Vir>
there's also not too much constexpr going on. After all Intel intrinsics aren't constexpr. So all my constexpr stuff is just some type generation magic
<Vir>
especially fixed_size has a magic implementation to build a tuple of native datapars to build any possible number of elements the user asks for
<david_pfander>
Vir: I'll get you a complete bash script to build octotiger after lunch, preconfigured with the settings that should exactly reproduce my binary
<Vir>
david_pfander: cool. thanks
<heller>
maybe switching from template <typename T> class C<T, enable_if<some_trait<T>::value> to template <typename T, bool Enable> class C; template <typename T> class C<T, true>; might help?
<Vir>
heller: I played days with all kinds of SFINAE variants vs. specialization. Nothing helped. Which makes sense if I interpret the -ftime-report output correctly.
<heller>
right
<heller>
of course ... forgot about that ;)
<heller>
well, enable_if certainly force instantiantion of various stuff
<heller>
but then, all other techniques most likely would do the same
<Vir>
how many unit tests do you know that produce 4MB of .text section?
<Vir>
or actually up to 9MB is what I have in the worst case
<Vir>
optimizing and generating all that code from lots of functions that each contribute a single instruction ...
<heller>
hmmmm
<heller>
would force_inline maybe do the trick?
<heller>
macros certainly would :P
<Vir>
heller: most of my functions are [[always_inline, artificial]]. It seems that doesn't help. Macros would help because the compiler wouldn't see any functions at all. But macros can't do what I need. :-)
<heller>
I can imagine that testtypes.hpp is responsible for most ;)
<heller>
Vir: rumors, isn't C++ just C with classes? nothing a preprocessor can't do ;)
<Vir>
:-) again, according to -ftime-report testtypes.h doesn't really make a problem. It's the many test function instantiations that blow it up.
<Vir>
lol
<heller>
and then splitting up the datapar.cpp unit test, for example into one file for each operation
<heller>
datapar_reductions.cpp, datapar_algorithms.cpp and so on
<heller>
Vir: well, I imagine that since you instantiate so many different functions inside of TU, that's what generates the load
<heller>
might be that there is some strange exponential algorithm inside of gcc that leads to that behavior once there are soo many functions inside one TU
<heller>
profiling the compiler should give you a hint there ;)
<Vir>
the question is, do the compile times scale O(N)? or worse? if worse, then splitting into more sources/binaries helps
<Vir>
right
<Vir>
there's definitely worse than O(N) going on when the memory usage explodes
<heller>
even if it is just O(N), splitting it would help since it enables parallel compilations
<Vir>
well, I have enough of that already
<heller>
your text section is 9 MB, that's 9 million characters, assuming an average length of maybe 128 byte per name, you get 70k symbols
<Vir>
nah, the .text section contains the machine code
<heller>
oh ... ok
<heller>
so too much inlining going on in that case?
<Vir>
consider maybe an average of 3-4 Bytes per instruction and you can see that it generates ~million instructions
<Vir>
if my unit test are good (I hope) then every function is instantiated only few times for a few important test values
* Vir
-> out for lunch
shoshijak has quit [Ping timeout: 240 seconds]
hkaiser has joined #ste||ar
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
bikineev has quit [Remote host closed the connection]
<Vir>
david_pfander: ah, just realized an important data point: In Vc 1.x the SimdArray class is ABI unstable, i.e. uses a defaulted copy ctor to copy it's native Vector<T> members. In wg21.link/p0214 the fixed_size ABI provides an additional (though QoI) "feature" of ABI stability. I.e. the copy ctor is supposed to ensure passing via the stack (with proper alignment). Now my implementation has no explicit code to ensure this behavior yet, but that's at least another
<Vir>
thing for me to investigate.
hkaiser_ has joined #ste||ar
hkaiser has quit [Read error: Connection reset by peer]
<jbjnr_>
heller: david_pfander bad news for the hackathon. we might not get in. 3 teams with CSCS staff on board have applied and politics makes it preferential to have 'users' selected. (Aside from the fact that our proposal might not be selected on merit anyway).
<heller>
jbjnr_: we have staff on our team as well ;)
<heller>
so it hurts that we are not official users?
pree has quit [Ping timeout: 260 seconds]
<jbjnr_>
well, it's better if (swiss?) projects with students who really need help are selected rather than just giving 3 slots to teams already including cscs staff (of which we are one)
<jbjnr_>
it's not restricted to swiss groups, but the hope is that users of cscs resources will partiticpate
<jbjnr_>
(proper users that is:) )
K-ballo has joined #ste||ar
<heller>
jbjnr_: so we need to become proper users ;)
<heller>
I was planning in smuggling AllScale into the proposal ... but noone from the compiler group wanted to participate
<david_pfander>
Vir: I just added build scripts, command to start a very small run, and verification of the output to the github issue
<david_pfander>
Vir: thanks for the information about the other potential issue
<david_pfander>
Vir: Unfortunately, Vc1 is no option on KNL. Until we can fully switch to Vc2 with AVX512 support, our performance will remain very bad on KNL
<david_pfander>
jbjnr_: Sounds bad... is there at least a small chance left?
pree has joined #ste||ar
<jbjnr_>
yes, there's still a chance. Just the reviewer told me that smaller projects with easy to cherry pick pieces of GPUified code would be favoured and huge projects requireing a lot of work less so. I don't think we stressed the simplcity of interfacing GPU stuff into our code using futures to attach to cuda stream events etc etc. But we were not expecting so many applicants ....
<jbjnr_>
"later, I asked another young professor whether one could use "I" and she said "Only if you want to sound like an arrogant bastard", and observed that only old people with established reputations can get away with it."
eschnett has joined #ste||ar
<jbjnr_>
so heller you'll be fine :)
<hkaiser_>
lol
david_pfander has quit [Quit: david_pfander]
david_pfander has joined #ste||ar
david_pfander has quit [Client Quit]
david_pfander has joined #ste||ar
<heller>
jbjnr_: great
<jbjnr_>
"In this thesis, a new approach to computation is presented." passivle voice - avoiding I/We for 100 pages is going to be hard work :(
bikineev has quit [Remote host closed the connection]
<jbjnr_>
I'm going to side with the majority on that link I posted and say in a single author thesis, then "I" is acceptable
bikineev has joined #ste||ar
bikineev has quit [Ping timeout: 240 seconds]
david_pfander has quit [Quit: david_pfander]
david_pfander has joined #ste||ar
bikineev has joined #ste||ar
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
bikineev has quit [Remote host closed the connection]
ajaivgeorge has joined #ste||ar
patg_away is now known as patg
<Vir>
heller: I tried to use the passive voice as long as it was easy to read. I wrote 'I' whenever I wanted to make clear what my contribution to the scientific community is/was. As much as it stands out, it's the only correct thing. And I actually stumble over 'we' when I can't determine a second (or more) person behind that statement/development/choice/...
<Vir>
AFAIK there's no 100% definitive rule on this. The only rule that's really important is to be consistent.
<Vir>
IMHO
<Vir>
david_pfander: thanks, building it now
<patg>
I used we but I believe my advisor also did that, even though it was my own work I had help
<Vir>
right, in my case my advisor did not advise me on any of the technical or design decisions. Even the choices for direction were close to 100% my own. Maybe for others it's not that clear cut.
<patg>
mostly the same here
<patg>
It hasn't been that long and I still can't really remember clearly
<zao>
Is there a flavor of y'all all y'all can use?
ajaivgeorge has quit [Read error: Connection reset by peer]