aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
zbyerly_ has quit [Remote host closed the connection]
zbyerly_ has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
vamatya has joined #ste||ar
zbyerly_ has quit [Remote host closed the connection]
zbyerly_ has joined #ste||ar
eschnett has joined #ste||ar
hkaiser has quit [Quit: bye]
vamatya has quit [Ping timeout: 240 seconds]
eschnett_ has joined #ste||ar
eschnett has quit [Ping timeout: 240 seconds]
eschnett_ is now known as eschnett
K-ballo has quit [Quit: K-ballo]
zbyerly_ has quit [Remote host closed the connection]
zbyerly_ has joined #ste||ar
patg has joined #ste||ar
patg is now known as Guest15479
zbyerly_ has quit [Remote host closed the connection]
zbyerly_ has joined #ste||ar
Guest15479 has quit [Read error: Connection reset by peer]
patg has joined #ste||ar
patg is now known as Guest25792
zbyerly_ has quit [Remote host closed the connection]
zbyerly_ has joined #ste||ar
taeguk has joined #ste||ar
zbyerly_ has quit [Remote host closed the connection]
zbyerly_ has joined #ste||ar
<heller>
wash: sure
<heller>
That's the plan ;)
taeguk has quit [Quit: Page closed]
zbyerly_ has quit [Remote host closed the connection]
zbyerly_ has joined #ste||ar
zbyerly_ has quit [Remote host closed the connection]
zbyerly_ has joined #ste||ar
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
david_pfander has joined #ste||ar
zbyerly_ has quit [Remote host closed the connection]
<heller>
david_pfander: see here, all marked green are good.
<heller>
last commit finished retesting is "6aec8ce"
<david_pfander>
heller: saw your mail, I guess we should discuss that after it is more clear what the future of octotiger will be
<heller>
yes
<heller>
david_pfander: well, it doesn't hurt to take responsibility, this will help in future projects as well ;)
<david_pfander>
heller: what do you mean with that?
<heller>
it's very frustrating to dump hours and hours into debugging problems which should have been seen months ago...
<david_pfander>
heller: tell me about it...
<heller>
well, the unit tests have been failing for months now
<heller>
"unit tests"
<heller>
without a real attempt trying to fix them
<heller>
the general opinion was: "This has to be the fault of HPX, FIX IT!"
hkaiser has joined #ste||ar
zbyerly_ has quit [Remote host closed the connection]
zbyerly_ has joined #ste||ar
<heller>
hkaiser: good morning
<heller>
see PM please
david_pfander has quit [Quit: david_pfander]
david_pfander has joined #ste||ar
mcopik has quit [Ping timeout: 240 seconds]
jbjnr has joined #ste||ar
zbyerly_ has quit [Remote host closed the connection]
zbyerly_ has joined #ste||ar
<heller>
jbjnr: hey! how did your presentation go?
<jbjnr>
fine. I gave an internal presentation of the zero-copy serialization work to our group
<jbjnr>
I have some questions:
<jbjnr>
if I disable the timer pool and the io-pool - what will stop working?
<hkaiser>
timed suspension
<hkaiser>
and all code relying on the io-pool, like octotiger
<hkaiser>
jbjnr: why do you want to disable that? those threads are most of the time suspended
<jbjnr>
raffaele collected the task timings for all the tasks that do a dgemm on a single tile of the matrix and got a plot that looks like a two humped camel. One hum corresponds to the time that it takes to do a dgemm on a tile of the matrix. The second hump is what we would get if a thread was being suspended by the OS for a timeslice. Running the matrix code on fewer threads improves the...
<jbjnr>
...situation and improves the runtime, so I wanted to try disabling the timer pol and io-pool to see if it made any difference.
<jbjnr>
are there any other threads that can run?
<jbjnr>
(other codes that do not use hpx do not show this multi-hump timing)
<jbjnr>
I've checked the affinity binding and can confirm that worker threads are pinned to os cores uniquely (which I suspected might have been a possibilitiy, but it turns out no)
<jbjnr>
(I mean if two worker threads were bound to the same core, that would hose things in the manner we see)
ajaivgeorge has joined #ste||ar
<heller>
jbjnr: interesting
eschnett has quit [Quit: eschnett]
<heller>
jbjnr: that would mean that the timer and/or IO threads are periodically scheduled for some reason
<heller>
jbjnr: I don't see a reason why you shouldn't be able to (temporarly) disable those two helper threads to see if it makes a difference
<jbjnr>
I have removed io pool and timer pool, but there are still extra threads running
david_pfander has quit [Ping timeout: 268 seconds]
<heller>
jbjnr: there is the main thread, which is just waiting on the runtime to down again
<jbjnr>
and 4 more
<heller>
jbjnr: there is currently no way to disable that
<heller>
that would be the TCP parcelport threads?
<jbjnr>
HPX_H?A
<jbjnr>
HPX_HAVE_NETWORKING=
<jbjnr>
OFF
<heller>
hmmm
<jbjnr>
but tcp might still be active.
<jbjnr>
tracking that down now
<heller>
ok
<heller>
maybe some CUDA related threads?
<jbjnr>
does agas have any dedicated threads at all?
<heller>
no, AGAS runs purely as HPX threads
<hkaiser>
jbjnr: kill the main thread (the one running main())!
<hkaiser>
jbjnr: FWIW hpx launches the following threads: main() (launched by the OS), 2 threads each per IO, timer, and TCP, and one thread per --hpx:threads=N
<hkaiser>
so either kill main or the ones run by the scheduler - voilla, look'ma no thread
<hkaiser>
s
<hkaiser>
jbjnr: may I see those humps, pls?
denis_blank has joined #ste||ar
<jbjnr>
the red line is the mean. it tells us that we're only getting 60% ish of peak. The matrix should be solved in 8ms if we have peak constantly
<hkaiser>
jbjnr: does this show results from many runs?
<jbjnr>
this is one run
<hkaiser>
ok, what do you measure then? many dgemm's?
<jbjnr>
there are thousands of matrix multiplies in each run and each one is timed
<hkaiser>
ok
<hkaiser>
I don't think this is caused by stray threads being scheduled
<hkaiser>
this is most likely caused by bad NUMA placement
<hkaiser>
jbjnr: have you looked at cross NUMA memory traffic?
<jbjnr>
I'd like to test the numa theory. I am worried that we do not schedule child tasks on the right threads
<jbjnr>
^no
<hkaiser>
hpx does not do anything by default to run tasks on the numa domain where the data is located
<jbjnr>
I know
<hkaiser>
use executors and hpx::compute allocators
<jbjnr>
I need to try fixing that, but I have to eliminate other options first
<hkaiser>
NUMA is the most obvious and hurtful option
<hkaiser>
I doubt the thread scheduling hurst you that much
<hkaiser>
even more as those threads sleep allmost all the time
<jbjnr>
the two peaks are too far apart in timing to be just a numa problem. the tile size is 512x512 double and fits into cache
<jbjnr>
so there should only be a single fetch or block overhead
<hkaiser>
how do you know?
<jbjnr>
^of block
<hkaiser>
well, reading stuff from the other numa domain is almost twice as slow than reading from the own memory
<jbjnr>
well, I may be wrong. If many blocks are being fetched, then the memory BW will be a bottleneck and could cause this,
<hkaiser>
the next (non-obvious) problem may be caused by the tasking nature of HPX
K-ballo has joined #ste||ar
<hkaiser>
this may cause excessive TLB reloading which can hurt you massively
<jbjnr>
but 2MB at 100GB/s should be only 1 or 2 ms delay
<jbjnr>
we've got 10ms delay here
<heller>
jbjnr: so this is a histogram of the timing of the different dgemm kernels?
bikineev has quit [Ping timeout: 260 seconds]
<jbjnr>
yes essentially
hkaiser has quit [Read error: Connection reset by peer]
hkaiser_ has joined #ste||ar
<hkaiser_>
jbjnr: all I'm saying is that I doubt the threads are the culprit
<heller>
are they potentially suspending somehow?
<heller>
jbjnr: I am taking sides with hkaiser_ ;)
zbyerly_ has quit [Remote host closed the connection]
zbyerly_ has joined #ste||ar
<jbjnr>
question: when a task completes - and triggers a continuation - do we know on which thread number the task that is completing is on. It would be trivial in my sceduler to use the same queue for the continuation as a first choice to run on
<jbjnr>
instead of the round robin we use currently
<heller>
yes
<jbjnr>
this would help numa placement
<hkaiser_>
on the thread which makes the future ready
<heller>
jbjnr: good idea!
<heller>
not when it has async launch policy
<hkaiser_>
well, then it's scheduled on the current thread, I believe
<heller>
not sure about that right now
<hkaiser_>
let's have a look
<jbjnr>
I'm talking about the queues in the thread pool. When a task is added to the queue, we use curr_queue++ and the tasks are added round robin, but if I know where the current one is ....
<hkaiser_>
not always
<jbjnr>
correct
<heller>
as far as I can see, there is no thread specified
<jbjnr>
I can just get the current OS thread from thread_num_tss and use that, convert it to pool index and then make sure it goes into that queue first.
<heller>
yes
<jbjnr>
but is that apply function only called for a continuation, or ...
<heller>
FWIW, the round robin scheduling is the most plausible explanation
<heller>
you potentially have cross-domain atomic instructions to schedule those new threads
<heller>
which leads to cache line invalidations etc.
<heller>
which would very well explain this intense delay and the two humps
<jbjnr>
yes, but we're 10ms slow. A cache line fail is tiny compared to that
<heller>
not if it accumulates
<jbjnr>
there is not that much contention
<jbjnr>
these tasks take 8ms each
<jbjnr>
and are running 10ms slow
<jbjnr>
milli, not micro
<hkaiser_>
jbjnr: look at cross numa-domain memory traffic at least once - that will answer your question - no need to second-guess
<heller>
the contention happens when all want to schedule a new task, no?
<jbjnr>
no
<heller>
sure
<heller>
curr_queue_ is a shared atomic, for example, the queues are implemented using atomics
<jbjnr>
we see a lovelt task trace, with all the tasks beind scheduled beautiffully, but significant number of them just task twice as long as the should
<heller>
which explains it
<heller>
you don't see this delay in the task trace
<jbjnr>
yes we do
<jbjnr>
the contention appears as a staggering of the tasks
<heller>
you see it because some tasks take longer, sure
<jbjnr>
this is just tasks taking longer
<heller>
but you don't see this effect as holes in the task trace
<heller>
because scheduling a new task inside of the task does still account to the tasks time
<heller>
there is no suspension or anything going on. the instructions just take longer
<jbjnr>
you see them as holes because raffaele's task profiling doesn't get created until the task is running
<heller>
so the task traces are purely within rafaele's user code, and they don't include, for example the attaching of continuations?
<jbjnr>
essentially yes
<heller>
ok
<heller>
then this theory is bullshit
<jbjnr>
we can tell the difference between tasks starting late and tasks taking longer.
<heller>
did you figure out what those remaining 4 OS threads are?
<jbjnr>
there's a boost::asio::servive thread that I thought had disabled, and possibly an apex thread
<jbjnr>
I got distracted.
<jbjnr>
if HPX_HAVE_NETWORKING is OFF, should the tcp threads still be created, or skipped altogether?
<zao>
From Raymond Chen himself - »There's also std::future, but its lack of composability makes it a poor choice for asynchronous programming.»
<zao>
(about the PPL, but still amusing to see)
<heller>
jbjnr: they shouldn't run if you disabled networking
<diehlpk_work>
Next week I will be at a conference trying to advertise HPX in the engineering community. So I will be not available via irc
mcopik has joined #ste||ar
Guest25792 has quit [Quit: This computer has gone to sleep]
zbyerly_ has quit [Remote host closed the connection]
vamatya has joined #ste||ar
<jbjnr>
hkaiser_: do you know if apex allows collection of papi type data yet?
<jbjnr>
(HPX5 git repo is not getting much activity these days)
hkaiser_ has quit [Quit: bye]
<K-ballo>
I love the new state of the branches btw.. can we drop 5 or 6 more? or is it asking for too much?
vamatya has quit [Ping timeout: 260 seconds]
<jbjnr>
hmm. I thought I deleted all mine, but some are still there
<diehlpk_work>
0'reilly is not interested in publishing the HPX book
<K-ballo>
lol, I was just kidding, didn't mean yours specifically jbjnr
<jbjnr>
K-ballo: well if I do a count of branches in my local tree, including all remote clone branches, it's 647, which frankly is a trifle large, so a bit of a cleanup is not a bad thing
<K-ballo>
wow :|
<K-ballo>
I assume you've never did prune on sync?
<jbjnr>
and HPX is just one of several projects that I have too many branches in :(
<jbjnr>
I don't prune ofter. Too afrid of losing something that "might be useful one day"
<K-ballo>
then I can see getting to 647 for HPX rather quickly
<jbjnr>
indeed
<K-ballo>
I have like 39.. counting by hand, not sure how to measure properly
<jbjnr>
I have a similar number for paraview/vtk/vtkm etc
<K-ballo>
I prune pretty much every time I fetch from upstream
<jbjnr>
39 sounds about right for local branches. I also counted the remote copies etc etc. Say 100 each for 5 or 6 repos in all.
<jbjnr>
so most are copies
<K-ballo>
I count 9 local refs, 9 in origin, 21 in upstream
<K-ballo>
local + origin = 15 unique
aserio has joined #ste||ar
bikineev has joined #ste||ar
hkaiser has joined #ste||ar
zbyerly_ has joined #ste||ar
<K-ballo>
why is StellarBot blocked on github?
<heller>
what do you mean?
bikineev has quit [Ping timeout: 258 seconds]
hkaiser has quit [Quit: bye]
ajaivgeorge has quit [Quit: ajaivgeorge]
<wash>
aserio: ping
hkaiser has joined #ste||ar
<aserio>
wash: here (but in a meeting)
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 246 seconds]
aserio1 is now known as aserio
<diehlpk_work>
In 9 days the second evaluation starts denis_blank ABresting thundergroudon[m taeguk[m]
bikineev has joined #ste||ar
<denis_blank>
diehlpk_work: Ok, thanks for noticing
aserio has quit [Ping timeout: 246 seconds]
bikineev has quit [Ping timeout: 260 seconds]
mars0000 has joined #ste||ar
aserio has joined #ste||ar
bikineev has joined #ste||ar
EverYoung has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
aserio has joined #ste||ar
denis_blank has quit [Quit: denis_blank]
zbyerly_ has quit [Ping timeout: 276 seconds]
bikineev has quit [Ping timeout: 240 seconds]
mars0000 has quit [Quit: mars0000]
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
<ABresting>
diehlpk_work: +1
vamatya has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
EverYoung has quit [Ping timeout: 246 seconds]
EverYoung has joined #ste||ar
<wash>
aserio: We should touch base re the paper sometime next week
EverYoun_ has joined #ste||ar
EverYoun_ has quit [Remote host closed the connection]
EverYoung has quit [Ping timeout: 276 seconds]
EverYoung has joined #ste||ar
<github>
[hpx] hkaiser force-pushed simple_base_lco from 6430975 to e4a4c2c: https://git.io/vQ5mv
<github>
hpx/simple_base_lco e4a4c2c Hartmut Kaiser: Leave managed base_lco_with_value the default...
aserio has joined #ste||ar
ajaivgeorge has joined #ste||ar
ajaivgeorge has quit [Remote host closed the connection]
ajaivgeorge has joined #ste||ar
ajaivgeorge has quit [Read error: Connection reset by peer]
ajaivgeorge has joined #ste||ar
antoinet has joined #ste||ar
antoinet has quit [Remote host closed the connection]
K-ballo has quit [Read error: Connection reset by peer]
K-ballo has joined #ste||ar
ajaivgeorge_ has joined #ste||ar
ajaivgeorge has quit [Read error: Connection reset by peer]
ajaivgeorge has joined #ste||ar
ajaivgeorge_ has quit [Client Quit]
mars0000 has joined #ste||ar
<diehlpk_work>
Does anyone know why my code compiles and links with gcc, but for clang I got this
<diehlpk_work>
In function `void output<problem::Quasistatic<material::Elastic> >(IO::deck::PD*, problem::Quasistatic<material::Elastic>*)':
<diehlpk_work>
In function `void output<problem::Quasistatic<material::Elastic> >(IO::deck::PD*, problem::Quasistatic<material::Elastic>*)':
ajaivgeorge has quit [Client Quit]
ajaivgeorge has joined #ste||ar
<diehlpk_work>
undefined reference
mars0000 has quit [Quit: mars0000]
<heller>
diehlpk_work: I don't see an undefined reference.
<heller>
And please, use a paste site
<heller>
Looking at my Crystal ball, you probably link against a library which uses an abi incompatible stdlib
<heller>
That is, at least one of your dependencies was compiled with gcc