aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 252 seconds]
EverYoun_ has quit [Ping timeout: 240 seconds]
EverYoung has joined #ste||ar
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 240 seconds]
EverYoung has joined #ste||ar
EverYoun_ has quit [Remote host closed the connection]
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
kisaacs has quit [Ping timeout: 248 seconds]
EverYoung has quit [Remote host closed the connection]
<github>
[hpx] biddisco closed pull request #2377: Add a customization point for put_parcel so we can override actions (… (master...action_customization) https://git.io/vXUSN
<github>
[hpx] biddisco deleted action_customization at 29aceaa: https://git.io/vNX4f
Vir has joined #ste||ar
<heller_>
jbjnr: I'm home literally. Needed a break
<jbjnr>
enjoy your rest.
<jbjnr>
I'm the only person in the building today. very queit
<heller_>
Nice as well
<heller_>
120 people went to my talk yesterday
<jbjnr>
wow. What did you talk about?
<jbjnr>
(HPX, of course, but allscale, or something esle?)
<hkaiser>
heller_: nice!
<hkaiser>
how's Klaus?
<heller_>
Sick
<hkaiser>
uhh
<heller_>
Didn't meet him
<hkaiser>
k
<heller_>
jbjnr: essentially a introduction. From std::thread to to the future and why os threads are bad
<jbjnr>
k
<heller_>
It was a non academic venue. Very refreshing
<jbjnr>
where then?
<heller_>
Munich c++ user group
<jbjnr>
lovely
<jbjnr>
pycicle just went crazy. somebody must have pushed something
<hkaiser>
jaafar: Kevin will probably contact you to use pycicle for Phylanx testing
<hkaiser>
jbjnr: ^^
<jbjnr>
poor jaafar
<jbjnr>
so much spam from you
<hkaiser>
sorry jaafar
<jbjnr>
no problem. I'd be glad to help
<hkaiser>
thanks
<jbjnr>
need to get heller_ running it at fau so we can see if any wrinkles need to be ironed out too
<jbjnr>
(running it regularly and full time I mean)
<hkaiser>
jbjnr: he might need some help in makin gpycicle independent of hpx
<jbjnr>
hkaiser: yup. No prob
<jbjnr>
that is not going to be a lot of work fortunately. Just a few places where we need to add config options insstead of hard coded stuff
<hkaiser>
nice
<jbjnr>
I'm desperate to get my hands on heller_ 's latest thread cleanup. Hoping he doesn't rest too much today :)
<heller_>
He
<heller_>
Later I might get back to work
<heller_>
jbjnr: could you do me a favor maybe?
<jbjnr>
ask
<heller_>
And run your application with the papi L1 data cache counters
<heller_>
I'm especially interested in misses and evictions
<heller_>
Sampling over time would be great
<jbjnr>
do you have a command line set of params I can copy to get the right syntax?
<heller_>
Not from the top of my head
<jbjnr>
just papi - not apex yes?
<heller_>
Yes
<heller_>
jbjnr: another thing, how many continuations do you attach to a single future?
<jbjnr>
max 2 with a shared future (at the moment), but the DAG may be thousands of continuations long in total
<heller_>
I think that one of your performance problems might come from too much dynamic memory allocation happening
<jbjnr>
sounds plausible
<heller_>
Ok, that means that you have around 1000 allocations
<heller_>
Not good
<jbjnr>
where does 1000 come from?
kisaacs has joined #ste||ar
<heller_>
You have a chain of 1000 continuations
<jbjnr>
it's actually a bit less
<heller_>
When you attach more than one, the small function optimization doesn't apply anymore
<jbjnr>
with a 40960 matrix using 256 block size, then there are 160 blocks, so the dag will have some multiple of that
<heller_>
And if your completion handler is larger than sizeof(3*void) it doesn't either
<jbjnr>
but for 20480 using 512 it's 40 blocks wide and tall
<heller_>
I'll have to think how to fix this... Not easy
<hkaiser>
heller_: don't create the compound continuation function object, rather use a small_vector to store them
<jbjnr>
(NB. ther might be a few places where we use 3 continuations, so a small vector with size~4 would probab;y work 99% of the time)
<jbjnr>
(when we have to send a matrix block therea's an extra continuation I think)
<hkaiser>
jbjnr: right
<hkaiser>
even a larger small_vector wouldn't be a problem as the shared state is not copied anyways
<jbjnr>
ok
<jbjnr>
just one more HPX_CONTINUTATION_SMALL_VECTOR_SIZE to add to the defines/config :)
<hkaiser>
right ;)
<K-ballo>
we've gained us one small_vector?
<jbjnr>
the libfabric PP uses one. boost::small_vector as I recall
<heller_>
hkaiser: yeah, that's what I'm thinking
<heller_>
In the end, future_data should be the size of one cache line
<jbjnr>
am I right in thinking that coroutines can't be used with executors - or can they be coerced into working together using the generator template stuff that goes with the coroutines
kisaacs has quit [Ping timeout: 240 seconds]
<heller_>
jbjnr: they can coexist
<heller_>
There is some very interesting work under the name cppcoro
<hkaiser>
K-ballo: do you know of a freestanding BSL small_vector?
<hkaiser>
jbjnr: I misread things, nvm
<K-ballo>
I don't know of any
hkaiser has quit [Quit: bye]
kisaacs has joined #ste||ar
eschnett has quit [Quit: eschnett]
aserio has joined #ste||ar
hkaiser has joined #ste||ar
eschnett has joined #ste||ar
aserio has quit [Ping timeout: 256 seconds]
aserio has joined #ste||ar
aserio has quit [Ping timeout: 252 seconds]
daissgr has joined #ste||ar
<heller_>
Putting a small_vector together shouldn't be hard
<heller_>
"what can go wrong"
<heller_>
Depends on whether we want fixed capacity or not though
<hkaiser>
heller_: try first with boost::container::small_vector
akheir has joined #ste||ar
aserio has joined #ste||ar
RostamLog has joined #ste||ar
<github>
[hpx] hkaiser force-pushed disable_executor_compatibility from f8fbdc6 to 2651fc2: https://git.io/vN2lz
<github>
hpx/disable_executor_compatibility 2651fc2 Hartmut Kaiser: This patch disables default executor compatibility with V1 executors...
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
Smasher has quit [Quit: Connection reset by beer]
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
kisaacs has quit [Ping timeout: 240 seconds]
Smasher has joined #ste||ar
twwright_ has joined #ste||ar
twwright has quit [Read error: Connection reset by peer]
twwright_ is now known as twwright
kisaacs has joined #ste||ar
daissgr has quit [Ping timeout: 240 seconds]
EverYoung has quit [Read error: Connection reset by peer]
aserio has quit [Ping timeout: 256 seconds]
EverYoung has joined #ste||ar
<heller_>
hkaiser: yeah, the question for me though is if we really need the ability to grow the capacity, or if we just do a similar trick than what we have now with compose_cb
<hkaiser>
either way works - I'd start with the simplest (and quickest) solution, i.e. use some small_vector that already exists
<hkaiser>
otoh, combining some finite container with the composition technique would copy the whole thing, might be too much
<heller_>
yup...
<heller_>
a resize is essentially the same though
<hkaiser>
depends on the reallocation strategy
<heller_>
sure, but once you grow over the capacity
<hkaiser>
sure sure
<heller_>
but yes, boost small_vector is what I'll aim for
jaafar has quit [Ping timeout: 268 seconds]
mcopik has quit [Ping timeout: 240 seconds]
vamatya has joined #ste||ar
aserio has joined #ste||ar
kisaacs has quit [Ping timeout: 268 seconds]
bibek has quit [Quit: Konversation terminated!]
<heller_>
what is the shared_state_allocator used for?
bibek has joined #ste||ar
aserio has quit [Ping timeout: 252 seconds]
aserio has joined #ste||ar
<jbjnr>
what is "trick ... we have now with compose_cb" ?
aserio has quit [Ping timeout: 252 seconds]
Smasher has quit [Remote host closed the connection]
Smasher has joined #ste||ar
mcopik has joined #ste||ar
daissgr has joined #ste||ar
<K-ballo>
compose_cb combines two type erased callbacks into a third type erased callback that calls the first two
aserio has joined #ste||ar
<K-ballo>
cb = compose_cb(cb, new_cb)
<K-ballo>
results in a singly linked list of callbacks of sorts (with SBO)
<jbjnr>
thanks
RostamLog has joined #ste||ar
<heller_>
I am a very naive person ;)
<K-ballo>
you had to know it was allocating
<heller_>
and assumed we always doe SBO :P
<K-ballo>
right
<heller_>
the problem is, it gets tricky to find those places, where you compose and bind callables etc.
<heller_>
and soon end up with a type that's too large
<heller_>
so yes, I knew that it is potentially allocating
<heller_>
I never thought it would be a big problem though
<heller_>
untill I discovered the thing with register_thread_nullary and friends...
kisaacs has joined #ste||ar
kisaacs has quit [Ping timeout: 264 seconds]
daissgr has quit [Ping timeout: 240 seconds]
Smasherr has joined #ste||ar
daissgr has joined #ste||ar
aserio has quit [Ping timeout: 252 seconds]
Smasher has quit [Remote host closed the connection]
Smasherr is now known as Smasher
Smasher has quit [Changing host]
Smasher has joined #ste||ar
aserio has joined #ste||ar
kisaacs has joined #ste||ar
bibek has quit [Quit: Konversation terminated!]
bibek has joined #ste||ar
aserio has quit [Ping timeout: 252 seconds]
diehlpk has joined #ste||ar
eschnett has quit [Quit: eschnett]
mcopik has quit [Ping timeout: 268 seconds]
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
aserio has quit [Ping timeout: 246 seconds]
diehlpk has quit [Remote host closed the connection]
jaafar has joined #ste||ar
aserio has joined #ste||ar
akheir has quit [Remote host closed the connection]
jaafar has quit [Remote host closed the connection]
jaafar has joined #ste||ar
aserio has quit [Ping timeout: 252 seconds]
jaafar has quit [Remote host closed the connection]