hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC2018: https://wp.me/p4pxJf-k1
diehlpk has joined #ste||ar
EverYoun_ has joined #ste||ar
EverYoung has quit [Read error: Connection reset by peer]
EverYoun_ has quit [Ping timeout: 265 seconds]
eschnett has joined #ste||ar
<github> [hpx] hkaiser pushed 1 new commit to local_agas: https://git.io/vpCEu
<github> hpx/local_agas 9c45a9a Hartmut Kaiser: Handle future state as an atomic instead of inside a locked region
<github> [hpx] hkaiser opened pull request #3299: Performance improvements (master...local_agas) https://git.io/vpCEK
<github> [hpx] hkaiser force-pushed local_agas from 9c45a9a to d44bf3e: https://git.io/vpCEX
<github> hpx/local_agas d44bf3e Hartmut Kaiser: Handle future state as an atomic instead of inside a locked region
<K-ballo> anyone interested in a C++ dev position in Berlin?
<K-ballo> ..that's all the info I got
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
<github> [hpx] hkaiser force-pushed local_agas from d44bf3e to 02fc71d: https://git.io/vpCEX
<github> hpx/local_agas 02fc71d Hartmut Kaiser: Handle future state as an atomic instead of inside a locked region
<K-ballo> hah, atomic state, it finally happened!
<K-ballo> hkaiser: something about not needing to lock the future state before constructing the result in place..
<hkaiser> K-ballo: what's wrong there?
<K-ballo> I don't recall all the details, but my atomic futures implementation moved that lock until after constructing the value, just before setting the new state
<K-ballo> something about racing set_result calls already being UB
<hkaiser> K-ballo: the lock is needed for accessing on_completed_
<K-ballo> that moved down too
<hkaiser> ok, that sounds reasonable
<K-ballo> at some point I even had that whole lock down to spinlocking on the state atomic, but I ran into trouble building the callback function
<K-ballo> just added that chunk to the gist too
<github> [hpx] hkaiser force-pushed local_agas from 02fc71d to 8fbd863: https://git.io/vpCEX
<github> hpx/local_agas 8fbd863 Hartmut Kaiser: Handle future state as an atomic instead of inside a locked region
<hkaiser> K-ballo: that's a nice touch to invoke the continuation right away if the future is ready
<K-ballo> don't we already do that?
<hkaiser> don't think so
<hkaiser> yah, we do it, I'm an idiot
<hkaiser> K-ballo: why do you check for is_ready twice (in set_on_completed)?
<K-ballo> just in case it got ready while grabing the lock
<K-ballo> ah, and copying the data, yeah
<K-ballo> if we are going to enqueue the function we need to copy, and in that time the future could get ready
<hkaiser> ok
<K-ballo> the copy does type erasure, so that's a potential allocation, etc
<K-ballo> checking the atomic again, this time under a lock, just be cheap
<hkaiser> right
<hkaiser> the second is_ready could be relaxed
<K-ballo> yeap
<K-ballo> IIRC with those changes I measured something like a 18ns speedup or something :P
<hkaiser> yah, figures ;)
<hkaiser> nice to have anyways, will reduce contention
<K-ballo> there was something to be said about calling f.get() inside .then, there was some sort of effect there
<K-ballo> though since then I figured the future given to .then can use a completely different shared state than the one .then was called on
<K-ballo> one that does absolutely no locking whatsoever
<hkaiser> the shared state for the continuation?
<hkaiser> yah, we reuse the same base class everywhere
<hkaiser> but the main hit is the allocaton of the shared state anyways
<K-ballo> fut.then([](auto fut) { ... }); the inner fut does not have to be the same shared state as the outer fut
<K-ballo> one could even have a single shared state with two personalities, implementing the shared state interface twice
<github> [hpx] hkaiser force-pushed local_agas from 8fbd863 to 603fdc9: https://git.io/vpCEX
<github> hpx/local_agas 603fdc9 Hartmut Kaiser: Handle future state as an atomic instead of inside a locked region
<hkaiser> K-ballo: that would save an allocation
<K-ballo> the fut.get() call inside .then doesn't need any form of synchronization, it knows it's ready by construction
<hkaiser> yes
<hkaiser> K-ballo: they are redoing future currently...
<hkaiser> I'll wait for that to happen
<K-ballo> nod, I'm reading bits here and there
<hkaiser> K-ballo: future itself will have no get anymore, there will be std::this_thread::get() instead
<K-ballo> can't say that surprises me
<hkaiser> nod
<K-ballo> my last attempt at implementing future did .get on top of .then
<hkaiser> K-ballo: yah, that's essentially it
<K-ballo> I had trouble with the timed waits though
<hkaiser> K-ballo: ok
<K-ballo> because you have to potentially back out and nullify a continuation somehow
<hkaiser> ahh, yes
<hkaiser> interesting
Antrix[m] has quit [Ping timeout: 256 seconds]
M-ms has quit [Ping timeout: 276 seconds]
FjordPrefect has quit [Ping timeout: 276 seconds]
itachi_uchiha_ has quit [Ping timeout: 276 seconds]
K-ballo has quit [Quit: K-ballo]
diehlpk has quit [Ping timeout: 264 seconds]
nanashi55 has quit [Ping timeout: 248 seconds]
nanashi55 has joined #ste||ar
khuck has quit [Remote host closed the connection]
parsa has quit [Quit: Zzzzzzzzzzzz]
khuck has joined #ste||ar
anushi has quit [Ping timeout: 265 seconds]
Anushi1998 has joined #ste||ar
jaafar has quit [Ping timeout: 248 seconds]
khuck has quit [Remote host closed the connection]
anushi has joined #ste||ar
khuck has joined #ste||ar
khuck has quit [Ping timeout: 248 seconds]
Anushi1998 has quit [Remote host closed the connection]
khuck has joined #ste||ar
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
quaz0r has quit [Ping timeout: 240 seconds]
khuck has quit [Ping timeout: 256 seconds]
simbergm has quit [Ping timeout: 263 seconds]
quaz0r has joined #ste||ar
Antrix[m] has joined #ste||ar
itachi_uchiha_ has joined #ste||ar
FjordPrefect has joined #ste||ar
M-ms has joined #ste||ar
Anushi1998 has joined #ste||ar
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vpCbM
<github> hpx/gh-pages 0e35659 StellarBot: Updating docs
simbergm has joined #ste||ar
_anushi has joined #ste||ar
Anushi1998 has quit [Read error: Connection reset by peer]
FjordPrefect has quit [Ping timeout: 260 seconds]
itachi_uchiha_ has quit [Ping timeout: 256 seconds]
M-ms has quit [Ping timeout: 255 seconds]
Antrix[m] has quit [Ping timeout: 255 seconds]
khuck has joined #ste||ar
khuck has quit [Ping timeout: 255 seconds]
parsa has joined #ste||ar
parsa has quit [Client Quit]
Antrix[m] has joined #ste||ar
anushi has quit [Ping timeout: 276 seconds]
anushi has joined #ste||ar
FjordPrefect has joined #ste||ar
itachi_uchiha_ has joined #ste||ar
M-ms has joined #ste||ar
jakub_golinowski has joined #ste||ar
K-ballo has joined #ste||ar
jakub_golinowski has quit [Ping timeout: 240 seconds]
jakub_golinowski has joined #ste||ar
<simbergm> ./bin/hello_world --hpx:pu-step=2
<simbergm> terminate called after throwing an instance of 'std::runtime_error'
<simbergm> what(): partitioner::setup_pools: Default pool default has no threads assigned. Please rerun with --hpx:threads=X and check the pool thread assignment
<hkaiser> uhh ohh
<simbergm> am I missing some other required option or is that broken?
<hkaiser> looks to be broken
<simbergm> hrm, this is with hwloc 1.11
<simbergm> will test with 2.0 as well
<hkaiser> simbergm: shouldn't be a hwloc issue
<simbergm> ok
<simbergm> any guesses on what could be wrong? I can look into it
<hkaiser> simbergm: the assignment of cores to threads might be broken in this case
<hkaiser> simbergm: what are you up to these days, btw?
<simbergm> :)
<hkaiser> still doing hpx work?
<simbergm> I'm wrapping the HPX cholesky implementation into a distributed linear algebra interface
<hkaiser> ahh, nice
<simbergm> one guy at cscs is working on the interface
<hkaiser> mauro?
<simbergm> raffaele
<hkaiser> k
<simbergm> I'm also preparing a small hpx workshop internally in the group
<github> [hpx] hkaiser closed pull request #3293: Adding emplace support to promise and make_ready_future (master...support_p0319) https://git.io/vpTRU
<simbergm> I've not abandoned hpx!
<hkaiser> :D
<hkaiser> simbergm: heller was mumbling something about the need to create an hpx backend for kokkos, what do you think about such a thing?
<hkaiser> would it make sense?
<simbergm> hrm, not sure but it feels like overkill
<simbergm> kokkos is much smaller in scope compared to hpx
<hkaiser> nod
<hkaiser> I'm not a fan of this idea as well
<simbergm> my boss has been asking if we could add other backends to hpx instead
<hkaiser> for instance?
jakub_golinowski has quit [Ping timeout: 260 seconds]
<simbergm> for example
<hkaiser> k
<simbergm> have you seen that? any experience?
<hkaiser> none
jakub_golinowski has joined #ste||ar
<simbergm> I think an hpx backend for kokkos seems doable but unnecessary, but maybe heller has thought more about it
nanashi64 has joined #ste||ar
nanashi55 has quit [Ping timeout: 256 seconds]
nanashi55 has joined #ste||ar
nanashi64 has quit [Ping timeout: 268 seconds]
<hkaiser> simbergm: I had the impression that he was coming more from a political end
<jakub_golinowski> Hey, I am preparing my laptop for the gsoc and have a question about HPX dependencies. During application process and my first HPX installation I only used the obligatory dependencies i.e. boost and hwloc. Now I want to ask about the "Highly Recommended Optional Software Prerequisites for HPX on Linux systems"
<jakub_golinowski> gperf, libunwind and openMPI
<jakub_golinowski> Should I use them?
<hkaiser> no need
<hkaiser> you might want to consider using jemalloc or tcmalloc, though
nanashi64 has joined #ste||ar
<jakub_golinowski> as I understand tcmalloc is a part of gperf?
<hkaiser> well, mpi is an option, possibly - if you want to try distributed things
<hkaiser> tcmalloc is part of google perftools
nanashi55 has quit [Ping timeout: 260 seconds]
nanashi64 is now known as nanashi55
<parsa[w]> hkaiser: is #367 done?
_anushi has quit [Quit: Leaving]
<hkaiser> parsa[w]: I added a couple comments just now
<hkaiser> parsa[w]: it's done in principle but could be improved, I think
<parsa[w]> i think it needs to be a state machine
<hkaiser> why's that"
<hkaiser> ?
<parsa[w]> it's got too many states
<parsa[w]> doing simple tasks
<hkaiser> well, sure - if you think this will help
<hkaiser> I'd rather think that it's a simple app - 1) read input AST (either from physl or ast), 2) optionally transform AST, 3) compile and run, 4) optionally write AST
<hkaiser> no bid deal
<hkaiser> big
<hkaiser> well, and 5) write instrumention (optionally)
<hkaiser> so why a state machine, there are no complex state changes in this scheme
<parsa[w]> dealing with the options seems to be getting complex
<hkaiser> shrug
<hkaiser> 2), 4), and 5) are optional - based on command line
<hkaiser> 1) makes its decision based on command line as well
<hkaiser> try to solve each of the steps separately, then invoke the step based on the options
<hkaiser> easy
jakub_golinowski has quit [Ping timeout: 240 seconds]
Anushi1998 has joined #ste||ar
mcopik has joined #ste||ar
khuck has joined #ste||ar
aserio has joined #ste||ar
khuck has quit [Ping timeout: 265 seconds]
<hkaiser> parsa[w]: here is the main function (approximate): https://gist.github.com/hkaiser/750001bcc9d7edd75789a627d3f363dd
<hkaiser> the rest should be rather straightforward
hkaiser has quit [Read error: Connection reset by peer]
eschnett has quit [Quit: eschnett]
Anushi1998 has quit [Ping timeout: 240 seconds]
khuck has joined #ste||ar
diehlpk has joined #ste||ar
diehlpk has quit [Client Quit]
diehlpk has joined #ste||ar
anushi has quit [Ping timeout: 255 seconds]
eschnett has joined #ste||ar
jakub_golinowski has joined #ste||ar
anushi has joined #ste||ar
galabc has joined #ste||ar
hkaiser has joined #ste||ar
<heller> simbergm: hkaiser: kokkos is indeed small in scope than HPX. However, wouldn't it be nice to be able to use the algorithms etc provided in kokkos within an HPX application?
<heller> simbergm: regarding argobots. I hear it's mostly a dead end.
<heller> might be wrong though
<diehlpk> When I talked to the "HPC" sessions speakers on computational engineering conferences, many of them use kokkos
<heller> simbergm: if it makes sense (politically) to use argobots as a backend to our scheduling, why not
<simbergm> heller: my impression of kokkos is that they basically have a parallel for loop, scan and reduce... is there something else?
<simbergm> and re argobots: interesting, what have you heard about it being a dead end
<diehlpk> simbergm, This is a good summary
<simbergm> it looks fast, but still very young
<heller> simbergm: correct. the big thing about kokkos is its views etc
nikunj has joined #ste||ar
<heller> simbergm: re argobots, not sure, I guess that guy saying that was more coming from the "it's cooking its own soup again, HPX is the way to go, standards compliance etc"
anushi has quit [Ping timeout: 240 seconds]
<heller> but if you guys think it's worth a shot... why not?
<heller> in the end, it would "just" be yet another context switching implementation and scheduler, right?
<hkaiser> Mike Heroux said to me in TOkyo (direct quote): "you guys are the only ones doing it right - I don't understand why everybody else is trying to do their own stuff"
<simbergm> oh it's more of a general idea to have different backends for HPX, argobots just happens to have futures
<simbergm> but I'm not yet convinced it would be worth the effort
<simbergm> (it might be though)
<hkaiser> simbergm: everbody has futures now (as everybody has stolen the idea from us) ;-)
<simbergm> heller: yes, exactly that
<heller> the futures in argobots don't have continuations ;)
<K-ballo> argobots sounds like a cartoon about avenger robots
<heller> the problem about argobots is that it comes out of argonne. So it's good by definition
<heller> simbergm: NB: the argobots paper doesn't even mention HPX
<diehlpk> hkaiser, Could you may add Marcin and me to the HPXML repo?
<hkaiser> diehlpk: sure
anushi has joined #ste||ar
<hkaiser> diehlpk: what's marcin's github id?
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
aserio1 is now known as aserio
nikunj has quit [Ping timeout: 260 seconds]
<diehlpk> hkaiser, mcopik
<mcopik> hkaiser: suprisingly it's mcopik
jakub_golinowski has quit [Ping timeout: 240 seconds]
<heller> hkaiser: regarding operator new: future allocation might not be the only source here. Could be allocations in function as well. You might want to check that
<github> [hpx] khuck pushed 1 new commit to apex_task_wrapper_memory_bug: https://git.io/vpWVg
<github> hpx/apex_task_wrapper_memory_bug c2a3be6 Kevin Huck: Don't record an APEX task when creating default object...
jakub_golinowski has joined #ste||ar
galabc has quit [Quit: Leaving]
david_pfander has quit [Ping timeout: 268 seconds]
nikunj has joined #ste||ar
jakub_golinowski has quit [Quit: Ex-Chat]
jakub_golinowski has joined #ste||ar
jakub_golinowski has quit [Client Quit]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
aserio has quit [Ping timeout: 256 seconds]
<hkaiser> mcopik: figures
<hkaiser> diehlpk, mcopik: done
<hkaiser> heller: you there?
EverYoung has quit [Read error: Connection reset by peer]
EverYoung has joined #ste||ar
jaafar has joined #ste||ar
<heller> hkaiser: what's up?
<hkaiser> I forgot
<heller> ;)
<heller> Maybe about allocation in hpx::util::function?
<hkaiser> no, but almost all my allocations are from shared states now
<hkaiser> heller: using tcmalloc improves things significantly, however
<K-ballo> what's up with allocation in hpx::util::function?
diehlpk has quit [Ping timeout: 260 seconds]
<heller> Just a guess what might be a performance problem
<heller> hkaiser: great!
<heller> hkaiser: so, how about this: shared states only get their memory from a thread local pool
<hkaiser> heller: how about releasing that memory back to the pool, that could happen from a different thread
<heller> So what? Just put it to your local pool
<heller> And reuse it there
<hkaiser> heller: well, depends on what you mean by pool
<hkaiser> if it's just a storage of reusable memory block, then yah, if it's some arena based allocator, then no
<heller> Yeah, essentially just a stack of already allocated blocks of memory
aserio has joined #ste||ar
nikunj has quit [Ping timeout: 264 seconds]
<heller> Very simple, no locking and we don't have locality for shared states anyway
<hkaiser> could work
nikunj has joined #ste||ar
<heller> Having those blocks cache aligned might be a good call there as well (avoiding cache thrashing for the synchronization)
<hkaiser> shared states have different sizes, though
<hkaiser> one for each type? sounds like overkill
<hkaiser> one pool*
<heller> One with a map from size to stack
<heller> ?
<heller> Returning cache aligned memory is always correctly aligned
<hkaiser> sure
<heller> Just a thought...
<hkaiser> heller: isn't that what jemallco etc are already doing?
<heller> They bail out eventually
<heller> They maintain a thread local cache combined with an arena allocator
mcopik has quit [Ping timeout: 248 seconds]
Anushi1998 has joined #ste||ar
Anushi1998 has quit [Ping timeout: 240 seconds]
aserio has quit [Ping timeout: 260 seconds]
Anushi1998 has joined #ste||ar
Anushi1998 has quit [Ping timeout: 256 seconds]
aserio has joined #ste||ar
anushi has quit [Ping timeout: 240 seconds]
anushi has joined #ste||ar
parsa has joined #ste||ar
diehlpk has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
hkaiser has quit [Read error: Connection reset by peer]
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
aserio has quit [Ping timeout: 276 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
parsa has quit [Quit: Zzzzzzzzzzzz]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
aserio has joined #ste||ar
Anushi1998 has joined #ste||ar
Anushi1998 has quit [Ping timeout: 240 seconds]
anushi has quit [Ping timeout: 255 seconds]
anushi has joined #ste||ar
Anushi1998 has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
hkaiser has joined #ste||ar
wash has quit [Ping timeout: 246 seconds]
wash has joined #ste||ar
aserio has joined #ste||ar
aserio has quit [Quit: aserio]
nikunj has quit [Quit: Leaving]
Anushi1998 has quit [Quit: Leaving]
khuck has quit [Remote host closed the connection]
diehlpk has quit [Ping timeout: 260 seconds]
khuck has joined #ste||ar
eschnett has quit [Quit: eschnett]
<github> [hpx] hkaiser force-pushed local_agas from b9dfe4e to c2357f0: https://git.io/vpCEX
<github> hpx/local_agas fd7e7b6 Hartmut Kaiser: Marking migratable objects in their gid to allow not handling migration in AGAS...
<github> hpx/local_agas 353db49 Hartmut Kaiser: Handle future state as an atomic instead of inside a locked region...
<github> hpx/local_agas c2357f0 Hartmut Kaiser: Executing remote direct actions directly, if possible
EverYoung has joined #ste||ar