aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
Rodario1 has joined #ste||ar
Rodario has quit [Ping timeout: 240 seconds]
Rodario has joined #ste||ar
Rodario1 has quit [Ping timeout: 248 seconds]
Rodario1 has joined #ste||ar
Rodario has quit [Ping timeout: 260 seconds]
StefanLSU has joined #ste||ar
Rodario has joined #ste||ar
Rodario1 has quit [Ping timeout: 240 seconds]
Matombo has quit [Ping timeout: 248 seconds]
StefanLSU has quit [Quit: StefanLSU]
Rodario1 has joined #ste||ar
Rodario has quit [Ping timeout: 255 seconds]
eschnett has joined #ste||ar
Rodario has joined #ste||ar
Rodario1 has quit [Ping timeout: 240 seconds]
StefanLSU has joined #ste||ar
StefanLSU has quit [Quit: StefanLSU]
Rodario1 has joined #ste||ar
Rodario has quit [Ping timeout: 240 seconds]
eschnett has quit [Quit: eschnett]
Rodario has joined #ste||ar
Rodario1 has quit [Ping timeout: 240 seconds]
hkaiser has quit [Quit: bye]
Rodario1 has joined #ste||ar
Rodario has quit [Ping timeout: 240 seconds]
diehlpk has joined #ste||ar
zbyerly_ has quit [Ping timeout: 240 seconds]
diehlpk has quit [Remote host closed the connection]
Rodario has joined #ste||ar
Rodario1 has quit [Ping timeout: 240 seconds]
Rodario1 has joined #ste||ar
Rodario has quit [Ping timeout: 240 seconds]
Rodario has joined #ste||ar
Rodario1 has quit [Ping timeout: 240 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
Rodario1 has joined #ste||ar
Rodario has quit [Ping timeout: 240 seconds]
Rodario has joined #ste||ar
Rodario1 has quit [Ping timeout: 248 seconds]
AnujSharma has joined #ste||ar
Rodario1 has joined #ste||ar
Rodario has quit [Ping timeout: 240 seconds]
Rodario has joined #ste||ar
Rodario1 has quit [Ping timeout: 240 seconds]
Rodario1 has joined #ste||ar
Rodario has quit [Ping timeout: 248 seconds]
Rodario1 has quit [Quit: Leaving.]
jaafar has joined #ste||ar
jaafar has quit [Ping timeout: 240 seconds]
<jbjnr>
heller: yt? when you are available, can I have a quick skype chat about the stream benchmark and making it work with RP. I want to kill off some of the block_executor stuff and replace it with pool executors etc etc
<jbjnr>
it is an executor, but it uses a cpu mask to set itself up
<jbjnr>
and I'm not sure how it interacts with the schedulers etc and the thread pool
<jbjnr>
I want to get rid of it.
<hkaiser>
that's a left-over, most probably - I have not touched th executors at all after working on the RP stuff
<jbjnr>
it seems to be a 'special case' executor though
<hkaiser>
by all means if we can subsume its functionality.. go ahead
<hkaiser>
pls update all the code using it, though
<jbjnr>
ok. I'm puzzled by it's operation though. I wanted to ask about its internals
<hkaiser>
ok
Matombo has quit [Ping timeout: 246 seconds]
<hkaiser>
it hosts several executors, one for each numa domain - that's an artifact
<hkaiser>
it was the only way before the rp to handle things
<jbjnr>
yeah. I want to remove these executors, but I'm not sure how they work internaly. how they interact with the thread pool.
<jbjnr>
sorry. people coming in and out and distracting me ...
<hkaiser>
jbjnr: the executors are 'attached' to the pools
<hkaiser>
executors are very thin objects by design, they don't own anything, just provide access to an underlying pool
taeguk has joined #ste||ar
<jbjnr>
hkaiser: I understand about the executors - what is tioubling me is that the target is defined solely by a birmap mask for pus - the executor is then bound to the target and when it launches a task, it uses hpx::parallel::execution::blah-blah - but then I am not sure how it interacts with the schedulers and the normal executors etc etc
<jbjnr>
it seems like it could hijack a thread pool by puting tasks on it, bypassing the usual task creation mechanism
<hkaiser>
the pool i screating the tasks, no?
<hkaiser>
is*
<jbjnr>
the executor just calls parallel::execution::async_execute(executors_[current],std::forward<F>(f), std::forward<Ts>(ts)...);}
<hkaiser>
jbjnr: I think all the exeutors need to be adapted to the rp one way or another - that's completely missing...
hkaiser has quit [Read error: Connection reset by peer]
mbremer has quit [Quit: Page closed]
hkaiser has joined #ste||ar
rod_t has joined #ste||ar
hkaiser has quit [Quit: bye]
pree has quit [Ping timeout: 240 seconds]
pree has joined #ste||ar
david_pfander has quit [Ping timeout: 252 seconds]
pree has quit [Read error: Connection reset by peer]
pree has joined #ste||ar
pree has quit [Ping timeout: 252 seconds]
AnujSharma has joined #ste||ar
EverYoung has joined #ste||ar
AnujSharma has quit [Ping timeout: 240 seconds]
pree has joined #ste||ar
pree has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
pree has joined #ste||ar
mcopik has quit [Ping timeout: 252 seconds]
pree has quit [Ping timeout: 246 seconds]
mbremer has joined #ste||ar
<mbremer>
@heller: That direct channel implementation seems to be working pretty well. There seems to be roughly a 6% (as in idle rate units) decrease in idle rate, and 10% speed-up in performance across various oversubscription configurations.
<mbremer>
^should have read the irc log first
<heller>
mbremer: great
<heller>
mbremer: does it also improve distributed performance?
<mbremer>
Well those numbers are for a problem across 8 nodes.
pree has joined #ste||ar
<mbremer>
So right now observed IR is 15% for 2X oversubsubscription.
<mbremer>
I would like to figure out if I'm actually using the intel omnipath network correctly. I read a talk by some charm++/NAMD people saying that they need more MPI processes per node to saturate the network vs whatever Cray network was on Cori
<mbremer>
Are there counters for measuring network bandwidth in HPX? Or should I rely on something like I_MPI_STATS?
<heller>
there are ways to read the number of bytes
<heller>
and the time it took
<heller>
where you could get the bandwidth from then
<mbremer>
Cool. I'll do that. Thanks @heller
mcopik has joined #ste||ar
pree has quit [Read error: Connection reset by peer]
EverYoun_ has joined #ste||ar
EverYoun_ has quit [Remote host closed the connection]
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
pree has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 246 seconds]
aserio1 is now known as aserio
EverYoung has joined #ste||ar
jaafar has joined #ste||ar
EverYoun_ has quit [Ping timeout: 255 seconds]
pree has quit [Ping timeout: 240 seconds]
pree has joined #ste||ar
pree has quit [Ping timeout: 240 seconds]
aserio has quit [Ping timeout: 255 seconds]
jaafar has quit [Ping timeout: 240 seconds]
pree has joined #ste||ar
Matombo has joined #ste||ar
pree has quit [Ping timeout: 264 seconds]
jaafar has joined #ste||ar
<github>
[hpx] hkaiser created fixing_2439_3 (+2 new commits): https://git.io/vdIJc
<github>
hpx/fixing_2439_3 485f64e Hartmut Kaiser: Replace executor_parameter_traits with separate customization points
<github>
hpx/fixing_2439_3 7c283f8 Hartmut Kaiser: Merge branch 'master' into fixing_2439_3...
<heller>
mbremer: another thing worth trying is to reduce the number of background threads
<github>
hpx/master f032046 Hartmut Kaiser: Adding parallel::unique to docs
<hkaiser>
jbjnr: NO
<hkaiser>
we have a bitmap type: mask_type and mask_cref_type
<hkaiser>
jbjnr: don't wrap HWLOC, wrap functionality we need in sensible ways, similar to what the topology class is doing
<hkaiser>
nobody wants to deal with bitmasks and similar lowlevel nonesense
<hkaiser>
jaafar: you introduced a nice abstraction of all of this in the RP, why not stick to it?
<hkaiser>
jbjnr: ^^
<jbjnr>
that's my point. hwloc api does not accept out bitset types
<jbjnr>
so I have to duplicate everything
<jbjnr>
its pathetic
<hkaiser>
you don't
<hkaiser>
what functionality do you miss from the topology class?
<jbjnr>
passing bitmaps around to use in our memor biding calls
<jbjnr>
^memory binding
<hkaiser>
either pass a mask_type or your class hierarchy
<jbjnr>
copying to and from mask_cref_type etc every time is pointless
<hkaiser>
why's that pointless?
<hkaiser>
hide it i a wrapper and forget about it
<jbjnr>
you have to do an hwloc_bitmap_allo and free everyt time you convert from bitset<> to hwloc_bitmap_t
<hkaiser>
so what?
<jbjnr>
this is not what I joined hpx for
<hkaiser>
lol
<jbjnr>
I don't want shit code
<jbjnr>
if hwloc_bitmap_t is valid, then I want to use it natively
<hkaiser>
what did you join it for? to duplicate the hwloc nonesense on the user-api level?
<jbjnr>
no
<jbjnr>
you are missing the point
<hkaiser>
if you pass around hwloc_bitmap_t's then you have to make sure it gets deallocated properly...
<jbjnr>
I do not want to copy bitmpas from hwloc to hpx constantly - I want to just use the hwloc types directly
<hkaiser>
so you HAVE to wrap it
<jbjnr>
we have hpx::resource::numa_domain
<hkaiser>
yes
<jbjnr>
I've put it in there for now
<hkaiser>
shrug
<jbjnr>
but it means we have hwloc_ types exposed (typdef void * hpx_hwloc_bitmap_t)
<jbjnr>
<sigh>
<hkaiser>
do whatever you want as long as a) the hwloc resources are properly allocated, copied, moved, and deallocated, and b) the user does not see anything of this nonesense
<jbjnr>
fine
<jbjnr>
thank you
<hkaiser>
jaafar: also, if you make hwloc mandatory now, please change all the related code, docs, etc.
<hkaiser>
jbjnr: darn, sorry
* jbjnr
still wants hwloc++
<hkaiser>
^^
<jbjnr>
yup
<hkaiser>
that means you can remove the none_topology and make all functions from the topology base class non-virtual, etc.
<jbjnr>
this is why I am unhappy
<jbjnr>
I have opened a huge can of worms
<jbjnr>
and you and heller will hate me
<hkaiser>
jbjnr: so close it before it gets out
<hkaiser>
jbjnr: I still think we shouldn't use hwloc types directly
<jbjnr>
this is where my conversation started.
<hkaiser>
that bit of back-and-forth between mask_type and the hwloc counterpart can be done in one place, then we can forget about it
<jbjnr>
I'm unhappy because I know we whouldn't, but wrapping is not good either
<hkaiser>
why's that so bad?
<jbjnr>
hwloc_bitmap_alloc and free
<jbjnr>
total waste of resources
<hkaiser>
blame hwloc
<jbjnr>
making me anxious and sweaty
<hkaiser>
so hide it deeply and forget about it
<jbjnr>
the god of prgramming will never let me into heaven
<jbjnr>
I'll be condemned to an eternity of fortran hell
<hkaiser>
why? because you make the bad stuff disappear?
<hkaiser>
you're trying to prematurely optimize
<hkaiser>
let's do it right first, let's make it fast later (if needed)