aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
Matombo has joined #ste||ar
pree has joined #ste||ar
vamatya has quit [Ping timeout: 252 seconds]
eschnett has joined #ste||ar
Matombo444 has joined #ste||ar
Matombo has quit [Ping timeout: 252 seconds]
K-ballo has quit [Quit: K-ballo]
Matombo444 has quit [Ping timeout: 248 seconds]
diehlpk has quit [Remote host closed the connection]
Matombo444 has joined #ste||ar
hkaiser has quit [Quit: bye]
pree has quit [Read error: Connection reset by peer]
vamatya has joined #ste||ar
pree has joined #ste||ar
Matombo444 has quit [Remote host closed the connection]
pree has quit [Quit: AaBbCc]
vamatya has quit [Ping timeout: 248 seconds]
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
pree has joined #ste||ar
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
pree has quit [Read error: Connection reset by peer]
pree has joined #ste||ar
david_pfander has joined #ste||ar
pree has quit [Read error: Connection reset by peer]
pree has joined #ste||ar
eschnett_ has joined #ste||ar
Vir has quit [Ping timeout: 240 seconds]
eschnett has quit [Ping timeout: 240 seconds]
eschnett_ is now known as eschnett
pree has quit [Read error: Connection reset by peer]
Vir has joined #ste||ar
pree has joined #ste||ar
pree has quit [Read error: Connection reset by peer]
pree has quit [Read error: Connection reset by peer]
Matombo has joined #ste||ar
pree has joined #ste||ar
bikineev has quit [Ping timeout: 252 seconds]
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
pree has quit [Quit: AaBbCc]
bikineev has joined #ste||ar
bikineev has quit [Read error: Connection reset by peer]
bikineev has joined #ste||ar
bikineev_ has joined #ste||ar
bikineev has quit [Read error: Connection reset by peer]
bikineev_ has quit [Client Quit]
K-ballo has joined #ste||ar
bikineev has joined #ste||ar
bikineev has quit [Read error: Connection reset by peer]
bikineev has joined #ste||ar
bikineev has quit [Read error: Connection reset by peer]
bikineev has joined #ste||ar
hkaiser has joined #ste||ar
bikineev_ has joined #ste||ar
bikineev has quit [Read error: Connection reset by peer]
bikineev_ has quit [Remote host closed the connection]
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
eschnett has quit [Quit: eschnett]
Matombo has quit [Ping timeout: 240 seconds]
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
eschnett has joined #ste||ar
StefanLSU has joined #ste||ar
StefanLSU has quit [Quit: StefanLSU]
eschnett has quit [Quit: eschnett]
Rodario has joined #ste||ar
<Rodario>
@zao, hi im the guy from weeks ago who was having the unsolvable problem with building hpx. Wanted to give the update that im using Hpx for demo purposes on my local Vm with identical OS (Fedora 26) where it is working just fine.
<github>
[hpx] hkaiser force-pushed process_error_reporting from 8698202 to c2ccf42: https://git.io/v5hg5
<github>
hpx/process_error_reporting c2ccf42 Hartmut Kaiser: Improve error reporting for process component on POSIX systems
bikineev has quit [Remote host closed the connection]
<github>
[hpx] hkaiser opened pull request #2913: Fix rp hang (master...fix_rp_hang) https://git.io/vdJt8
hkaiser has quit [Quit: bye]
aserio has joined #ste||ar
patg[[w]] has joined #ste||ar
rod_t_ has joined #ste||ar
rod_t_ has left #ste||ar [#ste||ar]
hkaiser has joined #ste||ar
<aserio>
hkaiser: did you head home?
<hkaiser>
ABresting: yes
<hkaiser>
aserio: ^^
<hkaiser>
aserio: you need me for anything?
<aserio>
hkaiser: no :) just wondering
<hkaiser>
;)
<patg[[w]]>
While cat's away the mice will play
Rodario has joined #ste||ar
Rodario1 has joined #ste||ar
rod_t has left #ste||ar [#ste||ar]
Rodario has quit [Ping timeout: 240 seconds]
zbyerly_ has quit [Ping timeout: 240 seconds]
Rodario has joined #ste||ar
patg[[w]] has quit [Quit: Leaving]
Rodario1 has quit [Ping timeout: 255 seconds]
<jbjnr>
anyone home?
<hkaiser>
jbjnr: wazzup?
<jbjnr>
hi- quick question ...
<jbjnr>
I need to add a numa allocator that does more than just first touch, but hwloc has HWLOC_MEMBIND_XXX options for doing that
<jbjnr>
so should I simply allow the user to use HWLOC_MEMBIND_XXX flags in their code, or should we provide HPX_MEMBIND_XXX flags that mirror the hwloc ones?
<hkaiser>
uhh
<jbjnr>
I dont like the idea of allowing HWLOC flags directly, but I don't want to duplicate the entire hwloc api either
<hkaiser>
the hwloc API is very lowlevel and error-prone when used
<jbjnr>
ok, then I wrap things
<hkaiser>
that's why we wrapped it
<jbjnr>
<sigh>
<hkaiser>
if you'r ehappy in your code with using hwloc directly - just do it
<jbjnr>
I'm happy in my code, but when I put in a PR I would like it to be acceptable for everyone
<jbjnr>
(cos I'm a conscientious contributor and all that)
<hkaiser>
well, that's a different story, indeed ;)
<hkaiser>
do you need to expose this at all?
<jbjnr>
for a quick hack, no, but for long term support, yes. we will need more hwloc stuff I think
<hkaiser>
k
<jbjnr>
we need hwloc++
<hkaiser>
I think that we should try to avoid exposing lower-level system properties as much as possible, but I don't understand your use cases
Rodario1 has joined #ste||ar
<jbjnr>
to summarise - I added a flag to the pool setup to switch memory mode to NUMA INTERLEAVED for my custom scheduler - and this solves our larger matrix issues - but it has a downside - it makes all pages allocated on the cores used by the pool interleaved - which impacts perf in other places. I need finer control - on an allocator basis and thus a clean api - we will want other features too -...
<jbjnr>
...this is just the start of our optimisation process and numa aware everythig will be next
<hkaiser>
what's numa-interleaved?
<jbjnr>
alternate pages get allocated from alternate numa domains
<jbjnr>
I didn't know it existed - but out matrix was allocated at start and all from one numa domain
<hkaiser>
yah, that sounds like something for special allocators
<jbjnr>
this led to bad bw issues
<hkaiser>
sure, makes sense
<jbjnr>
kokkos has a hwloc wrapper - we should steal it :)
Rodario has quit [Ping timeout: 248 seconds]
<hkaiser>
go ahead ;)
<jbjnr>
are we going to go ahead and make hwloc compulsory from now on (since RP needs it)
<jbjnr>
if, so we could remove a load of cruft from the #ifdef HPX_HAVE_HWLOC etc etc
<jbjnr>
and do a major clean up
<hkaiser>
jaafar_: I think yes, that's what we'll have to do
<hkaiser>
jbjnr: ^
<jbjnr>
whilst we're at it, we should cmakify hwloc and add it as a subdir that can be checked out and built in like apex ...
Rodario has joined #ste||ar
Rodario1 has quit [Ping timeout: 240 seconds]
* zao
shakes a tiny fist at hwloc
<zao>
Or rather, HPX's assumptions about machine structure :P
<jbjnr>
zao: what assumptions?
mcopik has joined #ste||ar
<zao>
jbjnr: As far as I got before I stopped hacking, it assumes that a core has PU children directly below.
<zao>
I've got a machine that doesn't expose "core" to hwloc at all, due to a bit of lack of OS support.
<zao>
Also no cache or package info.
<zao>
Might have to fix the OS in this case.
<zao>
Or just not try to build on it anymore.
<zao>
The day someone builds a cluster on DragonFlyBSD, I'll be very scared.
<jbjnr>
hmmm. not strictly an hpx failed assumption then. is it a show stopper or have you worked around it?
<zao>
I gave up.
<jbjnr>
hmmm
<zao>
Got no idea what the code is supposed to be used for, and where else it's assumed to be like that.
<jbjnr>
pus - but no cores - we ought to be able to work around that
<jbjnr>
just assume that if cores is zero, we set it to #pus - but then the affinity masks would be incorrect. yes it might be a serious problem
<jbjnr>
cos everything in hwloc uses masks for cores/domains etc
<zao>
I should upstream what I have for now, but it's still rather broken logically.
<zao>
At least we found a hang in the RP ctor :)
Rodario1 has joined #ste||ar
Rodario has quit [Ping timeout: 240 seconds]
aserio has quit [Quit: aserio]
Rodario has joined #ste||ar
Rodario1 has quit [Ping timeout: 252 seconds]
bikineev has joined #ste||ar
Rodario1 has joined #ste||ar
diehlpk_work has quit [Quit: Leaving]
Rodario has quit [Ping timeout: 248 seconds]
mcopik has quit [Ping timeout: 240 seconds]
Rodario has joined #ste||ar
Rodario1 has quit [Ping timeout: 240 seconds]
EverYoung has quit [Remote host closed the connection]
Rodario1 has joined #ste||ar
Rodario has quit [Ping timeout: 260 seconds]
Rodario1 has quit [Client Quit]
Rodario has joined #ste||ar
bikineev has quit [Remote host closed the connection]
Rodario has quit [Quit: Leaving.]
Rodario has joined #ste||ar
Rodario has quit [Quit: Leaving.]
zbyerly_ has joined #ste||ar
bikineev has joined #ste||ar
Matombo has quit [Remote host closed the connection]