aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
StefanLSU has joined #ste||ar
bikineev has quit [Remote host closed the connection]
StefanLSU has quit [Client Quit]
mcopik_ has quit [Ping timeout: 264 seconds]
jaafar has quit [Ping timeout: 252 seconds]
jaafar has joined #ste||ar
jaafar has quit [Ping timeout: 246 seconds]
Matombo444 has joined #ste||ar
Matombo has quit [Ping timeout: 240 seconds]
Matombo444 has quit [Remote host closed the connection]
K-ballo has quit [Quit: K-ballo]
eschnett has quit [Quit: eschnett]
hkaiser has quit [Quit: bye]
jaafar has joined #ste||ar
jaafar has quit [Ping timeout: 255 seconds]
AnujSharma has joined #ste||ar
bikineev has joined #ste||ar
parsa has joined #ste||ar
<github> [hpx] sithhell pushed 2 new commits to master: https://git.io/v5F9U
<github> hpx/master a9fda22 Thomas Heller: One more fix for service_executor...
<github> hpx/master 1f803a7 Thomas Heller: Removing superfluous ')'
<github> [hpx] sithhell pushed 1 new commit to master: https://git.io/v5F9s
<github> hpx/master 1120afc Thomas Heller: Fixing more typos within the PAPI perf counters
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
Smasher has quit [Changing host]
Smasher has joined #ste||ar
Smasher has joined #ste||ar
david_pfander has joined #ste||ar
bikineev has quit [Remote host closed the connection]
Matombo has joined #ste||ar
mcopik_ has joined #ste||ar
Matombo has quit [Remote host closed the connection]
mcopik_ has quit [Ping timeout: 240 seconds]
jaafar has joined #ste||ar
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/v5bfD
<github> hpx/gh-pages d4c7836 StellarBot: Updating docs
bikineev has joined #ste||ar
hkaiser has joined #ste||ar
<github> [hpx] hkaiser closed pull request #2907: Optionaly force-delete remaining channel items on close (master...fixing_2890) https://git.io/v5dO2
jaafar has quit [Ping timeout: 255 seconds]
<heller> hkaiser: good morning
<heller> hkaiser: I hate the service_pool executor
<hkaiser> heller: g'morning
<zao> I wonder if it's HPX or my platform that's painfully broken.
<zao> All tests wedge on DragonFlyBSD.
<hkaiser> hmm
<hkaiser> probably some problem in hpx for that platform
<zao> On the thread I broke into, it was waiting for hpx::resource::get_partitioner
<zao> Should've looked at all threads, I guess.
<hkaiser> uhh
<hkaiser> that's relying on magic statics, but we do that in several spots...
<hkaiser> or hangs in the constructor of the rp
<zao> Is it trying to get the partitioner while throwing while constructing the partitioner?
bikineev has quit [Ping timeout: 246 seconds]
<hkaiser> uhh, the constructor of the rp tries to recursively call get_partitioner
<hkaiser> that hangs because of the 'magic statics' lock
<heller> zao: hwloc problems?
<heller> ahh
<zao> Release build, so very little debug info :(
<hkaiser> hwloc_topology_info::get_number_of_cores throws for some reason and throwing exceptions apparently calls get_partitioner
<hkaiser> no idea why (for both)
<zao> hwloc_get_nbobjs_by_type yields 0 in a test program, but I'm not sure if I'm setting the topology up right.
<zao> (for HWLOC_OBJ_CORE)
<hkaiser> zao: can you do a debug build?
<zao> It'll take a good while, but sure.
<hkaiser> hold on for a sec, let be have a look
<zao> Output from hwloc-ls seems rather sparse. Are we making any assumptions about the shape of the node somehow?
<hkaiser> zao: shrug, I didn't think so
<hkaiser> zao: I think I can fix that particular problem, let's see - at least I can fix the hang
<zao> Building a single test, [115/182] targets atm.
<zao> get_number_of_cores indeed returns 0.
<zao> Note that lstopo only has PUs, no cores or cache info at all.
<hkaiser> nice
<hkaiser> so hwloc is broken for you :/
<zao> Broken and broken... less capable :D
K-ballo has joined #ste||ar
<hkaiser> heh
<hkaiser> zao: any idea what we could do for a workaround?
<zao> Not sure what the HPX code does with these facts and if we can make some assumption that there's 1 core per PU or something.
<zao> Feels like it's a legitimate hwloc structure, just strangely empty of a lot of the common components.
<zao> Simple solution would be to refuse to use hwloc on the platform, I guess.
<hkaiser> zao: well, if there is no core information then we have to assume one pu per core
<zao> Not sure how much we lose then, or is it required?
<zao> I always forget which libs are optional and not.
<hkaiser> we move more and more to hwloc being mandatory
<hkaiser> especially the rp code assumes that we have it
<hkaiser> zao: at least I will change the rp initialization avoiding the hang
<hkaiser> zao: I'll commit a fix for the hang in a sec, then we can start looking into the hwloc issue
<github> [hpx] hkaiser created fix_rp_hang (+1 new commit): https://git.io/v5bnJ
<github> hpx/fix_rp_hang 1c37b94 Hartmut Kaiser: Avoiding hang during creation of the resource partitioner
<hkaiser> zao: ^^
<hkaiser> zao: how many PUs are reported for you?
pree has joined #ste||ar
<zao> Four PUs, the machine has 1 socket with 4 cores.
<zao> CPU: Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz (3192.63-MHz K8-class CPU)
<zao> I've checked in the DragonFly IRC channel, seems like their exposure of topology to userspace is a bit lacking.
<zao> Let's see if I can rebase and build on the fix_rp_hang branch.
<hkaiser> ok, cool - thanks - I think we can add a some workaround code if numcpus is reported as zero
<zao> HWLOC_OBJ_PU indeed says 4.
<hkaiser> hwloc_get_nbobjs_by_type( HWLOC_OBJ_PU) ?
<zao> Yeah, in my standalone test.
<github> [hpx] mlang opened pull request #2909: Fix typo in include path (master...typo) https://git.io/v5b8V
<hkaiser> ok, I'll add this as a fallback
<zao> Branch now crashes properly.
<zao> $ bin/reduce_test
<zao> terminate called after throwing an instance of 'hpx::detail::exception_with_info<hpx::exception>'
<zao> what(): hwloc_get_nbobjs_by_type failed: HPX(kernel_error)
<zao> [1] 64673 abort (core dumped) bin/reduce_test
<hkaiser> zao: good, so the hang is fixed
<github> [hpx] hkaiser pushed 1 new commit to fix_rp_hang: https://git.io/v5b4Y
<github> hpx/fix_rp_hang 00eb16c Hartmut Kaiser: Add fallback to topology::get_number_of_cores to looks at number of PUs reported
<hkaiser> here is the workaround ^^
hkaiser has quit [Quit: bye]
<zao> Further, but still hosed :)
<zao> /home/zao/stellar/hpx/src/runtime/threads/policies/hwloc_topology_info.cpp:235
<zao> I'll hack away on it eventually.
<zao> (got other stuff on the plate)
diehlpk_work has joined #ste||ar
aserio has joined #ste||ar
eschnett has joined #ste||ar
<zao> Seems like things like hwloc_topology_info::get_pu_number assumes that cores exist and always have PUs as children.
hkaiser has joined #ste||ar
pree has quit [Read error: Connection reset by peer]
pree has joined #ste||ar
aserio has quit [Read error: Connection reset by peer]
aserio has joined #ste||ar
rod_t has joined #ste||ar
parsa has joined #ste||ar
<hkaiser> heller: may I ask you to stop pushing directly to master
<zao> hkaiser: Much more is hosed. hwloc_topology_info has some rather deep assumptions that cores are parents to PUs.
<zao> Proper approach here may be to fix the OS.
<hkaiser> zao: yah, that's true
aserio has quit [Ping timeout: 264 seconds]
aserio has joined #ste||ar
<heller> hkaiser: sure. I thought those changes were rather trivial
<heller> hkaiser: but yes, the service_executor is more painful than I thought...
<heller> The other changes were fixing typos sneaking in in the format merge
<heller> hkaiser: we didn't have as much green since quite some time now...
<K-ballo> I'm amazed by that #include typo that I introduced about 4 years ago
<heller> Yeah...
<heller> We should remove that example all together...
AnujSharma has quit [Ping timeout: 264 seconds]
EverYoung has joined #ste||ar
mcopik_ has joined #ste||ar
pree has quit [Read error: Connection reset by peer]
mcopik_ has quit [Ping timeout: 240 seconds]
pree has joined #ste||ar
david_pfander has quit [Ping timeout: 255 seconds]
mbremer has joined #ste||ar
Matombo has joined #ste||ar
hkaiser has quit [Read error: Connection reset by peer]
<zbyerly_> is anyone else working on KNLs?
StefanLSU has joined #ste||ar
pree has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
jaafar has joined #ste||ar
<zbyerly_> im' having trouble with avx512 stuff
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
EverYoun_ has quit [Ping timeout: 240 seconds]
pree has joined #ste||ar
hkaiser has quit [Ping timeout: 248 seconds]
akheir has joined #ste||ar
pree has quit [Ping timeout: 264 seconds]
aserio has quit [Ping timeout: 246 seconds]
hkaiser has joined #ste||ar
<hkaiser> heller: why should we remove that example?
EverYoung has joined #ste||ar
pree has joined #ste||ar
StefanLSU has quit [Quit: StefanLSU]
StefanLSU has joined #ste||ar
StefanLSU has quit [Client Quit]
aserio has joined #ste||ar
StefanLSU has joined #ste||ar
<K-ballo> because it did not compile for 4 years and nobody noticed?
StefanLSU has quit [Quit: StefanLSU]
pree has quit [Ping timeout: 240 seconds]
mcopik_ has joined #ste||ar
pree has joined #ste||ar
<heller> hkaiser: what exactly does it demonstrate?
<heller> Also the fact that nobody noticed that it wasn't working for 4 years ;)
<zao> I'd like to shake a fist at the owner of the quickstart example on how HPX can be used in a library.
<zao> It's not exactly easy to pull out argc/argv from a library on cool OSes :D
<zao> Also abusing it for the memory counters. I hope it's not expensive to kvm_open/kvm_get*
<K-ballo> what's with `__argc/v` on linux? is that a windows detail leaked?
pree has quit [Remote host closed the connection]
<zao> It's assumedly a MSVCRT thing, which the example has hacked into working by defining globals by those names on macOS and Linux.
<zao> (and now FreeBSD/DragonFly in my branch)
aserio has quit [Read error: Connection reset by peer]
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
bikineev has joined #ste||ar
hkaiser has joined #ste||ar
eschnett has quit [Quit: eschnett]
bikineev has quit [Remote host closed the connection]
Shahrzad has joined #ste||ar
bikineev has joined #ste||ar
<zbyerly_> does hpx use any automatically generated source code?
<K-ballo> how is that defined? as in a separate pre-compilation step?
<K-ballo> there's a tinsy bit of generated config headers, generated by cmake
Shahrzad has quit [Quit: Leaving]
Shahrzad has joined #ste||ar
Shahrzad has quit [Client Quit]
Shahrzad has joined #ste||ar
Shahrzad has quit [Client Quit]
Shahrzad has joined #ste||ar
Shahrzad has quit [Client Quit]
akheir has quit [Remote host closed the connection]
<diehlpk_work> hkaiser, see pm
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
<aserio> hkaiser: If you do not specify a policy, does dataflow use async?
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
<hkaiser> aserio: uhh
<hkaiser> I think so, yes
<hkaiser> same as async()
<zbyerly_> K-ballo, libgeodecomp uses ruby to generate cpp sourcecode
<aserio> hkaiser: thanks!
<K-ballo> hpx used to generate preprocessed cpp source with wave, but we don't need to anymore, as far as I know all what's left is those few config headers generated from cmake
<github> [hpx] aserio created add_checkpoint (+6 new commits): https://git.io/v5N0M
<github> hpx/add_checkpoint 3c2c29b aserio: Adding checkpoint.hpp
<github> hpx/add_checkpoint fc3bebf aserio: Adding testing for checkpoint.hpp
<github> hpx/add_checkpoint 07ccafa aserio: Preparing the test : checkpoint.cpp...
aserio has quit [Quit: aserio]
rod_t has left #ste||ar [#ste||ar]
StefanLSU has joined #ste||ar
<K-ballo> dataflow does not modify the types of the arguments nor the target callable, does it?
StefanLSU has quit [Quit: StefanLSU]
<K-ballo> I think I found the poisonous dataflow overload
<K-ballo> all these deduced return types everywhere make reading the code next to impossible
parsa has quit [Quit: Zzzzzzzzzzzz]
EverYoun_ has quit [Remote host closed the connection]
bikineev has quit [Remote host closed the connection]
Matombo has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar