00:13
StefanLSU has joined #ste||ar
00:16
bikineev has quit [Remote host closed the connection]
00:17
StefanLSU has quit [Client Quit]
00:41
mcopik_ has quit [Ping timeout: 264 seconds]
01:15
jaafar has quit [Ping timeout: 252 seconds]
01:22
jaafar has joined #ste||ar
01:31
jaafar has quit [Ping timeout: 246 seconds]
01:37
Matombo444 has joined #ste||ar
01:40
Matombo has quit [Ping timeout: 240 seconds]
02:04
Matombo444 has quit [Remote host closed the connection]
02:16
K-ballo has quit [Quit: K-ballo]
03:08
eschnett has quit [Quit: eschnett]
03:24
hkaiser has quit [Quit: bye]
03:42
jaafar has joined #ste||ar
04:20
jaafar has quit [Ping timeout: 255 seconds]
05:42
AnujSharma has joined #ste||ar
05:47
bikineev has joined #ste||ar
05:54
parsa has joined #ste||ar
06:27
<
github >
hpx/master a9fda22 Thomas Heller: One more fix for service_executor...
06:27
<
github >
hpx/master 1f803a7 Thomas Heller: Removing superfluous ')'
06:32
<
github >
hpx/master 1120afc Thomas Heller: Fixing more typos within the PAPI perf counters
06:35
parsa has quit [Quit: Zzzzzzzzzzzz]
06:55
parsa has joined #ste||ar
07:30
parsa has quit [Quit: Zzzzzzzzzzzz]
07:56
Smasher has quit [Changing host]
07:56
Smasher has joined #ste||ar
07:56
Smasher has joined #ste||ar
08:06
david_pfander has joined #ste||ar
08:15
bikineev has quit [Remote host closed the connection]
08:47
Matombo has joined #ste||ar
08:52
mcopik_ has joined #ste||ar
09:28
Matombo has quit [Remote host closed the connection]
09:29
mcopik_ has quit [Ping timeout: 240 seconds]
09:41
jaafar has joined #ste||ar
09:51
<
github >
hpx/gh-pages d4c7836 StellarBot: Updating docs
10:09
bikineev has joined #ste||ar
10:22
hkaiser has joined #ste||ar
10:35
<
github >
[hpx] hkaiser closed pull request #2907: Optionaly force-delete remaining channel items on close (master...fixing_2890)
https://git.io/v5dO2
10:37
jaafar has quit [Ping timeout: 255 seconds]
10:39
<
heller >
hkaiser: good morning
10:39
<
heller >
hkaiser: I hate the service_pool executor
11:01
<
hkaiser >
heller: g'morning
11:25
<
zao >
I wonder if it's HPX or my platform that's painfully broken.
11:25
<
zao >
All tests wedge on DragonFlyBSD.
11:26
<
hkaiser >
probably some problem in hpx for that platform
11:26
<
zao >
On the thread I broke into, it was waiting for hpx::resource::get_partitioner
11:26
<
zao >
Should've looked at all threads, I guess.
11:27
<
hkaiser >
that's relying on magic statics, but we do that in several spots...
11:27
<
hkaiser >
or hangs in the constructor of the rp
11:30
<
zao >
Is it trying to get the partitioner while throwing while constructing the partitioner?
11:30
bikineev has quit [Ping timeout: 246 seconds]
11:33
<
hkaiser >
uhh, the constructor of the rp tries to recursively call get_partitioner
11:33
<
hkaiser >
that hangs because of the 'magic statics' lock
11:35
<
heller >
zao: hwloc problems?
11:36
<
zao >
Release build, so very little debug info :(
11:41
<
hkaiser >
hwloc_topology_info::get_number_of_cores throws for some reason and throwing exceptions apparently calls get_partitioner
11:41
<
hkaiser >
no idea why (for both)
11:41
<
zao >
hwloc_get_nbobjs_by_type yields 0 in a test program, but I'm not sure if I'm setting the topology up right.
11:41
<
zao >
(for HWLOC_OBJ_CORE)
11:41
<
hkaiser >
zao: can you do a debug build?
11:42
<
zao >
It'll take a good while, but sure.
11:42
<
hkaiser >
hold on for a sec, let be have a look
11:44
<
zao >
Output from hwloc-ls seems rather sparse. Are we making any assumptions about the shape of the node somehow?
11:44
<
hkaiser >
zao: shrug, I didn't think so
11:46
<
hkaiser >
zao: I think I can fix that particular problem, let's see - at least I can fix the hang
11:47
<
zao >
Building a single test, [115/182] targets atm.
11:52
<
zao >
get_number_of_cores indeed returns 0.
11:53
<
zao >
Note that lstopo only has PUs, no cores or cache info at all.
11:56
<
hkaiser >
so hwloc is broken for you :/
11:57
<
zao >
Broken and broken... less capable :D
12:01
K-ballo has joined #ste||ar
12:09
<
hkaiser >
zao: any idea what we could do for a workaround?
12:14
<
zao >
Not sure what the HPX code does with these facts and if we can make some assumption that there's 1 core per PU or something.
12:14
<
zao >
Feels like it's a legitimate hwloc structure, just strangely empty of a lot of the common components.
12:14
<
zao >
Simple solution would be to refuse to use hwloc on the platform, I guess.
12:15
<
hkaiser >
zao: well, if there is no core information then we have to assume one pu per core
12:15
<
zao >
Not sure how much we lose then, or is it required?
12:15
<
zao >
I always forget which libs are optional and not.
12:15
<
hkaiser >
we move more and more to hwloc being mandatory
12:15
<
hkaiser >
especially the rp code assumes that we have it
12:16
<
hkaiser >
zao: at least I will change the rp initialization avoiding the hang
12:33
<
hkaiser >
zao: I'll commit a fix for the hang in a sec, then we can start looking into the hwloc issue
12:35
<
github >
hpx/fix_rp_hang 1c37b94 Hartmut Kaiser: Avoiding hang during creation of the resource partitioner
12:43
<
hkaiser >
zao: how many PUs are reported for you?
13:11
pree has joined #ste||ar
13:16
<
zao >
Four PUs, the machine has 1 socket with 4 cores.
13:17
<
zao >
CPU: Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz (3192.63-MHz K8-class CPU)
13:17
<
zao >
I've checked in the DragonFly IRC channel, seems like their exposure of topology to userspace is a bit lacking.
13:21
<
zao >
Let's see if I can rebase and build on the fix_rp_hang branch.
13:22
<
hkaiser >
ok, cool - thanks - I think we can add a some workaround code if numcpus is reported as zero
13:23
<
zao >
HWLOC_OBJ_PU indeed says 4.
13:23
<
hkaiser >
hwloc_get_nbobjs_by_type( HWLOC_OBJ_PU) ?
13:23
<
zao >
Yeah, in my standalone test.
13:24
<
hkaiser >
ok, I'll add this as a fallback
13:26
<
zao >
Branch now crashes properly.
13:26
<
zao >
$ bin/reduce_test
13:26
<
zao >
terminate called after throwing an instance of 'hpx::detail::exception_with_info<hpx::exception>'
13:26
<
zao >
what(): hwloc_get_nbobjs_by_type failed: HPX(kernel_error)
13:26
<
zao >
[1] 64673 abort (core dumped) bin/reduce_test
13:28
<
hkaiser >
zao: good, so the hang is fixed
13:28
<
github >
hpx/fix_rp_hang 00eb16c Hartmut Kaiser: Add fallback to topology::get_number_of_cores to looks at number of PUs reported
13:28
<
hkaiser >
here is the workaround ^^
13:32
hkaiser has quit [Quit: bye]
13:37
<
zao >
Further, but still hosed :)
13:38
<
zao >
/home/zao/stellar/hpx/src/runtime/threads/policies/hwloc_topology_info.cpp:235
13:38
<
zao >
I'll hack away on it eventually.
13:38
<
zao >
(got other stuff on the plate)
13:40
diehlpk_work has joined #ste||ar
13:52
aserio has joined #ste||ar
13:58
eschnett has joined #ste||ar
14:12
<
zao >
Seems like things like hwloc_topology_info::get_pu_number assumes that cores exist and always have PUs as children.
14:15
hkaiser has joined #ste||ar
14:28
pree has quit [Read error: Connection reset by peer]
14:44
pree has joined #ste||ar
14:46
aserio has quit [Read error: Connection reset by peer]
14:48
aserio has joined #ste||ar
15:06
rod_t has joined #ste||ar
15:20
parsa has joined #ste||ar
15:26
<
hkaiser >
heller: may I ask you to stop pushing directly to master
15:29
<
zao >
hkaiser: Much more is hosed. hwloc_topology_info has some rather deep assumptions that cores are parents to PUs.
15:29
<
zao >
Proper approach here may be to fix the OS.
15:29
<
hkaiser >
zao: yah, that's true
15:30
aserio has quit [Ping timeout: 264 seconds]
15:39
aserio has joined #ste||ar
15:48
<
heller >
hkaiser: sure. I thought those changes were rather trivial
15:48
<
heller >
hkaiser: but yes, the service_executor is more painful than I thought...
15:49
<
heller >
The other changes were fixing typos sneaking in in the format merge
15:55
<
heller >
hkaiser: we didn't have as much green since quite some time now...
16:01
<
K-ballo >
I'm amazed by that #include typo that I introduced about 4 years ago
16:02
<
heller >
We should remove that example all together...
16:03
AnujSharma has quit [Ping timeout: 264 seconds]
16:04
EverYoung has joined #ste||ar
16:07
mcopik_ has joined #ste||ar
16:12
pree has quit [Read error: Connection reset by peer]
16:14
mcopik_ has quit [Ping timeout: 240 seconds]
16:27
pree has joined #ste||ar
16:36
david_pfander has quit [Ping timeout: 255 seconds]
16:39
mbremer has joined #ste||ar
16:55
Matombo has joined #ste||ar
16:59
hkaiser has quit [Read error: Connection reset by peer]
17:04
<
zbyerly_ >
is anyone else working on KNLs?
17:16
StefanLSU has joined #ste||ar
17:16
pree has quit [Read error: Connection reset by peer]
17:17
hkaiser has joined #ste||ar
17:18
jaafar has joined #ste||ar
17:18
<
zbyerly_ >
im' having trouble with avx512 stuff
17:21
EverYoun_ has joined #ste||ar
17:24
EverYoung has quit [Ping timeout: 246 seconds]
17:26
EverYoun_ has quit [Ping timeout: 240 seconds]
17:29
pree has joined #ste||ar
17:38
hkaiser has quit [Ping timeout: 248 seconds]
17:42
akheir has joined #ste||ar
17:42
pree has quit [Ping timeout: 264 seconds]
17:43
aserio has quit [Ping timeout: 246 seconds]
17:46
hkaiser has joined #ste||ar
17:55
<
hkaiser >
heller: why should we remove that example?
17:55
EverYoung has joined #ste||ar
17:55
pree has joined #ste||ar
18:02
StefanLSU has quit [Quit: StefanLSU]
18:06
StefanLSU has joined #ste||ar
18:10
StefanLSU has quit [Client Quit]
18:10
aserio has joined #ste||ar
18:11
StefanLSU has joined #ste||ar
18:26
<
K-ballo >
because it did not compile for 4 years and nobody noticed?
18:36
StefanLSU has quit [Quit: StefanLSU]
18:39
pree has quit [Ping timeout: 240 seconds]
18:45
mcopik_ has joined #ste||ar
18:52
pree has joined #ste||ar
19:04
<
heller >
hkaiser: what exactly does it demonstrate?
19:04
<
heller >
Also the fact that nobody noticed that it wasn't working for 4 years ;)
19:15
<
zao >
I'd like to shake a fist at the owner of the quickstart example on how HPX can be used in a library.
19:16
<
zao >
It's not exactly easy to pull out argc/argv from a library on cool OSes :D
19:17
<
zao >
Also abusing it for the memory counters. I hope it's not expensive to kvm_open/kvm_get*
19:19
<
K-ballo >
what's with `__argc/v` on linux? is that a windows detail leaked?
19:24
pree has quit [Remote host closed the connection]
19:32
<
zao >
It's assumedly a MSVCRT thing, which the example has hacked into working by defining globals by those names on macOS and Linux.
19:32
<
zao >
(and now FreeBSD/DragonFly in my branch)
19:32
aserio has quit [Read error: Connection reset by peer]
19:34
aserio has joined #ste||ar
20:03
hkaiser has quit [Quit: bye]
20:26
bikineev has joined #ste||ar
20:37
hkaiser has joined #ste||ar
20:48
eschnett has quit [Quit: eschnett]
20:52
bikineev has quit [Remote host closed the connection]
20:53
Shahrzad has joined #ste||ar
20:53
bikineev has joined #ste||ar
20:55
<
zbyerly_ >
does hpx use any automatically generated source code?
20:56
<
K-ballo >
how is that defined? as in a separate pre-compilation step?
20:57
<
K-ballo >
there's a tinsy bit of generated config headers, generated by cmake
21:06
Shahrzad has quit [Quit: Leaving]
21:07
Shahrzad has joined #ste||ar
21:07
Shahrzad has quit [Client Quit]
21:08
Shahrzad has joined #ste||ar
21:08
Shahrzad has quit [Client Quit]
21:08
Shahrzad has joined #ste||ar
21:09
Shahrzad has quit [Client Quit]
21:17
akheir has quit [Remote host closed the connection]
21:22
<
diehlpk_work >
hkaiser, see pm
21:30
EverYoung has quit [Remote host closed the connection]
21:30
EverYoung has joined #ste||ar
21:43
<
aserio >
hkaiser: If you do not specify a policy, does dataflow use async?
21:43
EverYoun_ has joined #ste||ar
21:46
EverYoung has quit [Ping timeout: 246 seconds]
21:49
<
hkaiser >
aserio: uhh
21:49
<
hkaiser >
I think so, yes
21:50
<
hkaiser >
same as async()
21:50
<
zbyerly_ >
K-ballo, libgeodecomp uses ruby to generate cpp sourcecode
21:50
<
aserio >
hkaiser: thanks!
21:51
<
K-ballo >
hpx used to generate preprocessed cpp source with wave, but we don't need to anymore, as far as I know all what's left is those few config headers generated from cmake
22:14
<
github >
hpx/add_checkpoint 3c2c29b aserio: Adding checkpoint.hpp
22:14
<
github >
hpx/add_checkpoint fc3bebf aserio: Adding testing for checkpoint.hpp
22:14
<
github >
hpx/add_checkpoint 07ccafa aserio: Preparing the test : checkpoint.cpp...
22:18
aserio has quit [Quit: aserio]
22:30
rod_t has left #ste||ar [#ste||ar]
22:33
StefanLSU has joined #ste||ar
22:36
<
K-ballo >
dataflow does not modify the types of the arguments nor the target callable, does it?
22:39
StefanLSU has quit [Quit: StefanLSU]
22:48
<
K-ballo >
I think I found the poisonous dataflow overload
22:49
<
K-ballo >
all these deduced return types everywhere make reading the code next to impossible
22:59
parsa has quit [Quit: Zzzzzzzzzzzz]
23:15
EverYoun_ has quit [Remote host closed the connection]
23:36
bikineev has quit [Remote host closed the connection]
23:53
Matombo has quit [Remote host closed the connection]
23:55
EverYoung has joined #ste||ar
23:56
EverYoung has quit [Remote host closed the connection]
23:56
EverYoung has joined #ste||ar