00:50
hkaiser has joined #ste||ar
00:50
hkaiser has quit [Client Quit]
01:02
<
zao >
`[116/116] Creating library symlink lib/libhpx.so.1 lib/libhpx.so`
01:02
<
zao >
This went better than expected.
01:45
Anushi1998 has joined #ste||ar
02:36
<
zao >
v-netbsd$ ninja core components
02:36
<
zao >
ninja: no work to do.
02:37
<
zao >
I'll see if I can push a branch with this work eventually so that the NetBSD lad and we can figure out what guards and tests to have where.
02:37
<
zao >
Just implemented memory counters via libkvm, like on FreeBSD and DragonFly.
02:45
<
K-ballo >
15.8.0 works with boost 1.68.0 after all
02:45
<
K-ballo >
triggers warning STL4019
02:51
nikunj has quit [Quit: goodnight]
02:58
K-ballo has quit [Quit: K-ballo]
02:58
<
zao >
`c++: internal compiler error: Killed (program cc1plus received signal 9)` :D
02:59
<
zao >
Hungry hungry partition tests.
03:03
parsa[w] has quit [Read error: Connection reset by peer]
03:07
Anushi1998 has quit [Ping timeout: 244 seconds]
03:07
parsa[w] has joined #ste||ar
03:48
Anushi1998 has joined #ste||ar
04:38
Anushi1998 has quit [Remote host closed the connection]
04:39
Anushi1998 has joined #ste||ar
06:12
AnujSharma has joined #ste||ar
07:12
Anushi1998 has quit [Ping timeout: 272 seconds]
07:17
Anushi1998 has joined #ste||ar
07:20
jgolinowski has joined #ste||ar
07:36
AnujSharma has quit [Read error: Connection reset by peer]
07:45
jaafar has quit [Ping timeout: 260 seconds]
08:06
AnujSharma has joined #ste||ar
08:08
AnSh has joined #ste||ar
08:11
AnujSharma has quit [Ping timeout: 240 seconds]
08:11
anushi has joined #ste||ar
08:14
Anushi1998 has quit [Ping timeout: 268 seconds]
08:18
nikunj has joined #ste||ar
09:07
ste||ar-github has joined #ste||ar
09:07
<
ste||ar-github >
hpx/gh-pages 08a7810 StellarBot: Updating docs
09:07
ste||ar-github has left #ste||ar [#ste||ar]
09:17
wash[m]_ has joined #ste||ar
09:18
wash[m] has quit [Ping timeout: 245 seconds]
09:18
wash[m]_ is now known as wash[m]
09:19
jgolinowski has quit [Ping timeout: 240 seconds]
09:58
anushi has quit [Quit: Bye]
10:20
mcopik has joined #ste||ar
11:09
<
zao >
Apartment too cold? Just touch a core HPX header - [1/1211]
11:10
<
heller >
did scandinavia cool down again?
11:10
<
zao >
18°C out now, was 8°C this morning.
11:10
<
zao >
It's currently tolerable.
11:10
<
heller >
can't wait to put my sweater back on ;)
11:11
<
zao >
heller: I applied some of my DragonFly fixes for NetBSD and wrote new memory counter code.
11:11
<
zao >
I think most tests build now, and all examples.
11:11
<
zao >
Is there any test I can run to see if my memory counters are right?
11:12
K-ballo has joined #ste||ar
11:12
<
zao >
Haven't run the test suite at all, expecting great explosions.
11:12
<
heller >
run any program with /usr/bin/time -v and the memory performance counter to see if it makes sens
11:12
<
heller >
there is no automated test, IIRC
11:13
<
zao >
Had to dig deep to get a "block size" constant out of NetBSD, it seems very tuneable.
11:14
<
zao >
Oh well, this build will run for well over an hour, in a VM on my desktop with "just" 16G of memory.
11:18
fjordprefect[m] has left #ste||ar ["Kicked by @appservice-irc:matrix.org : removing from IRC because user idle on matrix for 30+ days"]
11:22
<
zao >
Have to clamp it to -j3 because of the partition tests.
11:25
<
heller >
yeah, it sucks
11:25
<
heller >
we really have to work on those...
11:34
<
K-ballo >
before they turn into a real problem
11:45
mcopik has quit [Ping timeout: 240 seconds]
11:52
AnSh has quit [Ping timeout: 240 seconds]
11:57
eschnett has quit [Quit: eschnett]
12:34
<
zao >
[1203/1203] Linking CXX executable bin/serialization_overhead
12:34
<
zao >
Everything built \o/
12:35
<
zao >
Ooh, some tests even pass.
12:35
eschnett has joined #ste||ar
12:40
<
zao >
13% tests passed, 505 tests failed out of 578
12:40
<
zao >
2: terminate called after throwing an instance of 'hpx::detail::exception_with_info<hpx::exception>'
12:40
<
zao >
2: what(): Failed to get number of cores: HPX(no_success)
12:41
<
zao >
Pretty much all of them fail with this.
12:46
<
zao >
heller: Seems like NetBSD may be hosed in the same way DragonFly was w.r.t. our assumptions about topology.
12:48
<
zao >
I don't quite like our adherence to a particular hwloc hierarchy that happens to hold on Linux and current systems.
12:49
<
K-ballo >
no_success, very descriptive
12:51
<
heller >
zao: I see
12:52
<
heller >
so if I get this correctly, those particular systems only deal with PUs and not Cores and such?
12:53
<
zao >
As I understand it, they don't expose enough information in a way hwloc understands it, so it falls back to a rudimentary structure.
12:53
<
zao >
I don't have my DragonFly information handy, but I believe it was similar there.
12:55
<
heller >
so this is hwloc version 1.11.7
12:55
<
zao >
We kind of rigidly assume that there's a core->PU hierarchy IIRC.
12:55
<
heller >
do you mind checking if hwloc 2 has the same problem?
12:55
<
heller >
the verbose output shows Cores and PUs though
12:56
<
heller >
could it be a problem with the combination VM + *BSD?
12:56
<
zao >
One of the outputs is from FreeBSD, which should work.
12:56
<
zao >
The other is from NetBSD, which doesn't.
12:56
<
heller >
ah, gotcha
12:56
<
zao >
Got to build hwloc2 from source, no package in pkgsrc for that.
12:57
<
heller >
if the problem is fixed with hwloc2, should we go into the trouble of fixing it for hwloc1?
12:59
<
heller >
to `if (num_cores < 0) { ... }` and add an additional `if (num_cores == 0) num_cores = 1;`, this should fix the problem
13:04
<
zao >
Broke the build tree getting it to use hwloc2, so that test will take a while :D
13:10
aserio has joined #ste||ar
13:26
<
zao >
Same behaviour with hwloc 2.0.1, except there's a NUMANode at special depth -3.
13:28
K-ballo has quit [Quit: K-ballo]
13:28
K-ballo has joined #ste||ar
13:39
<
zao >
Naive clamp of number of cores shifts the problem down into get_pu_number, /home/zao/stellar/hpx/src/runtime/threads/topology.cpp:344
13:40
<
heller >
because there are no objects of type HWLOC_OBJ_CORE?
13:40
<
zao >
The hwloc_get_obj_by_type HWLOC_OBJ_CORE call returns nullptr, as there is
_no_ Core.
13:40
<
heller >
alright ... then, I guess, we have to make a complete alternative code path ...
13:40
<
zao >
Fairly sure we/I dug around this the last time with DragonFly and didn't come to any decent conclusion.
13:41
<
zao >
The interface to the topology object is a bit icky, but I guess one could spoof it a bit inside.
13:41
<
heller >
if num_core == 0 --> hwloc_get_obj_type(topo, HWLOC_OBJ_PU, num_pu)->logical_index);
13:41
<
heller >
something like that...
13:42
<
heller >
or, instea of HWLOC_OBJ_CORE, we get the HWLOC_OBJ_MACHINE type
13:50
<
zao >
I'll poke around a bit after I get dinner sorted out.
13:57
jgolinowski has joined #ste||ar
14:03
aserio has quit [Ping timeout: 276 seconds]
14:29
aserio has joined #ste||ar
14:49
<
K-ballo >
I just found out about this AutoPCH folder vs has been creating lately, seems to increase on every build
14:49
<
K-ballo >
I noticed when the project folder for a simple silly project reached 3gb
14:50
<
K-ballo >
hah, about 5gb for hpx
14:52
jgolinowski has quit [Ping timeout: 276 seconds]
14:58
nikunj has quit [Read error: Connection reset by peer]
14:58
nikunj97 has joined #ste||ar
15:19
jgolinowski has joined #ste||ar
15:24
jgolinowski has quit [Ping timeout: 248 seconds]
15:39
aserio has quit [Ping timeout: 265 seconds]
15:49
aserio has joined #ste||ar
16:43
aserio has quit [Ping timeout: 240 seconds]
16:56
jgolinowski has joined #ste||ar
17:03
<
zao >
More sadness in all the thread affinity mask functions, yay.
17:05
<
nikunj97 >
Why are we using libdl?
17:05
<
zao >
nikunj97: Because that's where dlopen and friends live.
17:05
<
zao >
Except on OSes where they are in the actual libc.
17:06
<
K-ballo >
does CMAKE_DL_LIBS get that right?
17:06
<
zao >
It does on NetBSD at least.
17:06
<
nikunj97 >
zao, yes, I know what libdl does. what is it that we're trying to open at runtime?
17:07
<
zao >
nikunj97: Components, mostly.
17:07
<
nikunj97 >
interesting, why are we not linking them directly in the executable?
17:07
<
zao >
Because some of the point is that you can dynamically load components you're not overly aware of at link time?
17:07
<
zao >
(correct me if I'm way off)
17:08
<
nikunj97 >
yes, makes sense!
17:09
<
zao >
You should be able to delve into the codebase and see what might use the stuff in hpx/util/plugin.
17:09
<
nikunj97 >
I kinda got off a little bit. I tried to compare it with one of my previous implementation where I used it
17:09
<
nikunj97 >
It is entirely different case
17:15
<
heller >
We also use it to check for certain symbols in shared libraries, like additional command line options and stuff
17:27
jgolinowski has quit [Ping timeout: 240 seconds]
17:29
jgolinowski has joined #ste||ar
17:31
<
zao >
Bah, I have no idea what I'm doing around all these masks, seems quite messy.
17:43
<
zao >
Heh, gdb crashed trying to inspect objects :D
17:46
<
zao >
I'm hanging up my hat for a while, hard to hack on code you don't know how it's supposed to work.
17:46
<
zao >
Got to write some test programs to inspect how it actually hangs together.
17:48
<
zao >
Not helped by terminal Vim mangling your source code if you work at your regular pace, you need to space out operations a lot for some reason.
17:52
jaafar has joined #ste||ar
18:15
aserio has joined #ste||ar
18:17
AnSh has joined #ste||ar
18:58
jgolinowski has quit [Quit: Leaving]
18:58
jgolinowski has joined #ste||ar
19:15
RostamLog has joined #ste||ar
19:30
<
diehlpk_work >
My HPXCL build does not work with the current master
19:31
<
nikunj97 >
diehlpk_work, let me have a look
19:31
<
diehlpk_work >
Ist it related to your changes, because I just added one new example
19:31
<
nikunj97 >
did it work before?
19:31
<
nikunj97 >
with my implementation?
19:33
<
nikunj97 >
diehlpk_work, on which OS are you on?
19:34
<
diehlpk_work >
nikunj97, It happens on the circle-ci build
19:35
<
diehlpk_work >
OS is some Linux
19:35
<
zao >
Did we make some breaking change in how you need to include hpx_main and friends?
19:35
<
zao >
Didn't we have to mess with existing examples in the HPX repo?
19:36
<
zao >
examples that used to work perfectly fine before?
19:36
<
diehlpk_work >
HPXCL was working with 7$f$3$e$67
19:36
<
nikunj97 >
no we don't need to
19:36
<
nikunj97 >
things work out of the box
19:36
<
diehlpk_work >
7f3e67
19:36
<
zao >
nikunj97: We did have to do something recently. If you're not familiar with it, maybe you should take a look.
19:36
<
zao >
Some example that was "misusing" things.
19:37
<
nikunj97 >
oh, I was not aware of it
19:37
<
diehlpk_work >
This file does not compile, but it did with a older version of hpx
19:37
<
zao >
unfortunately this god-forgotten IRC client doesn't have easily accessible logs, so I can't dig it up.
19:38
<
nikunj97 >
diehlpk_work, I can't see any main function in it
19:38
<
K-ballo >
try the web logs?
19:39
<
nikunj97 >
the entry point is int main and not hpx_main. That's why linker couldn't find main and hence the error
19:39
<
zao >
(thing being that I'm used to having logs on the filesystem in a terminal, which I can just grep the heck out of)
19:39
<
nikunj97 >
replacing the name of function from hpx_main to main should make things right imo
19:39
<
zao >
nikunj97: So why did this use to work before, and is the breaking change documented?
19:40
<
nikunj97 >
zao, iirc I did mention it. Hold on let me check the documentation I added
19:40
aserio has quit [Ping timeout: 240 seconds]
19:43
<
nikunj97 >
zao, the entry point has been defined as main. I don't think other examples would run as well (if they use hpx_main as an entry point), given they are using my implementation
19:44
<
nikunj97 >
I can look into ways to add hpx_main as an entry point as well
19:44
AnSh has quit [Ping timeout: 272 seconds]
19:44
<
zao >
I mean, it has historically worked. Something you changed in your implementation changed this.
19:44
<
zao >
Whether it's something to document as "fix your application" or if it's something to special-case, that's up to you and the rest :)
19:52
<
nikunj97 >
This is where it's entry point was handled
19:53
<
nikunj97 >
from what I could dig up in last few minutes
19:54
<
diehlpk_work >
nikunj97, I will focus on the paper deadline and will look later into the changes
19:56
<
nikunj97 >
diehlpk_work, you could try changing function name from hpx_main to main. Things should work. In the meantime I will look into the it as well.
19:58
<
diehlpk_work >
nikunj97, Ok, I will try it tomorrow
19:58
<
nikunj97 >
diehlpk_work, ok
19:58
<
diehlpk_work >
I ill let you know tomorrow
20:11
<
nikunj97 >
zao, I'll talk to hkaiser to know more about it. If the system used to work for both main and hpx_main then I can call the required function instead to let it decide. If the call returns to my function then I'll call main else entry point will be hpx_main
20:18
<
ms[m]1 >
jgolinowski_: here btw, in case my pms still aren't working
20:19
<
ms[m]1 >
nikunj97: better check with hkaiser, but afaik hpx_main was always the entry point when using hpx::init/start explicitly and never worked when including hpx_main.hpp (and never needed to work)
20:21
<
nikunj97 >
ms[m]1, Did things worked when I were to include hpx_main.hpp without any int main but int hpx_main?
20:24
aserio has joined #ste||ar
20:25
<
ms[m]1 >
aha, not during my time but that example is already quite old, possible that things were different before
20:25
<
ms[m]1 >
99% sure though that hpx_main without a main is not supposed to work now
20:26
<
nikunj97 >
ms[m]1, yes currently with my implementation, it will not work. I wanted to know if things were same back before my implementation as well
20:27
<
ms[m]1 >
nikunj97: yeah, by now I mean last year or so, i.e. before your implementation, don't know how it was in 2016
20:28
<
nikunj97 >
ok, that gives me enough info to dig into it's working. Thanks a lot!
20:29
eschnett has quit [Quit: eschnett]
20:30
<
ms[m]1 >
You might find something just by looking at the history of hpx_main.hpp
21:42
aserio has quit [Quit: aserio]
22:22
nikunj97 has quit [Quit: goodnight]
22:38
jbjnr has quit [Ping timeout: 265 seconds]
23:36
jaafar has quit [Ping timeout: 272 seconds]