hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC2018: https://wp.me/p4pxJf-k1
hkaiser has joined #ste||ar
hkaiser has quit [Client Quit]
<zao> `[116/116] Creating library symlink lib/libhpx.so.1 lib/libhpx.so`
<zao> This went better than expected.
Anushi1998 has joined #ste||ar
<zao> v-netbsd$ ninja core components
<zao> ninja: no work to do.
<zao> I'll see if I can push a branch with this work eventually so that the NetBSD lad and we can figure out what guards and tests to have where.
<zao> Just implemented memory counters via libkvm, like on FreeBSD and DragonFly.
<K-ballo> 15.8.0 works with boost 1.68.0 after all
<K-ballo> triggers warning STL4019
nikunj has quit [Quit: goodnight]
K-ballo has quit [Quit: K-ballo]
<zao> `c++: internal compiler error: Killed (program cc1plus received signal 9)` :D
<zao> Hungry hungry partition tests.
parsa[w] has quit [Read error: Connection reset by peer]
Anushi1998 has quit [Ping timeout: 244 seconds]
parsa[w] has joined #ste||ar
Anushi1998 has joined #ste||ar
Anushi1998 has quit [Remote host closed the connection]
Anushi1998 has joined #ste||ar
AnujSharma has joined #ste||ar
Anushi1998 has quit [Ping timeout: 272 seconds]
Anushi1998 has joined #ste||ar
jgolinowski has joined #ste||ar
AnujSharma has quit [Read error: Connection reset by peer]
jaafar has quit [Ping timeout: 260 seconds]
AnujSharma has joined #ste||ar
AnSh has joined #ste||ar
AnujSharma has quit [Ping timeout: 240 seconds]
anushi has joined #ste||ar
Anushi1998 has quit [Ping timeout: 268 seconds]
nikunj has joined #ste||ar
ste||ar-github has joined #ste||ar
<ste||ar-github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://github.com/STEllAR-GROUP/hpx/commit/08a78102524ed451fcb0db1234a862cf928b07ac
<ste||ar-github> hpx/gh-pages 08a7810 StellarBot: Updating docs
ste||ar-github has left #ste||ar [#ste||ar]
wash[m]_ has joined #ste||ar
wash[m] has quit [Ping timeout: 245 seconds]
wash[m]_ is now known as wash[m]
jgolinowski has quit [Ping timeout: 240 seconds]
anushi has quit [Quit: Bye]
mcopik has joined #ste||ar
<zao> Apartment too cold? Just touch a core HPX header - [1/1211]
<heller> :P
<heller> did scandinavia cool down again?
<zao> 18°C out now, was 8°C this morning.
<heller> great
<zao> It's currently tolerable.
<heller> can't wait to put my sweater back on ;)
<zao> heller: I applied some of my DragonFly fixes for NetBSD and wrote new memory counter code.
<heller> awesome
<zao> I think most tests build now, and all examples.
<heller> cool
<zao> Is there any test I can run to see if my memory counters are right?
K-ballo has joined #ste||ar
<zao> Haven't run the test suite at all, expecting great explosions.
<heller> run any program with /usr/bin/time -v and the memory performance counter to see if it makes sens
<heller> e
<heller> there is no automated test, IIRC
<zao> Had to dig deep to get a "block size" constant out of NetBSD, it seems very tuneable.
<zao> *page size
<zao> Oh well, this build will run for well over an hour, in a VM on my desktop with "just" 16G of memory.
fjordprefect[m] has left #ste||ar ["Kicked by @appservice-irc:matrix.org : removing from IRC because user idle on matrix for 30+ days"]
<heller> wow
<zao> Have to clamp it to -j3 because of the partition tests.
<heller> yeah, it sucks
<heller> we really have to work on those...
<K-ballo> before they turn into a real problem
mcopik has quit [Ping timeout: 240 seconds]
AnSh has quit [Ping timeout: 240 seconds]
eschnett has quit [Quit: eschnett]
<zao> [1203/1203] Linking CXX executable bin/serialization_overhead
<zao> Everything built \o/
<zao> Ooh, some tests even pass.
eschnett has joined #ste||ar
<zao> 13% tests passed, 505 tests failed out of 578
<zao> 2: terminate called after throwing an instance of 'hpx::detail::exception_with_info<hpx::exception>'
<zao> 2: what(): Failed to get number of cores: HPX(no_success)
<zao> Pretty much all of them fail with this.
<zao> heller: Seems like NetBSD may be hosed in the same way DragonFly was w.r.t. our assumptions about topology.
<zao> I don't quite like our adherence to a particular hwloc hierarchy that happens to hold on Linux and current systems.
<K-ballo> no_success, very descriptive
<heller> zao: I see
<heller> one sec
<heller> so if I get this correctly, those particular systems only deal with PUs and not Cores and such?
<zao> As I understand it, they don't expose enough information in a way hwloc understands it, so it falls back to a rudimentary structure.
<zao> I don't have my DragonFly information handy, but I believe it was similar there.
<heller> If I had to guess, it's that line that throws the error: https://github.com/STEllAR-GROUP/hpx/blob/master/src/runtime/threads/topology.cpp#L326
<heller> so this is hwloc version 1.11.7
<zao> We kind of rigidly assume that there's a core->PU hierarchy IIRC.
<heller> do you mind checking if hwloc 2 has the same problem?
<heller> yeah
<heller> the verbose output shows Cores and PUs though
<heller> could it be a problem with the combination VM + *BSD?
<zao> One of the outputs is from FreeBSD, which should work.
<zao> The other is from NetBSD, which doesn't.
<heller> ah, gotcha
<zao> Got to build hwloc2 from source, no package in pkgsrc for that.
<heller> if the problem is fixed with hwloc2, should we go into the trouble of fixing it for hwloc1?
<heller> if I am not mistaken, and we change that line here: https://github.com/STEllAR-GROUP/hpx/blob/master/src/runtime/threads/topology.cpp#L324
<heller> to `if (num_cores < 0) { ... }` and add an additional `if (num_cores == 0) num_cores = 1;`, this should fix the problem
<zao> Broke the build tree getting it to use hwloc2, so that test will take a while :D
aserio has joined #ste||ar
<zao> Same behaviour with hwloc 2.0.1, except there's a NUMANode at special depth -3.
K-ballo has quit [Quit: K-ballo]
K-ballo has joined #ste||ar
<zao> Naive clamp of number of cores shifts the problem down into get_pu_number, /home/zao/stellar/hpx/src/runtime/threads/topology.cpp:344
<heller> because there are no objects of type HWLOC_OBJ_CORE?
<zao> The hwloc_get_obj_by_type HWLOC_OBJ_CORE call returns nullptr, as there is _no_ Core.
<heller> alright ... then, I guess, we have to make a complete alternative code path ...
<zao> Fairly sure we/I dug around this the last time with DragonFly and didn't come to any decent conclusion.
<zao> The interface to the topology object is a bit icky, but I guess one could spoof it a bit inside.
<heller> if num_core == 0 --> hwloc_get_obj_type(topo, HWLOC_OBJ_PU, num_pu)->logical_index);
<heller> something like that...
<heller> or, instea of HWLOC_OBJ_CORE, we get the HWLOC_OBJ_MACHINE type
<zao> I'll poke around a bit after I get dinner sorted out.
jgolinowski has joined #ste||ar
aserio has quit [Ping timeout: 276 seconds]
aserio has joined #ste||ar
<K-ballo> I just found out about this AutoPCH folder vs has been creating lately, seems to increase on every build
<K-ballo> I noticed when the project folder for a simple silly project reached 3gb
<K-ballo> hah, about 5gb for hpx
jgolinowski has quit [Ping timeout: 276 seconds]
nikunj has quit [Read error: Connection reset by peer]
nikunj97 has joined #ste||ar
jgolinowski has joined #ste||ar
jgolinowski has quit [Ping timeout: 248 seconds]
aserio has quit [Ping timeout: 265 seconds]
aserio has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
jgolinowski has joined #ste||ar
<zao> More sadness in all the thread affinity mask functions, yay.
<nikunj97> Why are we using libdl?
<zao> nikunj97: Because that's where dlopen and friends live.
<zao> Except on OSes where they are in the actual libc.
<K-ballo> does CMAKE_DL_LIBS get that right?
<zao> Largely.
<zao> It does on NetBSD at least.
<nikunj97> zao, yes, I know what libdl does. what is it that we're trying to open at runtime?
<zao> nikunj97: Components, mostly.
<nikunj97> interesting, why are we not linking them directly in the executable?
<zao> Because some of the point is that you can dynamically load components you're not overly aware of at link time?
<zao> (correct me if I'm way off)
<nikunj97> yes, makes sense!
<zao> You should be able to delve into the codebase and see what might use the stuff in hpx/util/plugin.
<nikunj97> I kinda got off a little bit. I tried to compare it with one of my previous implementation where I used it
<nikunj97> It is entirely different case
<heller> We also use it to check for certain symbols in shared libraries, like additional command line options and stuff
jgolinowski has quit [Ping timeout: 240 seconds]
jgolinowski has joined #ste||ar
<zao> Bah, I have no idea what I'm doing around all these masks, seems quite messy.
<zao> Heh, gdb crashed trying to inspect objects :D
<nikunj97> xD
<zao> I'm hanging up my hat for a while, hard to hack on code you don't know how it's supposed to work.
<zao> Got to write some test programs to inspect how it actually hangs together.
<zao> Not helped by terminal Vim mangling your source code if you work at your regular pace, you need to space out operations a lot for some reason.
jaafar has joined #ste||ar
aserio has joined #ste||ar
AnSh has joined #ste||ar
jgolinowski has quit [Quit: Leaving]
jgolinowski has joined #ste||ar
RostamLog has joined #ste||ar
<diehlpk_work> nikunj97, https://pastebin.com/Y3k8eYNi
<diehlpk_work> My HPXCL build does not work with the current master
<nikunj97> diehlpk_work, let me have a look
<diehlpk_work> Ist it related to your changes, because I just added one new example
<nikunj97> did it work before?
<nikunj97> with my implementation?
<nikunj97> diehlpk_work, on which OS are you on?
<diehlpk_work> nikunj97, It happens on the circle-ci build
<diehlpk_work> OS is some Linux
<nikunj97> ohk
<zao> Did we make some breaking change in how you need to include hpx_main and friends?
<nikunj97> zao, no
<zao> Didn't we have to mess with existing examples in the HPX repo?
<zao> examples that used to work perfectly fine before?
<diehlpk_work> HPXCL was working with 7$f$3$e$67
<nikunj97> no we don't need to
<nikunj97> things work out of the box
<diehlpk_work> 7f3e67
<zao> nikunj97: We did have to do something recently. If you're not familiar with it, maybe you should take a look.
<zao> Some example that was "misusing" things.
<nikunj97> oh, I was not aware of it
<diehlpk_work> This file does not compile, but it did with a older version of hpx
<zao> unfortunately this god-forgotten IRC client doesn't have easily accessible logs, so I can't dig it up.
<nikunj97> diehlpk_work, I can't see any main function in it
<K-ballo> try the web logs?
<nikunj97> the entry point is int main and not hpx_main. That's why linker couldn't find main and hence the error
<zao> (thing being that I'm used to having logs on the filesystem in a terminal, which I can just grep the heck out of)
<nikunj97> replacing the name of function from hpx_main to main should make things right imo
<zao> nikunj97: So why did this use to work before, and is the breaking change documented?
<nikunj97> zao, iirc I did mention it. Hold on let me check the documentation I added
aserio has quit [Ping timeout: 240 seconds]
<nikunj97> zao, the entry point has been defined as main. I don't think other examples would run as well (if they use hpx_main as an entry point), given they are using my implementation
<nikunj97> I can look into ways to add hpx_main as an entry point as well
AnSh has quit [Ping timeout: 272 seconds]
<zao> I mean, it has historically worked. Something you changed in your implementation changed this.
<zao> Whether it's something to document as "fix your application" or if it's something to special-case, that's up to you and the rest :)
<nikunj97> This is where it's entry point was handled
<nikunj97> from what I could dig up in last few minutes
<diehlpk_work> nikunj97, I will focus on the paper deadline and will look later into the changes
<nikunj97> diehlpk_work, you could try changing function name from hpx_main to main. Things should work. In the meantime I will look into the it as well.
<diehlpk_work> nikunj97, Ok, I will try it tomorrow
<diehlpk_work> gtg
<nikunj97> diehlpk_work, ok
<diehlpk_work> I ill let you know tomorrow
<nikunj97> ok
<nikunj97> zao, I'll talk to hkaiser to know more about it. If the system used to work for both main and hpx_main then I can call the required function instead to let it decide. If the call returns to my function then I'll call main else entry point will be hpx_main
<ms[m]1> jgolinowski_: here btw, in case my pms still aren't working
<ms[m]1> nikunj97: better check with hkaiser, but afaik hpx_main was always the entry point when using hpx::init/start explicitly and never worked when including hpx_main.hpp (and never needed to work)
<nikunj97> ms[m]1, Did things worked when I were to include hpx_main.hpp without any int main but int hpx_main?
aserio has joined #ste||ar
<ms[m]1> aha, not during my time but that example is already quite old, possible that things were different before
<ms[m]1> 99% sure though that hpx_main without a main is not supposed to work now
<nikunj97> ms[m]1, yes currently with my implementation, it will not work. I wanted to know if things were same back before my implementation as well
<ms[m]1> nikunj97: yeah, by now I mean last year or so, i.e. before your implementation, don't know how it was in 2016
<nikunj97> ok, that gives me enough info to dig into it's working. Thanks a lot!
eschnett has quit [Quit: eschnett]
<ms[m]1> You might find something just by looking at the history of hpx_main.hpp
aserio has quit [Quit: aserio]
nikunj97 has quit [Quit: goodnight]
jbjnr has quit [Ping timeout: 265 seconds]
jaafar has quit [Ping timeout: 272 seconds]