hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC2018: https://wp.me/p4pxJf-k1
anushi has quit [Ping timeout: 260 seconds]
anushi has joined #ste||ar
eschnett_ has joined #ste||ar
eschnett has quit [Ping timeout: 240 seconds]
anushi has quit [Ping timeout: 240 seconds]
diehlpk has joined #ste||ar
diehlpk has quit [Ping timeout: 240 seconds]
hkaiser has quit [Quit: bye]
parsa[w] has quit [Read error: Connection reset by peer]
parsa[w] has joined #ste||ar
anushi has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
nanashi55 has quit [Ping timeout: 240 seconds]
nanashi55 has joined #ste||ar
nikunj has joined #ste||ar
<jbjnr> on power I've got problems with coroutines. I tried with GENERIC_COROUTINES ON and OFF but both give errors. I seem to recall that we had to use an older boost at one time (but I though hk fixed it). Does anyone know if it's fixable with another option, or boost version etc. heller___ ?
anushi has quit [Ping timeout: 240 seconds]
anushi has joined #ste||ar
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
anushi has quit [Ping timeout: 240 seconds]
anushi has joined #ste||ar
david_pfander has joined #ste||ar
anushi has quit [Ping timeout: 240 seconds]
anushi has joined #ste||ar
anushi has quit [Remote host closed the connection]
anushi has joined #ste||ar
<github> [hpx] msimberg pushed 2 new commits to master: https://git.io/fNkt9
<github> hpx/master ddbd108 Mikael Simberg: Fix some more c++11 build problems
<github> hpx/master ef04f40 Mikael Simberg: Merge pull request #3372 from msimberg/fix-c++11...
<jakub_golinowski> M-ms - for me all the perf tests passed both for the opencv from master (pthreads) and opencv from hpx_backend branch
<jakub_golinowski> among them the test that was requested by the opencv ppl - but the results are far from satisfying - in all cases opencv with hpx backend is considerably slower sometimes the overall test run time for opencv with hpx backend is multiple times bigger than the opencv wiht pthreads runtime
<M-ms> jakub_golinowski: do you mean (opencv master branch with pthreads backend) and (opencv hpx_backend branch with pthreads backend)?
<M-ms> jakub_golinowski: mmh, that's a problem
<jakub_golinowski> ah, no sorry I mean: (1) opencv master with pthreads backend vs. (2) opencv hpx_backend branch with hpx backend
<jbjnr> jakub_golinowski: the time taken for the hpx runtime to start up is quite long and can contribute to longer tests. I hope the test time reported for the graphs is only the copmoutation part and does not include the startup.
<jakub_golinowski> jbjnr, in the mandelbrot benchmark it is only the computation time. However, note that in case of start-stop backend the runtime needs to be started after the call to parallel_for_() and therefore is included in the computation time (but this is a fair measurement, since in the start-stop version we always have to start and stop the rutime whenever we call parallel_for)
<jbjnr> ok
<jakub_golinowski> M-ms, you said something that employing HPX might somehow disrupt OpenCL - maybe this is the reason?
<M-ms> jakub_golinowski: yeah, may be but I don't know what, it just seemed to be a commonality for some of the failing tests
<M-ms> so we know now that HPX is a bit slower but not by much in your mandelbrot benchmark
<M-ms> but something else is happening in the perf tests that's making it really slow
<M-ms> so you could try profiling one of the perf tests (also filtered to just one test) and see if there are any hints
<M-ms> although before that, all perf tests pass now? so what about unit tests? any change there or still some failing?
<jakub_golinowski> M-ms, do you have something exact in mind when saying profiling? Use some special tool?
eschnett_ has quit [Quit: eschnett_]
<M-ms> jakub_golinowski: yeah, my favourite is perf because all you need to do is "perf my_program" (and build with debug symbols)
<M-ms> it's not fancy but it gives something useful quickly
<heller___> jbjnr: what are the problems?
<heller___> jbjnr: compiler? linker? runtime?
<jbjnr> , funp_(&trampoline<Functor>)
<jbjnr> wtf?
<jbjnr> users/biddisco/src/hpx/hpx/runtime/threads/coroutines/detail/context_generic_context.hpp:209:35: error: use of undeclared identifier 'Functor'
<jbjnr> and
<jbjnr> users/biddisco/src/hpx/hpx/runtime/threads/coroutines/detail/coroutine_impl.hpp:62:17: error: use of class template 'context_base' requires template arguments
<jbjnr> typedef context_base super_type;
<jbjnr> context type problems
<jbjnr> boost-1.65.1
<jbjnr> too new?
<jakub_golinowski> M-ms, as for the unit tests I think there are some failing still I am reruning them now to make sure no new segfaults pop-up.
<jakub_golinowski> M-ms, do you want me to push the test resutls to repo?
<jbjnr> I didn't spend any time looking at the code yet cos I have other things to work on too, but I do recall us having issues with bost context on power and maybe I need to do something. But I can't remember, so I thought I'd ask instead before spending time on it
<jbjnr> heller___: ^^^^
<jakub_golinowski> M-ms, thank you for the reference to profiler - I will familiarize myself with it
<heller___> jbjnr: strange, I was compiling a more or less recent HPX with generic context coroutines for ARM the other day
<heller___> using boost 1.6 something
<jbjnr> PowerPC
<heller___> boost 1.67
<heller___> sure
<heller___> let me check
jakub_golinowski has quit [Ping timeout: 256 seconds]
quaz0r has quit [Ping timeout: 240 seconds]
quaz0r has joined #ste||ar
jakub_golinowski has joined #ste||ar
anushi has quit [Ping timeout: 265 seconds]
anushi has joined #ste||ar
<M-ms> jakub_golinowski: nah, no need to push the test results
<jakub_golinowski> M-ms, so after filtering out a few tests the accuracy tests for pthreads from master vs. tests for hpx from hpx_backend have the same results in terms of fail/pass
<jakub_golinowski> M-ms, and the result is that all the tests pass - I realized I was not correctly passing the path to the opencv_extra/testdata and this was the reason for tests that were failing yesterday
<M-ms> jakub_golinowski: ok, I assume the ones you're filtering out are still only failing with the hpx backend? how many are we roughly talking about?
<jakub_golinowski> M-ms, sth like 10 tests
<M-ms> hmm, that actually sounds pretty good
<jakub_golinowski> M-ms, I remember mostly they were producing the segfault with the error of sth like longjump sth sth
<M-ms> so some of these are the ocl tests?
<jakub_golinowski> yes
<M-ms> ah, the ones we saw in the beginning
<M-ms> so non-core tests are mostly good then
<jakub_golinowski> M-ms, yes and some other tests that at first sight do not seem to have a lot in common (at least by looking at their names)
<jakub_golinowski> I modified a bit the run_tests script and will push it in a secodn
<M-ms> ok, thanks
<jakub_golinowski> there is the full list of failing tests - or to be more specific tests that I have seen at least one failing
<M-ms> could you also make it run perf tests (if you haven't already)?
<jakub_golinowski> M-ms, yes yes it runs perf tests as well
<M-ms> ok, good
<jakub_golinowski> M-ms, and about the changes I have as a PR - can I merge then now?
<M-ms> jakub_golinowski: sorry, I didn't have time to look at it properly
<M-ms> the only thing I wanted to comment was that you don't in principle need to have project(xxx) and find_package(HPX) in all the sub-CMakeLists.txt, it would be enough to do that at the top level
<M-ms> on the other hand they now still work as standalone examples which is still nice
<M-ms> but please go ahead and merge so you're not stuck juggling branches
<heller___> jbjnr: can you please post the full error message?
<jakub_golinowski> M-ms, thanks - with the projects at the single-example level the idea was exactly to still be able to build single example. For instance the mandelbrot benchmark is based on rebuilding the single opencv_mandelbrot example
<jbjnr> error messages for both type of context coroutines
<jbjnr> I can't remember which I'm suppoosed to use
<jbjnr> M-ms: maybe you know which options matthieu used on power
<heller___> jbjnr: you should use GENERIC_CONTEXT
eschnett has joined #ste||ar
<jbjnr> heller___: ok. remind me, -DHPX_WITH_GENERIC_CONTEXT_COROUTINES=ON means use boost, or use our own version?
<heller___> jbjnr: I am not sure where your error comes from
<heller___> this means to use boost
<jbjnr> hat's odd. the version in git seems different to mine
<jbjnr> I'd better check I'm not holding some merges that messed it up.
<jbjnr> heller___: I have merged the "Changing the coroutine implementations to do a lazy init" into my branch and that's why it is different.
<heller___> ahh, i see
<heller___> could you try with regular master please?
<jbjnr> that's annoying, becuase I can't unmerge that as I made a load of changes to fit myself to that branch
<jbjnr> I am building master now, but if there is a chance you could fix that lazy init branch ....
<jbjnr> heller___: master https://gist.github.com/biddisco/b4666f77c426be10f300f9ff0c7bae37 I think I need to tell clang to use it's own c++ stdlib instead of gcc one.
<heller___> yup
jakub_golinowski has quit [Ping timeout: 265 seconds]
<M-ms> jbjnr: I don't know but I assume he didn't have any special options
<M-ms> But maybe that's why he was having problems later...
<jbjnr> M-ms: it's ok. turns out my branch is screwed up. redoing everything now with stdlib=libstdc++ and clean master branch
<M-ms> ok
<zao> Sounds like you're having about as much fun as I do on FreeBSD :)
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
hkaiser has joined #ste||ar
<nikunj> hkaiser: So I looked into the cmake source code in detail today. Turns out hpx_wrap got added due to it's addition in HPX_PKG_LIBRARIES and not due to hpx_targets. I made the change and tested the code with Debug/Release with dynamic main turned on and off
<nikunj> Phylanx worked fine on my laptop with the HPX build
anushi has quit [Remote host closed the connection]
<hkaiser> nikunj: does the hpx build pass now?
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<hkaiser> nikunj: good, thanks
<hkaiser> let's merge this, then
<hkaiser> nikunj: for Phylanx, do we still need #519?
<nikunj> hkaiser: yes
<hkaiser> ok, let's retrigger that once HPX has cycled
<nikunj> in my Phylanx pull I've added 2 lines of code
<nikunj> One for the linker flag, second for the linking library
<hkaiser> k, is that in #519?
<nikunj> yes
<nikunj> that's why we will need #519 as well
<hkaiser> k
<jbjnr> boost help anyone?
<jbjnr> thre bost libs on power are generated with names like libboost_system-clang70-mt-d-p64-1_67.so
<hkaiser> wazzup?
<hkaiser> yes
<jbjnr> but cmake does not add the p64 to the lib name and so cannot find it
<jbjnr> can I tell b2 to not add p64 to the names
<hkaiser> try a newer cmake
<zao> Ooh, the new CPU tags.
<hkaiser> iirc that was added recently only
<jbjnr> can I just turn it off?
<hkaiser> you can generate boost without the tags, yes
<jbjnr> rebuilding cmake will take longer than relinking boost
<jbjnr> and I have no guarantee that it will work with a newer cmake
<zao> Amusingly enough, they don't seem to have implemented the Power one.
<zao> jbjnr: ^
<jbjnr> thanks. That tells me what I need to know. newer cmake won't help
<zao> Anyway, building a Boost in non-"versioned" flavour should get rid of it, together with the rest of the useful tags.
<jbjnr> ok. I will remove the versioned tag
<jbjnr> thanks
<jbjnr> that was what I wanted to know
K-ballo has joined #ste||ar
<zao> Not sure if the layout "tagged" will work, or if you need to go down to "system".
<zao> Or building a newer CMake, with a FindBoost handpatched :P
<jbjnr> trying now with 'tagged'
<jbjnr> I will build newer boost if this fails
<jbjnr> I begin to see why our users dislike boost so much.
<jbjnr> zao: thanks. libs look good in tagged mode
<jbjnr> now I try cmake on them
<jbjnr> yat \o/
<jbjnr> yay \o/
<jbjnr> I meant.
<jbjnr> works!!!!
<zao> \o/
<jbjnr> zao: cpp tests compiling, linking and giving the right output. All is well with the world. Cheers.
<jbjnr> now for hpx again ....
<nikunj> jbjnr: HPX cmake for some reason don't find boost if the layout is set to system or versioned but finds it with layout set to tagged. That's one thing I faced while installing HPX
<jbjnr> nikunj: ok. I usually use versioned without problems, but it looks like some new stuff is added and cmake has not caught up.
<jbjnr> heller___: M-ms hkaiser pycicle is being upgraded so that you can now trigger N builds per PR, with random combinations of CMake options and settings for each build. Options can be dependent on other options (so if HPX_WITH_CUDA is ON, then blah blah blah can be added with random combinations of sub options). Also, boost versions, compiler type (gcc/clang) can be handled in the same pass, so we now have a way of replicating buildbot
<jbjnr> and going further becuase we can explore option spaces more thoroughly.
<jbjnr> I will need help setting up good combinations of options for testing.
mcopik has joined #ste||ar
<jbjnr> anyone : openpower01:~/build/hpx$ bin/hello_world --hpx:threads=4
<jbjnr> terminating with uncaught exception of type std::invalid_argument: hpx::resource::get_partitioner() can be called only after the resource partitioner has been allowed to parse the command line options.
<jbjnr> Aborted (core dumped)
<jbjnr> Can someone remind me what causes the error above^^^ ?
<nikunj> jbjnr: which HPX build are you using?
quaz0r has quit [Ping timeout: 260 seconds]
<jbjnr> the one I have just done on a power machine
<nikunj> jbjnr: it might be due to previous incomplete implementation that got merged
<nikunj> could you try building the current master?
<jbjnr> this is master
<jbjnr> from today
<jbjnr> M-ms: can you rememeber what causes the RP error above?
jakub_golinowski has joined #ste||ar
<nikunj> jbjnr: my commit got merged like an hour ago. Could you retry with that?
<jbjnr> ok
<M-ms> jbjnr: HPX_WITH_MAX_CPU_COUNT?
<jbjnr> nikunj: your merge has nothing to do with this error
<jbjnr> M-ms: aha!
<jbjnr> thanks
<jbjnr> I check ...
<jbjnr> we need to fix that. its a huge PITA
<jbjnr> I forget every time
<M-ms> not sure if it's that but there's an equally unhelpful message if you forget that
<nikunj> jbjnr: That error also appears when you try to access HPX functionality without initiating HPX (I faced it quite a lot during implementing). So I got confused :/
<jbjnr> I think I will add a CMake check that greps /proc/cpuinfo and gives a warning just in case. It won't be reliable because login/compute nodes, but at least it might help
<M-ms> does someone know why we can't have the masks be dynamic? was it for a performance reason at some point?
<jbjnr> yes. for <64 we use an int with bit masks, for larger we use a bitset. I doubt there's much difference in speed though
<heller___> shouldn't
<jakub_golinowski> M-ms, could you tell me what should I expect from perf? Is there an option to have it tell me how much time was spent in each function or sth like this?
<heller___> we used to use a dynamic bitset once. that was a performance hit
<jbjnr> balls.
<jbjnr> fixing CPU count did not make error go away
quaz0r has joined #ste||ar
<M-ms> jbjnr: ?
<jbjnr> ?
<M-ms> just to be sure, you set it high enough? because it has something like 8 hyperthreads, no?
<jbjnr> grrrr
<jbjnr> lstop gives me 160 PUs and I set it to 256
<jakub_golinowski> M-ms, I got as far as this: Overhead Command Shared Object Symbol ◆
<jakub_golinowski> 42.07% opencv_perf_dnn libopencv_dnn.so.4.0.0 [.] _ZN2cv3dnn8opt_AVX28fastConvEPKfmS
<M-ms> jakub_golinowski: yeah, so I usually run "perf record -F99 --call-graph dwarf my_program"
<M-ms> -F sets the sampling frequency, call-graph dwarf was useful for some reason
<M-ms> it then outputs a file which you can view with perf report
<hkaiser> : jbjnrI still think this is a problem in the resource_manager
<hkaiser> we need to correct things there, using the cmake option just prevents it from happening
<M-ms> ah, yeah, so you probably need to recompile with RelWithDebInfo for it to be useful
<jakub_golinowski> M-ms, but actually this is from the debug build :/
<jbjnr> M-ms: hwloc 1.x or 2.0 on power? opinion?
<M-ms> note that hpx is going to have a lot of entries there while opencv is doing single threaded stuff
<jbjnr> no differnce I would hope
<hkaiser> M-ms: any optinion on that?
<M-ms> hrm
<M-ms> jbjnr: I think neither was significantly worse when we tested with mathieu (but some affinity test was giving problems)
<jbjnr> I was just looking for possible reasons why the RP was choking.
<hkaiser> jbjnr, M-msI believe the problem you're seeing is at least diagnosable, if not preventable
<M-ms> hkaiser: you mean checking properly in the resource manager if hpx is configured with a bad max cpu count?
<hkaiser> M-ms: at least, if not find a way for the resource manager to use only as many cores as hpx was configured with
<jbjnr> it might not be an RP problem. it might be some unrelated error
<hkaiser> jbjnr: there is also the hpx_main snafu on master nikunj is trying to resolve, currently
<hkaiser> could be resolved now, could be not
<M-ms> mmh, yeah if it's cpu count related I'm sure we could do something better there, if it's something else depends on what that something else is...
<hkaiser> results in a similarily useless error message
<M-ms> jakub_golinowski: so are you getting more lines of output than that one line?
<hkaiser> M-ms: well, you can work around the issue by specifying the cpu count, so it looks to be related
<nikunj> jbjnr: could you please tell the environment you're on?
<hkaiser> jbjnr: last resort: use HPX_WITH_DYNAMIC_HPX_MAIN=OFF
<jakub_golinowski> M-ms, ah yes, of course - but I still het the somewhat obfuscated Symbols
<M-ms> jakub_golinowski: another approach would be to just time the parallel_for loop to see if that's where the time is spent or if it's somewhere else
<hkaiser> M-ms, jakub_golinowskihave you ocnsidered using APEX? allows to collect traces, timings, etc.
<jbjnr> nikunj: I'm on a powerpc node. OS is RedHad Enterprise Server 7.5. 160 cores. Shitload of memory. etc etc
<M-ms> hkaiser: ah, ok
<hkaiser> shows every hpx thread in the end
<jbjnr> Clang 7.0 compiled from source today
<hkaiser> jbjnr: is it one of those summit nodes?
<jbjnr> nikunj: hkaiser I did not realize that nikunj was working on something related to this. I will pul from master again now
<hkaiser> 6 GPUs?
<jbjnr> hkaiser: similar
<hkaiser> cool
<jbjnr> it's a local node here we are using for summit type testing
<hkaiser> nice
<nikunj> jbjnr: redhat does come with glibc. So my code should work. Please try with the current merged commit. If it's related to my implementation then it should (most likely) get fixed
<M-ms> hkaiser: that's a good idea, wanted to start off easy though
<hkaiser> k
hkaiser has quit [Quit: bye]
<jbjnr> clang-7: error: unable to execute command: Segmentation fault (core dumped)
<jbjnr> grrrrrr
<jbjnr> tried to recompile jemalloc
<jbjnr> (again)
<jakub_golinowski> M-ms, so basically I should not see Symbols like this: _ZN3hpx7threads10coroutines6detail2lx10trampolineINS2_14coroutine_implEEEvPT_
<jakub_golinowski> but sth more readable?
<nikunj> jakub_golinowski: are you trying to see assembly for a code?
<jakub_golinowski> nikunj, no no :D I am trying to profile a cpp application. Specifically one of the OpenCV tests
<jakub_golinowski> using perf for the first time and trying to first get to know what should I expect from it
<nikunj> jakub_golinowski: oh my bad then. Those symbols come in assembly so I was a bit curious
<jbjnr> nikunj: using latest master I get the same exception
<jbjnr> terminating with uncaught exception of type std::invalid_argument: hpx::resource::get_partitioner() can be called only after the resource partitioner has been allowed to parse the command line options.
<jbjnr> Aborted (core dumped)
<nikunj> jbjnr: It might not be related to my code then. As a last resort can you try building with -DHPX_WITH_DYNAMIC_HPX_MAIN=OFF
<jbjnr> building now
<nikunj> jbjnr: just to be sure, it comes with glibc right?
<jbjnr> nikunj: -DHPX_WITH_DYNAMIC_HPX_MAIN=OFF fixes it.
<jbjnr> Thanks
<jbjnr> I have hello world from 160 cores
<nikunj> jbjnr: so it is related to my implementation. Could you please tell me the result of: ldd --verion
<jbjnr> openpower01:~/build/hpx$ ldd --version
<jbjnr> -bash: otool: command not found
<jbjnr> we are missing some bintools by the looks of it
<nikunj> jbjnr: that explains it!
<jbjnr> how?
<nikunj> My implementation requires glibc. It was not found in your machine
<nikunj> that is why it did not work. I could not find a macro that could specifically targeted glibc, so I used linux in general (thinking most of them come with glibc by default)
<jbjnr> ok. makes sense. I am not very expert with some of the sysdmin side of stuff.
<jakub_golinowski> M-ms, not sure if doing sth wrong but other libraries have more readable Symbols (including so from OpenCV) however HPX has still this style:
<jakub_golinowski> _ZNSt8_Rb_treeIPKvSt4pairIKS1_N3hpx4util6detail9lock_dataEESt10_Select1stIS8_ESt4lessIS1_ESaIS8_EE4findERS3_
<zao> Judging by paths mentioned earlier, it's running RHEL/CentOS?
<jbjnr> M-ms: building al tests on power now, will confirm Matthieu's findings. Can you remember if he had many fails?
<jbjnr> nikunj: Thanks very much for helping. I can actually get some work done now.
<jbjnr> zao: I love doing a make -j32 test :)
<jbjnr> I could try more, but ....
<nikunj> jbjnr: I will find a suitable solution to fix it right off the box. until then please use -DHPX_WITH_DYNAMIC_HPX_MAIN=OFF or install glibc
<zao> Lovely.
<zao> We only have 72 cores on the largemem nodes, 260+ on the KNLs but that's cheating.
<nikunj> woah! that's a lot of cores
<nikunj> and I'm yet to work on an 8 core machine :p
<zao> Had to go for four sockets to support 3TB of memory.
<zao> So many sticks :)
<nikunj> xD
<nikunj> K-ballo: yt?
<M-ms> jakub_golinowski: I've had more readable names at some point, but seem to have similar names when I'm checking now
<M-ms> I'll check at home as well
<M-ms> you should still be able to see if there's something obvious that sticks out
<M-ms> but consider timing just the parallel for loops as well, that's bound to work
<M-ms> jbjnr: I have a list somewhere but I think it was two or three tests that failed consistently
<jakub_golinowski> M-ms, also in the perf docu they use the term build-id - maybe it has to be somehow explicitly enabled for hpx?
<jakub_golinowski> M-ms, as for timing parallel for loops, you mean also with the use of perf?
<M-ms> jakub_golinowski: no, for that I mean just with a timer, I don't think perf can be made to look at just a section of code
<jakub_golinowski> M-ms, timer like in the opencv_mandelbrot just by changing the source code (woving the instrumentation to the binary)?
<M-ms> yep, the "dumb but simple" way
<M-ms> jbjnr: 455 - tests.unit.parcelset.distributed.tcp.put_parcels_with_coalescing (Failed)
<M-ms> 506 - tests.unit.threads.thread_affinity (Failed)
<M-ms> I'm not sure if the first one always failed, the second one was the main problem
<jbjnr> thanks. thread_affinity is a biggy
<jbjnr> that's what I look at now.
<jbjnr> [100%] Built target tests
<M-ms> I think it was with hwloc 1.X and 2.0.0, 2.0.1 may have fixed something
galabc has joined #ste||ar
<K-ballo> nikunj: I'm here now
<nikunj> K-ballo: do you know a way (probably a macro) to know if the code is linked with glibc?
<zao> predef.sf.net is (used to be) the holy grail for detection defines.
<K-ballo> sorry, I froze over linked there
<K-ballo> `__GLIBC__` should be fine to see if it was included
hkaiser has joined #ste||ar
<jbjnr> M-ms: I think my build wins the epic fail contest https://gist.github.com/biddisco/10f1e782c86a0483bc752753c3d1c056
<jbjnr> interesting that the affinity looks like it passed. but there are clearly problems with parallel algorithms
<K-ballo> that's an odd collection of algorithms.. why those and not others?
<nikunj> K-ballo: let me try
<M-ms> impressive!
<M-ms> which hwloc?
<K-ballo> jbjnr: what's the output for 553?
<nikunj> zao: I too saw that link, the problem was the inclusion of that header `features.h`
<M-ms> and how many of those fail only because the timed out tests don't properly die?
<jbjnr> K-ballo: 553: /users/biddisco/src/hpx/tests/unit/util/function/function_arith.cpp(30): test 'f(5, 3) == 5.f/3' failed in function 'int hpx_startup::user_main()': '1.66667' != '1.66667'
<jbjnr> useful error message!
<jbjnr> M-ms: hwloc 1.11.10
<K-ballo> what kind of floating point you have over there?
<jbjnr> I think
<jbjnr> K-ballo: no idea. I'd better check
<K-ballo> -fast-math?
<jbjnr> whatever they give us on PowerPC
<jbjnr> yes I used fast-math
<K-ballo> ok
<nikunj> hkaiser: could you please re trigger the Phylanx pr
<hkaiser> nikunj: has hpx passed now?
<nikunj> yes
<hkaiser> done
<jbjnr> K-ballo: https://gist.github.com/biddisco/c05978233004587758b2abca1c954477 I'm going to stick my neck out and surmise that we have a bug in our find implementation ...
<zao> **gasp**
<jbjnr> __kernel_sigtramp_rt64 is new to me
<zao> Assumedly involved in failure handling on your platform.
<zao> I can't find the logs, but I could swear that that test failed the other day for me.
<jbjnr> yes. I suppose that catches something, then the stack backtrace dies and segfaults. Not sure what the original error is
<zao> Huh... didn't we change all these tests over to something not rand()/srand()?
<zao> Ah, some of the failing ones use generators, probably not that then.
mbremer has joined #ste||ar
<nikunj> hkaiser: seems like all tests have passed!
<jakub_golinowski> M-ms, hkaiser: I built HPX in mode RelWithDebugInfo and still get the somewhat "obfuscated" perf Symbols:
<jakub_golinowski> _ZN3hpx7threads6detail15scheduling_loopINS0_8policies30local_priority_queue_schedulerISt5mutexNS3_13lockfree_fifoES6_NS3_13lockfree_lifoEEEEEvmRT_RNS▒
<M-ms> jakub_golinowski: I don't know why that's happening
<M-ms> some more basic things you could try are change the number of threads hpx uses, and play with the idling settings
<jakub_golinowski> M-ms, to probe the performance increase
<jakub_golinowski> ?
<M-ms> you'll get 8 threads by default and it might be that the serial parts of the tests are slowed down significantly by a second thread spinning in the scheduling loop on the same core
<M-ms> just to see if it makes some difference to start with, don't care about exact numbers yet
<hkaiser> jakub_golinowski: have you tried to feed those symbols through c++filt?
<jakub_golinowski> hkaiser, no, what is that?
galabc has quit [Quit: Leaving]
<jakub_golinowski> Ah I see
<jakub_golinowski> hkaiser, it works :D
<hkaiser> jaafar: you can pass the whole file through it, it will ignore everything but the symbols
<nikunj> hkaiser: all tests have passed with #519. This should fix things for now.
<hkaiser> nikunj: let me merge things and retrigger all the other PRs
david_pfander has quit [Ping timeout: 260 seconds]
hkaiser has quit [Quit: bye]
mcopik has quit [Ping timeout: 265 seconds]
mcopik has joined #ste||ar
<jakub_golinowski> M-ms, hkaiser: I found out that the package perf for ubuntu is built with NO_DEMANGLE
<jakub_golinowski> I rebuilt the package manually changing NO_DEMANGLE to 0 and now it works withot the c++filt trick
jakub_golinowski has quit [Quit: Ex-Chat]
nikunj has quit [Quit: Leaving]
jakub_golinowski has joined #ste||ar
nikunj has joined #ste||ar
anushi has joined #ste||ar
<nikunj> jbjnr: yt?
<jbjnr> nikunj: just logged in from home
<nikunj> ohk, jbjnr could you share the options that ld gives you in Redhat (run: man ld and share a gist) the next time you operate on it.
<jbjnr> jakub_golinowski: you should try --hpx:threads=4 --hpx:bind=balanced to use only one PU per core
<jbjnr> nikunj: I'll try now
<jakub_golinowski> jbjnr, in order to boost performance?
<nikunj> jbjnr: thanks!
<jbjnr> jakub_golinowski: yes, if you have 4 cores, 8 hyperthreads, you often find that performance drops when you use both PUs on a core compared to just one
hkaiser has joined #ste||ar
<jakub_golinowski> ok, thank for the hint!
<jakub_golinowski> jbjnr, I will run the mandelbrot benchmark with this config
<nikunj> jbjnr: could you run this code as well and tell me if it prints anything: https://gist.github.com/NK-Nikunj/f13dfc69e3deb580b421984c64fe8147
<jbjnr> use N/2 or however many actual cores you have
<jakub_golinowski> jbjnr, right
<jakub_golinowski> but actually sth went wrong and I cannot build opencv any more :/
<jbjnr> nikunj: I have added GLIBC to an existing check I have and the output is here https://gist.github.com/biddisco/4b9080a805775c4a9858b649a556bad5
<jbjnr> near the bottom
<nikunj> so glibc is defined
<nikunj> jbjnr: did you add glibc today or was it already present?
<jbjnr> I did not add it
<jbjnr> I compiled a small test using the same compiler settings as I use for HPX
<zao> The libc of choice on a system tends to be rather fixed in stone.
<zao> Unless you go extremely out of your way to build a separate one, but it won't be able to interop much with the world.
<nikunj> then this would mean glibc has defined it's startup code differently for powerpc
anushi has quit [Remote host closed the connection]
<nikunj> in that case, I'll have to look up the source code for powerpc as well
<jbjnr> nikunj: https://github.com/biddisco/cpptest/blob/master/pi-lockfree.cxx is a test I wrote when I had problems with HPXon raspberry pi, I just added __GLIBC__ to it and ran it
<zao> jbjnr: Did you mention what kind of hardware and distro this was?
* zao prays it's not a Cray.
<jbjnr> gtg dinner is ready
<jbjnr> back in 25mins
<zao> enjoy!
<jakub_golinowski> nikunj, can this error be connected with ongoing changes: https://pastebin.com/Ln5VfHYW
<jakub_golinowski> hmm I think it might be not
<nikunj> jakub_golinowski: it is linked with my errors
<nikunj> jakub_golinowski: could you try building HPX from current master
<nikunj> that should fix things
<jakub_golinowski> nikunj, ok
<nikunj> jakub_golinowski, what environment are you using?
<jakub_golinowski> nikunj, ubuntu 16.04
<jakub_golinowski> gcc 5.4.0
<jakub_golinowski> but this error seems to only emerge when I try to build opencv agains hpx in RelWithDebugInfo mode
<nikunj> oh yes, I can understand. That's coz I mistakenly exported CMAKE_EXE_LINKER_FLAGS, which is the root cause of all this disruptions you see
<nikunj> It added my linker flag (savior for hpx, killer for others) to it, so anything but hpx executable would build nicely
<nikunj> I changed it today and it is now merged, so things should run fine now
<jakub_golinowski> nikunj, ok, rebuilding hpx
<nikunj> jbjnr: could you please share the assembly of a simple hello world program
anushi has joined #ste||ar
<hkaiser> nikunj: yt?
<hkaiser> nikunj: things are much better now, but not entirely good yet
<hkaiser> please look at the phylanx buildbot here: http://ktau.nic.uoregon.edu:8020/#/
<hkaiser> this might be a better view: http://ktau.nic.uoregon.edu:8020/#/builders
<hkaiser> three (out of nine) platforms passnow
anushi has quit [Remote host closed the connection]
<hkaiser> nikunj: better yet here: http://ktau.nic.uoregon.edu:8020/#/console
<nikunj> hkaiser: I'll have a look
<nikunj> hkaiser: are these separate architectures?
jakub_golinowski has quit [Ping timeout: 256 seconds]
<hkaiser> yes, powerpc, knl (XeonPhi) and a x86 system
<nikunj> hkaiser: ok
<nikunj> I'll have a look
<hkaiser> thanks
jakub_golinowski has joined #ste||ar
anushi has joined #ste||ar
<nikunj> hkaiser: do we have a similar buildbot for hpx?
jakub_golinowski has quit [Ping timeout: 256 seconds]
jakub_golinowski has joined #ste||ar
<hkaiser> nikunj: no, but hpx is being built there as well (no tests are run, though)
<nikunj> hkaiser: ok
anushi has quit [Remote host closed the connection]
jakub_golinowski has quit [Ping timeout: 256 seconds]
jakub_golinowski has joined #ste||ar
anushi has joined #ste||ar
<nikunj> hkaiser: I see only one failing test everywhere, am I missing anything?
<jbjnr> if you need extradebug symbols etc - I can recompile - think this was release
<nikunj> jbjnr, no that's enough
<nikunj> I only wanted to see how they handle call to __libc_start_main
<jbjnr> watching england-croatia now, so won't be replying much
<nikunj> jbjnr: England will win
nikunj has quit [Quit: Leaving]
<jbjnr> wow - what a goal!!!!
nikunj has joined #ste||ar
<M-ms> is phylanx using some fancy new version of buildbot?
<M-ms> jakub_golinowski: good that you got perf working
<M-ms> I'm going to rebuild all my stuff with latest masters as well now
<M-ms> try not just the mandelbrot benchmark with 4 threads but also the opencv perf tests
<nikunj> jbjnr: Please run the executable generated from the makefile here (https://github.com/NK-Nikunj/GSoC-experimental-codes/tree/master/powerpc) whenever you get time. If the output is not the same as written in README.md then please notify me about it.
hkaiser has quit [Quit: bye]
<jaafar> Someone challenged me on Twitter so I'm going to pull out some HPX on them https://twitter.com/lemire/status/1017058602161463296
<jaafar> Don't let me down, y'all
eschnett has quit [Quit: eschnett]
anushi has quit [Ping timeout: 264 seconds]
anushi has joined #ste||ar
<zao> :)
<nikunj> so __libc_start_main does not get wrapped!
<nikunj> That's the root cause of non-working hpx build on Powerpc then
<nikunj> jbjnr: could you provide me with the assembly of main?
<nikunj> that would help me decipher things
jakub_golinowski has quit [Ping timeout: 256 seconds]
jakub_golinowski has joined #ste||ar
hkaiser has joined #ste||ar
<jbjnr> nikunj: added to gist at bottom
<jbjnr> <nikunj> so __libc_start_main does not get wrapped!
<jbjnr> <nikunj> That's the root cause of non-working hpx build on Powerpc then
<jbjnr> <nikunj> jbjnr: could you provide me with the assembly of main?
<jbjnr> <nikunj> that would help me decipher things
<jbjnr> ◀━━ Quits: jakub_golinowski (~jakub@2a00:23c0:3201:c601:8c09:1667:2f74:f10b) (Ping timeout: 256 seconds)
<jbjnr> ━━▶ Joins: jakub_golinowski (~jakub@host31-52-138-38.range31-52.btcentralplus.com)
<jbjnr> ━━▶ Joins: hkaiser (~hkaiser@2600:1700:a50:99a0:cc2c:a554:2a09:bd8d)
<jbjnr> ❮▲❯ ChanServ gives channel operator status to hkaiser
<jbjnr> oops. keyboard error
<jbjnr> not sure what happend. Football is back on. bbiab
<nikunj> Now it's getting even stranger. __libc_start_main is getting wrapped but still the program is not working T-T
<nikunj> jbjnr, could you try building it again: https://github.com/NK-Nikunj/GSoC-experimental-codes/tree/master/powerpc
<nikunj> hkaiser, yt?
<hkaiser> here
<nikunj> hkaiser: there is one test failing in all the builds
<hkaiser> why is it red, then?
<nikunj> hkaiser, also from the stack frame I can see HPX_WITH_DYNAMIC_HPX_MAIN=OFF
<nikunj> anything you did?
<hkaiser> ok
<hkaiser> I'll look
<hkaiser> ok
<hkaiser> even that does not prove anything ;-)
<nikunj> it does not
<nikunj> I'm happy that most of the errors have passed
<hkaiser> nikunj: :D
<hkaiser> yah, good job!
<nikunj> jbjnr, I have changed the function signature, could you try it again: https://github.com/NK-Nikunj/GSoC-experimental-codes/tree/master/powerpc
<diehlpk_work> jbjnr, Exciting game :)
<nikunj> diehlpk_work, very exciting!
<jbjnr> nikunj: pulled from master but it gave ma a conflict - I assume you force pushed. I reset --hard origin/master and recompiled. Same output, just main
<nikunj> jbjnr, yes it was a force push
<nikunj> jbjnr, i see
<nikunj> jbjnr, could you tell if it's a powerpc or a powerpc64?
<jbjnr> 64
<nikunj> also which environment is it (with version)
<jbjnr> ehat do you mean? what do you want to know
<nikunj> which version of redhat are you using?
mbremer has quit [Quit: Page closed]
mcopik has quit [Ping timeout: 240 seconds]
mcopik has joined #ste||ar
<M-ms> sorry jbjnr
<nikunj> I feel sad for england right now
<nikunj> jbjnr, I made changes specific to powerpc. Could you run make again? (https://github.com/NK-Nikunj/GSoC-experimental-codes/tree/master/powerpc)
jakub_golinowski has quit [Ping timeout: 244 seconds]
<jbjnr> nikunj: /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../lib64/crt1.o: In function `_start':
<jbjnr> (.text+0x24): undefined reference to `__wrap___libc_start_main'
<jbjnr> clang-7: error: linker command failed with exit code 1 (use -v to see invocation)
<diehlpk_work> hkaiser, Could you finish your gsoc evaluation by today?
<jbjnr> nikunj: I tried adding extern "C" to the __Wrap_xxx but it segfaulted when I did that
jakub_golinowski has joined #ste||ar
<nikunj> ok
<nikunj> hkaiser, yt?
<hkaiser> nikunj: here
<nikunj> I have a wonderful idea (which I should have thought of before). Instead of wrapping __libc_start_main, let's wrap the main instead. Call to main for every architecture is same so it will flawlessly
<hkaiser> ok
<nikunj> I thought that __libc_start_main would work similarly (so I never bothered changing main) but turns out a slight change in definition won't allow it to work properly on powerpc
<hkaiser> ...or other platforms
<hkaiser> we new it was brittle
<hkaiser> knew*
<nikunj> yes
<nikunj> main should not be brittle
<nikunj> jbjnr, could you try the current master?
<nikunj> just to check if my main hypothesis is correct
<jbjnr> wrap_libc.cpp:(.text+0x44): undefined reference to `__real_main'
<nikunj> oh wait
<nikunj> I forgot to add a few things
<nikunj> jbjnr, try now, I've changed makefile
<jbjnr> nikunj: openpower01:~/src/GSoC-experimental-codes/powerpc (master<>)$ ./main
<jbjnr> __wrap_main
<jbjnr> main
<jbjnr> looks good
<nikunj> jbjnr, perfect!
<nikunj> so my hypothesis was right!
<jbjnr> what was the hypothesis
<nikunj> jbjnr, the wrap symbol (from ld) provides you a way to wrap a symbol and define your own wrapper for it. I chose _libc_start_main as a wrapper initially thinking that we might be able to initiate the HPX runtime pretty early on. It failed pretty miserably, but the entry point was changed to our own custom entry point.
<nikunj> Thinking that the solution is still portable enough I implemented it only later to identify it won't work on powerpc
<nikunj> So now we are wrapping main instead since that will now be our new portable entry point
<nikunj> that should do things nicely
<jbjnr> good work
<nikunj> jbjnr, thanks!
<nikunj> hkaiser: working on a pr now. Will try this to get merged then rebase apple to merge it as well. sounds good?
<hkaiser> let focus on one thing at a time
<nikunj> hkaiser: what should I focus on?
<hkaiser> finish linux support
<nikunj> hkaiser: the pr was to make the linux support better. This would prevent us from using multiple versions of __libc_start_main for different platforms
<hkaiser> nikunj: do you have ppc support under control now?
<nikunj> if we wrap main, we will have it under control. Things work on jbjnr machine
<nikunj> hkaiser: changing the wrapper symbol will help us gain better overall control over the linux platform
<hkaiser> nikunj: isn't that what I suggested? finish linux support?
<nikunj> hkaiser: ohk on it then, pr will arive soon
<hkaiser> nikunj: is main a weak symbol too?
<nikunj> if it's letting us wrap it then yes it is as well
<nikunj> hkaiser: i'm not too sure though
<hkaiser> also, pls be careful, hpx_init defines its own main (which would have to be wrapped with a pp constant anyways in your case, now that I think about it)
<hkaiser> it even defines several versions of it
<nikunj> hkaiser: actually the fact with wrap is it let's you wrap any function (weak or strong)
<hkaiser> does it?
<nikunj> Basically it tells the compiler to call the __wrap_<symbol_name> instead of the actual one
<nikunj> It's very much like dlsym method. But the plus point here is it can be extended to static executables as well
<nikunj> in case of dlsym it is limited only to dynamic executable
<nikunj> that was the primary reason i chose wrap over dlsym. dlsym had one thing done right, if you chose to call dlsym when the symbol does not exists then it would simply not refuse to work. In case of wrap we had to add checks to get things right
<nikunj> *simply refuse to work
<hkaiser> k
<nikunj> hkaiser, check here to read about --wrap. I read it from here as well: https://linux.die.net/man/1/ld
<hkaiser> k
<jaafar> OK transform_reduce with par execution policy does pretty well, scales nicely
mcopik has quit [Ping timeout: 240 seconds]
<nikunj> hkaiser, build passes. Tests working out fine too
<hkaiser> jaafar: HPX?
<hkaiser> nikunj: nice
<jaafar> hkaiser: you got it :)
<hkaiser> jaafar: cool
<jaafar> It was just a dot product, I thought "there's an algorithm for that" :)
mbremer has joined #ste||ar
<jaafar> If you want to follow on Twitter: https://twitter.com/lemire/status/1017058602161463296
<hkaiser> jaafar: thanks!
<hkaiser> jaafar: do you have your results posted somewhere?
quaz0r has quit [Ping timeout: 256 seconds]
<nikunj> hkaiser, phylanx builds and runs perfectly as well.
jbjnr has quit [Read error: Connection reset by peer]
<jaafar> hkaiser: ah no I didn't actually post them but I have them in a terminal window right now :)
mcopik has joined #ste||ar
<github> [hpx] NK-Nikunj opened pull request #3375: Replacing wrapper for __libc_start_main with main (master...Linux_better_impl) https://git.io/fNIXC
nikunj has quit [Quit: goodnight]
<jakub_golinowski> M-ms, for your reference and to start a discussion about how to read the perf logs: https://pastebin.com/8NueJTEa
mcopik has quit [Ping timeout: 260 seconds]
quaz0r has joined #ste||ar
quaz0r has quit [Ping timeout: 268 seconds]
hkaiser has quit [Read error: Connection reset by peer]
diehlpk has joined #ste||ar
quaz0r has joined #ste||ar
jakub_golinowski has quit [Ping timeout: 240 seconds]