aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
vamatya has quit [Ping timeout: 252 seconds]
eschnett has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
mcopik has quit [Ping timeout: 248 seconds]
jfbastien has quit [Read error: Connection reset by peer]
parsa has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
parsa has quit [Quit: Zzzzzzzzzzzz]
bikineev has quit [Remote host closed the connection]
vamatya has joined #ste||ar
vamatya has quit [Read error: Connection reset by peer]
vamatya has joined #ste||ar
hkaiser has quit [Quit: bye]
parsa has joined #ste||ar
parsa has quit [Client Quit]
parsa has joined #ste||ar
parsa has quit [Client Quit]
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
bikineev has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
Matombo has joined #ste||ar
<github> [hpx] biddisco pushed 1 new commit to master: https://git.io/v5ch5
<github> hpx/master 1a50ac6 John Biddiscombe: Merge pull request #2865 from STEllAR-GROUP/fixing_2862...
<github> [hpx] sithhell closed pull request #2782: Replace boost::atomic with std::atomic (where possible) (master...std-atomic-lite) https://git.io/v7tSF
<heller> jbjnr: what was the reason appveyor failed?
<jbjnr> timeout?
<heller> ok
<github> [hpx] sithhell deleted fixing_2862 at 6a34d9b: https://git.io/v5cjJ
<heller> I made some tweaks to buildbot, hopefully getting rid fo all those strange failures
<jbjnr> happy late birthday to your wife. During our skype call, we just tried to make some plans as to what we will do next week, and who'd do what. since you won't be there, we must scale back our ambitions to a bare minimum of getting kernels in place for FMM etc.
<heller> i'll plan on being virtually present
<heller> also, I am fixing the CUDA compile errors we have right now
<jbjnr> very good
<jbjnr> I want to test cuda on my laptop too for fast turnaround and debuging
<jbjnr> (I mean cuda+hpx)
<heller> yes
<heller> I am a little stuck at the moment with the compiler errors
<heller> but i'll get there
<jbjnr> So I had a meeting about hpx in zurich yesterday and the upshot is that if we don't match the parsec results soon, then CSCS will have to start looking elsewhere.
<heller> ok
<heller> what's missing?
<heller> and (totally serious) what are the alternatives?
<heller> legion? thrust?
<jbjnr> HPX is perceived as a nice library written by people who know nothing about HPC, we will simply join with the us labs eventually. For the time being, I will continue with my hpx experiments and we may have a second cscs chap working on stuf too.
<heller> knowing nothing about HPC might not be a bad thing :P
<jbjnr> well, the initial results we got with the cholesky were totally shit and I've had to rewrite the schedulers and implement thread pools to make any progress. If HPX were written with HPC in mind, this would already have been done
<jbjnr> none of the hpx developer are actually testing HPX against the competition - when was the last time you wrote an algorithm with another library and compared it to hpx? legion? starpu? others?
<heller> true
<heller> when was the last time I wrote a complete algorithm alltogether
<jbjnr> I just hope I can get the cholesky up to parsec, then everyone will be happy.
<jbjnr> memory layout turns out to be different between our version and parsec, so there may be more work :(
<heller> wow
<heller> that's surprising
<heller> and might indeed be the cause for the difference
<jbjnr> my big problem at the moment is this ...
<heller> tell me
<heller> where's parsec in that picture?
<jbjnr> the 512 block size gets worse with more threads and when we look at the timing of each dgemm, we see this ...
<heller> what's the parsec performance?
<heller> for the 512 block size, it seems that your new scheduler isn't as effective
<heller> what's the reason to use 512 block? 256 block seems to be way better
<jbjnr> the time for a single dgemm is 8ms, but we have a significant number taking 16+
<jbjnr> someone is stealing our cpu timeslices
<heller> ahh, that picture again :
<jbjnr> new version of old pic
<heller> ok
<jbjnr> with 256 block size, we do not see this effect, as it finishes in one timeslice generaly
<heller> aha
<heller> strange
<heller> we removed the thread niceness level, right?
<jbjnr> so when we run hpx on fewer threads, we see improved perf in some cases - cos the system (?) jitter is not causing that problem
<jbjnr> I beleive we removed it
<heller> do you know the idle rate for the 512 block run?
<jbjnr> it's practically zero
<jbjnr> I have disabled all perf counters
<heller> ok
<jbjnr> but I can see the task traces from raffaelels'e profiler when I use it
<heller> that probably helps
<heller> do you have likwid available?
<jbjnr> there is no wasted CPU time, there is only blocks taking longer than they should.
<jbjnr> (no wasted CPU time in the more optimal cases, at least)
<heller> sure
<heller> would be nice to read out the hardware perf counters and see what's going on
<heller> cache misses, memory transfers etc.
<jbjnr> getting papi working is on my list
<jbjnr> bbiab
<heller> mind you that papi has significant overhead
<heller> likwid might give you an immediate first picture
<heller> (or vtune)
jaafar has quit [Ping timeout: 252 seconds]
<heller> why is rostam so overly broken?
Matombo has quit [Remote host closed the connection]
<heller> all tests fail due to misconfigured nodes, that's depressing :/
david_pfander has joined #ste||ar
bikineev has quit [Remote host closed the connection]
mcopik has joined #ste||ar
bikineev has joined #ste||ar
hkaiser has joined #ste||ar
<heller> hkaiser: rostam is a little broken :/
<heller> hkaiser: can you pass me the email of alireza please, so I can make a proper bug report
<hkaiser> heller: it's akheir here
<heller> akheir: hey
<heller> hkaiser: thanks
<hkaiser> heller: Alireza Kheirkhahan <akheir1@lsu.edu>
<heller> thanks
vamatya has quit [Ping timeout: 240 seconds]
<jbjnr> is it time to consider buildbot as a deadend and adopt some other technology?
<hkaiser> jbjnr: feel free
<hkaiser> we don't have anything better right now
<jbjnr> also - I understand circleci - but what is this appveyor thing?
<heller> jbjnr: visual studio testing
<heller> jbjnr: I am totally with you there
<heller> jbjnr: also, I don't think it is buildbot per se. rostam just can't up with the load
<heller> we should update to 0.9 and so. There is buildbot_travis which looks pretty awesome
K-ballo has joined #ste||ar
heller has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
heller has joined #ste||ar
bikineev has quit [Ping timeout: 246 seconds]
<github> [hpx] hkaiser created serialize_boost_variant (+1 new commit): https://git.io/v5CgG
<github> hpx/serialize_boost_variant 2f083ab Hartmut Kaiser: Changed serialization of boost.variant to use variadic templates
twwright has quit [*.net *.split]
twwright has joined #ste||ar
pree has joined #ste||ar
eschnett has quit [Quit: eschnett]
bikineev has joined #ste||ar
jaafar has joined #ste||ar
aserio has joined #ste||ar
eschnett has joined #ste||ar
bikineev has quit [Ping timeout: 240 seconds]
pree has quit [Ping timeout: 260 seconds]
parsa has joined #ste||ar
bikineev has joined #ste||ar
pree has joined #ste||ar
<diehlpk_work> hkaiser, jbjnr , mcopik parsa Final evalualtion of GSoC students opened today
<jbjnr> did the students all submit the stuff they needed to do?
<aserio> heller: did you punch buildbot?
<aserio> :p
<heller> aserio: always
<heller> aserio: rostam is a bit broken
<aserio> heller: just a little
<aserio> Does most of it stem from the RP merge?
<heller> mainly software failures
<aserio> lol
<heller> no
<aserio> oh?
<heller> *rostam* is broken
<heller> HPX is still broken
<heller> but there is no way to determine the level of brokeness of HPX with the current brokeness of rostam
<aserio> akheir, heller: Do we know what is broken with rostam, or is that part of the issue?
<heller> I sent a bug report to akheir
<aserio> ok
<heller> he is working on of the problems that occur
<aserio> good to know
<heller> for the other problem, I don't know exactly what's going on
<aserio> I will watch this space
<heller> something to do with mpi not working properly
<heller> either the IB drivers are misconfigured, or MPI is using the wrong drivers
<aserio> sigh, we need a distributed testing infrastructure
<heller> I agree
<heller> I would be implementing it
<heller> the problem is that you guys keep swamping me with other stuff :P
<aserio> ...maybe I should try to hire a student this semester to build it
<aserio> heller: its not just you :p
<heller> make sure the students knows what he is up to
<heller> ask him what cmake, a compiler and a linker is
zbyerly_ has joined #ste||ar
<aserio> It would take a hacker to put this together
<heller> yup
<aserio> We don't need a C++ guy for the job though right?
<aserio> There might be a gem or two that I can find
<heller> we need a guy who knows what the C++ features mean
<heller> sorry, C++ error messages
<heller> and how to fix them
<heller> he should ideally know python
<heller> and ctest
<aserio> Does build bot use python?
<aserio> Or are you imagining re-implementing all of HPX testing
<heller> buildbot uses pythong
<heller> pythin
<heller> python
<heller> I am playing with the idea of implementing something from scratch, yes ;)
* aserio sings the pythong song
<heller> lol
<heller> it's a thong song
<heller> mainly to support our use case of running stuff through slurm, and maybe also supporting runnning stuff over ssh
<heller> maybe even resuing the output of ctest directly
<aserio> I don't know if I will be able to find a student to do all of that
<aserio> But I would be happy to have a student who is a crack at buildbot
<heller> ;)
<heller> I think an upgrade to buildbot 0.9 would be good enough already
<aserio> I will try to put to put some feelers out there
<heller> ^^ this one here looks pretty awesome
zbyerly_ has quit [Ping timeout: 248 seconds]
bikineev has quit [Ping timeout: 240 seconds]
<heller> jbjnr: you broke the throttling scheduling policy :/
<heller> hkaiser: ^^
bikineev has joined #ste||ar
<jbjnr> heller: what's wrong? How do I check/see the error?
<heller> jbjnr: run any application with --hpx:queuing=throttle
<jbjnr> aha. probably all the command line queuing options will fail too. I'll have a look later
<heller> na, it seems it is just that one
mcopik has quit [Ping timeout: 260 seconds]
<heller> jbjnr: got it
<jbjnr> (some of the throttling changes might have got lost during merges etc).
<heller> yup
<heller> threadmanager.cpp didn't initialize the pools correctly
<jbjnr> have you fixed it - or do you need me to look at anything?
<heller> jbjnr: I think I fixed it
<heller> compiling and testing right now
<jbjnr> thanks
pree has quit [Read error: Connection reset by peer]
<github> [hpx] sithhell created fix_throttle (+1 new commit): https://git.io/v5Cpg
<github> hpx/fix_throttle e16c7bd Thomas Heller: Adding missing support for throttling scheduler...
<github> [hpx] sithhell opened pull request #2871: Adding missing support for throttling scheduler (master...fix_throttle) https://git.io/v5Cpw
<heller> jbjnr: ^^
<heller> just a few oversights
<jbjnr> thanks again
<diehlpk_work> jbjnr, Yes, all students submmitted
<jbjnr> great. thanks
<hkaiser> heller: the throttle scheduler should go alltogether from the main hpx repo
<hkaiser> heller: can be moved to user space now
<heller> hkaiser: I agree
<heller> hkaiser: but in the meantime, we need something to work
<hkaiser> pls create a ticket for this
<hkaiser> I didn't even look at it for this reason
<heller> an example showing how to achieve the same behavior would be nice or so
<hkaiser> there is such anexample
<heller> not really ;)
<hkaiser> there is
<hkaiser> simple_resource_partitioner or so
<heller> the examples that use the resource partitioners are overly complicated
<hkaiser> feel free to create one
<heller> as said: in the meantime, we need something to work
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
jakemp has joined #ste||ar
akheir has quit [Remote host closed the connection]
parsa has quit [Quit: *yawn*]
zbyerly_ has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
mcopik has joined #ste||ar
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
patg[[w]] has joined #ste||ar
david_pfander has quit [Ping timeout: 248 seconds]
parsa has joined #ste||ar
jkleinh has joined #ste||ar
<jkleinh> NAMES
Matombo has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
bikineev has quit [Ping timeout: 252 seconds]
patg[[w]] has quit [Quit: Leaving]
EverYoung has joined #ste||ar
jaafar has quit [Quit: Konversation terminated!]
jaafar has joined #ste||ar
EverYoun_ has quit [Ping timeout: 246 seconds]
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
<jkleinh> Has anyone had any experience compiling hpx statically with jemalloc or tcmalloc?
<jkleinh> I am not able to do it without getting duplicate symbol errors
<jkleinh> something to do with the linking order
<zao> If the lazy people here don't notice it, please open an issue on github or poke the mailing list.
<zao> Also, hi.
<diehlpk_work> jkleinh, Can you share your error message as a gis or on pastbein with us
<diehlpk_work> Or in an issue on github?
<diehlpk_work> jkleinh, Whic OS do you use? Cmake version?
bikineev has joined #ste||ar
<hkaiser> jkleinh: do you need to link everything statically?
<heller> jkleinh: which jemalloc version do you use, there is a known Problem with 4.5 iirc
ajaivgeorge has joined #ste||ar
<heller> We did the same with octotiger a few months ago. There shouldn't be any problems
<hkaiser> heller: but we linked everything statically
<hkaiser> not just jemalloc
<heller> Yes
<heller> That's what I just wanted to say ;)
<heller> jemalloc 5.0.1 should work fine
<heller> So you need to have static boost, hwloc and jemalloc as the minimal set of dependencies built statically
<heller> I never attempted a mixed build
<heller> jkleinh: how do you know your calls don't go through jemalloc/tcmalloc?
<hkaiser> heller: he's getting duplicate symbol errors while linking
<heller> Where can I see the error?
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
<heller> The cmake steps would be important as well
<heller> And the cmake version
<jkleinh> I am compiling on cori with gcc-6.3.0, cmake 3.8.2, boost 1.62, hwloc 1.11.7
<jkleinh> I tried jemalloc 5.0.1 and gperftools 2.6.1
<jkleinh> I don't have a hard requirement to link statically but my understanding is that this is the preferred method on cori
<jkleinh> I am also linking everything else statically
<jkleinh> one second and I will post error message
<jkleinh> this is on hpx master
<heller> jkleinh: it helps when doing large scale runs, for initial development having dynamic libraries is just fine
<heller> jkleinh: for the record, we were having quite some problems with the libraries provided by the modules system when doing static builds on cori
<heller> (also haswell partition or knl?)
<heller> Please don't forget to post your cmake invocation
<jkleinh> yeah me as well. I compiled all dependencies myself to try to avoid that. Provided hwloc has other linking errors and memkind module seems to not actually provide jemalloc headers that are needed
<jkleinh> trying to build on haswell currently
<jkleinh> we are interested in knl as well
<heller> Yeah, you need to avoid memkind with hpx
<jkleinh> here is build script
<heller> Ok, let's stick with haswell first
ajaivgeorge has quit [Ping timeout: 248 seconds]
<heller> Ok, you need to absolutely use the cray toolchain file
<heller> And add a HPX_WITH_PARCELPORT_LIBFABRIC=Off to your script
parsa has quit [Quit: Zzzzzzzzzzzz]
<jkleinh> here is error
ajaivgeorge has joined #ste||ar
<jkleinh> ok I will rerun with cray static tool chain
<heller> You need the CrayStatic one for haswell
<heller> And use jemalloc
<heller> I vaguely remember similar errors with tcmalloc for a static build
<jkleinh> ok I'm reconfiguring now
<jkleinh> another question: is there a large performance difference between the different parcelports, especially mpi vs tcp?
<heller> Yes
<heller> Sorry, gtg now
<heller> If all fails, I suggest a regular build for now
<heller> And file a issue with the error.
<jkleinh> ok will do
<hkaiser> jkleinh: yes, tcp is slow, mpi should be relatively ok
<heller> It's late here so I won't look into until tomorrow morning
<jkleinh> no problem. thanks!
<jkleinh> ok then there are more problems. building with mpi parcelports generates another set of errors
<heller> That might help for a knl build
<heller> Uh oh
<hkaiser> jkleinh: pls create tickets with your build log and errors you're seeing
<hkaiser> we'll get back asap
<jkleinh> ok I will create issue
<jkleinh> new build script
<jkleinh> this with jemalloc now
aserio has joined #ste||ar
ajaivgeorge has quit [Ping timeout: 260 seconds]
ajaivgeorge has joined #ste||ar
<hkaiser> add those links to the ticket, pls
aserio has quit [Ping timeout: 248 seconds]
eschnett has quit [Quit: eschnett]
jkleinh has quit [Quit: Page closed]
jakemp has quit [Ping timeout: 248 seconds]
jakemp has joined #ste||ar
<github> [hpx] hkaiser closed pull request #2830: Break the debugger when a test failed (master...test_break) https://git.io/v7QEm
<7JTAB6L7K> [hpx] hkaiser pushed 3 new commits to master: https://git.io/v5W7v
<7JTAB6L7K> hpx/master b1b1deb Denis Blank: Break the debugger when a test failed...
<7JTAB6L7K> hpx/master a3e1cfb Denis Blank: Allow to disable the hpx:attach-debugger option...
<7JTAB6L7K> hpx/master 8acb6a7 Hartmut Kaiser: Merge pull request #2830 from Naios/test_break...
<zao> A commit to break the debugger? Sounds legit :P
<18WAAECVA> [hpx] hkaiser pushed 1 new commit to disambiguate_base_lco: https://git.io/v5W7L
<18WAAECVA> hpx/disambiguate_base_lco cbf64bc Hartmut Kaiser: Merge branch 'master' into disambiguate_base_lco
<hkaiser> break into the debugger
<github> [hpx] hkaiser force-pushed serialize_boost_variant from 2f083ab to 01245ef: https://git.io/v5W7o
<github> hpx/serialize_boost_variant 01245ef Hartmut Kaiser: Changed serialization of boost.variant to use variadic templates
<github> [hpx] hkaiser force-pushed inspect_assert from 7fea1b2 to 6008bec: https://git.io/v5LEL
<github> hpx/inspect_assert 6008bec Hartmut Kaiser: Deprecate use of BOOST_ASSERT and ensure HPX_ASSERT has corresponding #include present
<github> [hpx] hkaiser opened pull request #2874: Changed serialization of boost.variant to use variadic templates (master...serialize_boost_variant) https://git.io/v5W75
<github> [hpx] hkaiser opened pull request #2875: Deprecate use of BOOST_ASSERT (master...inspect_assert) https://git.io/v5W5Z
<heller> hkaiser: do you know if ali made progress on fixing the errors I reported?
eschnett has joined #ste||ar
<hkaiser> heller: no idea, LSU was closed today
<github> [hpx] hkaiser pushed 1 new commit to inspect_assert: https://git.io/v5W5b
<github> hpx/inspect_assert 7d07859 Hartmut Kaiser: Merge branch 'master' into inspect_assert
parsa has joined #ste||ar
<github> [hpx] hkaiser force-pushed serialize_boost_variant from 01245ef to 51f1be9: https://git.io/v5W7o
<github> hpx/serialize_boost_variant 51f1be9 Hartmut Kaiser: Changed serialization of boost.variant to use variadic templates
EverYoun_ has joined #ste||ar
<zao> Oh right, Harvey is visiting?
<hkaiser> nod
EverYoung has quit [Ping timeout: 240 seconds]
jakemp has quit [Read error: Connection reset by peer]
jaafar has quit [Quit: Konversation terminated!]
jaafar has joined #ste||ar
EverYoun_ has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
Matombo has quit [Remote host closed the connection]
aserio has joined #ste||ar
aserio has quit [Ping timeout: 252 seconds]
zbyerly_ has quit [Ping timeout: 240 seconds]
zbyerly_ has joined #ste||ar
bikineev has quit [Remote host closed the connection]
<github> [hpx] hkaiser pushed 1 new commit to addressing_service: https://git.io/v5lv4
<github> hpx/addressing_service 7d93790 Hartmut Kaiser: Merge branch 'master' into addressing_service