aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
mcopik has joined #ste||ar
mcopik has quit [Client Quit]
EverYoung has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
eschnett has joined #ste||ar
eschnett has quit [Client Quit]
eschnett has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
<github> [hpx] hkaiser created fixing_execution_parameters (+1 new commit): https://git.io/vdCrw
<github> hpx/fixing_execution_parameters 05c00ee Hartmut Kaiser: Fixing problems with new execution parameters interface
Matombo444 has joined #ste||ar
jaafar has quit [Ping timeout: 258 seconds]
kisaacs has joined #ste||ar
Matombo444 has quit [Remote host closed the connection]
Matombo has quit [Ping timeout: 258 seconds]
diehlpk has joined #ste||ar
kisaacs has quit [Ping timeout: 258 seconds]
kisaacs has joined #ste||ar
kisaacs has quit [Ping timeout: 248 seconds]
jaafar has joined #ste||ar
kisaacs has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
diehlpk has quit [Ping timeout: 240 seconds]
kisaacs has quit [Ping timeout: 248 seconds]
vamatya_ has quit [Ping timeout: 258 seconds]
parsa has joined #ste||ar
hkaiser has quit [Read error: Connection reset by peer]
parsa has quit [Quit: Zzzzzzzzzzzz]
jbjnr has joined #ste||ar
jaafar has quit [Ping timeout: 246 seconds]
<jbjnr> I was really hoping master would be fixed when I woke up :(
<zao> 01:18:23 -- | Notice(github): hpx/master fed3fea Hartmut Kaiser: Stop-gap measure to make execution parameters compile for thread-executors...
<zao> 01:18:38 @hkaiser | heller: this should fix things for now ^^ - I will create a real fix through a PR
<zao> jbjnr: That didn't do it, whatever _that_ was? :)
<jbjnr> correct. it was another fail
<zao> Might've been for diehl's stuff.
<zao> Bummer.
<jbjnr> no it was an attempt to fix the compilation errors. The dashboard is redder than a very red thing today. Terrible. We destroyed it.
<heller> Fix it
<heller> Revert those changes and move on
kisaacs has joined #ste||ar
<heller> jbjnr: ^^
<heller> Not sure if Hartmut is aware of when the tutorial is going to happen
<jbjnr> I made a fix. testing now
<jbjnr> because I test my stuff before pushing to master!
<heller> Go for it!
<jbjnr> anyway heller - the problem is not just that one commit, it's the other couple before, and the merge of the branch a few days ago. We need to reset the master back a couple of days.
<heller> And you know, you better not make inspect fail, that's the worst ;)
<heller> Yes, I know
<jbjnr> my fix does not fix everything :(
simbergm has joined #ste||ar
<heller> :(
<heller> jbjnr: first thing in doing when I get is to implement proper testing
<heller> Circle taking 4 hours and not catching any real problems is just bad
<jbjnr> "when I get" what?
<jbjnr> hmm. time?
<jbjnr> reset_thread_distribution is causing the trouble at the oment, but I've no idea where this is coming from
<jbjnr> customization point for executors
kisaacs has quit [Ping timeout: 258 seconds]
<heller> jbjnr: get back
kisaacs has joined #ste||ar
Matombo has joined #ste||ar
<jbjnr> kisaacs: Are you the Kate Isaacs that works on Ravel?
kisaacs has quit [Ping timeout: 264 seconds]
<github> [hpx] biddisco created cscs2017 (+32 new commits): https://git.io/vdC7j
<github> hpx/cscs2017 459a043 Thomas Heller: Making Clang + CUDA work...
<github> hpx/cscs2017 4302b4c Thomas Heller: Fixing ICE with nvcc
<github> hpx/cscs2017 1ff0a9f Thomas Heller: Properly fixing NVCC builds...
<jbjnr> just fyi heller that cscs2017 branch is the one that I am using at the moment for the tutorial
<jbjnr> it mostly works
simbergm has quit [Ping timeout: 246 seconds]
<github> [hpx] biddisco pushed 1 new commit to master: https://git.io/vdC57
<github> hpx/master 05bca6d John Biddiscombe: Fix one compilation problem with customization points
<jbjnr> we're screwed. 70% tests passed, 63 tests failed out of 210
<jbjnr> just for a small test run.
<jbjnr> the last day wen things were mostly green for tests i here http://cdash.cscs.ch/index.php?project=HPX&date=2017-09-21
simbergm has joined #ste||ar
<heller> jbjnr: OK. I think those are the tests that I broke
<heller> Shouldn't matter too much for the short term.
<jbjnr> ha! I knew it must have been your fault :)
<heller> :p
<jbjnr> stencil tutorials don't compile still.
<heller> Ok, why not?
<jbjnr> not got round to fixing them yet
<jbjnr> customization points and hpx::util::unwrapped changes
<heller> Might use some deprecated apis
<heller> Right
<jbjnr> that customization point commit has totally fucked everything.
<heller> I'm deep into family time right now
<heller> Yeah
<jbjnr> enjoy
<heller> We should just revert it...
<jbjnr> lets hope hartmut can't sleep and gets up early.
<heller> Yes ;)
<jbjnr> hopefully his conscience will be keeping him awake!
<heller> We just need better testing...
<heller> A PR got merged which broke some tests (two weeks ago) nobody cared fixing it. More PRs get merged without even looking at buildbot anymore...
<heller> ttyl
simbergm has quit [Ping timeout: 255 seconds]
kisaacs has joined #ste||ar
kisaacs has quit [Ping timeout: 260 seconds]
simbergm has joined #ste||ar
simbergm has quit [Ping timeout: 246 seconds]
kisaacs has joined #ste||ar
kisaacs has quit [Ping timeout: 248 seconds]
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vdWqM
<github> hpx/gh-pages 5284a77 StellarBot: Updating docs
simbergm has joined #ste||ar
K-ballo has joined #ste||ar
simbergm has quit [Ping timeout: 240 seconds]
hkaiser has joined #ste||ar
simbergm has joined #ste||ar
simbergm has quit [Ping timeout: 246 seconds]
simbergm has joined #ste||ar
simbergm has left #ste||ar [#ste||ar]
msimberg has joined #ste||ar
zbyerly_ has quit [Ping timeout: 248 seconds]
<github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/vdW0n
<github> hpx/master 5f7ddc7 Hartmut Kaiser: Fixing core compilation problems
<jbjnr> Do we think that the issues with hello_world producing odd results are in any way connected to recent executor meddling?
<hkaiser> jbjnr: frankly I have no idea where this comes from - I don't think it's related to my recent changes to the execution parameters, however
<hkaiser> jbjnr: does it happen for bond=none only?
<hkaiser> bind=none*
<jbjnr> build error with master for some parallel algorithms
<hkaiser> grrr
<jbjnr> let me try the hello word again
<hkaiser> sec
<hkaiser> I hate doing all of this directly on master
<hkaiser> try again
<github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/vdWu3
<github> hpx/master a1a0e05 Hartmut Kaiser: Properly forward declare type
<jbjnr> bin/hello_world -t2 --hpx:bind=balanced
<jbjnr> hello world from OS-thread 1 on locality 0 hello world from OS-thread 1 on locality 0 hello world from OS-thread 0 on locality 0
<jbjnr> three outputs for t2!
<hkaiser> right
<jbjnr> welcome to the world of weird
<hkaiser> is this reproducible now?
<jbjnr> not every time
<jbjnr> but frequently
<hkaiser> ok, will look today
<jbjnr> I have no real idea what might have caused this, other than a race somewhere, but I have no suspicions about where
<hkaiser> has to be a race
<jbjnr> ok. there was a mutex around the used_pu_mask access at one point that I might have messed up during RP changes ... ???
<hkaiser> works for me (tm) :/
<jbjnr> same error on master for parallel algos
<hkaiser> ok, I stop pushing to master now - will provide a properly tested fix
<jbjnr> home/biddisco/src/hpx/hpx/traits/v1/is_executor_parameters.hpp:32:55: error: ‘is_executor_parameters’ in namespace ‘hpx::parallel::execution::detail’ does not name a template type
<jbjnr> using is_executor_parameters = execution::detail::is_executor_parameters<T>;
<jbjnr> good plan
<hkaiser> jbjnr: where is that mutex problem you mentioned above?
<jbjnr> hold on
<jbjnr> I was not sure why it was there
<jbjnr> it isn't actually proteting anythign afaict
<jbjnr> bbiam
aserio has joined #ste||ar
kisaacs has joined #ste||ar
<jbjnr> kisaacs: ping?
kisaacs has quit [Ping timeout: 240 seconds]
hkaiser has quit [Quit: bye]
parsa has joined #ste||ar
kisaacs has joined #ste||ar
jbjnr has quit [Read error: Connection reset by peer]
jbjnr has joined #ste||ar
kisaacs has quit [Ping timeout: 240 seconds]
jaafar has joined #ste||ar
hkaiser has joined #ste||ar
<diehlpk_work> jbjnr, Any news with the -t parameter issue?
<jbjnr> you need the HAVE_MORE than 64 cpus flag
<jbjnr> AND the MAX_CPOUS set to 256 or whatever
<diehlpk_work> Ok, I used -DHPX_WITH_MAX_CPUS=128 and it is still existend
<jbjnr> set the other one too. heller was mistaken and it is still needed
<jbjnr> if you have more than 256 cpus, set it to 512
<diehlpk_work> CMake Warning:
<diehlpk_work> Manually-specified variables were not used by the project:
<diehlpk_work> HPX_WITH_MAX_CPUS
<K-ballo> there's HPX_WITH_MAX_CPU_COUNT
<diehlpk_work> Ok, will try
<jbjnr> HPX_WITH_MORE_THAN_64_THREADS=ON
<jbjnr> HPX_WITH_MAX_CPUS=choose a big number that is larger than your cpu count
<diehlpk_work> Yes, I was using this one and HPX_WITH_MAX_CPUS was not detected by cmake
<jbjnr> HPX_WITH_MAX_CPU_COUNT as K-ballo says
eschnett has quit [Quit: eschnett]
<diehlpk_work> Maybe this is related to the problem
<diehlpk_work> not used [-Wunused-function]
<diehlpk_work> static mask_type get_full_machine_mask(std::size_t num_threads)
<jbjnr> yes, just a warning, ignore it for now
<jbjnr> when cpus is <64 we use an int, when >64 a bitset, some cruft needs tidying
<diehlpk_work> Ok, now the code from master does not compile
<diehlpk_work> using is_executor_parameters = execution::detail::is_executor_parameters<T>;
<diehlpk_work> ececution does not name a type
<hkaiser> diehlpk_work: yes, that's expected - master is currently broken :/
<hkaiser> I'm working on a fix as we speak
<diehlpk_work> Ok, let me know when I should test it again
<hkaiser> will do
EverYoung has joined #ste||ar
eschnett has joined #ste||ar
jaafar has quit [Ping timeout: 258 seconds]
denis_blank has joined #ste||ar
kisaacs has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
kisaacs has quit [Ping timeout: 240 seconds]
EverYoung has quit [Ping timeout: 258 seconds]
EverYoung has joined #ste||ar
parsa has joined #ste||ar
jaafar has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
aserio has quit [Ping timeout: 258 seconds]
parsa has joined #ste||ar
kisaacs has joined #ste||ar
denis_blank2 has joined #ste||ar
denis_blank has quit [Ping timeout: 240 seconds]
parsa has quit [Quit: Zzzzzzzzzzzz]
kisaacs has quit [Ping timeout: 248 seconds]
aserio has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
aserio1 is now known as aserio
denis_blank has joined #ste||ar
denis_blank2 has quit [Ping timeout: 248 seconds]
<github> [hpx] biddisco pushed 1 new commit to cscs2017: https://git.io/vdltF
<github> hpx/cscs2017 2a8121c John Biddiscombe: Fix a couple of deprecated c++ features
<K-ballo> WOW!
<K-ballo> how did I never saw those?!
<K-ballo> *see
<K-ballo> beware with using namespace std::placeholders; at global scope, boost.bind may conflict
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 255 seconds]
aserio1 is now known as aserio
<zao> One of my users beat HPX in waste of space today - MPI implementation pushed around 6.6 MiB/s of diagnostic output through Slurm and out onto Lustre.
<zao> They pushed around 2.1 TiB of data before they noticed because quota command told them they were half full :)
jbjnr_ has joined #ste||ar
<jbjnr_> K-ballo: that placeholders is a single example and I though it was safe there, but I can move it inside the scope block.
<K-ballo> boost.bind defines its placeholders in the global namespace, so it's problematic if it were to get included
<K-ballo> bad boost.bind
<zao> Meanwhile, the twitter peanut gallery laughs at the verbosity of std::placeholders::, ignorant.
<jbjnr_> nasty
parsa has joined #ste||ar
rod_t has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
denis_blank has quit [Quit: denis_blank]
<hkaiser> jbjnr_: the branch fixing_execution_parameters looks good so far - I think that will solve the problems on master
<hkaiser> jbjnr_: I'll wait for the tests to finish cycling and will create a PR
<jbjnr_> ok - do I merge it to my master and try it out
<jbjnr_> is it just compilation fixes - or did you also solve hello world strangeness
<jbjnr_> and thanks very much for doing this
<hkaiser> jbjnr_: I have not looked into the hello_world strangeness yet, but it should fix most of the test failures we're seeing as well
<jbjnr_> great
<hkaiser> there is just the problem heller introduced a while back - no idea how to fix this yet - so some of the tests will still fail :/
<jbjnr_> did you push already? I see nothing new
<hkaiser> I pushed 3 hours ago - was waiting for the tests
<jbjnr_> ok ta
parsa has joined #ste||ar
<K-ballo> wow, tests.unit.parallel is taking as long as the entire tests.unit used to take last I looked at times
<jbjnr_> the partitioned vector stuff is outrageously slow
<jbjnr_> I think we need a specil flag to enable it and have it off by default
<diehlpk_work> hkaiser, Is this rlated to broken master? /usr/local/include/hpx/lcos/future.hpp:813:11: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'hpx::lcos::future<boost::range_detail::integer_iterator<int> >' to 'const hpx::lcos::future<void> &' for 1st argument
<diehlpk_work> class future : public detail::future_base<future<R>, R>
jaafar has quit [Ping timeout: 240 seconds]
kisaacs has joined #ste||ar
EverYoun_ has joined #ste||ar
eschnett has quit [Quit: eschnett]
EverYoung has quit [Ping timeout: 255 seconds]
aserio has quit [Quit: aserio]
jaafar has joined #ste||ar
<diehlpk_work> Here is the interview from the talk last week
<heller> hkaiser: I'll look into them tomorrow, I guess.
<heller> We always have the option to revert PRs...
kisaacs has quit [Ping timeout: 258 seconds]
<hkaiser> heller: other things may already rely on those changes, not sure if its revertable
<heller> Sure
<heller> It boils down that we need to test our PRs more thoroughly...
<heller> I just didn't have the time to look into it.
<heller> I didn't merge it :p
hkaiser has quit [Quit: bye]
kisaacs has joined #ste||ar
<jbjnr_> heller: who did? was it me?
<jbjnr_> have I brought all this pain upon myself
<heller> I don't know who it was
<heller> We should just be more responsible... Instead we keep on pushing broken stuff...
EverYoun_ has quit [Remote host closed the connection]
<heller> Before cleaning up the mess
EverYoung has joined #ste||ar
kisaacs has quit [Ping timeout: 240 seconds]
kisaacs has joined #ste||ar
<heller> And when you clean up, you get asked to not commit to master directly...
<heller> Anyways... I'll be back tomorrow
kisaacs has quit [Ping timeout: 240 seconds]
kisaacs has joined #ste||ar
kisaacs has quit [Ping timeout: 248 seconds]
hkaiser has joined #ste||ar
<github> [hpx] biddisco pushed 1 new commit to fixing_execution_parameters: https://git.io/vdluy
<github> hpx/fixing_execution_parameters 7044c4d John Biddiscombe: Fix a bitset/serialization compile error for test
<jbjnr_> hkaiser: just pushed s tiny commit to your branch
<jbjnr_> ah. there it is
<hkaiser> ok, cool - thanks
<hkaiser> wasn
<hkaiser> t aware there was a problem
<hkaiser> why is that needed?
<hkaiser> jbjnr_: there is no bitset used anywhere
<jbjnr_> I'm using CPUs>64 so bitset is being puled in somewhere
<jbjnr_> not sure why that test failed
<hkaiser> bu then this #include is required somewhere else not in that test
<jbjnr_> (did not look deep, just want a clean build)
<jbjnr_> ^correct
<hkaiser> ok, will investigate
<jbjnr_> partitioned_vector_transform_scan.cpp has a lot of warnings btw
<jbjnr_> remove my commit if you find the right place
<hkaiser> ok
<hkaiser> probaly the target_allocator or somesuch
<jbjnr_> tests.unit.parallel : 87% tests passed, 28 tests failed out of 210
<jbjnr_> better
<hkaiser> jbjnr_: can yo ugive th elist of failing tests, pls?
eschnett has joined #ste||ar
<jbjnr_> thank again. I gotta go.
<jbjnr_> goodnight all
jbjnr_ has quit [Quit: ChatZilla 0.9.93 [Firefox 55.0.2/20170820225132]]
<diehlpk_work> Bye
kisaacs has joined #ste||ar
<heller> Those are the ones related to the pool changes, I guess?
<hkaiser> : hellersome of them
kisaacs has quit [Ping timeout: 258 seconds]
<heller> Ok
<github> [hpx] biddisco pushed 3 new commits to cscs2017: https://git.io/vdl2m
<github> hpx/cscs2017 396bc49 John Biddiscombe: Merge remote-tracking branch 'stellar/fixing_execution_parameters' into cscs2017
<github> hpx/cscs2017 5e472b0 John Biddiscombe: Merge remote-tracking branch 'stellar/fixing_execution_parameters' into cscs2017
<github> hpx/cscs2017 8322338 John Biddiscombe: Fix a bitset/serialization compile error for test
jaafar has quit [Ping timeout: 240 seconds]
<heller> hkaiser: let's see, we just need something for the tutorial on Thursday
<jbjnr> heller: I cannot see what I am typing, but the cscs2017 branch is built nd running on daint with cuda using clang
<jbjnr> cublas demo ok
<jbjnr> module fiule reated
<jbjnr> all will be fine
<jbjnr> bye
<hkaiser> heller: things will be fin eby Thursday, I promise
<heller> jbjnr: thanks!
<heller> hkaiser: I'm sure they will be ;)
<heller> I'll join the fun tomorrow
<heller> hkaiser: systems with more than 64 cores seem to become more and more likely. Should we increase it to 512 by default?
zbyerly_ has joined #ste||ar
<hkaiser> heller: I don't think so
<hkaiser> 95% of all machines (if not more) have less than 64 cores/node
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
jaafar has joined #ste||ar
parsa has quit [Ping timeout: 240 seconds]
patg[[w]] has quit [Ping timeout: 258 seconds]
rod_t has left #ste||ar [#ste||ar]
<hkaiser> heller: I have now fixed all test failures except the executor related problem caused by that parent_pool pointer == nullptr
<github> [hpx] hkaiser opened pull request #2928: Fixing execution parameters (master...fixing_execution_parameters) https://git.io/vdloO
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 258 seconds]
EverYoun_ has quit [Ping timeout: 258 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]