aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
EverYoung has quit [Ping timeout: 276 seconds]
EverYoung has joined #ste||ar
EverYoun_ has joined #ste||ar
EverYoun_ has quit [Remote host closed the connection]
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 260 seconds]
simbergm has quit [Quit: WeeChat 1.4]
EverYoun_ has quit [Ping timeout: 240 seconds]
<github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/vbt6s
<github> hpx/master cd29421 Hartmut Kaiser: Adding more logging during startup
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
gedaj has quit [Quit: leaving]
EverYoung has quit [Ping timeout: 255 seconds]
K-ballo has quit [Quit: K-ballo]
eschnett has joined #ste||ar
ct-clmsn has joined #ste||ar
ct-clmsn is now known as ct_clmsn
ct_clmsn is now known as ct_clmsn__
hkaiser has quit [Quit: bye]
ct_clmsn__ has quit [Quit: Leaving]
eschnett has quit [Quit: eschnett]
nanashi55 has quit [Ping timeout: 240 seconds]
nanashi55 has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 252 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
jaafar has quit [Ping timeout: 240 seconds]
msimberg has joined #ste||ar
<jbjnr> heller: yt?
<heller> jbjnr: hey
<jbjnr> quick question - just uploading an image - hold on
<jbjnr> during startup - hpx takes around 9s or so before anything happens. then the code runs and takes a bit less time than that.
<jbjnr> Do you know why it takes so very long for the runtime to get going - this is one node
<jbjnr> so no network stuff or anything
<jbjnr> I didn't spend any time looking into it, but it spoils plots like this when the startup takes longer than the matrix solver
<heller> jbjnr: ugh
<heller> 9 seconds?
<heller> that's strange :/
<jbjnr> indeed
<heller> looks like it is mostly doing background work?
<heller> and one bug task_object::apply
<jbjnr> it's always been slow to get going - but 9s is a bit iffy when it doesn't look as though anything of value is being done
<heller> does this always happen?
<jbjnr> yes
<jbjnr> I didn't debug ...
<heller> release build, I assume?
<jbjnr> this example is relwithdebinfo
<heller> ok, I never encountered that problem myself
<heller> obviously...
<jbjnr> I'm doing some profiling
<heller> sure, that shouldn't happen though
<heller> could you check if the same thing happens with a pure release build?
<jbjnr> ok, don't worry about it now, once the main performance is fixed, I'll spend some time on it
<jbjnr> I'll checkl
<jbjnr> but I think yes
<heller> hmmm
<heller> then it might be related to the HPX_WITH_NETWORKING=Off setting
<heller> or apex, maybe?
<jbjnr> all are possible - I onlky asked in case you already knew
<jbjnr> if it was a known 'feature'
<heller> no, never occured to me
<jbjnr> looking at the trace - I think it is apex
<jbjnr> I can see many many small tasks that hop from thread to thread, but are 'serial' in total, and they tak 8s or so in total at start
<jbjnr> I'll look into that later ...
<jbjnr> I suspec that apex inits stuff for each thread and locks it's output file or something and so each one waits for the others ...
david_pfander has joined #ste||ar
<heller> jbjnr: could be, shouldn't take 9 seconds though
<jbjnr> 9s is too long - I'll do more tests when I'm sure the $SCRATCH filesystem is working without any lags/issues
<msimberg> heller:
<msimberg> oops
<msimberg> anyway, heller, I will then ignore the periodic scheduler unless you give me your blessing to just remove it now
<heller> msimberg: ignore it for now
<heller> msimberg: did you run the whole test suite?
<msimberg> yeah, been running it now, the only ones that seem to fail are ones that also fail on master
<msimberg> it varies from run to run though
<heller> *nod*
<heller> the clang_tidy and fix_wrapper_heap are important fixes ...
<heller> yet, noone cares
<heller> except for jbjnr ;)
<msimberg> mmh, my changes should wait for a green master anyway so that I know I haven't messed something up
<msimberg> heller: I assume there's not really anything I can do to help with getting master green? all failing tests are somehow being worked on?
<heller> msimberg: not sure. you need to ask hkaiser, he introduced most of the newer once
<heller> ones
<heller> especially the action move count regressions
<msimberg> mmh, okay
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vbqIM
<github> hpx/gh-pages 7c1d962 StellarBot: Updating docs
K-ballo has joined #ste||ar
<jbjnr> heller: I care!
<jbjnr> msimberg: waiting for green master might make you an old man :)
<msimberg> jbjnr: may be, but there won't be a release without a green master
<jbjnr> as release manager - you can enforce a rule that master is locked, except for merges that fix tests fails
<msimberg> at least not with a red one
<msimberg> yeah, that's a very reasonable rule
<jbjnr> use the hpx-users or devel mailing lists to set a date, after which only fixes for tests will be accepted
<jbjnr> before we can go back to feature additions etc
<jbjnr> prior to next release
<jbjnr> so (say) no merges after 10th dec - unless they address tests failures until dashboard is green - after that we can begin merging fixes and improvements that are neeeded for release etc etc until $DATE - after which master will be locked again prior to a release
<jbjnr> that kind of thing ....
<jbjnr> we only need heller and hkaiser to agree to it, and then ....
<jbjnr> we must also consider removing tests that use features that never pass and are never green and consider those features as unsupported until such time tht they become green agin
<jbjnr> (we need to start being brutal IMHO) - we can discuss this with adrian this afternoon and see what he thinks.
<msimberg> jbjnr: I agree with everything you said, I admit I've been too passive so far
<msimberg> was a bit slow in arranging this call with aserio
<msimberg> jbjnr: can you remind me, the hpx tutorial will be in march, no?
<msimberg> jbjnr: btw, I think now is a good time to not allow anything that doesn't fix tests ;)
<jbjnr> yes march tutorial
<jbjnr> ok - lets talk to adrian, then from monday we can begin ...
<jbjnr> (release mid feb at latest I'd say - for march tutorial)
<msimberg> jbjnr: yeah, came to the same conclusion
<msimberg> mid feb release, rc end of january, cleanup in january, test fixing in december
<msimberg> roughly
<heller> jbjnr: I more than agree
<heller> msimberg: sounds like a plan
<heller> please enforce this rule ;)(
<heller> I have two PRs that fix stuff
<heller> they really block me ...
<heller> and I think we should switch to a policy such that noone is allowed to merge their own PRs
<K-ballo> that policy in itself would probably effectively halt development
<K-ballo> I haven't merged a PR, mine or otherwise, in months
<K-ballo> there's a lot more people writing PRs than there are merging PRs.. basically is what, one and two halfs?
<heller> well, the main reason I don't merge PRs at the moment is because there are so many tests failing
<heller> every other PR merged leads to new test failures
<heller> and it looks like no one really cares
hkaiser has joined #ste||ar
<heller> hkaiser: I would really like to move on with #3007 and #2982. Those PR fix rather important bugs
<heller> and good morning ;)
<hkaiser> heller: see pm, pls
<zao> Mood gorning.
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<heller> grrr, inspect
<K-ballo> that reminds me, I'll drop my inspect-first PR.. we'll have to wait for the workflow based ci
<K-ballo> I cannot get circle to even consider artifacts being generated during the build
<K-ballo> or rather, in the case of a build failure
<heller> ok
<heller> I stalled the workflow based one, since it it also runs the tests
<heller> it doesn't make sense to merge it to master if those keep failing
<msimberg> heller: what's actually holding back moving to circleci 2?
<heller> msimberg: I'd like to run the tests on circleci, but then, every PR is being flagged as failed
<heller> well, wouldn't be too bad, I think
<msimberg> that would be great
<heller> well, every PR including the move to circle2 one ;)
<msimberg> but you mean with circleci 2 we would be able to run all the tests? and with 1 we can't?
<zao> I really hate that my test machine is just yawning.
<heller> we should be able to do it with circle1 as well
<msimberg> okay, but that would be acceptable
<heller> but with the workflow pipeline, the different steps can be easily overlapped
<heller> in an ideal case, we'd be able to reduce the test time to about 1:30
<msimberg> but I think that's something that should tested/broken as soon as possible, so that we can get on with not merging broken prs
<zao> 1h30m or 1m30s?
<heller> 1h30m
<heller> including running the tests
<msimberg> huh, builds + tests?
<heller> yeah, running in parallel, on 4 builders
<zao> My biggest pain point for the tests is the ones that time out or are disproportionally long.
<msimberg> ah, that would be very nice
<heller> as an example
<zao> Can't really go much below 100s timeout before it starts culling genuinely long tests.
<heller> ah, 1:30 is just for building, as it seems
<heller> msimberg: i'd be happy if you rebase those commits to latest master and prepare the PR
<heller> we could even think about dropping the actual test running as a stop-gap measure
<heller> or only run selected tests
<heller> ha, I am wrong, that workflow ran the tests... just no the header tests
<msimberg> heller: yes! I'll gladly do that
<msimberg> heller: it's just the config file right? is there otherwise something that you'd still like to change or clean up there or is it just a matter of making a pr?
<github> [hpx] msimberg force-pushed circle_2 from a0cdcfc to c95bc59: https://git.io/vdKfd
<github> hpx/circle_2 c95bc59 Thomas Heller: Switching to CircleCI 2.0...
hkaiser has quit [Ping timeout: 268 seconds]
<heller> msimberg: it's just the config file, yeah. the clang-tidy changes are obsolete once the clang_tidy branch is merged
<heller> msimberg: and then it is about to fix the tests ;)
<msimberg> heller: okay, I guess I can leave it in there... the commented out stuff at the bottom of the config file is also just the old config file? I can remove that?
<heller> yes, that can be removed
<msimberg> yeah, sure, then it's "just" fixing tests ;) but at least we have a much better view of when things break
<heller> yes
<heller> and inspect before everything else!
<github> [hpx] msimberg pushed 1 new commit to circle_2: https://git.io/vbq2k
<github> hpx/circle_2 6cabb00 Mikael Simberg: Clean up CircleCI config file
<msimberg> heller: so I guess if tests fail merging is completely blocked? or can that be overriden? all tests can't be fixed in one PR...
<heller> it can be overriden by admins
<heller> such as me and hartmut
<heller> and john
<heller> are you not one of them?
<msimberg> okay, good, but then there should be no reason to hold this back assuming it works as advertised ;)
<heller> let's see :P
<msimberg> I have some kind of rights as well, don't know if there are different levels
<heller> last time I checked, it failed to build the parallel unit tests
<heller> mostly because of partitioned_vector
<heller> personally, I'd move partitioned_vector to the "not tested" stage...
<heller> YMMV
<msimberg> is it new for 1.1? or was it there before as well?
<heller> essentially all of the distributed container and algorithms until the compile time and memory usage problem is fixed.
<heller> it has been there since a while
<jbjnr> I'd like to see partitioned_vector made a lot more optional - it bogs everything down
<heller> jbjnr: exactly
<jbjnr> couldn't we make a cmake option to have it or not...
<heller> probably ;)
<heller> go for it!
<msimberg> should we?
<heller> it would make sense for the circle-ci runner
<heller> let's see how the current tests go
<jbjnr> msimberg: should we what?
<heller> if they still fail, i'd say disable them
<msimberg> jbjnr: of course we can, but should we? ;)
<jbjnr> I was not sure what you're question referred to. I presume my PV optional statement
<heller> I am not sure anyone uses them anyways
<msimberg> yeah, exactly that
<jbjnr> I'd say yes. make it opt-in
<jbjnr> exacly - it's an experimental feature used by LSU only, opt-in would work for me
<heller> let's discuss this with hartmut later, he might heavily object because some of his PhDs needs it
<msimberg> my follow up was going to be that we decide either yes or no as soon as possible, so that it's not left hanging
<heller> open an issue
<jbjnr> HPX_WITH_DISTRIBUTED_CONTAINERS ... etc
<jbjnr> and if they are disabled, the so are the algorothms that use them ...
<heller> yes
<heller> the downside ... code that is being checked in for partitioned_vector will be left unchecked. both compilation and testing
<heller> so we essentially end up with dead and broken code
<jbjnr> well we will need at least one build that has it enabled
<jbjnr> on buildbot we can do that yes?
<heller> yes
<msimberg> hmm, also shouldn't whatever is broken be fixed first? then whatever PR fixes the problems should enable the option and test on circleci
<heller> yeah
<heller> that's the current workflow
<msimberg> shuold ctest have the --output-on-failure flag? or does the output go into some file?
<heller> potentially, CircleCI supports junit files ... but that's not working with workflows at the moment :/
<heller> might be cool if we could convert it to html though
<heller> right, that was another problem ... the test output
<msimberg> that's already better than nothing
<heller> but yeah ... having the --output-on-failure would be a good thing until the test parsing works
hkaiser has joined #ste||ar
mcopik has joined #ste||ar
<msimberg> it shouldn't hurt
<msimberg> what actually generates the xml files? doing --output-on-failure won't disable that or something?
<heller> nope, it won't
<heller> it doesn't hurt, yes, lets' just enable it
<msimberg> good, adding it
<hkaiser> heller: have you heard back from circleci wrt more resources?
<github> [hpx] msimberg pushed 1 new commit to circle_2: https://git.io/vbqw3
<github> hpx/circle_2 48017bd Mikael Simberg: Add --output-on-failure flag to CircleCI ctest runs
eschnett has joined #ste||ar
aserio has joined #ste||ar
<heller> hkaiser: yeah, they won't give them away for free
<hkaiser> stupid
<heller> yeah...
eschnett has quit [Quit: eschnett]
<aserio> msimberg: I am ready if you are
<msimberg> aserio: yep, ready, see you on skype
eschnett has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
david_pfander has quit [Quit: david_pfander]
<github> [hpx] sithhell closed pull request #2982: Clang tidy (master...clang_tidy) https://git.io/vFnB3
mcopik has quit [Ping timeout: 258 seconds]
<jbjnr> who can tell me what the difference is between an acceptance token and a personal token on gihub
<jbjnr> zao: ?
<jbjnr> ignore that question. I found what I needed to know
mcopik has joined #ste||ar
hkaiser has quit [Quit: bye]
jaafar has joined #ste||ar
hkaiser has joined #ste||ar
aserio has quit [Ping timeout: 255 seconds]
<zao> Great, because I don't know :)
<zao> For my bot, I just issued an user token with no rights.
<zao> Enough to query for PRs and commits of public repos, I believe.
<K-ballo> zao: your bot?
<zao> K-ballo: The part of my grand soak-testing plan that figures out what to test.
<zao> By fetching all repo refs, polling github API, and explicit IRC requests.
<zao> (or something)
<K-ballo> where is this bot?
<zao> In pieces on some of my home machines.
<zao> The goal is to make some use out of the Ryzen 7.
<zao> So a build of the codebase on a single compiler and Boost version, aiming for a short turnaround from PR commit to tests, and running tests for many iterations to see if they're stable.
<zao> Which is something I need to build pretty much from scratch, doesn't slot in well with buildbot exactly.
<zao> Doesn't help that I have no idea what I'm doing when it comes to building such daemons.
<zao> :)
<K-ballo> that's the fun part!
<zao> Been touching node.js for days without throwing up much!
<zao> I've got singularity figured out enough to have an image with a prebuilt Boost and Clang compilers, that I can mount a source/build tree into and run a build and run tests with network isolation in one container.
<zao> (so I can run several test suites at once without them colliding on ports and stuff)
<zao> I've still got to write the runner that starts and monitors such processes, and the service that triggers them and figures out what to build.
<zao> Intent is to run it on all PRs until they're merged.
<zao> (also master, but that should be eternally green, shouldn't it? :P)
<zao> Off to chill with the parents, there's glögg!
<K-ballo> master is... well... difficult
<jbjnr> zao: the reason I asked is because I started implementing a bot using pygithub
<jbjnr> I can query all PRs and then spawn builds for them on a CSCS machine
<jbjnr> I will add options to set the compiler and boost version etc etc
<hkaiser> jbjnr: I hope you're not seriously implementing your own github bot/integration :/
<hkaiser> didn't you want to use some existing CI system?
<jbjnr> hkaiser: I'm just experimenting - all we need is to build all PR's + master and see the test results. CDash can show the results, the simply python bot can pull the PRs - just want to see hiow easy it is to get a simple bot running
<jbjnr> partly so that when cscs meets next week, I can show them
<hkaiser> jbjnr: is building sufficient, shouldn't we also run the tests?
<jbjnr> yes of course I'm running the tests
<hkaiser> k
<jbjnr> I didn't think I needed to say that
<jbjnr> (this is my spare time)
<hkaiser> sure ;)
<hkaiser> just asking
<jbjnr> next week, cscs has a CI meeting and they will tell us that we can't have this/that/the other because a/b/c ... and it will take 6 months ...
<hkaiser> nod
<hkaiser> would it be possible for those results to be visible to everybody?
<jbjnr> yes, they will go to the cscs cdash server that you can already see
<jbjnr> the difference will be we will see warnings/build/update
<jbjnr> etc
aserio has joined #ste||ar
eschnett has quit [Quit: eschnett]
rtohid has joined #ste||ar
<zao> Got any spare free time I can steal? :)
eschnett has joined #ste||ar
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
aserio has quit [Quit: aserio]
rtohid has left #ste||ar [#ste||ar]
eschnett has quit [Quit: eschnett]
mcopik has quit [Ping timeout: 248 seconds]
eschnett has joined #ste||ar