#ste||ar on 2017-12-01 — irc logs at irclog.cct.lsu.edu

2017-05-17 13:54 aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:14 EverYoung has quit [Ping timeout: 276 seconds]

00:23 EverYoung has joined #ste||ar

00:29 EverYoun_ has joined #ste||ar

00:30 EverYoun_ has quit [Remote host closed the connection]

00:31 EverYoun_ has joined #ste||ar

00:32 EverYoung has quit [Ping timeout: 260 seconds]

01:05 simbergm has quit [Quit: WeeChat 1.4]

01:18 EverYoun_ has quit [Ping timeout: 240 seconds]

01:22 <github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/vbt6s

01:22 <github> hpx/master cd29421 Hartmut Kaiser: Adding more logging during startup

01:38 EverYoung has joined #ste||ar

01:49 EverYoung has quit [Remote host closed the connection]

01:49 EverYoung has joined #ste||ar

01:56 gedaj has quit [Quit: leaving]

01:57 EverYoung has quit [Ping timeout: 255 seconds]

02:27 K-ballo has quit [Quit: K-ballo]

02:38 eschnett has joined #ste||ar

02:55 ct-clmsn has joined #ste||ar

02:55 ct-clmsn is now known as ct_clmsn

02:55 ct_clmsn is now known as ct_clmsn__

03:13 hkaiser has quit [Quit: bye]

03:45 ct_clmsn__ has quit [Quit: Leaving]

04:14 eschnett has quit [Quit: eschnett]

04:43 nanashi55 has quit [Ping timeout: 240 seconds]

04:45 nanashi55 has joined #ste||ar

05:01 EverYoung has joined #ste||ar

05:06 EverYoung has quit [Ping timeout: 252 seconds]

06:00 EverYoung has joined #ste||ar

06:13 EverYoung has quit [Ping timeout: 246 seconds]

07:25 jaafar has quit [Ping timeout: 240 seconds]

07:28 msimberg has joined #ste||ar

08:09 <jbjnr> heller: yt?

08:09 <heller> jbjnr: hey

08:09 <jbjnr> quick question - just uploading an image - hold on

08:12 <jbjnr> https://pasteboard.co/GWadILQ.png

08:12 <jbjnr> during startup - hpx takes around 9s or so before anything happens. then the code runs and takes a bit less time than that.

08:12 <jbjnr> Do you know why it takes so very long for the runtime to get going - this is one node

08:12 <jbjnr> so no network stuff or anything

08:13 <jbjnr> I didn't spend any time looking into it, but it spoils plots like this when the startup takes longer than the matrix solver

08:14 <heller> jbjnr: ugh

08:14 <heller> 9 seconds?

08:14 <heller> that's strange :/

08:15 <jbjnr> indeed

08:15 <heller> looks like it is mostly doing background work?

08:15 <heller> and one bug task_object::apply

08:15 <jbjnr> it's always been slow to get going - but 9s is a bit iffy when it doesn't look as though anything of value is being done

08:15 <heller> does this always happen?

08:15 <jbjnr> yes

08:15 <jbjnr> I didn't debug ...

08:15 <heller> release build, I assume?

08:16 <jbjnr> this example is relwithdebinfo

08:16 <heller> ok, I never encountered that problem myself

08:16 <heller> obviously...

08:16 <jbjnr> I'm doing some profiling

08:16 <heller> sure, that shouldn't happen though

08:16 <heller> could you check if the same thing happens with a pure release build?

08:16 <jbjnr> ok, don't worry about it now, once the main performance is fixed, I'll spend some time on it

08:17 <jbjnr> I'll checkl

08:17 <jbjnr> but I think yes

08:17 <heller> hmmm

08:17 <heller> then it might be related to the HPX_WITH_NETWORKING=Off setting

08:17 <heller> or apex, maybe?

08:18 <jbjnr> all are possible - I onlky asked in case you already knew

08:18 <jbjnr> if it was a known 'feature'

08:18 <heller> no, never occured to me

08:27 <jbjnr> looking at the trace - I think it is apex

08:27 <jbjnr> I can see many many small tasks that hop from thread to thread, but are 'serial' in total, and they tak 8s or so in total at start

08:27 <jbjnr> I'll look into that later ...

08:28 <jbjnr> I suspec that apex inits stuff for each thread and locks it's output file or something and so each one waits for the others ...

08:40 david_pfander has joined #ste||ar

09:01 <heller> jbjnr: could be, shouldn't take 9 seconds though

09:09 <jbjnr> 9s is too long - I'll do more tests when I'm sure the $SCRATCH filesystem is working without any lags/issues

09:22 <msimberg> heller:

09:22 <msimberg> oops

09:23 <msimberg> anyway, heller, I will then ignore the periodic scheduler unless you give me your blessing to just remove it now

09:47 <heller> msimberg: ignore it for now

09:47 <heller> msimberg: did you run the whole test suite?

09:47 <msimberg> yeah, been running it now, the only ones that seem to fail are ones that also fail on master

09:47 <msimberg> it varies from run to run though

09:52 <heller> *nod*

09:52 <heller> the clang_tidy and fix_wrapper_heap are important fixes ...

09:55 <heller> yet, noone cares

09:55 <heller> except for jbjnr ;)

09:56 <msimberg> mmh, my changes should wait for a green master anyway so that I know I haven't messed something up

09:59 <msimberg> heller: I assume there's not really anything I can do to help with getting master green? all failing tests are somehow being worked on?

10:00 <heller> msimberg: not sure. you need to ask hkaiser, he introduced most of the newer once

10:00 <heller> ones

10:00 <heller> especially the action move count regressions

10:01 <msimberg> mmh, okay

10:06 <github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vbqIM

10:06 <github> hpx/gh-pages 7c1d962 StellarBot: Updating docs

10:17 K-ballo has joined #ste||ar

10:37 <jbjnr> heller: I care!

10:37 <jbjnr> msimberg: waiting for green master might make you an old man :)

10:38 <msimberg> jbjnr: may be, but there won't be a release without a green master

10:38 <jbjnr> as release manager - you can enforce a rule that master is locked, except for merges that fix tests fails

10:38 <msimberg> at least not with a red one

10:39 <msimberg> yeah, that's a very reasonable rule

10:39 <jbjnr> use the hpx-users or devel mailing lists to set a date, after which only fixes for tests will be accepted

10:39 <jbjnr> before we can go back to feature additions etc

10:39 <jbjnr> prior to next release

10:41 <jbjnr> so (say) no merges after 10th dec - unless they address tests failures until dashboard is green - after that we can begin merging fixes and improvements that are neeeded for release etc etc until $DATE - after which master will be locked again prior to a release

10:41 <jbjnr> that kind of thing ....

10:42 <jbjnr> we only need heller and hkaiser to agree to it, and then ....

10:43 <jbjnr> we must also consider removing tests that use features that never pass and are never green and consider those features as unsupported until such time tht they become green agin

10:44 <jbjnr> (we need to start being brutal IMHO) - we can discuss this with adrian this afternoon and see what he thinks.

10:49 <msimberg> jbjnr: I agree with everything you said, I admit I've been too passive so far

10:49 <msimberg> was a bit slow in arranging this call with aserio

10:50 <msimberg> jbjnr: can you remind me, the hpx tutorial will be in march, no?

10:54 <msimberg> jbjnr: btw, I think now is a good time to not allow anything that doesn't fix tests ;)

11:14 <jbjnr> yes march tutorial

11:14 <jbjnr> ok - lets talk to adrian, then from monday we can begin ...

11:15 <jbjnr> (release mid feb at latest I'd say - for march tutorial)

11:17 <msimberg> jbjnr: yeah, came to the same conclusion

11:17 <msimberg> mid feb release, rc end of january, cleanup in january, test fixing in december

11:17 <msimberg> roughly

11:26 <heller> jbjnr: I more than agree

11:27 <heller> msimberg: sounds like a plan

11:27 <heller> please enforce this rule ;)(

11:27 <heller> I have two PRs that fix stuff

11:27 <heller> they really block me ...

11:28 <heller> and I think we should switch to a policy such that noone is allowed to merge their own PRs

11:31 <K-ballo> that policy in itself would probably effectively halt development

11:32 <K-ballo> I haven't merged a PR, mine or otherwise, in months

11:34 <K-ballo> there's a lot more people writing PRs than there are merging PRs.. basically is what, one and two halfs?

11:43 <heller> well, the main reason I don't merge PRs at the moment is because there are so many tests failing

11:43 <heller> every other PR merged leads to new test failures

11:43 <heller> and it looks like no one really cares

11:45 hkaiser has joined #ste||ar

11:49 <heller> hkaiser: I would really like to move on with #3007 and #2982. Those PR fix rather important bugs

11:49 <heller> and good morning ;)

11:49 <hkaiser> heller: see pm, pls

11:57 <zao> Mood gorning.

12:42 hkaiser has quit [Read error: Connection reset by peer]

12:45 hkaiser has joined #ste||ar

13:30 <heller> grrr, inspect

13:31 <K-ballo> that reminds me, I'll drop my inspect-first PR.. we'll have to wait for the workflow based ci

13:32 <K-ballo> I cannot get circle to even consider artifacts being generated during the build

13:32 <K-ballo> or rather, in the case of a build failure

13:32 <heller> ok

13:33 <heller> I stalled the workflow based one, since it it also runs the tests

13:33 <heller> it doesn't make sense to merge it to master if those keep failing

13:38 <msimberg> heller: what's actually holding back moving to circleci 2?

13:39 <heller> msimberg: I'd like to run the tests on circleci, but then, every PR is being flagged as failed

13:39 <heller> well, wouldn't be too bad, I think

13:39 <msimberg> that would be great

13:40 <heller> well, every PR including the move to circle2 one ;)

13:40 <msimberg> but you mean with circleci 2 we would be able to run all the tests? and with 1 we can't?

13:40 <zao> I really hate that my test machine is just yawning.

13:40 <heller> we should be able to do it with circle1 as well

13:40 <msimberg> okay, but that would be acceptable

13:41 <heller> but with the workflow pipeline, the different steps can be easily overlapped

13:42 <heller> in an ideal case, we'd be able to reduce the test time to about 1:30

13:42 <msimberg> but I think that's something that should tested/broken as soon as possible, so that we can get on with not merging broken prs

13:42 <zao> 1h30m or 1m30s?

13:42 <heller> 1h30m

13:42 <heller> including running the tests

13:43 <msimberg> huh, builds + tests?

13:43 <heller> yeah, running in parallel, on 4 builders

13:43 <zao> My biggest pain point for the tests is the ones that time out or are disproportionally long.

13:43 <msimberg> ah, that would be very nice

13:43 <heller> https://circleci.com/workflow-run/b23ae4fc-685f-4a3c-b8c9-31b8a8c999ef

13:43 <heller> as an example

13:43 <zao> Can't really go much below 100s timeout before it starts culling genuinely long tests.

13:43 <heller> ah, 1:30 is just for building, as it seems

13:44 <heller> msimberg: i'd be happy if you rebase those commits to latest master and prepare the PR

13:45 <heller> we could even think about dropping the actual test running as a stop-gap measure

13:45 <heller> or only run selected tests

13:45 <heller> ha, I am wrong, that workflow ran the tests... just no the header tests

13:47 <msimberg> heller: yes! I'll gladly do that

13:51 <msimberg> heller: it's just the config file right? is there otherwise something that you'd still like to change or clean up there or is it just a matter of making a pr?

13:54 <github> [hpx] msimberg force-pushed circle_2 from a0cdcfc to c95bc59: https://git.io/vdKfd

13:54 <github> hpx/circle_2 c95bc59 Thomas Heller: Switching to CircleCI 2.0...

13:55 hkaiser has quit [Ping timeout: 268 seconds]

13:58 <heller> msimberg: it's just the config file, yeah. the clang-tidy changes are obsolete once the clang_tidy branch is merged

13:59 <heller> msimberg: and then it is about to fix the tests ;)

14:00 <msimberg> heller: okay, I guess I can leave it in there... the commented out stuff at the bottom of the config file is also just the old config file? I can remove that?

14:01 <heller> yes, that can be removed

14:02 <msimberg> yeah, sure, then it's "just" fixing tests ;) but at least we have a much better view of when things break

14:02 <heller> yes

14:03 <heller> and inspect before everything else!

14:04 <github> [hpx] msimberg pushed 1 new commit to circle_2: https://git.io/vbq2k

14:04 <github> hpx/circle_2 6cabb00 Mikael Simberg: Clean up CircleCI config file

14:05 <msimberg> heller: so I guess if tests fail merging is completely blocked? or can that be overriden? all tests can't be fixed in one PR...

14:05 <heller> it can be overriden by admins

14:06 <heller> such as me and hartmut

14:06 <heller> and john

14:06 <heller> are you not one of them?

14:06 <msimberg> okay, good, but then there should be no reason to hold this back assuming it works as advertised ;)

14:06 <heller> let's see :P

14:06 <msimberg> I have some kind of rights as well, don't know if there are different levels

14:06 <heller> last time I checked, it failed to build the parallel unit tests

14:07 <heller> mostly because of partitioned_vector

14:07 <heller> personally, I'd move partitioned_vector to the "not tested" stage...

14:07 <heller> YMMV

14:07 <msimberg> is it new for 1.1? or was it there before as well?

14:08 <heller> essentially all of the distributed container and algorithms until the compile time and memory usage problem is fixed.

14:08 <heller> it has been there since a while

14:08 <jbjnr> I'd like to see partitioned_vector made a lot more optional - it bogs everything down

14:08 <heller> jbjnr: exactly

14:08 <jbjnr> couldn't we make a cmake option to have it or not...

14:08 <heller> probably ;)

14:08 <heller> go for it!

14:08 <msimberg> should we?

14:08 <heller> it would make sense for the circle-ci runner

14:09 <heller> let's see how the current tests go

14:09 <jbjnr> msimberg: should we what?

14:09 <heller> if they still fail, i'd say disable them

14:09 <msimberg> jbjnr: of course we can, but should we? ;)

14:10 <jbjnr> I was not sure what you're question referred to. I presume my PV optional statement

14:10 <heller> I am not sure anyone uses them anyways

14:10 <msimberg> yeah, exactly that

14:10 <jbjnr> I'd say yes. make it opt-in

14:11 <jbjnr> exacly - it's an experimental feature used by LSU only, opt-in would work for me

14:11 <heller> let's discuss this with hartmut later, he might heavily object because some of his PhDs needs it

14:11 <msimberg> my follow up was going to be that we decide either yes or no as soon as possible, so that it's not left hanging

14:11 <heller> open an issue

14:12 <jbjnr> HPX_WITH_DISTRIBUTED_CONTAINERS ... etc

14:13 <jbjnr> and if they are disabled, the so are the algorothms that use them ...

14:13 <heller> yes

14:15 <heller> the downside ... code that is being checked in for partitioned_vector will be left unchecked. both compilation and testing

14:15 <heller> so we essentially end up with dead and broken code

14:17 <jbjnr> well we will need at least one build that has it enabled

14:17 <jbjnr> on buildbot we can do that yes?

14:19 <heller> yes

14:19 <msimberg> hmm, also shouldn't whatever is broken be fixed first? then whatever PR fixes the problems should enable the option and test on circleci

14:20 <heller> yeah

14:21 <heller> https://circleci.com/workflow-run/70bf9db2-e838-433e-8412-b0bd70c98606

14:22 <heller> that's the current workflow

14:23 <msimberg> shuold ctest have the --output-on-failure flag? or does the output go into some file?

14:23 <heller> potentially, CircleCI supports junit files ... but that's not working with workflows at the moment :/

14:23 <heller> https://8818-4455628-gh.circle-artifacts.com/0/unit/util/Test.xml

14:24 <heller> might be cool if we could convert it to html though

14:24 <heller> right, that was another problem ... the test output

14:25 <msimberg> that's already better than nothing

14:26 <heller> but yeah ... having the --output-on-failure would be a good thing until the test parsing works

14:28 hkaiser has joined #ste||ar

14:29 mcopik has joined #ste||ar

14:29 <msimberg> it shouldn't hurt

14:30 <msimberg> what actually generates the xml files? doing --output-on-failure won't disable that or something?

14:32 <heller> nope, it won't

14:32 <heller> it doesn't hurt, yes, lets' just enable it

14:33 <msimberg> good, adding it

14:35 <hkaiser> heller: have you heard back from circleci wrt more resources?

14:35 <github> [hpx] msimberg pushed 1 new commit to circle_2: https://git.io/vbqw3

14:35 <github> hpx/circle_2 48017bd Mikael Simberg: Add --output-on-failure flag to CircleCI ctest runs

14:38 eschnett has joined #ste||ar

14:53 aserio has joined #ste||ar

15:07 <heller> hkaiser: yeah, they won't give them away for free

15:09 <hkaiser> stupid

15:09 <heller> yeah...

15:53 eschnett has quit [Quit: eschnett]

16:03 <aserio> msimberg: I am ready if you are

16:04 <msimberg> aserio: yep, ready, see you on skype

16:17 eschnett has joined #ste||ar

16:40 EverYoung has joined #ste||ar

16:41 EverYoung has quit [Remote host closed the connection]

16:42 EverYoung has joined #ste||ar

16:54 david_pfander has quit [Quit: david_pfander]

18:05 <github> [hpx] sithhell closed pull request #2982: Clang tidy (master...clang_tidy) https://git.io/vFnB3

18:13 mcopik has quit [Ping timeout: 258 seconds]

18:37 <jbjnr> who can tell me what the difference is between an acceptance token and a personal token on gihub

18:37 <jbjnr> zao: ?

18:39 <jbjnr> ignore that question. I found what I needed to know

18:49 mcopik has joined #ste||ar

19:03 hkaiser has quit [Quit: bye]

19:20 jaafar has joined #ste||ar

19:33 hkaiser has joined #ste||ar

19:40 aserio has quit [Ping timeout: 255 seconds]

20:28 <zao> Great, because I don't know :)

20:29 <zao> For my bot, I just issued an user token with no rights.

20:29 <zao> Enough to query for PRs and commits of public repos, I believe.

20:30 <K-ballo> zao: your bot?

20:30 <zao> K-ballo: The part of my grand soak-testing plan that figures out what to test.

20:30 <zao> By fetching all repo refs, polling github API, and explicit IRC requests.

20:30 <zao> (or something)

20:32 <K-ballo> where is this bot?

20:34 <zao> In pieces on some of my home machines.

20:35 <zao> The goal is to make some use out of the Ryzen 7.

20:36 <zao> So a build of the codebase on a single compiler and Boost version, aiming for a short turnaround from PR commit to tests, and running tests for many iterations to see if they're stable.

20:36 <zao> Which is something I need to build pretty much from scratch, doesn't slot in well with buildbot exactly.

20:37 <zao> Doesn't help that I have no idea what I'm doing when it comes to building such daemons.

20:37 <zao> :)

20:37 <K-ballo> that's the fun part!

20:38 <zao> Been touching node.js for days without throwing up much!

20:39 <zao> I've got singularity figured out enough to have an image with a prebuilt Boost and Clang compilers, that I can mount a source/build tree into and run a build and run tests with network isolation in one container.

20:39 <zao> (so I can run several test suites at once without them colliding on ports and stuff)

20:39 <zao> I've still got to write the runner that starts and monitors such processes, and the service that triggers them and figures out what to build.

20:40 <zao> Intent is to run it on all PRs until they're merged.

20:40 <zao> (also master, but that should be eternally green, shouldn't it? :P)

20:41 <zao> Off to chill with the parents, there's glögg!

20:42 <K-ballo> master is... well... difficult

20:49 <jbjnr> zao: the reason I asked is because I started implementing a bot using pygithub

20:49 <jbjnr> I can query all PRs and then spawn builds for them on a CSCS machine

20:50 <jbjnr> I will add options to set the compiler and boost version etc etc

21:00 <hkaiser> jbjnr: I hope you're not seriously implementing your own github bot/integration :/

21:01 <hkaiser> didn't you want to use some existing CI system?

21:05 <jbjnr> hkaiser: I'm just experimenting - all we need is to build all PR's + master and see the test results. CDash can show the results, the simply python bot can pull the PRs - just want to see hiow easy it is to get a simple bot running

21:06 <jbjnr> partly so that when cscs meets next week, I can show them

21:06 <hkaiser> jbjnr: is building sufficient, shouldn't we also run the tests?

21:06 <jbjnr> yes of course I'm running the tests

21:06 <hkaiser> k

21:06 <jbjnr> I didn't think I needed to say that

21:06 <jbjnr> (this is my spare time)

21:07 <hkaiser> sure ;)

21:07 <hkaiser> just asking

21:10 <jbjnr> next week, cscs has a CI meeting and they will tell us that we can't have this/that/the other because a/b/c ... and it will take 6 months ...

21:10 <hkaiser> nod

21:10 <hkaiser> would it be possible for those results to be visible to everybody?

21:12 <jbjnr> yes, they will go to the cscs cdash server that you can already see

21:12 <jbjnr> the difference will be we will see warnings/build/update

21:12 <jbjnr> etc

21:36 aserio has joined #ste||ar

21:44 eschnett has quit [Quit: eschnett]

22:05 rtohid has joined #ste||ar

22:13 <zao> Got any spare free time I can steal? :)

22:20 eschnett has joined #ste||ar

22:45 aserio has quit [Quit: aserio]

22:53 aserio has joined #ste||ar

23:01 aserio has quit [Quit: aserio]

23:02 rtohid has left #ste||ar [#ste||ar]

23:13 eschnett has quit [Quit: eschnett]

23:17 mcopik has quit [Ping timeout: 248 seconds]

23:30 eschnett has joined #ste||ar