aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
hkaiser has joined #ste||ar
fane_faiz has joined #ste||ar
parsa[[[w]]] has quit [Read error: Connection reset by peer]
parsa[[[w]]] has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
mcopik has quit [Ping timeout: 248 seconds]
hkaiser has quit [Quit: bye]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
K-ballo has quit [Quit: K-ballo]
nanashi55 has quit [Ping timeout: 255 seconds]
nanashi55 has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
jaafar_ has quit [Ping timeout: 255 seconds]
Smasher has joined #ste||ar
fane_faiz has quit [Ping timeout: 248 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
EverYoung has joined #ste||ar
fane_faiz has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
<jbjnr> nobody is merging PR's - I want to test pycicle more!
<heller> ;)
<heller> jbjnr: make it report the status, and I'll volunteer
david_pfander has joined #ste||ar
<heller> should be relatively straight forward
<jbjnr> you mean report the status back to github?
<jbjnr> yes. I missed it before, but there is a thing in pygithub to let you do it - however, I need to scrape the xml test files and gather the pass/fail from them.
<jbjnr> got to do some real work today as well, but I might have a go at it
<jbjnr> The problem is of course, that reporting the status is doomed to fail - master has test fails every time, so no PR will ever pass now that tests are actualy run :(
<jbjnr> unless heller fixes all the tests in a PR!
<heller> right
<heller> having a failed status in the PRs is good!
<jbjnr> if you do that - I'll scrape the XML :)
<jbjnr> deal?
<heller> that gives us a better incentive to fix the failing tests
<heller> don't scrape the XML
<heller> report the cdash URL
<heller> exit code == 0 -> success, exit code != 0 -> fail. both cases point to cdash
<jbjnr> I don't understand - please enlighten me
<heller> jbjnr: the github API expects a json containing the state (error, failure, pending or success), the target_url (that is the output of the test, in our case, cdash URL), a description and a context
<jbjnr> my trouble is that the pycicle server is running on machine A, the build/test runs on machine B - I do not know on machine A, what the result was - so I do not know what to send to github in the json
<jbjnr> (unless I scrape the xml - then it becomes easy)
<heller> well
<heller> pycicle doesn't need to know
<heller> you know at the end, right?
<jbjnr> only ctest knows
<heller> couldn't you have some kind of backchannel?
<jbjnr> on daint, ctest is running on a conpute node and cannot send any http/other out
<jbjnr> so on daint - we HAVE to scrape the xml anyway
<heller> oh, good point
<heller> scraping the XML doesn't give us anything, no?
<heller> we need to know the cdash url in any case
<jbjnr> so my thinking is, that if I do that, then I can submit the reuslts to cdash from daint, from machine A, that does have connectivity, and update github status as well.
<heller> so you can't send the results from ctest to the dashboard on daint anyway, no?
<jbjnr> I'm not sure how to generate the cdash URL for 'one' build, but I expect there is a way
<jbjnr> cdash might have a query mode too
<heller> query mode?
<jbjnr> heller: correct, ctest cannot submit from daint
<heller> so how do you do that right now?
<jbjnr> I don't! I'm running on gtreina
<heller> oh
<jbjnr> greina -the com[pute nodes there can see the outside workd
<heller> good
<jbjnr> so ctest_submit just works
<heller> how about we care about daint later?
<jbjnr> ctest_submit, might have a query mode that returns the ID, and then I can generate the URL etc etc (that's what I was gatting at -or maybe you can query cdash to get stuff)
<jbjnr> if I get daint to work, then we have unlimited build/test capability. greina is smaller than rostam
<jbjnr> also tave/BGQ/etc, if I add xml scrape feature, then we can run anywhere and all will be super-dooper
<heller> ok
<heller> yeha!
<jbjnr> also, if you want to send me a config for FAU machine, we can test that from here too, all I need is ssh access
<heller> nice
<heller> well, I could just start a pycicle runner myself, no?
<jbjnr> yes, that would work, but then you might spawn duplicate builds for same config - it really depends on how we update github status I expect. onje machine or lots - does it matter?
<jbjnr> actually, you running your own instance would be great
<jbjnr> and one at lsu
<heller> yes, I think that would be best
<heller> we might still need a master that distributes for the different configs
<jbjnr> all doing differnt configs, then it's only the github status that needs work
<jbjnr> when N machines pass and one fails ....
<heller> right
<heller> all doing different configs would be the best, indeed
<heller> I have to do real work as well
<jbjnr> lsu can manage rostam, I can do ours, and you do yours, then we never run on each others machines and all are different enough that the coverage is good
<heller> (I'd love to hack on those things as well...)
<jbjnr> takes 5 minutes to setup. once I have daint xml scraping, then I'll tell you how to do a setup.
<heller> ok
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
<jbjnr> heller: if you want to start hacking the script sooner - then please do. there's only one setup step required. After xml scraping - I will need to add support for a list of machines instead of just one - feel free to do that for me :)
<heller> jbjnr: I have other things to do at the moment :/
<jbjnr> :)
Smasher has quit [Remote host closed the connection]
Smasher has joined #ste||ar
Smasher has quit [Remote host closed the connection]
Smasher has joined #ste||ar
parsa[[[w]]] has quit [Remote host closed the connection]
parsa[[[w]]] has joined #ste||ar
fane_faiz has quit [Ping timeout: 248 seconds]
fane_faiz has joined #ste||ar
fane_faiz has quit [Ping timeout: 248 seconds]
EverYoung has joined #ste||ar
<heller> jbjnr: can't you use ctets for checkout as well?
EverYoung has quit [Ping timeout: 255 seconds]
<jbjnr> heller: ? what do you mean?
<heller> jbjnr: you manually invoke git right now, right? If I am not mistaken, ctest does code checkout as well?
<jbjnr> I do the manual step because ctest doesn't corectly display the changed files when you switch to another branch
<jbjnr> I wanted the update section of cdash to show N files different on branch X
<jbjnr> I spent quite a while playing with it to get it to display them. When I used ctat_update on it's own, it was not showing everything. If you can make it work in a simpler manner ....
<jbjnr> ^ctest_update
<jbjnr> the other thing I do manually is the initial clone. If you clone from github for each PR, it takes hours, so before you start pycicle, you make one copy of the repo in the pycicle dir, then each branch can copy that, then pull from github just the stuff they need. Speeds things up enormously.
<jbjnr> that's the 1 setup step.
K-ballo has joined #ste||ar
<jbjnr> so it would seem that I'm wrong and daint can send results to cdash.
<jbjnr> I tried it years ago and it didn't work - but actually it does now!
<heller> jbjnr: ok, ctest_update: got you.
<heller> regarding cloning: Why not just have a shallow clone of master?
hkaiser has joined #ste||ar
<jbjnr> because if you do a shallow clone and then try to merge a branch - it can't find the merge_base and craps out, you need history at least as far back as the divergence point. I decided that either way, cp -r was quicker, and less prone to network issues etc etc
<heller> ok
<heller> good enough ;)
<heller> this base needs to be updated for every PR then though
<jbjnr> yup. it's working and that's all that matters
<heller> and you need to make sure, that it is not being updated concurrently
<jbjnr> I'm going to trigger a build of al branches on daint now think, since it seems to be working
<hkaiser> jbjnr: github canprovide you with a shallw copy already merged
<jbjnr> Interesting : but ...I need the diffed files for the dashboard display
<jbjnr> heller: no. there is one clone of everyhitng, then each build makes a copy of it and pulls master and then merges the branch for the PR.
<heller> jbjnr: ahh, got it
<heller> jbjnr: just saying that this clone of everything should be updated every now and then
<jbjnr> ton of disk space used, but it's $SCRATCH
<jbjnr> yes, the clone should be updated every few days
<jbjnr> I will add that.
<heller> hkaiser: how?
<hkaiser> appveyor uses it, no idea how they do it ;)
<jbjnr> also - greina does a full build and test in under half an hour - IF I don't wipe the build tree - most of the time this is ok, but I am thinking that we can add a flag to say WIPE every build, or wipe every N build just so that we test all scenarios.
<hkaiser> jbjnr: you can wipe builds once the PR was merged
<jbjnr> so if you push a new commit to a PR, then the build is triggered, it is pretty quick
<jbjnr> no, ctest usually expects to clean the build tree out at the start of each build, so you always get a from zero build
<jbjnr> if I disable that, we have a quite fast turn-around
<hkaiser> I meant wiping logs, etc
<jbjnr> but it should be enabled now and then to ensure stale cache entries don't cause problems
<jbjnr> heller: done. daint has 11 builds running now :) http://cdash.cscs.ch/index.php?project=HPX
<heller> jbjnr: you are my hero :D
<heller> jbjnr: next step: have an allocation for one node for building, and two for running the tests
<jbjnr> I just hope this gets us closer to a green dashboard soon.
<heller> jbjnr: and of course, github status ;)
<heller> only if we fix the currently failing tests
<jbjnr> yes - I need multinode testing.
<heller> jbjnr: you do my tax declaration and I fix the tests, ok?
<jbjnr> the MPI tests will surely fail on daint - we'll see
<jbjnr> I will also add libfabric testing too :)
<jbjnr> once we get mpi runing
<jbjnr> tax declaraiotn : easy, just send me all your money and I'll take care of it in my swiss bank for you.
<heller> ;)
<heller> hkaiser: do we have documentation on how to use intel vtune?
<heller> I always fail to set it up :/
<hkaiser> heller: I don't know how to do that onlinux
<hkaiser> heller: or are you referring to how to make it work with hpx?
<heller> hkaiser: I think I got it to build
<heller> more or less
<heller> now I want to profile my application
<heller> so I'd need the command line arguments for the task traces to show up etc.
<hkaiser> HPX_WITH_ITTNOTIFY=ON and -Ihpx.ittnotify=1
<heller> ok, thanks
<heller> there's a new 2018 version, IIRC, the performance counters didn't work on linux last time I tried
<hkaiser> yah, would be nice if they worked now
fane_faiz1 has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
Smasher has quit [Remote host closed the connection]
Smasher has joined #ste||ar
mcopik has joined #ste||ar
fane_faiz1 has quit [Ping timeout: 240 seconds]
<msimberg> hkaiser: a few questions if you have time?
<hkaiser> msimberg: gotta run soon, but try me
<msimberg> ooh, ok
<msimberg> first: intel builds on rostam fail because the license has expired, who can I bug about getting that updated?
<hkaiser> msimberg: we know about this, working on it
<hkaiser> thanks
<msimberg> ok, good
<hkaiser> I'll remind Alireza today
<msimberg> second: would you be okay with not merging prs to master until tests are green? :)
<hkaiser> msimberg: how do you want to get tests green without merging things?
<hkaiser> ;)
<msimberg> well... unless they are meant to fix tests :p
<hkaiser> sure, however nobody talked to me about such an 'agreement' yet
<msimberg> I know, that's why I'm asking now
<msimberg> as in can we make it official with you as well now?
<hkaiser> ahh - I see
<msimberg> heller and jbjnr are already onboard
<hkaiser> well, was it 'official' before?
<msimberg> no, it never was
<msimberg> :)
<msimberg> afaik
<hkaiser> ok, I'm fine with that - I'd appreciate this kind of discussions on the ML, though
<hkaiser> to make sure everybody is informed and on board
<msimberg> okay, that's useful feedback
<msimberg> I wasn't sure if it's enough that the people here know, or if it should be announced more widely
<msimberg> the ops here seem to be the ones merging prs anyway
<hkaiser> I think this should be discussed widely - not everybody reads back here all the time
<hkaiser> msimberg: I think it's not just merging
<hkaiser> it's also asking everybody to focus towards fixing things first
<hkaiser> and that's much wider than just th eops here
<msimberg> yes, absolutely
<hkaiser> msimberg: also, making things green is good, but not sufficient for a possible release
<hkaiser> that would require putting together a list of things that have to be done before a release - not sure if we have all of that as tickets
<msimberg> no, I'm aware of that, but I think if possible making master green should be done before finalizing whatever remaining features/bugfixes that should be done
<hkaiser> ok - fair enough
<hkaiser> thanks for pushing this, btw
<msimberg> you might have noticed I tried to clean up some issues from the 1.1 release, but I think more should be removed still if we want a release before march
<msimberg> so I might need your help also in judging which issues still are realistic to be worked on before that
<hkaiser> yes, we need also create a bunch more tickets outlining missing things
<msimberg> yeah, are there ones you know of off the top of your head? I can add some
<msimberg> or would you have time sometime in december to add in the ones you know are missing?
<hkaiser> adding tests for the RP, for instance
<msimberg> ok
<jbjnr> (hkaiser: there are only 4 people who actually merge anything)
<msimberg> assuming we decide to focus on fixing tests first, would you have time to look at some of them soon?
<msimberg> (I want to help but I
<hkaiser> it's a fine line between 'fixing tests' (i.e. making things green) and 'fixing issues' we have no tests for
<msimberg> 'm afraid fixing some of those tests would take very long for me)
<msimberg> well, I see it as one thing at a time, first fix existing tests, then add tests for things which should be tested, and then add features
<hkaiser> msimberg: I can try, I'm 100% tied up with some personal problem, however - will do my best, though
<hkaiser> ok
<msimberg> hkaiser: ok, no problem, just want to know roughly how things will progress in the next few weeks :)
<hkaiser> do we have a list of tests in need of fixing?
<hkaiser> might be a good ticket to list those
<hkaiser> this ticket could be amended over time
<msimberg> I think not other than what buildbot tells us, I can add them to an issue
<hkaiser> k
<hkaiser> msimberg: is there anything, I'd need to run now
<hkaiser> but we can talk later
<msimberg> no, thanks, last question I can add to github :)
<hkaiser> tks
<jbjnr> the tests that fail are very easy to see on here. http://cdash.cscs.ch/index.php?project=HPX just clikc the link under the test fails. ignore the cray testing for now, until I fix whatever is wrong with mpi.
<jbjnr> there are around 8-15 that fail every time on every machine
<jbjnr> a bunch of builds are still running
<jbjnr> we must also fix the build errors
<msimberg> yeah, lots of things need to be fixed... :/
<msimberg> you also have a bunch of different configurations tested on master now it seems?
<msimberg> is this going to blow through any quotas if you run that all the time? :) or do we have practically unlimited hours?
<msimberg> jbjnr:
<jbjnr> on daint I have almost unlimited
<jbjnr> the testing will quiet down, once I stop tweking pycicle
<msimberg> at this pace rostam won't be needed anymore though...
<jbjnr> greina is now showing test results and it's the same 8 failing every time
eschnett has quit [Quit: eschnett]
hkaiser has quit [Quit: bye]
<jbjnr> msimberg: well, we're never going to convince the others to get rid of buildbot. pycicle is just a quick hack to help us get past the current problems
<msimberg> yeah, I wasn't entirely serious :) but it's remarkable how useful you got it already after a few days
<jbjnr> I just with I did this a year ago
<jbjnr> ^wish
<msimberg> better late than never!
<heller> jbjnr: I think pycicle + cdash actually has the potential to replace buildbot
<heller> we need a few more iterations, but I think it's almost there
<jbjnr> jenkins?
<jbjnr> we have ameeting this week about it
<heller> what about jenkins?
<heller> i think your admins will revolt if you tell them that we want to hook up external machines ;)
<heller> that is, machines that are outside of the cscs network
<jbjnr> I guess we'll see. My current thinking is we should push them to get jenkins useful, but continue tweaking pycile/other since we can add LSU and FAU to pycicle easily and whichever ends up being most useful, we can use ...
<jbjnr> I'll ask marco to upgrade our cdash server and purge the existing databases so we can have a clean start ...
<heller> good
<heller> eventually, we can still use pycicle as a script for a jenkins worker
<heller> or even buildbot worker
<jbjnr> yup
<jbjnr> heller: did you have a PR that fixes some of the build/test errors
<heller> yes
<heller> jbjnr: #3038
<jbjnr> aha. already merged. ok
<heller> yes
aserio has joined #ste||ar
<heller> planning on fixing the action_invoke_no_more_than test later
<jbjnr> I will fix the get_stack_size test
eschnett has joined #ste||ar
<jbjnr> ooh heller the invoke no more than, I made an issue for earlier. Please check
<jbjnr> I think we should get rid of it
<heller> jbjnr: the invoke and stack_size failures are related
<jbjnr> ok
<jbjnr> then I will hold on a bit
<jbjnr> I just ran the stack test locally and asan choked mightily
<heller> Yes
<heller> I know which knobs to turn
<jbjnr> some fault deep in the action code I suppose.
<heller> Yes
<jbjnr> if you know what to do, I'll move on to something else
<heller> Hartmuts local action improvement patch
<heller> jbjnr: you work on the github status report ;)
<jbjnr> ok. Sadly tomorrw I must go to ETHZ and start a different project, so I'd better get it working today ....
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
akheir has joined #ste||ar
hkaiser has joined #ste||ar
hkaiser has quit [Client Quit]
hkaiser has joined #ste||ar
jaafar_ has joined #ste||ar
fane_faiz1 has joined #ste||ar
<jbjnr> msimberg: yt?
<jbjnr> your branch finished testing http://cdash.cscs.ch/index.php?project=HPX&date=2017-12-04 looks like 3 new fails :(
<jbjnr> I love pycicle!
<hkaiser> jbjnr: frankly, I have no idea where to look in thattable
<hkaiser> it's the worst UI I have seen in a while
<jbjnr> if you click build time, then they should be sorted by most recent results first - then the top one is "PR-3046-suspend-thread-Release" which is Mikale's PR, then click the test fails - which is 11 - you can also see the 26 modified files etc etc
<jbjnr> hkaiser: ^
<hkaiser> ok
RostamLog has joined #ste||ar
<github> [hpx] hkaiser closed pull request #2546: Broadcast async (master...broadcast_async) https://git.io/vydS0
EverYoung has joined #ste||ar
diehlpk has joined #ste||ar
rtohid has joined #ste||ar
diehlpk has quit [Ping timeout: 255 seconds]
david_pfander has quit [Ping timeout: 250 seconds]
akheir has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
<akheir> msimberg: yt?
EverYou__ has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
<heller> akheir: the Intel compiler license problem
<heller> I think I queried you about that as well a few weeks ago
<heller> jbjnr: got it covered
EverYou__ has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
<akheir> heller: just fixed it. the next build should be able to use intel compiler
<heller> akheir: cool
<hkaiser> K-ballo: you could use one of the execisting wave lexers to tokenize things for inspect
fane_faiz1 has quit [Ping timeout: 276 seconds]
<K-ballo> hkaiser: I already had the lexer implemented from elsewhere
<hkaiser> just saying
<K-ballo> the token based checks did not end up looking as I was expecting
<hkaiser> k
<heller> what's wrong with clang-tidy?
<hkaiser> nothing
<heller> probably too slow though
<heller> inspect is pretty fast as it is, clang-tidy takes a few hours on our code base
<heller> wave might even be in the same ballpark as inspect, I guess
<hkaiser> the wave lexer (tokenizer) is extremely fast
<heller> having a token stream is probably way more flexible than the regex parser we have now in inspect
<heller> the benefit of clang-tidy, is that you have access to the fully parsed AST as well
<K-ballo> that was my thinking, and why I implemented the token stream for inspect.. but I don't like the results
<K-ballo> I did not particularly care about speed though, not for inspect checks
<hkaiser> right
<heller> well ... if we want to have fast turnaround times...
<K-ballo> anything under 2 minutes is super fast for inspect
<heller> yes
<hkaiser> heller: inspect takes not even 1% of the overall time the tests require
<heller> sure, just comparing to what clang-tidy would give us
<heller> (not sure if it isn't my storage though, inspect takes ages for me as well)
<hkaiser> heller: build inspect in release, speeds up things by a factor of > 10
<hkaiser> if not more
<heller> ha! never even thought about this ;)
<K-ballo> yeah, inspect in debug is useless
<K-ballo> so is inspect in release using msvc's regex
<heller> don't we build it in debug on circle?
<K-ballo> uhm...
<heller> hkaiser: so, I am thinking about ways to speed up partitioned_vector ... would it make sense if we completely type erase the partitions? So that only the client has the full type information?
<K-ballo> yes... yes we do... jah
Smasher has quit [Remote host closed the connection]
Smasher has joined #ste||ar
<hkaiser> heller: could help
<hkaiser> gtg, sorry
hkaiser has quit [Quit: bye]
<heller> jbjnr: #3044 is just for you
<heller> jbjnr: btw, I think it should be cray(daint) and linux(greina) or daint(cray) and greina(linux)
<heller> jbjnr: regarding the mpirun failures on daint, did you start the tests with the env variable HPX_RUNWRAPPER=srun set?
<heller> sorry, HPXRUN_RUNWRAPPER=srun
<heller> woah, can't believe it. only one failed test with ctest -R tests.unit
aserio1 has joined #ste||ar
hkaiser has joined #ste||ar
Bibek has quit [Quit: Leaving]
Bibek has joined #ste||ar
aserio has quit [Quit: aserio]
aserio1 is now known as aserio
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
aserio1 has joined #ste||ar
eschnett has quit [Quit: eschnett]
<aserio> hkaiser: yt?
<hkaiser> aserio: here
<aserio> hkaiser: do you remember my username on the host server on the stellar websites?
<hkaiser> sec
<aserio> I am trying to make sure all of my tools work again
<hkaiser> the wordpress login? or the ftp account
<aserio> ftp
<hkaiser> aserio: username: aserio@stellar.cct.lsu.edu
<aserio> so should I use the host name stellar.cct.lsu.edu?
<hkaiser> hostname is ftp.crochetcutedolls.com
<aserio> I thought it was the alpha.validns.com?
<hkaiser> that should work as well, yes
Smasher has quit [Remote host closed the connection]
rtohid has left #ste||ar [#ste||ar]
EverYoun_ has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
rtohid has joined #ste||ar
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
aserio1 has quit [Quit: aserio1]
aserio has quit [Quit: aserio]
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar