hkaiser changed the topic of #ste||ar to: The topic is 'STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
primef3 has quit [Ping timeout: 248 seconds]
primef3 has joined #ste||ar
primef4 has joined #ste||ar
primef3 has quit [Ping timeout: 260 seconds]
primef4 has quit [Ping timeout: 260 seconds]
hkaiser has quit [Quit: bye]
K-ballo has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
rori has joined #ste||ar
<K-ballo> how do I get a raw build log out of cdash?
<jbjnr> We don't keep all the output, only the bit near the error (I think, can't remember)
<simbergm> I haven't found a way to get the log :/
<simbergm> jbjnr: is it possible to enable that? would be nice to have
<jbjnr> I think it can be adjusted, but it would generate a crap ton of useless info most of the time
primef4 has joined #ste||ar
hkaiser has joined #ste||ar
jaafar has joined #ste||ar
primef5 has joined #ste||ar
primef4 has quit [Ping timeout: 252 seconds]
<zao> hkaiser: Some of the DOI links in the Publications page of stellar-group.org are missing the https://doi.org/ prefix, instead interpreting the number directly as an IP address.
<hkaiser> zao: thanks, I'll have a look
<zao> (f.ex, the two top ones under Conference publications)
<hkaiser> yah, I added those yesterday, and screwed it up :/
<zao> :D
<hkaiser> zao: ok, fixed
primef5 has left #ste||ar ["WeeChat 2.7"]
primef has joined #ste||ar
hkaiser has quit [Quit: bye]
<simbergm> hkaiser: heller we have another meeting until 5 (1 hour 10 minutes from now) on the coordination meeting number so either don't join early or join early if you want to eavesdrop...
rori has quit [Ping timeout: 260 seconds]
hkaiser has joined #ste||ar
hkaiser has quit [Ping timeout: 245 seconds]
hkaiser has joined #ste||ar
<simbergm> heller: are you joining today?
<heller2> @ms:matrix.org: doh.. totally forgot, sorry!
<diehlpk_work> jbjnr, I am still working on the performance issue. So no time to look into libfabric.
<diehlpk_work> How are things going for you?
primef has quit [Ping timeout: 272 seconds]
<K-ballo> hkaiser: you won't be in prague?
primef has joined #ste||ar
<hkaiser> K-ballo: sorry, no
<K-ballo> :/
primef has quit [Ping timeout: 246 seconds]
primef has joined #ste||ar
<heller2> ms: hkaiser: Did I miss anything important?
<simbergm> heller: always ;)
<simbergm> we talked about having a patch release in two weeks latest, some updates to the governance document, a bit about the survey, rostam buildbot (we'll keep the old one for now, maybe replace it with pycicle or newer buildbot in the future)
<simbergm> some pr talk
<simbergm> remind me next time to take meeting notes if I don't do it voluntarily... I meant to start today but forgot
<heller2> ok
<heller2> ;)
<heller2> so 15 users responded so far, that's pretty cool
<simbergm> heller: well, yes, the new one would be cool :) are you volunteering to set it up? I think we all want to change but keeping the old one in place is just the easiest for now
<heller2> yes, I am working on it right now
<simbergm> all right :P I interpreted your email a bit differently ;)
<simbergm> but then that was already yesterday...
<heller2> well
<heller2> "for the time being"
<heller2> I have no idea how fast I will be able to get it to a satisfactory state ;)
<hkaiser> heller2: why reinvent yet another buildbot environment?
<heller2> hkaiser: the number one thing I dislike most about buildbot, is that we can't easily control what is being build in the same repo where the code is, as opposed to circleci, travisci etc
<heller2> hkaiser: the number one thing I dislike most about circleci, travisci etc. is that it is impossible to have our own environment inside it
<heller2> own machines that is
<hkaiser> heller2: ok, what about using pycicle?
<heller2> I don't like cdash ;)
<hkaiser> well, it's your time you're spending
<heller2> the cdash results are very hard to interpret, but maybe that's just me
<hkaiser> I just don't like for us to waste time on tool development, if not absolutely necessay
<hkaiser> those usually stay in a half-assed and abandoned state
<heller2> sure
<heller2> take the status of this PR: https://github.com/STEllAR-GROUP/hpx/pull/4335
<hkaiser> heller2: what do you mean?
<heller2> now click on the details for the failed pycicle builds
<hkaiser> ok?
<hkaiser> what are you trying to say? that cdash is difficult to read - I'm fully aware of that
<hkaiser> does this warrant developing your own CI environment? I don't believe so
<heller2> well, you asked "why not pycicle"
<hkaiser> ok
<hkaiser> your call - but I will probably be against using yet another CI environment - especially one where we have not guarantee of it being maintained
<heller2> fair enough
<heller2> what other options do we have?
<hkaiser> well pycicle is guaranteed to be maintained as cscs is using it
<heller2> is it?
<heller2> FWIW: The maintenance uncertainty is the main reason why I responded to "keep running the old buildbot"
<hkaiser> it's used by other projects besides hpx
<hkaiser> well, in the end it's just my oppinion, simbergm and jbjnr and the others might disagree with me
<heller2> so far, I have nothing of value to show. pycicle is there and seem to work for y'all
<heller2> so there really isn't a lot to discuss there
<heller2> right now
<hkaiser> sure
Yorlik has joined #ste||ar
<hkaiser> heller2: I don't want to discourage anyone to help moving things forward, all I want it to try using our time wisely
<heller2> I understand that perfectly
<simbergm> I'm not opposed to anything really, but my 5 cents is that pycicle is more "developing our own tools" than using a newer buildbot
<hkaiser> fair enough
<heller2> what I am trying to do is to develop our own tool though ;)
<simbergm> I don't particularly like cdash either (but I've gotten used to it), I just don't feel like looking at alternatives either :P
<simbergm> but based on buildbot? I guess I don't know exactly what you're trying to do
<heller2> yes, based on buildbot
<simbergm> so you've forked buildbot? :)
<heller2> well, essentially, I am trying to create a proper GitHub App around it to behave it more like travis/circle
<heller2> no, no need to fork it
<heller2> and have the ability to attach builders that we control to it
<heller2> and buildbot supports this feature of "latent" workers, which we can use to for example dynamically start the buildbot worker on rostam, daint or a cloud provider (like packet)
<heller2> I started this effort about a year ago, and only began to revisit it again
<simbergm> ok, I take my previous statement back, this does sound like developing our own tools
<simbergm> if you have the time and patience to do it and maintain it it's a worthy goal though
<heller2> unfortunately, it's more fun than developing C++ for any other project...
<heller2> and some of the fun is taken away from me, because I am frustrated with the tools supporting the project
<heller2> projects, even
<K-ballo> let's switch to Kotlin, they are going native
<Yorlik> Rust ...
<Yorlik> I believe modularization will help a lot in the long run. As things get larger their complexity grows exponentially.
<Yorlik> Creating smaller usints of work should help efficiency.
<Yorlik> units ..
<heller2> that doesn't solve the issue of integrating all those small units ;)
<Yorlik> It's a matter of aggregation and how to aggregate. It comes down to a usable divide and conquer strategy.
jbjnr1 has quit [*.net *.split]
<Yorlik> With n units you have n(n-1)/2 interactions, so complexity grows quadratically with the number of interacting units.
<Yorlik> Reducing these numbers by aggregating in a meaningful way it gets simpler.
<Yorlik> OFC - tha can easily be a mamoth task, sure.
<heller2> noticed the contradiction ;)?
<Yorlik> Which contradiction?
<Yorlik> Like you need to invest time to save time?
<heller2> it getting simpler vs it being a mamoth task
<Yorlik> The refactoring / restructuring can be a maoth task, sure - that's the investment to gain long term savings.
<Yorlik> I think the attempt to modularize HPX is such a thing. A good investment.
hkaiser has quit [Quit: bye]
<heller2> sure, the problem is, that an iterative approach takes ages...
<heller2> especially given the tools we have
<Yorlik> It's tough how to approach this. My own project is much smaller and I'm already hitting walls all the time because stuff gets to complex easily. And yes - we all hate CMake, right ;?
<Yorlik> As brutal as it is - but I think the approach John Lakos suggests might be the least of all evils.
<heller2> it's not about cmake
<Yorlik> I know.
<Yorlik> CMake is just one of the not perfect tools.
<heller2> Main issue: I either want to submit a PR to HPX or review one. There simply is no easy way to tell. Has there been one PR in the last year where all the status checks have been green? Why can we not filter out tests that have been failing since ages?
<Yorlik> So you're fighting the CD/CI?
<heller2> yes. because that should be our main gate keeper from getting back code into the repository
<Yorlik> And it's a problem of available CI/CD tools?
<heller2> look at this one for example: https://github.com/STEllAR-GROUP/hpx/pull/4305
<heller2> the CI status checks are just plain broken
<heller2> it's more of a problem of not wanting to throw money at the problem
<Yorlik> So - is it a problem of the tools or the way how you customized them / use them?
<heller2> I think travis-ci or circle-ci is great and would easily solve most of our CI needs if we were able to invest around $1k a month
<Yorlik> FFS
<Yorlik> So jenkins can't do it?
<heller2> I don't like it ;)
<Yorlik> lol
<Yorlik> We're working on a jenkins container for our thing.
<heller2> last I looked at it, it was a pile of innefficient java stuff, configurable only with a web interface
<heller2> maintenance nightmare
<heller2> with buildbot, you can at least codify the config
<heller2> but you still seperate your code from the actual testing
<Yorlik> Isn't travis free if you host it yourself?
<heller2> Not that I am aware of
<heller2> same goes for pycicle, you need adapt the tester to changes in the code
<Yorlik> It's beautiful and it has three colors ! :) </optimistic mode>
<Yorlik> on the travis site they say they have a free plan for open souirce projects
<heller2> yes
<heller2> time limit of an hour and only one build at a time
<heller2> we use it for windows builds
<heller2> note: not testing anything there
<Yorlik> Modularize HPX, make each a repo on its own and launch 20 open source projects .... ;)
<heller2> circle-ci allows us to have 4 parallel jobs, which is nice, but only linux and with really low powered virtual machines
<heller2> not working. It is per organization
<Yorlik> crap
<Yorlik> thats puny
<heller2> also, brings you back to increasing the complexity when integrating things
<Yorlik> indeed - i wasn't really serious about that
<Yorlik> You need a strong industrial sponsor.
<Yorlik> Did you review all the available open source CO tools? A naive google search brings up a bunch
<Yorlik> err CI tools
<heller2> yes
<Yorlik> So - you're screwed?
<heller2> so here is the thing: both LSU and CSCS have massive resources at their fingertips
<heller2> all the available tools don't integrate well with those
<heller2> they want you to run on a cloud
<heller2> * they want you to run on a cloud provider
<heller2> submitting job to a cluster managed by slurm? no way!
<Yorlik> Maybe it's time for a true open source CI tool. Maybe one powered by HPX tasks. :)
<Yorlik> But I agree: Hit this type of wall seriously sucks.
<heller2> it seems that it is just me though
<Yorlik> Well - who's the HPX build / CI / CD engineer?
hkaiser has joined #ste||ar
<heller2> jbjnr did pycicle
<heller2> wash[m] had initially developed buildbot
<heller2> I continued it
<heller2> I introduced circleci
primef has quit [Ping timeout: 260 seconds]
<Yorlik> Maybe you need a group process / session to decide a direction to go here and consequently develop that. With the size of the project I am not surprised this becomes an issue - it had to become one sooner or later. After all HPX is not a trivial piece of code.
<heller2> there's only thing that keeps me sane...
<Yorlik> Watching the Simpsons on a regular basis?
<heller2> no, being sure that throwing more manpower and money at such a problem only creates worse sitations
<heller2> * no, being sure that throwing more manpower and money at such a problem only creates worse situations
<Yorlik> Yep. You need one single or very few crazy geniuses to solve this. Because the usual responsibility diffusion in groups makes this not working.
<Yorlik> It requires extremely intimate knowledge of the software on a detail and a conceptual level and the ability to make sense of the mess and give it a better structure and glue it all together. That's why I said it's a mammoth task.
primef has joined #ste||ar
hkaiser has quit [Ping timeout: 260 seconds]