-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out how to handle code in multiple repos #24343
Comments
I'd love to see an analysis of what is the best we can do without creating additional repos (of course re-organizing the directory structure would not count as creating additional repos). In other words, for each of the points you mentioned ("ACLs, notification management, issue triage, PR reviews, sequentialized submit testing, merge conflicts, etc."), what is the best we can do within the current framework, and what will these things look like if we have separate repos. So we can see how much things will be better with current repos (and understand what downsides there might be, other than the implementation work to get there). Full disclosure: my personal feeling is that creating more repos this year will create an amount of churn that will be counter-productive to the project's velocity, and while it may make some things better, it will make other things worse, and we don't have a good handle on either how much better things will become or how much worse things will become. |
Another downside is people will start saying "that's not my job", cross-repo. I think it'll make dealing with e2e flakes that much harder. |
The boundaries between repos are APIs and releases. It's already hard for people from different groups to figure out why other e2e tests failed?
I completely agree. Nevertheless, it's a good timing to raise attention that the current development workflow is tangling people with different interests in scheduling, client side, testing, node into one crowded path. I think what was suggested here is not forcing people to choose but opening new paths for people w.r.t separation of concerns. The goal is about agility. Starting evaluating different approaches and understanding potential problems is much better. |
Some repos will be compiled into Kube, and so the boundaries are also Admission controllers may deserve separate repos, as well as authorizer On Fri, Apr 15, 2016 at 3:05 PM, Hongchao Deng notifications@github.com
|
That really depends on how coupled the project is. This is more a statement about the current state of our repo, not about software development philosoply. ~50% of the "Services" test flake I debugged for the past release was not related to Services. If i'd just dropped the mic on those, things would probably be worse off. I think the distinction is that a lot of contributors don't debug test flake (and this is being unfair to those contributors who do) but there's a very low chance I'm following anything into eg: kubelet code if it's a Godep. Cadvisor is a good exmaple of this. |
I think we're not really equipped to run multiple repos right now. I imagine that leads to a world of N submit queues, N times the tests, N times the vendoring, basically N times the problems. It will badly hurt velocity because what you can do in a single PR now, you'd instead have to do with a well-ordered series of PRs & dependency bumps. Building import walls in the repository is the thing we can do ~now and it will make an eventual split easier/possible (right now I think we'll have vendor loops if we split). The import-boss utility makes this possible. I think we desperately need OWNERs, & to scale the number of reviewers. I am not in favor of splitting our repository further until it looks like we're treating contrib/ with the same seriousness we treat this one. That means same tool stack, same testing standards, same set of code verification tests, same SLO on reviews, sane vendoring strategy. We shouldn't do more splits until our current split is looking like a success. I do think it's good to come up with a strategy for splitting, but I don't have bandwidth to participate heavily at the moment. However, before we do any splits, I think we need to split out the tools that operate on our repo. That means many/most of the verify-*.sh scripts, for example. We need consistent tooling to run over all of our repositories. Without that, instead of one tangled mess, we will end up with N tangled messes. |
Agree with OWNER and import walls now. On Fri, Apr 15, 2016 at 3:20 PM, Daniel Smith notifications@github.com
|
I like @lavalamp's suggestion, but it still doesn't help with managing notifications though... |
May I ask what import wall is? |
I'd like to see more justification of why we should split -- that is, identify problems caused by the single repo, determine how significant those problems actually are and what their impact is, then discuss specifically how splitting into multiple repos will solve those problems, along with what the downsides of splitting are, and whether there are alternative approaches to solving those problems that come with less cost or other benefits. Once we have that, we can decide whether cost/benefit works out vs. opportunity cost elsewhere. |
@vishh : you and @bgrant0607 both referred to a notifications problem. Can you explain this problem in more detail? In particular, what notifications do you receive today that you do not want, and what notifications do you not receive today that you do want? |
@davidopp I was referring to the notifications generated by github. If we were to have separate repos, I can choose to not watch or de-prioritize emails from certain parts of the system. As of now, it is difficult to identify the PRs and issues that I need to look at using notification emails. |
@vishh I feel there are solutions to that problem without going to separate repos. Step 1: Reorg the directory structure a bit, to have cleaner separation between areas This ensures, to first approximation, that you are subscribed to every issue and PR that you might be interested in. Then you can just ignore everything you are not subscribed to. |
To be clear, there is no specific timeframe for this, but I view it as inevitable and ultimately healthy. I wanted a place to centralize the discussion. More and more code is going into additional repos already, I expect that trend to continue. |
+1 to this being healthy. On Sat, Apr 16, 2016 at 2:36 AM, Brian Grant notifications@github.com
|
I agree it's healthy long term, but we do need to think hard about the chunks. Having all of the binaries be built and versioned together is pretty important, I think. In fact, I'd love for us only to build one binary - hyperkube. Breaking out client utils and libs is pretty obvious. Gabe from Deis said it hurt a lot but was worth doing... |
Request to break out service discovery / DNS: https://twitter.com/jbeda/status/711585271221866496 |
+1 on this as a long term goal. But I don't think we are ready to do this, especially the top 4 areas listed above: Kubelet, Generic API infrastructure, Client libraries, and Misc. utilities. Taking Kubelet as an example, for a long term, I really want to run Kubelet as a standalone project, so that itself can be packaged as a product to manage a single KNode without API server. But to really achieve that, we need to solve Generic API infrastructure issue first, Kubelet checkpoint issue, etc. Not even mention that we need first answer the question related to the version management, compatibility issues, testing, etc. Also if one look through all features we introduced to the core system today, actually most of them are still requiring multiple components' changes. I think splitting them out at this moment has a negative impact on our velocity. |
/cc @kubernetes/sig-testing |
We can't put anything new into the main repo. At minimum, new things need to go in other repos. This is why minikube and the node-problem-detector were put into new repos, for instance. Github is designed for small repos, small teams. ACLs (e.g., for label/PR management) are coarse-grain, on a per-repo basis. CI is on a per-repo basis. The notification flood from our giant repo is unmanageable. We have >3000 open issues and >1000 that we haven't really even looked at. PRs can't find reviewers and vice versa. We now have over 500 open PRs. PR merge latency is growing monotonically. The repo is infeasible to build-cop. We can barely keep our tests running. We hit a ceiling on commit rate over a year ago. https://github.com/kubernetes/kubernetes/graphs/contributors I haven't found any single repo on github with a sustained commit rate higher than 250 commits per week. The only way Docker can achieve 300 PRs / week is through multiple repos. We also need to break up contrib: That repo has no automation and isn't build-copped, and most people on the project don't pay attention to it. Even things that should be maintained are hard to maintain because so many of the notifications are about irrelevant things in contrib, so it gets ignored. Implementing OWNERS will enable us to give label/PR/wiki power to more people, and we need to do that, for automatic PR assignment/approval if nothing else, but it won't solve the notification, build-copping, and CI problems. What's urgent to extract? Ecosystem developers need stable client libraries, with minimal dependencies on the rest of our codebase, which can be imported independently. Cloudproviders / cluster provisioning / getting-started guides: Many of these aren't maintained, and we can't really review or test most of them. We have at least 3 new ones waiting to be merged right now. We're just slowing them down. Both of these require technical changes. I don't know if there is any lower-hanging fruit that also has significant value. We're going to need to make our PR automation and test infrastructure easy to replicate for new repos. We have to do that, anyway, since we have several active repos already: contrib, kubernetes.github.io, dashboard, heapster, minikube, helm, ... The documentation repo, which was created due to the way github hosts project sites, desperately needs automation. cc @philips |
How about picking one component and breaking it into a separate repo to exercise the change needed and define a process. We won't be breaking all at once. In some other projects, this would start with a [PROPOSAL] and then a [VOTE] on the mailing list. Start with |
@Runseb We decided we wanted to move out examples a long time ago. The challenge is finding someone to do the required work, and someone to review the necessary changes. |
@karlkfi Made some great points here: #16508 (comment) |
I agree that github is not good at big repos. But it's not clear that splitting into multiple repos is going to be any better. We had one separate repo (contrib) and it was worse on all the axes we are critiquing the main repo for: PRs and issues remained unattended, and they were less discoverable for being split across repos. We should prove with one extra repo that it is an improvement, before we go through all the churn involved in splitting into N repos. We should also give due weight to the costs of having multiple repos. We are aware of the problems of having one repo, but multiple repos will brings its own problems. Will issues actually be more discoverable? Do we really expect issues to be opened in the correct repo, or in practice will we just track issues on the main repo? How will we coordinate releases, or PRs that touch more than one component? The obvious precedent (OpenStack) went through a similar fragmentation into a large number of repos & projects, and the outcome was not what we are hoping for. Finally, I am not sure that the conclusion I would draw from "github is bad at big repos" is "not big repos". "Not github" feels like an equally valid position: many big projects seem to have gone that route, with a code mirror on github. I personally am much more excited about working on tooling that leap-frogs GitHub's functionality, than I am in teasing apart a repo into GitHub-approved chunk sizes. I propose that we:
I propose that the client APIs would be a good candidate as a project to split off, because we definitely have a problem right now with the binary size of any program that uses the go client (I think being in one big repo means the client pulls in a lot more code than it strictly needs.) I think the protobuf work also makes this practical now, including having other language bindings to the k8s API. In some senses clients are a bad choice because the separation there should be more obvious than with some of our other components, but it also will give us a taste of the complexity because of the circular dependency (because our servers are also API clients) |
Is anyone working on a generic release process? If not I am raising my hand to help with that and get a release process for kops. One hurdle is that we need a place to put bits and containers. |
Just something I noticed this morning about our release process: we have |
Related: Development in other branches/forks: |
Some updates:
|
Is there a post mortem/lessons learned ? if you had to do it again would you do it or what would you do differently ? And which size of contributors/velocity would you recommend for a split ? |
Update on Zuul which I presented to the testing SIG a few months back as a potential solution for Jenkins scaling and multi-repo testing. OpenStack has fully migrated to Zuulv3 now. It can be considered "battle hardenING" at the moment. If a solution for cross-repo testing has not been created yet, perhaps someone should make an attempt to setup an experimental Zuulv3. The main challenge to success in that now would be that Zuul still only knows how to get computing resources from |
Just an FYI for everyone. kops team with sig-release are working on a MVP, for a build process for sub projects. Currently kops is not released with kubernetes/kubernetes, and we are working on flushing out a release process. |
@kubernetes/sig-testing-feature-requests |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Automatic merge from submit-queue Redirect all files in /examples folder to kubernetes/examples repo **What this PR does / why we need it**: Examples are being moved to their own repository: https://github.com/kubernetes/examples We need to remove them from the main repo , but first we need to keep a redirect. This is a *big* organizational change, but nothing technical (aside from e2e tests) **Which issue this PR fixes** fixes part of kubernetes#24343 **Special notes for your reviewer**: WIP, I still need to figure out what to do with the BUILD script and tests, plus take care of the e2e tests that use some of these examples. **release notes** ```release-note Redirect all examples README to the the kubernetes/examples repo ```
/lifecycle frozen |
Hey, just to let you know - I'm working on a multi-repo code review tool.
Here's a 2-min video of the tool: The grand idea is that having multi-repo code review, multi-repo cli and multi-repo CI, we can do cross-repo changes, repos can depend on head of other repos, so also cross-repo tests at head. These are the main advantages of a monorepo, while also staying multi-repo. Happy to answer any questions. |
/area code-organization This specific issue is no longer useful, so closing. |
@bgrant0607: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…tapi-pkg Picks Upstream 86256 - Remove use of testapi package Origin-commit: ae96064ab640c9e2206b28472f17163fefad022b
A large monorepo works for Google, but not on github.
We hit the ceiling of achievable velocity of a single github repo in early 2015:
https://github.com/kubernetes/kubernetes/graphs/contributors
There are many reasons: ACLs, notification management, issue triage, PR reviews, sequentialized submit testing, merge conflicts, etc.
We're chipping away at these issues, but we need more than incremental improvement.
We've discussed moving a number of things to other repos:
pkg/util
into a separate repo #24156We need to seriously think about how to do this.
Known issues that need to be addressed:
An example of a Go project on github with good repo hygiene:
https://github.com/deis
I have no illusions that breaking the project into separate repos will be a silver bullet: it's necessary, but not sufficient. I also know that it will cause some pain. But that pain already exists: cadvisor, heapster, dashboard, contrib, docs, ....
Speaking of contrib, it needs to be broken up, too: kubernetes-retired/contrib#762
@thockin @smarterclayton @lavalamp @mikedanese @dchen1107 @davidopp @ixdy
The text was updated successfully, but these errors were encountered: