-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't copy whole response during response marshalling #129304
Comments
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/sig api-machinery |
Great observation Marek! I think there are two main unknowns with the proposal:
Having better visibility for both will be helpful to understand the tradeoff we are making. |
Tested pods, spread data in 1KB chunks around containers, initcontainers volumes and condition fields to create a large structure. The test results showed smaller, but still substential benefits. Memory usage went down from 27GB to 9GB. With pods the base memory is around 7GB, so it means we reduce allocations from 20GB to 2GB. I can accept that :P |
I think we can skip them for now. Our main focus should on default client behavior. Which is base JSON. One sad thing is that we don't really have any JSON performance. Scalability tests run everything in proto based on fact that K8s would not hit 5k nodes if we used JSON at all. Still would like to have Proto implemented just to see the impact in scalability tests. |
I imagine we'd like to triage this as accepted. What are the downsides? |
What would you like to be added?
When experimenting with measuring memory usage for large LIST requests I noticed one thing that surprised me. It's expected that apiserver requires a lot of memory when listing from etcd. It needs to fetch the data, decode etcd, however what about listing from cache?
I was surprised when listing from cache still required gigabytes of data (10 concurrent list of 1.5GB data increased memory usage by 22GB). Why? apiserver has all the data it needs, we copy structure of data (e.g. converting between types), but data for that should be miniscule. Most data is stored should come from strings, which are immutable. This lead me to revisit old discussion I had with @mborsz and his experimental work on streaming lists from etcd master...mborsz:kubernetes:streaming, he proposed to implement custom streaming encoder. I looked at current implementation of encoding:
kubernetes/staging/src/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go
Lines 245 to 246 in 29101e9
Built In json library encoder still marshalls whole value and writes it at once. While this is ok for single objects that weight 2MB, it's bad for LIST responses which can be up to 2GB.
I did a PoC that confirmed my suspicions. master...serathius:kubernetes:streaming-encoder-list Simple hack over encoder reduced memory needed from 26GB to 4 GB.
Proposal:
Options:
Other thoughts:
Why is this needed?
For LISTs served from watch cache it prevents allocating data proportional to size of response. This makes memory usage more proportional to CPU usage improving cost estimations of APF which cares only about memory.
For LISTs served from etcd there is a ongoing proposal to serve them from cache.
The text was updated successfully, but these errors were encountered: