Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not apply rate limits for sds server #41213

Merged
merged 6 commits into from
Oct 11, 2022

Conversation

ramaraochavali
Copy link
Contributor

@ramaraochavali ramaraochavali commented Sep 29, 2022

When DiscoveryServer is instantiated as a SDS Service in agent, it applies default rate limit of 1 which prints log lines like this in agent

2022-09-29T03:33:23.234076Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline
2022-09-29T03:33:23.234092Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline
2022-09-29T03:33:23.234052Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline
2022-09-29T03:33:23.234060Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline
2022-09-29T03:33:23.234010Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline
2022-09-29T03:33:23.234077Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline
2022-09-29T03:33:23.234125Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline
2022-09-29T03:33:23.234132Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline
2022-09-29T03:33:23.234033Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline
2022-09-29T03:33:23.234041Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline
2022-09-29T03:33:23.234067Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline
2022-09-29T03:33:23.234010Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline
2022-09-29T03:33:23.234096Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline
2022-09-29T03:33:23.234179Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline
2022-09-29T03:33:23.234183Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline
2022-09-29T03:33:23.234042Z	warn	ads	ADS: "@" exceeded rate limit: rate: Wait(n=1) would exceed context deadline

Also the log lines like

2022-09-29T03:33:23.515201Z	info	ads	ADS: new connection for node:istio-ingressgateway-processing-us-west-2c-6c74f945bc-6cpvz.sfproxy-13

for every secret request is confusing.

This PR fixes both

  • Ambient
  • Configuration Infrastructure
  • Docs
  • Installation
  • Networking
  • Performance and Scalability
  • Policies and Telemetry
  • Security
  • Test and Release
  • User Experience
  • Developer Infrastructure

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@ramaraochavali ramaraochavali requested review from a team as code owners September 29, 2022 09:42
@istio-policy-bot
Copy link

🤔 🐛 You appear to be fixing a bug in Go code, yet your PR doesn't include updates to any test files. Did you forget to add a test?

Courtesy of your friendly test nag.

@istio-policy-bot istio-policy-bot added the release-notes-none Indicates a PR that does not require release notes. label Sep 29, 2022
@istio-testing istio-testing added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 29, 2022
@ramaraochavali
Copy link
Contributor Author

envoyproxy/envoy#21600 - This might improve things but still rate limiting is not needed for sds

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@@ -154,6 +154,9 @@ type DiscoveryServer struct {
// ClusterAliases are aliase names for cluster. When a proxy connects with a cluster ID
// and if it has a different alias we should use that a cluster ID for proxy.
ClusterAliases map[cluster.ID]cluster.ID

// Indicates this is running as an Sds Server.
SdsServer bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i donot think this is a good mark. I am thinking not add any filed here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you think we should add any fields here?

Do you have any alternatives to identify otherwise? we can drive this by req.typeURL if you think that is better - I can change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry we can not do by typeURL at the place where we apply rate limit (new stream init)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, can we just make the rate limit configurable and make it MaxInt for SDS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configurable you mean in DiscoveryServer? I can make that change but some log lines are also confusing that for each SDS stream - connection id keep incrementing. Should we leave them as is?

new connection for node:istio-ingressgateway-processing-us-west-2c-6c74f945bc-6cpvz.sfproxy**-13**

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@ramaraochavali
Copy link
Contributor Author

@howardjohn made changes. PTAL

Copy link
Member

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rate limit part lgtm, not sure about the logs

@@ -147,11 +147,19 @@ func (s *DiscoveryServer) receive(con *Connection, identities []string) {
req, err := con.stream.Recv()
if err != nil {
if istiogrpc.IsExpectedGRPCError(err) {
log.Infof("ADS: %q %s terminated", con.peerAddr, con.conID)
if req != nil && req.TypeUrl == v3.SecretType {
log.Infof("ADS: connection terminated for secrets:%v", strings.Join(req.ResourceNames, ","))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secret is used even for Istiod, and may have 10000s of secrets, not sure we should log this

@@ -58,6 +60,8 @@ var _ model.XdsResourceGenerator = &sdsservice{}

func NewXdsServer(stop chan struct{}, gen model.XdsResourceGenerator) *xds.DiscoveryServer {
s := xds.NewXDS(stop)
// No ratelimit for SDS calls in agent.
s.DiscoveryServer.RequestRateLimit = rate.NewLimiter(rate.Limit(math.MaxInt), 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rate.Inf exists, but its basically the same

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@istio-testing istio-testing added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 30, 2022
@ramaraochavali
Copy link
Contributor Author

@howardjohn fixed logs

@ramaraochavali
Copy link
Contributor Author

@howardjohn can you PTAL when you get chance?

@@ -58,6 +59,8 @@ var _ model.XdsResourceGenerator = &sdsservice{}

func NewXdsServer(stop chan struct{}, gen model.XdsResourceGenerator) *xds.DiscoveryServer {
s := xds.NewXDS(stop)
// No ratelimit for SDS calls in agent.
s.DiscoveryServer.RequestRateLimit = rate.NewLimiter(rate.Inf, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about not setting limit at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be good. I see we already handle zero limit

if s.requestRateLimit.Limit() == 0 {

@@ -147,11 +147,15 @@ func (s *DiscoveryServer) receive(con *Connection, identities []string) {
req, err := con.stream.Recv()
if err != nil {
if istiogrpc.IsExpectedGRPCError(err) {
log.Infof("ADS: %q %s terminated", con.peerAddr, con.conID)
if req != nil && req.TypeUrl != v3.SecretType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you check not secret type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because for secret types we do not want to log this messages as it can be 1000s as it logs for every secret.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np, without ratelimit, will error happen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually important for new connection with out this check

2022-09-29T03:33:23.515201Z info ads ADS: new connection for node:istio-ingressgateway-processing-us-west-2c-6c74f945bc-6cpvz.sfproxy-13

lots of these mis leading logs with come. So I protected at both places

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can happen only at L180, how about changing to debug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that log at info level is useful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we are hiding a useful error message. This seems to be because we are mixing "SecretType" to mean "istio-agent SDS server", when its not the case...

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Copy link
Member

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel comfortable hiding all the secret errors. IMO if we want to modify logging (which is a big if, it seems over complex), it should be done as DiscoveryServer params, not based on Type

@@ -147,11 +147,15 @@ func (s *DiscoveryServer) receive(con *Connection, identities []string) {
req, err := con.stream.Recv()
if err != nil {
if istiogrpc.IsExpectedGRPCError(err) {
log.Infof("ADS: %q %s terminated", con.peerAddr, con.conID)
if req != nil && req.TypeUrl != v3.SecretType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we are hiding a useful error message. This seems to be because we are mixing "SecretType" to mean "istio-agent SDS server", when its not the case...

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@ramaraochavali
Copy link
Contributor Author

Since there is no consensus on logging changes, I have reverted them for now. I certainly think

2022-09-29T03:33:23.515201Z info ads ADS: new connection for node:istio-ingressgateway-processing-us-west-2c-6c74f945bc-6cpvz.sfproxy-13

is pretty confusing log line in agent but I will think about it a bit more to get proper solution.

@istio-testing istio-testing merged commit ac7f820 into istio:master Oct 11, 2022
@ramaraochavali ramaraochavali deleted the fix/sds_service branch October 11, 2022 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes-none Indicates a PR that does not require release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants