Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky Test (8/100k): RingHash_ReattemptWhenGoingFromTransientFailureToIdle #7464

Closed
easwars opened this issue Jul 31, 2024 · 11 comments · Fixed by #7784
Closed

Flaky Test (8/100k): RingHash_ReattemptWhenGoingFromTransientFailureToIdle #7464

easwars opened this issue Jul 31, 2024 · 11 comments · Fixed by #7784
Labels
Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. Type: Bug

Comments

@easwars
Copy link
Contributor

easwars commented Jul 31, 2024

https://github.com/grpc/grpc-go/actions/runs/10063024287/job/27816861885?pr=7430

--- FAIL: Test (12.91s)
    --- FAIL: Test/RingHash_ReattemptWhenGoingFromTransientFailureToIdle (0.08s)
        ringhash_balancer_test.go:1522: Created new snapshot cache...
        tlogger.go:116: INFO server.go:685 [core] [Server #547]Server created  (t=+1.309591ms)
        ringhash_balancer_test.go:1522: Registered Aggregated Discovery Service (ADS)...
        ringhash_balancer_test.go:1522: xDS management server serving at: 127.0.0.1:41641...
        server.go:229: Created new resource snapshot...
        server.go:235: Updated snapshot cache with resource snapshot...
        tlogger.go:116: INFO clientconn.go:1683 [core] original dial target is: "xds:///test.server"  (t=+2.954029ms)
        tlogger.go:116: INFO clientconn.go:310 [core] [Channel #548]Channel created  (t=+3.010395ms)
        tlogger.go:116: INFO clientconn.go:192 [core] [Channel #548]parsed dial target is: resolver.Target{URL:url.URL{Scheme:"xds", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/test.server", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}  (t=+3.092208ms)
        tlogger.go:116: INFO clientconn.go:193 [core] [Channel #548]Channel authority set to "test.server"  (t=+3.143404ms)
        tlogger.go:116: INFO server.go:881 [core] [Server #547 ListenSocket #549]ListenSocket created  (t=+3.301299ms)
        tlogger.go:116: INFO xds_resolver.go:86 [xds] [xds-resolver 0xc00103c240] Creating resolver for target: xds:///test.server  (t=+3.398251ms)
        tlogger.go:116: INFO client_refcounted.go:81 [xds] [xds-client 0xc000c363c0] Created client with name "xds:///test.server" to primary xDS management server: "passthrough:///127.0.0.1:41641-insecure"  (t=+4.02296ms)
        tlogger.go:116: INFO client_refcounted.go:86 [xds] xDS node ID: 4d0a6692-9942-4553-9a52-876e89657de8  (t=+4.09212ms)
        tlogger.go:116: INFO clientconn.go:1683 [core] original dial target is: "passthrough:///127.0.0.1:41641"  (t=+4.171568ms)
        tlogger.go:116: INFO clientconn.go:310 [core] [Channel #550]Channel created  (t=+4.224939ms)
        tlogger.go:116: INFO clientconn.go:192 [core] [Channel #550]parsed dial target is: resolver.Target{URL:url.URL{Scheme:"passthrough", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/127.0.0.1:41641", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}  (t=+4.296362ms)
        tlogger.go:116: INFO clientconn.go:193 [core] [Channel #550]Channel authority set to "127.0.0.1:41641"  (t=+4.352016ms)
        tlogger.go:116: INFO resolver_wrapper.go:197 [core] [Channel #550]Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "127.0.0.1:41641",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Metadata": null
                }
              ],
              "Endpoints": [
                {
                  "Addresses": [
                    {
                      "Addr": "127.0.0.1:41641",
                      "ServerName": "",
                      "Attributes": null,
                      "BalancerAttributes": null,
                      "Metadata": null
                    }
                  ],
                  "Attributes": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+4.587727ms)
        tlogger.go:116: INFO balancer_wrapper.go:106 [core] [Channel #550]Channel switches to new LB policy "pick_first"  (t=+4.709425ms)
        tlogger.go:116: INFO clientconn.go:847 [core] [Channel #550 SubChannel #551]Subchannel created  (t=+4.791308ms)
        tlogger.go:116: INFO clientconn.go:540 [core] [Channel #550]Channel Connectivity change to CONNECTING  (t=+4.858024ms)
        tlogger.go:116: INFO clientconn.go:1[207](https://github.com/grpc/grpc-go/actions/runs/10063024287/job/27816861885?pr=7430#step:8:208) [core] [Channel #550 SubChannel #551]Subchannel Connectivity change to CONNECTING  (t=+4.957991ms)
        tlogger.go:116: INFO clientconn.go:1325 [core] [Channel #550 SubChannel #551]Subchannel picks a new address "127.0.0.1:41641" to connect  (t=+5.014947ms)
        tlogger.go:116: INFO clientconn.go:310 [core] [Channel #550]Channel exiting idle mode  (t=+5.469658ms)
        tlogger.go:116: INFO transport.go:237 [xds] [xds-client 0xc000c363c0] [passthrough:///127.0.0.1:41641] Created transport to server "passthrough:///127.0.0.1:41641"  (t=+5.550189ms)
        tlogger.go:116: INFO clientconn.go:310 [core] [Channel #548]Channel exiting idle mode  (t=+5.711882ms)
        tlogger.go:116: INFO clientconn.go:1207 [core] [Channel #550 SubChannel #551]Subchannel Connectivity change to READY  (t=+6.338214ms)
        tlogger.go:116: INFO clientconn.go:540 [core] [Channel #550]Channel Connectivity change to READY  (t=+6.413635ms)
        tlogger.go:116: INFO transport.go:338 [xds] [xds-client 0xc000c363c0] [passthrough:///127.0.0.1:41641] ADS stream created  (t=+6.491641ms)
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.listener.v3.Listener[test.server] and known map[]. Diff [test.server]
        logging.go:30: respond type.googleapis.com/envoy.config.listener.v3.Listener[test.server] version "" with version "1"
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.listener.v3.Listener[test.server] and known map[test.server:{}]. Diff []
        logging.go:30: open watch 1 for type.googleapis.com/envoy.config.listener.v3.Listener[test.server] from nodeID "4d0a6692-9942-4553-9a52-876e89657de8", version "1"
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.route.v3.RouteConfiguration[new_route] and known map[]. Diff [new_route]
        logging.go:30: respond type.googleapis.com/envoy.config.route.v3.RouteConfiguration[new_route] version "" with version "1"
        tlogger.go:116: INFO xds_resolver.go:270 [xds] [xds-resolver 0xc00103c240] For Listener resource "test.server" and RouteConfiguration resource "new_route", generated service config: {
              "loadBalancingConfig": [
                {
                  "xds_cluster_manager_experimental": {
                    "children": {
                      "cluster:cluster": {
                        "childPolicy": [
                          {
                            "cds_experimental": {
                              "cluster": "cluster"
                            }
                          }
                        ]
                      }
                    }
                  }
                }
              ]
            }  (t=+8.84713ms)
        tlogger.go:116: INFO resolver_wrapper.go:197 [core] [Channel #548]Resolver state updated: {
              "Addresses": null,
              "Endpoints": [],
              "ServiceConfig": {
                "Config": {
                  "Config": null,
                  "Methods": {}
                },
                "Err": null
              },
              "Attributes": {
                "\u003c%!p(resolver.csKeyType=grpc.internal.resolver.configSelector)\u003e": "\u003c0xc000b36aa0\u003e",
                "\u003c%!p(xdsclient.clientKeyType=grpc.xds.internal.client.Client)\u003e": "\u003c0xc000c42910\u003e"
              }
            } (service config updated)  (t=+9.241649ms)
        tlogger.go:116: INFO balancer_wrapper.go:106 [core] [Channel #548]Channel switches to new LB policy "xds_cluster_manager_experimental"  (t=+9.346796ms)
        tlogger.go:116: INFO clustermanager.go:59 [xds] [xds-cluster-manager-lb 0xc00016ab60] Created  (t=+9.404003ms)
        tlogger.go:116: INFO clustermanager.go:174 [xds] [xds-cluster-manager-lb 0xc00016ab60] Update with config {
              "LoadBalancingConfig": null,
              "Children": {
                "cluster:cluster": {
                  "ChildPolicy": [
                    {
                      "cds_experimental": {
                        "LoadBalancingConfig": null,
                        "Cluster": "cluster"
                      }
                    }
                  ]
                }
              }
            }, resolver state {Addresses:[] Endpoints:[] ServiceConfig:0xc00016a840 Attributes:{"<%!p(resolver.csKeyType=grpc.internal.resolver.configSelector)>": "<0xc000b36aa0>" , "<%!p(xdsclient.clientKeyType=grpc.xds.internal.client.Client)>": "<0xc000c42910>" }}  (t=+9.561247ms)
        tlogger.go:116: INFO balancergroup.go:305 [xds] [xds-cluster-manager-lb 0xc00016ab60] Adding child policy of type "cds_experimental" for locality "cluster:cluster"  (t=+9.627281ms)
        tlogger.go:116: INFO balancergroup.go:102 [xds] [xds-cluster-manager-lb 0xc00016ab60] Creating child policy of type "cds_experimental" for locality "cluster:cluster"  (t=+9.668347ms)
        tlogger.go:116: INFO cdsbalancer.go:109 [xds] [cds-lb 0xc00034f500] Created  (t=+9.7[208](https://github.com/grpc/grpc-go/actions/runs/10063024287/job/27816861885?pr=7430#step:8:209)16ms)
        tlogger.go:116: INFO cdsbalancer.go:121 [xds] [cds-lb 0xc00034f500] xDS credentials in use: false  (t=+9.805164ms)
        tlogger.go:116: INFO cdsbalancer.go:290 [xds] [cds-lb 0xc00034f500] Received balancer config update: {
              "LoadBalancingConfig": null,
              "Cluster": "cluster"
            }  (t=+9.869424ms)
        tlogger.go:116: INFO balancerstateaggregator.go:191 [xds] [xds-cluster-manager-lb 0xc00016ab60] Child pickers: map[cluster:cluster:picker:0xc000ca6bb0,state:CONNECTING,stateToAggregate:CONNECTING]  (t=+10.04887ms)
        tlogger.go:116: INFO clientconn.go:540 [core] [Channel #548]Channel Connectivity change to CONNECTING  (t=+10.110194ms)
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.route.v3.RouteConfiguration[new_route] and known map[new_route:{}]. Diff []
        logging.go:30: open watch 2 for type.googleapis.com/envoy.config.route.v3.RouteConfiguration[new_route] from nodeID "4d0a6692-9942-4553-9a52-876e89657de8", version "1"
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.cluster.v3.Cluster[cluster] and known map[]. Diff [cluster]
        logging.go:30: respond type.googleapis.com/envoy.config.cluster.v3.Cluster[cluster] version "" with version "1"
        tlogger.go:116: INFO cdsbalancer.go:414 [xds] [cds-lb 0xc00034f500] Received Cluster resource: {
              "ClusterType": 0,
              "ClusterName": "cluster",
              "EDSServiceName": "cluster",
              "LRSServerConfig": null,
              "SecurityCfg": null,
              "MaxRequests": null,
              "DNSHostName": "",
              "PrioritizedClusterNames": null,
              "LBPolicy": [
                {
                  "ring_hash_experimental": {
                    "minRingSize": 1024,
                    "maxRingSize": 8388608
                  }
                }
              ],
              "OutlierDetection": null,
              "Raw": {
                "type_url": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
                "value": "CgdjbHVzdGVyGg0KAhoAEgdjbHVzdGVyMAIQAw=="
              },
              "TelemetryLabels": {
                "csm.service_name": "unknown",
                "csm.service_namespace_name": "unknown"
              }
            }  (t=+12.024167ms)
        tlogger.go:116: INFO clusterresolver.go:85 [xds] [xds-cluster-resolver-lb 0xc0008ce4e0] Created  (t=+12.117442ms)
        tlogger.go:116: INFO cdsbalancer.go:455 [xds] [cds-lb 0xc00034f500] Created child policy 0xc0008ce4e0 of type cluster_resolver_experimental  (t=+12.174428ms)
        tlogger.go:116: INFO clusterresolver.go:187 [xds] [xds-cluster-resolver-lb 0xc0008ce4e0] Received new balancer config: {
              "discoveryMechanisms": [
                {
                  "cluster": "cluster",
                  "edsServiceName": "cluster",
                  "outlierDetection": {},
                  "telemetryLabels": {
                    "csm.service_name": "unknown",
                    "csm.service_namespace_name": "unknown"
                  }
                }
              ],
              "xdsLbPolicy": [
                {
                  "ring_hash_experimental": {
                    "minRingSize": 1024,
                    "maxRingSize": 8388608
                  }
                }
              ]
            }  (t=+12.57066ms)
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.cluster.v3.Cluster[cluster] and known map[cluster:{}]. Diff []
        logging.go:30: open watch 3 for type.googleapis.com/envoy.config.cluster.v3.Cluster[cluster] from nodeID "4d0a6692-9942-4553-9a52-876e89657de8", version "1"
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment[cluster] and known map[]. Diff [cluster]
        logging.go:30: respond type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment[cluster] version "" with version "1"
        tlogger.go:116: WARNING unmarshal_eds.go:125 [xds] [xds-resource] Ignoring locality {
              "region": "region-1",
              "zone": "zone-1",
              "subZone": "subzone-1"
            } with weight 0  (t=+13.409911ms)
        tlogger.go:116: INFO balancer.go:76 [xds] [priority-lb 0xc0000e81c0] Created  (t=+13.556706ms)
        tlogger.go:116: INFO balancergroup.go:544 [xds] [xds-cluster-manager-lb 0xc00016ab60] Balancer state update from locality cluster:cluster, new state: {ConnectivityState:CONNECTING Picker:0xc0001117e0}  (t=+14.46194ms)
        tlogger.go:116: INFO balancerstateaggregator.go:147 [xds] [xds-cluster-manager-lb 0xc00016ab60] State update from sub-balancer "cluster:cluster": {ConnectivityState:CONNECTING Picker:0xc0001117e0}  (t=+14.521051ms)
        tlogger.go:116: INFO balancerstateaggregator.go:191 [xds] [xds-cluster-manager-lb 0xc00016ab60] Child pickers: map[cluster:cluster:picker:0xc0001117e0,state:CONNECTING,stateToAggregate:CONNECTING]  (t=+14.574571ms)
        tlogger.go:116: INFO balancergroup.go:305 [xds] [priority-lb 0xc0000e81c0] Adding child policy of type "outlier_detection_experimental" for locality "priority-0-0"  (t=+14.626268ms)
        tlogger.go:116: INFO balancergroup.go:102 [xds] [priority-lb 0xc0000e81c0] Creating child policy of type "outlier_detection_experimental" for locality "priority-0-0"  (t=+14.675109ms)
        tlogger.go:116: INFO balancer.go:74 [xds] [outlier-detection-lb 0xc0007f91e0] Created  (t=+14.727637ms)
        tlogger.go:116: INFO clusterimpl.go:84 [xds] [xds-cluster-impl-lb 0xc000174008] Created  (t=+14.796336ms)
        tlogger.go:116: INFO ringhash.go:55 [xds] [ring-hash-lb 0xc001108280] Created  (t=+14.871827ms)
        tlogger.go:116: INFO ringhash.go:271 [xds] [ring-hash-lb 0xc001108280] Received update from resolver, balancer config: {
              "minRingSize": 1024,
              "maxRingSize": 4096
            }  (t=+14.956245ms)
        tlogger.go:116: INFO balancergroup.go:544 [xds] [priority-lb 0xc0000e81c0] Balancer state update from locality priority-0-0, new state: {ConnectivityState:TRANSIENT_FAILURE Picker:0xc000db6498}  (t=+15.054279ms)
        tlogger.go:116: WARNING balancer_child.go:118 [xds] [priority-lb 0xc0000e81c0] Failed to update state for child policy "priority-0-0": bad resolver state  (t=+15.097871ms)
        tlogger.go:116: INFO balancergroup.go:544 [xds] [xds-cluster-manager-lb 0xc00016ab60] Balancer state update from locality cluster:cluster, new state: {ConnectivityState:TRANSIENT_FAILURE Picker:0xc000db6498}  (t=+15.149177ms)
        tlogger.go:116: INFO balancerstateaggregator.go:147 [xds] [xds-cluster-manager-lb 0xc00016ab60] State update from sub-balancer "cluster:cluster": {ConnectivityState:TRANSIENT_FAILURE Picker:0xc000db6498}  (t=+15.19389ms)
        tlogger.go:116: INFO balancerstateaggregator.go:191 [xds] [xds-cluster-manager-lb 0xc00016ab60] Child pickers: map[cluster:cluster:picker:0xc000db6498,state:TRANSIENT_FAILURE,stateToAggregate:TRANSIENT_FAILURE]  (t=+15.251809ms)
        tlogger.go:116: INFO clientconn.go:540 [core] [Channel #548]Channel Connectivity change to TRANSIENT_FAILURE  (t=+15.309176ms)
        tlogger.go:116: INFO server.go:685 [core] [Server #554]Server created  (t=+15.654513ms)
        stubserver.go:266: Started test service backend at "127.0.0.1:45139"
        ringhash_balancer_test.go:1549: Updating EDS with a new backend endpoint.
        server.go:229: Created new resource snapshot...
        logging.go:30: respond open watch 1 type.googleapis.com/envoy.config.listener.v3.Listener[test.server] with new version "2"
        logging.go:30: respond type.googleapis.com/envoy.config.listener.v3.Listener[test.server] version "1" with version "2"
        logging.go:30: respond open watch 2 type.googleapis.com/envoy.config.route.v3.RouteConfiguration[new_route] with new version "2"
        logging.go:30: respond type.googleapis.com/envoy.config.route.v3.RouteConfiguration[new_route] version "1" with version "2"
        logging.go:30: respond open watch 3 type.googleapis.com/envoy.config.cluster.v3.Cluster[cluster] with new version "2"
        logging.go:30: respond type.googleapis.com/envoy.config.cluster.v3.Cluster[cluster] version "1" with version "2"
        server.go:235: Updated snapshot cache with resource snapshot...
        tlogger.go:116: INFO server.go:881 [core] [Server #554 ListenSocket #555]ListenSocket created  (t=+16.230751ms)
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment[cluster] and known map[cluster:{}]. Diff []
        logging.go:30: respond type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment[cluster] version "1" with version "2"
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.listener.v3.Listener[test.server] and known map[test.server:{}]. Diff []
        logging.go:30: open watch 4 for type.googleapis.com/envoy.config.listener.v3.Listener[test.server] from nodeID "4d0a6692-9942-4553-9a52-876e89657de8", version "2"
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.route.v3.RouteConfiguration[new_route] and known map[new_route:{}]. Diff []
        logging.go:30: open watch 5 for type.googleapis.com/envoy.config.route.v3.RouteConfiguration[new_route] from nodeID "4d0a6692-9942-4553-9a52-876e89657de8", version "2"
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.cluster.v3.Cluster[cluster] and known map[cluster:{}]. Diff []
        logging.go:30: open watch 6 for type.googleapis.com/envoy.config.cluster.v3.Cluster[cluster] from nodeID "4d0a6692-9942-4553-9a52-876e89657de8", version "2"
        tlogger.go:116: INFO balancergroup.go:375 [xds] [priority-lb 0xc0000e81c0] Removing child policy for locality "priority-0-0"  (t=+18.951905ms)
        tlogger.go:116: INFO balancergroup.go:544 [xds] [xds-cluster-manager-lb 0xc00016ab60] Balancer state update from locality cluster:cluster, new state: {ConnectivityState:CONNECTING Picker:0xc00046b550}  (t=+19.059957ms)
        tlogger.go:116: INFO balancerstateaggregator.go:147 [xds] [xds-cluster-manager-lb 0xc00016ab60] State update from sub-balancer "cluster:cluster": {ConnectivityState:CONNECTING Picker:0xc00046b550}  (t=+19.11468ms)
        tlogger.go:116: INFO balancerstateaggregator.go:191 [xds] [xds-cluster-manager-lb 0xc00016ab60] Child pickers: map[cluster:cluster:picker:0xc00046b550,state:CONNECTING,stateToAggregate:TRANSIENT_FAILURE]  (t=+19.167128ms)
        tlogger.go:116: INFO balancergroup.go:305 [xds] [priority-lb 0xc0000e81c0] Adding child policy of type "outlier_detection_experimental" for locality "priority-0-1"  (t=+19.22193ms)
        tlogger.go:116: INFO balancergroup.go:102 [xds] [priority-lb 0xc0000e81c0] Creating child policy of type "outlier_detection_experimental" for locality "priority-0-1"  (t=+19.263979ms)
        tlogger.go:116: INFO balancer.go:74 [xds] [outlier-detection-lb 0xc0008625b0] Created  (t=+19.311668ms)
        tlogger.go:116: INFO clusterimpl.go:84 [xds] [xds-cluster-impl-lb 0xc0001758c8] Created  (t=+19.376239ms)
        tlogger.go:116: INFO ringhash.go:55 [xds] [ring-hash-lb 0xc001108a80] Created  (t=+19.431783ms)
        tlogger.go:116: INFO ringhash.go:271 [xds] [ring-hash-lb 0xc001108a80] Received update from resolver, balancer config: {
              "minRingSize": 1024,
              "maxRingSize": 4096
            }  (t=+19.511843ms)
        tlogger.go:116: INFO clientconn.go:847 [core] [Channel #548 SubChannel #556]Subchannel created  (t=+19.613073ms)
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment[cluster] and known map[cluster:{}]. Diff []
        logging.go:30: open watch 7 for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment[cluster] from nodeID "4d0a6692-9942-4553-9a52-876e89657de8", version "2"
        tlogger.go:116: INFO balancergroup.go:544 [xds] [priority-lb 0xc0000e81c0] Balancer state update from locality priority-0-1, new state: {ConnectivityState:IDLE Picker:0xc000d3da28}  (t=+22.067637ms)
        tlogger.go:116: INFO balancergroup.go:544 [xds] [xds-cluster-manager-lb 0xc00016ab60] Balancer state update from locality cluster:cluster, new state: {ConnectivityState:IDLE Picker:0xc000d3da28}  (t=+22.148719ms)
        tlogger.go:116: INFO balancerstateaggregator.go:147 [xds] [xds-cluster-manager-lb 0xc00016ab60] State update from sub-balancer "cluster:cluster": {ConnectivityState:IDLE Picker:0xc000d3da28}  (t=+22.216936ms)
        tlogger.go:116: INFO balancerstateaggregator.go:191 [xds] [xds-cluster-manager-lb 0xc00016ab60] Child pickers: map[cluster:cluster:picker:0xc000d3da28,state:IDLE,stateToAggregate:IDLE]  (t=+22.277089ms)
        tlogger.go:116: INFO clientconn.go:540 [core] [Channel #548]Channel Connectivity change to IDLE  (t=+22.353271ms)
        tlogger.go:116: INFO ringhash.go:166 [xds] [ring-hash-lb 0xc001108a80] [subConn 0xc0001153e0] Executing a queued connect for subConn in state: IDLE  (t=+22.449852ms)
        tlogger.go:116: INFO clientconn.go:1207 [core] [Channel #548 SubChannel #556]Subchannel Connectivity change to CONNECTING  (t=+22.584935ms)
        tlogger.go:116: INFO ringhash.go:357 [xds] [ring-hash-lb 0xc001108a80] SubConn's effective old state was: IDLE, new state is CONNECTING  (t=+22.69[209](https://github.com/grpc/grpc-go/actions/runs/10063024287/job/27816861885?pr=7430#step:8:210)6ms)
        tlogger.go:116: INFO ringhash.go:376 [xds] [ring-hash-lb 0xc001108a80] Pushing new state CONNECTING and picker 0xc000ca8978  (t=+23.372038ms)
        tlogger.go:116: INFO balancergroup.go:544 [xds] [priority-lb 0xc0000e81c0] Balancer state update from locality priority-0-1, new state: {ConnectivityState:CONNECTING Picker:0xc000ca89c0}  (t=+23.500399ms)
        tlogger.go:116: INFO balancergroup.go:544 [xds] [xds-cluster-manager-lb 0xc00016ab60] Balancer state update from locality cluster:cluster, new state: {ConnectivityState:CONNECTING Picker:0xc000ca89c0}  (t=+23.561233ms)
        tlogger.go:116: INFO balancerstateaggregator.go:147 [xds] [xds-cluster-manager-lb 0xc00016ab60] State update from sub-balancer "cluster:cluster": {ConnectivityState:CONNECTING Picker:0xc000ca89c0}  (t=+23.612428ms)
        tlogger.go:116: INFO balancerstateaggregator.go:191 [xds] [xds-cluster-manager-lb 0xc00016ab60] Child pickers: map[cluster:cluster:picker:0xc000ca89c0,state:CONNECTING,stateToAggregate:CONNECTING]  (t=+23.664025ms)
        tlogger.go:116: INFO clientconn.go:540 [core] [Channel #548]Channel Connectivity change to CONNECTING  (t=+23.720811ms)
        tlogger.go:116: INFO clientconn.go:1325 [core] [Channel #548 SubChannel #556]Subchannel picks a new address ":45139" to connect  (t=+23.80537ms)
        tlogger.go:116: INFO clientconn.go:1207 [core] [Channel #548 SubChannel #556]Subchannel Connectivity change to READY  (t=+24.906601ms)
        tlogger.go:116: INFO ringhash.go:357 [xds] [ring-hash-lb 0xc001108a80] SubConn's effective old state was: CONNECTING, new state is READY  (t=+25.040091ms)
        tlogger.go:116: INFO ringhash.go:376 [xds] [ring-hash-lb 0xc001108a80] Pushing new state READY and picker 0xc000d3dae8  (t=+25.711337ms)
        tlogger.go:116: INFO balancergroup.go:544 [xds] [priority-lb 0xc0000e81c0] Balancer state update from locality priority-0-1, new state: {ConnectivityState:READY Picker:0xc000d3db30}  (t=+25.866027ms)
        tlogger.go:116: INFO balancergroup.go:544 [xds] [xds-cluster-manager-lb 0xc00016ab60] Balancer state update from locality cluster:cluster, new state: {ConnectivityState:READY Picker:0xc000d3db30}  (t=+25.957648ms)
        tlogger.go:116: INFO balancerstateaggregator.go:147 [xds] [xds-cluster-manager-lb 0xc00016ab60] State update from sub-balancer "cluster:cluster": {ConnectivityState:READY Picker:0xc000d3db30}  (t=+26.026177ms)
        tlogger.go:116: INFO balancerstateaggregator.go:191 [xds] [xds-cluster-manager-lb 0xc00016ab60] Child pickers: map[cluster:cluster:picker:0xc000d3db30,state:READY,stateToAggregate:READY]  (t=+26.082812ms)
        ringhash_balancer_test.go:1566: conn.GetState(): got CONNECTING, want READY
        tlogger.go:116: INFO clientconn.go:540 [core] [Channel #548]Channel Connectivity change to READY  (t=+27.04844ms)
        tlogger.go:116: INFO clientconn.go:540 [core] [Channel #548]Channel Connectivity change to SHUTDOWN  (t=+27.138549ms)
        tlogger.go:116: INFO resolver_wrapper.go:100 [core] [Channel #548]Closing the name resolver  (t=+27.20356ms)
        tlogger.go:116: INFO balancer_wrapper.go:144 [core] [Channel #548]ccBalancerWrapper: closing  (t=+27.295783ms)
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.listener.v3.Listener[] and known map[test.server:{}]. Diff []
        logging.go:30: open watch 8 for type.googleapis.com/envoy.config.listener.v3.Listener[] from nodeID "4d0a6692-9942-4553-9a52-876e89657de8", version "2"
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.cluster.v3.Cluster[] and known map[cluster:{}]. Diff []
        logging.go:30: open watch 9 for type.googleapis.com/envoy.config.cluster.v3.Cluster[] from nodeID "4d0a6692-9942-4553-9a52-876e89657de8", version "2"
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment[] and known map[cluster:{}]. Diff []
        logging.go:30: open watch 10 for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment[] from nodeID "4d0a6692-9942-4553-9a52-876e89657de8", version "2"
        tlogger.go:116: INFO watch_service.go:59 [xds] [xds-resolver 0xc00103c240] Canceling watch on Listener resource "test.server"  (t=+27.727762ms)
        logging.go:30: nodeID "4d0a6692-9942-4553-9a52-876e89657de8" requested type.googleapis.com/envoy.config.route.v3.RouteConfiguration[] and known map[new_route:{}]. Diff []
        logging.go:30: open watch 11 for type.googleapis.com/envoy.config.route.v3.RouteConfiguration[] from nodeID "4d0a6692-9942-4553-9a52-876e89657de8", version "2"
        tlogger.go:116: INFO watch_service.go:94 [xds] [xds-resolver 0xc00103c240] Canceling watch on RouteConfiguration resource "new_route"  (t=+28.466244ms)
        tlogger.go:116: INFO ringhash.go:441 [xds] [ring-hash-lb 0xc001108280] Shutdown  (t=+28.884677ms)
        tlogger.go:116: WARNING authority.go:422 [xds] [xds-client 0xc000c363c0] [passthrough:///127.0.0.1:41641] Watchers not notified since ADS stream failed after having received at least one response: rpc error: code = Canceled desc = context canceled  (t=+29.130918ms)
        tlogger.go:116: WARNING transport.go:480 [xds] [xds-client 0xc000c363c0] [passthrough:///127.0.0.1:41641] ADS stream closed: rpc error: code = Canceled desc = context canceled  (t=+29.175331ms)
        tlogger.go:116: INFO clientconn.go:540 [core] [Channel #550]Channel Connectivity change to SHUTDOWN  (t=+29.252185ms)
        tlogger.go:116: INFO resolver_wrapper.go:100 [core] [Channel #550]Closing the name resolver  (t=+29.312979ms)
        tlogger.go:116: INFO balancer_wrapper.go:144 [core] [Channel #550]ccBalancerWrapper: closing  (t=+29.377189ms)
        tlogger.go:116: INFO clientconn.go:1207 [core] [Channel #550 SubChannel #551]Subchannel Connectivity change to SHUTDOWN  (t=+29.461857ms)
        tlogger.go:116: INFO clientconn.go:1556 [core] [Channel #550 SubChannel #551]Subchannel deleted  (t=+29.523783ms)
        tlogger.go:116: INFO clientconn.go:310 [core] [Channel #550]Channel deleted  (t=+29.681408ms)
        tlogger.go:116: INFO clientimpl.go:100 [xds] [xds-client 0xc000c363c0] Shutdown  (t=+29.742172ms)
        tlogger.go:116: INFO xds_resolver.go:239 [xds] [xds-resolver 0xc00103c240] Shutdown  (t=+29.807294ms)
        tlogger.go:116: INFO clusterimpl.go:322 [xds] [xds-cluster-impl-lb 0xc000174008] Shutdown  (t=+29.982322ms)
        tlogger.go:116: INFO ringhash.go:441 [xds] [ring-hash-lb 0xc001108a80] Shutdown  (t=+30.072981ms)
        tlogger.go:116: INFO clusterimpl.go:322 [xds] [xds-cluster-impl-lb 0xc0001758c8] Shutdown  (t=+30.14176ms)
        tlogger.go:116: INFO balancergroup.go:375 [xds] [priority-lb 0xc0000e81c0] Removing child policy for locality "priority-0-1"  (t=+30.229164ms)
        tlogger.go:116: INFO clusterresolver.go:336 [xds] [xds-cluster-resolver-lb 0xc0008ce4e0] Shutdown  (t=+30.271413ms)
        tlogger.go:116: INFO cdsbalancer.go:380 [xds] [cds-lb 0xc00034f500] Shutdown  (t=+30.325614ms)
        tlogger.go:116: INFO clustermanager.go:192 [xds] [xds-cluster-manager-lb 0xc00016ab60] Shutdown  (t=+30.402478ms)
        tlogger.go:116: INFO clientconn.go:1207 [core] [Channel #548 SubChannel #556]Subchannel Connectivity change to SHUTDOWN  (t=+30.512895ms)
        tlogger.go:116: INFO clientconn.go:1556 [core] [Channel #548 SubChannel #556]Subchannel deleted  (t=+30.582194ms)
        tlogger.go:116: INFO clientconn.go:310 [core] [Channel #548]Channel deleted  (t=+30.733277ms)
        tlogger.go:116: INFO server.go:817 [core] [Server #554 ListenSocket #555]ListenSocket deleted  (t=+30.840969ms)
        tlogger.go:116: INFO server.go:817 [core] [Server #547 ListenSocket #549]ListenSocket deleted  (t=+31.008[212](https://github.com/grpc/grpc-go/actions/runs/10063024287/job/27816861885?pr=7430#step:8:213)ms)
FAIL
@easwars
Copy link
Contributor Author

easwars commented Aug 30, 2024

@aranjans
Copy link
Contributor

aranjans commented Sep 2, 2024

I ran this test on forge, 8 failed out of 100k runs.

@dfawley dfawley changed the title Flaky Test: RingHash_ReattemptWhenGoingFromTransientFailureToIdle Flaky Test (8/100k): RingHash_ReattemptWhenGoingFromTransientFailureToIdle Sep 10, 2024
@zasweq
Copy link
Contributor

zasweq commented Sep 10, 2024

@purnesh42H
Copy link
Contributor

@zasweq
Copy link
Contributor

zasweq commented Sep 17, 2024

@purnesh42H purnesh42H added Area: Testing Includes tests and testing utilities that we have for unit and e2e tests within our repo. Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. and removed Area: Testing Includes tests and testing utilities that we have for unit and e2e tests within our repo. labels Sep 23, 2024
@eshitachandwani
Copy link
Member

@zasweq
Copy link
Contributor

zasweq commented Sep 26, 2024

https://github.com/zasweq/grpc-go/actions/runs/11061116597/job/30733087341 this is ring hash IDLE to READY? Same root cause perhaps? Maybe I should track separately but will put here since likely same flakiness.

@zasweq
Copy link
Contributor

zasweq commented Oct 8, 2024

https://github.com/grpc/grpc-go/actions/runs/11244635192/job/31263111110 another test failure in ring hash this time on RingHash_ContinuesConnectingWithoutPicksOneSubchannelAtATime, but perhaps on same root cause. Perhaps this is a real bug?

@purnesh42H
Copy link
Contributor

https://github.com/grpc/grpc-go/actions/runs/11361570171/job/31627478831?pr=7737

  • "Test/RingHash_ReattemptWhenGoingFromTransientFailureToIdle"

@easwars
Copy link
Contributor Author

easwars commented Oct 16, 2024

@purnesh42H
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. Type: Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants