Skip to content

Commit

Permalink
Merge pull request cloudwego#96 from AshleeT/patch-3
Browse files Browse the repository at this point in the history
Update retry.md
  • Loading branch information
GuangmingLuo authored Sep 1, 2021
2 parents 133575e + f255669 commit 09ee57c
Showing 1 changed file with 191 additions and 0 deletions.
191 changes: 191 additions & 0 deletions docs/guide/basic-features/retry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
## Request Retry

### 1. Introduction

There are currently three types of retries: `Timeout Retry`, `Backup Request` and `Connection Failed Retry`. Among them, `Connection Failed Retry` is a network-level problem, since the request is not sent, the framework will retry by default. Here we only present the use of the first two types of retries:

- `Timeout Retry`: Improve the overall success rate of the service.
- `Backup Request`: Reduce delay jitter of request.

Because many requests are not idempotent, these two types of retries are not used as the default policy.

#### 1.1 Attention

- Confirm that your service is idempotent before enable retry.
- `Timeout Retry` will increase overall latency.

### 2. Retry Policy

Only one of the `Timeout Retry` and `Backup Request` policies can be configured.

- `Timeout Retry`

Configuration Item|Default value|Description|Limit
----|----|----|----
`MaxRetryTimes`|2|The first request is not included. If it is configured as 0, it means to stop retrying.|Value: [0-5]
`MaxDurationMS`|0|Including the time-consuming of the first failed request and the retry request. If the limit is reached, the subsequent retry will be stopped. 0 means unlimited. Note: if configured, the configuration item must be greater than the request timeout.
`EERThreshold`|10%|If the method-level request error rate exceeds the threshold, retry stops.|Value: (0-30%]
`ChainStop`|-|`Chain Stop` is enabled by default. If the upstream request is a retry request, it will not be retried after timeout.|>= v0.0.5 as the default policy.
`DDLStop`|false|If the timeout period of overall request chain is reached, the retry request won't be sent with this policy. Notice, Kitex doesn't provide build-in implementation, use `retry.RegisterDDLStop(ddlStopFunc)` to register is needed.
`BackOff`|None|Retry waiting strategy, `NoneBackOff` by default. Optional: `FixedBackOff`, `RandomBackOff`.
`RetrySameNode`|false|By default, Kitex selects another node to retry. If you want to retry on the same node, set this parameter to true.

- `Backup Request`

Configuration Item|Default value|Description|Limit
----|----|----|----
`RetryDelayMS`|-|Duration of waiting for initiating a Backup Requset when the first request is not returned. This parameter must be set manually. It is suggested to set as TP99.
`MaxRetryTimes`|1|The first request is not included. If it is configured as 0, it means to stop retrying.|Value: [0-2]
`EERThreshold`|10%|If the method-level request error rate exceeds the threshold, retry stops.|Value: (0-30%]
`ChainStop`|false|`Chain Stop` is enabled by default. If the upstream request is a retry request, it will not be retried after timeout.|>= v0.0.5 as the default policy.
`RetrySameNode`|false|By default, Kitex selects another node to retry. If you want to retry on the same node, set this parameter to true.

### 3. How to use

#### 3.1 Enable by Code Configuration

Note: Dynamic configuration (see 3.3) cannot take effect if retry is enabled by code configuration.

##### 3.1.1 Timeout Retry Configuration

- Configuration e.g.

```go
// import "github.com/cloudwego/kitex/pkg/retry"
fp := retry.NewFailurePolicy()
fp.WithMaxRetryTimes(3) // set the maximum number of retries to 3
xxxCli := xxxservice.NewClient("psm", client.WithFailureRetry(fp))
```

- Strategy selection:

```go
fp := retry.NewFailurePolicy()

// Number of retries. The default value is 2, excluding the first request.
fp.WithMaxRetryTimes(xxx)

// Total time consuming. Including the time-consuming for the first failed request and retry request. If the duration limit is reached, the subsequent retries are stopped.
fp.WithMaxDurationMS(xxx)

// Disable `Chain Stop`
fp.DisableChainRetryStop()

// Enable DDL abort
fp.WithDDLStop()

// Backoff policy. No backoff strategy by default.
fp.WithFixedBackOff(fixMS int) // Fixed backoff
fp.WithRandomBackOff(minMS int, maxMS int) // Random backoff

// Set errRate for retry circuit breaker
fp.WithRetryBreaker(errRate float64)

// Retry on the same node
fp.WithRetrySameNode()
```

##### 3.1.2 Backup Request Configuration

- Retry Delay recommendations

It is recommended to configure as TP99, then 1% request will trigger `Backup Request`.

- Configuration e.g.

```go
// If the first request is not returned after XXX ms, the backup request will be initiated and the `Chain Retry Stop` is enabled
bp := retry.NewBackupPolicy(xxx)
xxxCli := xxxservice.NewClient("psm", client.WithBackupRequest(bp))
```

- Strategy selection:

```go
bp := retry.NewBackupPolicy(xxx)

// Number of retries. The default value is 1, excluding the first request.
bp.WithMaxRetryTimes(xxx)

// Disable `Chain Stop`
bp.DisableChainRetryStop()

// Set errRate for retry circuit breaker
bp.WithRetryBreaker(errRate float64)

// Retry on the same node
bp.WithRetrySameNode()
```

#### 3.2 Circuit Breaker Reuse

When circuit breaker is enabled for a service, you can reuse the breaker's statistics to reduce additional CPU consumption. Note that the error rate threshold for retries must be lower than the threshold for a service, as follows:

```go
// 1. Initialize kitex's built-in cbsuite
cbs := circuitbreak.NewCBSuite(circuitbreak.RPCInfo2Key)// 2. Initialize retryContainer, passing in ServiceControl and ServicePanel
retryC := retry.NewRetryContainerWithCB(cs.cbs.ServiceControl(), cs.cbs.ServicePanel())

var opts []client.Option
// 3. Set retryContainer
opts = append(opts, client.WithRetryContainer(retryC))
// 4. Set Service circuit breaker
opts = append(opts, client.WithMiddleware(cbs.ServiceCBMW()))

// 5. Initialize Client and pass in the configuration option
cli, err := xxxservice.NewClient(targetService, opts...)
```

#### 3.3 Dynamic open or adjust strategy

If you want to adjust the policy in combination with remote configuration, dynamic open retry, or runtime, you can take effect through the `NotifyPolicyChange` method of `retryContainer`. Currently, the open source version of Kitex does not provide a remote configuration module, and users can integrate their own configuration center. Note: If it is turned on through code configuration, dynamic configuration cannot take effect.

Use case:

```go
retryC := retry.NewRetryContainer()
// demo
// 1. define your change func
// 2. exec yourChangeFunc in your config module
yourChangeFunc := func(key string, oldData, newData interface{}) {
newConf := newData.(*retry.Policy)
method := parseMethod(key)
retryC.NotifyPolicyChange(method, policy)
}


// configure retryContainer
cli, err := xxxservice.NewClient(targetService, client.WithRetryContainer(retryC))
```

#### 4. Tracking

Kitex records the retry times and previous request time in `rpcInfo`. You can report or output a retry request based on the `retry Tag` in Client's `metric` or log through:

```go
var retryCount string
var lastCosts string

toInfo := rpcinfo.GetRPCInfo(ctx).To()
if retryTag, ok := toInfo.Tag(rpcinfo.RetryTag); ok {
retryCount = retryTag
if lastCostTag, ok := toInfo.Tag(rpcinfo.RetryLastCostTag); ok {
lastCosts = lastCostTag
}
}
```

#### 5. Downstream identification

If using `TTHeader` as the transport protocol, you can determine if the downstream `handler` is currently a retry request and decide whether to continue processing.

```go
retryReqCount, exist := metainfo.GetPersistentValue(ctx,retry.TransitKey)
```

For example, `retryReqCount = 2`, which means the second retry request (excluding the first request), then the business degradation strategy can be adopted(non-retry requests do not have this information).

>Question: `Chain Stop` is enabled by default, is it necessary for services to identify retry requests?
>Answer:`Chain Stop` means that the retry request on the chain will not be retried. Assuming that there is a request chain `A->B->C`, `A` sends a retry request to `B`, while during `B->C`, if a timeout occurs or `Backup` is configured, `B` will not send a retry request to `C`. If the service can identify the retry request, it can directly decide whether to continue the request to `C`.
In short, `Chain Stop` avoids retry amplification caused by `B` sending a retry request to `C`. The service's own control can completely avoid requests from `B` to `C`.

0 comments on commit 09ee57c

Please sign in to comment.