Description
Is your feature request related to a problem? Please describe
vminsert has the -replicationFactor=N
command-line flag to store N copies of every ingested sample on N distinct vmstorage nodes.
Currently, it takes two steps for vminsert to ingest data to vmstorage nodes.
step1: vminsert calculates the metric name and labels hash to decide which node's buff it will send the data to;
step2: buffer data is sent to N distinct vmstorage nodes, starting with the corresponding node. For example, with -replicationFactor=1
, data in storageNode1 buffer will be sent to storageNode1, if failed, try storageNode2...
This separates the sharding and replication logic, and the behaviors are also affected by flags -disableRerouting(default true)
, -disableReroutingOnUnavailable(default false)
and -dropSamplesOnOverload(default false)
.
However, it could cause the following issues(assuming -replicationFactor=2
, and a series that is supposed to be assigned to the storageNode1 buffer):
- With the default
-disableRerouting=true, -disableReroutingOnUnavailable=false
: when storageNode2 is down, its workload will be evenly distributed among the other three nodes. However, the data that was duplicated to storageNode2 from storageNode1's buffer will all be sent to storageNode3, see vmstorage pod restart causing cpu spike on sequentially higher vmstorage podΒ #8103. - With
-disableReroutingOnUnavailable=true
: when storageNode1 is down, the series won't be append to buffer at all, then storageNode2 won't receive it, resulting in complete data loss; - With
-dropSamplesOnOverload=true
: when storageNode1 is degraded due to host issues or noisy neighbor, the series won't be append to buffer as well, then storageNode2 won't receive it, resulting in complete data loss;
Issue 1 degrades the cluster performance.
Issue2&3 break the expected replicated data as there is only one storage node down, while the -replicationFactor=2
.
Describe the solution you'd like
Performing data replication before sending data to the storageNode buffer.
In this implementation, if a series is assigned to storageNode1&storageNode2(refer to the issues mentioned above):
- With the default
-disableRerouting=true, -disableReroutingOnUnavailable=false
: when storageNode2 is down, the behavior remains unchanged, but it can be improved by setting-disableReroutingOnUnavailable=true
if necessary, see vmstorage pod restart causing cpu spike on sequentially higher vmstorage podΒ #8103 (comment). - With
-disableReroutingOnUnavailable=true
: when storageNode1 is down, series only goes to storageNode2(buffer); - With
-dropSamplesOnOverload=true
: when storageNode 1 is degraded, series only goes to storageNode2(buffer).
Regarding sending data from the buffer to the actual storage node:
- With the default
-disableRerouting=true
, if data can't be sent from buffer to the corresponding storage node successfully, we drop it directly. - If user specifies
-disableRerouting=false
and data can't be sent from buffer to the corresponding storage node, we try to reroute it to another storage node(may need an additional check to prevent writing to the same node whenreplicationFactor>1
).
Describe alternatives you've considered
No response