vminsert: change replication implementation

### Is your feature request related to a problem? Please describe

vminsert has the `-replicationFactor=N` command-line flag to store N copies of every ingested sample on N distinct vmstorage nodes.
Currently, it takes two steps for vminsert to ingest data to vmstorage nodes.

<img width="649" alt="Image" src="https://github.com/user-attachments/assets/e52a5f3d-7213-4a97-822f-0d7972f03340" />

**step1**: vminsert calculates the metric name and labels hash to decide which node's buff it will send the data to;
**step2**: buffer data is sent to N distinct vmstorage nodes, starting with the corresponding node. For example, with `-replicationFactor=1`, data in **storageNode1** buffer will be sent to **storageNode1**, if failed, try **storageNode2**...

This separates the sharding and replication logic, and the behaviors are also affected by flags `-disableRerouting(default true)`, `-disableReroutingOnUnavailable(default false)` and `-dropSamplesOnOverload(default false)`.

However, it could cause the following issues(assuming `-replicationFactor=2`, and a series that is supposed to be assigned to the **storageNode1** buffer):

1. With the default `-disableRerouting=true, -disableReroutingOnUnavailable=false`: when **storageNode2** is down, its workload will be evenly distributed among the other three nodes. However, the data that was duplicated to **storageNode2** from **storageNode1**'s buffer will all be sent to **storageNode3**, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8103.
2. With `-disableReroutingOnUnavailable=true`: when **storageNode1** is down, the series won't be append to buffer at all, then **storageNode2** won't receive it, resulting in complete data loss;
3. With `-dropSamplesOnOverload=true`: when **storageNode1** is degraded due to host issues or noisy neighbor, the series won't be append to buffer as well, then **storageNode2** won't receive it, resulting in complete data loss;

Issue 1 degrades the cluster performance.
Issue2&3 break the expected replicated data as there is only one storage node down, while the `-replicationFactor=2`.

### Describe the solution you'd like

Performing data replication before sending data to the storageNode buffer.

<img width="678" alt="Image" src="https://github.com/user-attachments/assets/4615c743-1aee-496a-b152-7813ddc15890" />

In this implementation, if a series is assigned to **storageNode1**&**storageNode2**(refer to the issues mentioned above):
1. With the default `-disableRerouting=true, -disableReroutingOnUnavailable=false`: when **storageNode2** is down, the behavior remains unchanged, but it can be improved by setting `-disableReroutingOnUnavailable=true` if necessary, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8103#issuecomment-2608921006.
2. With `-disableReroutingOnUnavailable=true`: when **storageNode1** is down, series only goes to **storageNode2**(buffer);
3. With `-dropSamplesOnOverload=true`: when **storageNode** 1 is degraded, series only goes to **storageNode2**(buffer).

Regarding sending data from the buffer to the actual storage node:
1. With the default `-disableRerouting=true`, if data can't be sent from buffer to the corresponding storage node successfully, we drop it directly.
2. If user specifies `-disableRerouting=false` and data can't be sent from buffer to the corresponding storage node, we try to reroute it to another storage node(may need an additional check to prevent writing to the same node when `replicationFactor>1`).

### Describe alternatives you've considered

_No response_

### Additional information

related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6801, https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7995

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vminsert: change replication implementation #8044

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development