Skip to content

Commit

Permalink
Deprecate mydumper tikv importer (pingcap#15289)
Browse files Browse the repository at this point in the history
  • Loading branch information
Frank945946 authored and Oreoxmt committed Nov 13, 2023
1 parent d03a949 commit e35db63
Show file tree
Hide file tree
Showing 13 changed files with 17 additions and 178 deletions.
2 changes: 0 additions & 2 deletions binary-package.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ The `TiDB-community-toolkit` package contains the following contents.

| Content | Change history |
|---|---|
| tikv-importer-{version}-linux-{arch}.tar.gz | |
| pd-recover-{version}-linux-{arch}.tar.gz | |
| etcdctl | New in v6.0.0 |
| tiup-linux-{arch}.tar.gz | |
Expand All @@ -67,7 +66,6 @@ The `TiDB-community-toolkit` package contains the following contents.
| sync_diff_inspector | |
| reparo | |
| arbiter | |
| mydumper | New in v6.0.0 |
| server-{version}-linux-{arch}.tar.gz | New in v6.2.0 |
| grafana-{version}-linux-{arch}.tar.gz | New in v6.2.0 |
| alertmanager-{version}-linux-{arch}.tar.gz | New in v6.2.0 |
Expand Down
2 changes: 1 addition & 1 deletion develop/dev-guide-timeouts-in-tidb.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ TiDB's transaction implementation uses the MVCC (Multiple Version Concurrency Co

By default, each MVCC version (consistency snapshots) is kept for 10 minutes. Transactions that take longer than 10 minutes to read will receive an error `GC life time is shorter than transaction duration`.

If you need longer read time, for example, when you are using **Mydumper** for full backups (**Mydumper** backs up consistent snapshots), you can adjust the value of `tikv_gc_life_time` in the `mysql.tidb` table in TiDB to increase the MVCC version retention time. Note that `tikv_gc_life_time` takes effect globally and immediately. Increasing the value will increase the life time of all existing snapshots, and decreasing it will immediately shorten the life time of all snapshots. Too many MVCC versions will impact TiKV's processing efficiency. So you need to change `tikv_gc_life_time` back to the previous setting in time after doing a full backup with **Mydumper**.
If you need longer read time, for example, when you are using **Dumpling** for full backups (**Dumpling** backs up consistent snapshots), you can adjust the value of `tikv_gc_life_time` in the `mysql.tidb` table in TiDB to increase the MVCC version retention time. Note that `tikv_gc_life_time` takes effect globally and immediately. Increasing the value will increase the life time of all existing snapshots, and decreasing it will immediately shorten the life time of all snapshots. Too many MVCC versions will impact TiKV's processing efficiency. So you need to change `tikv_gc_life_time` back to the previous setting in time after doing a full backup with **Dumpling**.

For more information about GC, see [GC Overview](/garbage-collection-overview.md).

Expand Down
6 changes: 2 additions & 4 deletions dumpling-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,9 @@ TiDB also provides other tools that you can choose to use as needed.

> **Note:**
>
> PingCAP previously maintained a fork of the [mydumper project](https://github.com/maxbube/mydumper) with enhancements specific to TiDB. This fork has since been replaced by [Dumpling](/dumpling-overview.md), which has been rewritten in Go, and supports more optimizations that are specific to TiDB. It is strongly recommended that you use Dumpling instead of mydumper.
>
> For more information on Mydumper, refer to [v4.0 Mydumper documentation](https://docs.pingcap.com/tidb/v4.0/backup-and-restore-using-mydumper-lightning).
> PingCAP previously maintained a fork of the [mydumper project](https://github.com/maxbube/mydumper) with enhancements specific to TiDB. Starting from v7.5.0, [Mydumper](https://docs.pingcap.com/tidb/v4.0/mydumper-overview) is deprecated and most of its features have been replaced by [Dumpling](/dumpling-overview.md). It is strongly recommended that you use Dumpling instead of mydumper.
Compared to Mydumper, Dumpling has the following improvements:
Dumpling has the following advantages:

- Support exporting data in multiple formats, including SQL and CSV.
- Support the [table-filter](https://github.com/pingcap/tidb-tools/blob/master/pkg/table-filter/README.md) feature, which makes it easier to filter data.
Expand Down
2 changes: 1 addition & 1 deletion ecosystem-tool-user-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ The following are the basics of Dumpling:

> **Note:**
>
> PingCAP previously maintained a fork of the [mydumper project](https://github.com/maxbube/mydumper) with enhancements specific to TiDB. This fork has since been replaced by [Dumpling](/dumpling-overview.md), which has been rewritten in Golang, and provides more optimizations specific to TiDB. It is strongly recommended that you use Dumpling instead of mydumper.
> PingCAP previously maintained a fork of the [mydumper project](https://github.com/maxbube/mydumper) with enhancements specific to TiDB. Starting from v7.5.0, [Mydumper](https://docs.pingcap.com/tidb/v4.0/mydumper-overview) is deprecated and most of its features have been replaced by [Dumpling](/dumpling-overview.md). It is strongly recommended that you use Dumpling instead of mydumper.
### Full data import - TiDB Lightning

Expand Down
4 changes: 2 additions & 2 deletions migration-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ summary: Learn the overview of data migration scenarios and the solutions.
This document gives an overview of the data migration solutions that you can use with TiDB. The data migration solutions are as follows:

- Full data migration.
- To import Amazon Aurora snapshots, CSV files, or Mydumper SQL files into TiDB, you can use TiDB Lightning to perform the full migration.
- To export all TiDB data as CSV files or Mydumper SQL files, you can use Dumpling to perform the full migration, which makes data migration from MySQL or MariaDB easier.
- To import Amazon Aurora snapshots, CSV files, or SQL dump files into TiDB, you can use TiDB Lightning to perform the full migration.
- To export all TiDB data as CSV files or SQL dump files, you can use Dumpling to perform the full migration, which makes data migration from MySQL or MariaDB easier.
- To migrate all data from a database with a small data size volume (for example, less than 1 TiB), you can also use TiDB Data Migration (DM).

- Quick initialization of TiDB. TiDB Lightning supports quickly importing data and can quickly initialize a specific table in TiDB. Before you use this feature, pay attention that the quick initialization has a great impact on TiDB and the cluster does not provide services during the initialization period.
Expand Down
2 changes: 1 addition & 1 deletion tidb-cloud/changefeed-sink-to-mysql.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ To load the existing data:
SET GLOBAL tidb_gc_life_time = '720h';
```

2. Use [Dumpling](https://docs.pingcap.com/tidb/stable/dumpling-overview) to export data from your TiDB cluster, then use community tools such as [mydumper/myloader](https://centminmod.com/mydumper.html) to load data to the MySQL service.
2. Use [Dumpling](https://docs.pingcap.com/tidb/stable/dumpling-overview) to export data from your TiDB cluster, then use community tools such as myloader to load data to the MySQL service.

3. From the [exported files of Dumpling](https://docs.pingcap.com/tidb/stable/dumpling-overview#format-of-exported-files), get the start position of MySQL sink from the metadata file:

Expand Down
93 changes: 1 addition & 92 deletions tidb-lightning/monitor-tidb-lightning.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,6 @@ pprof-port = 8289
...
```

and in `tikv-importer.toml`:

```toml
# Listening address of the status server.
status-server-address = '0.0.0.0:8286'
```

You need to configure Prometheus to make it discover the servers. For instance, you can directly add the server address to the `scrape_configs` section:

```yaml
Expand All @@ -37,9 +30,6 @@ scrape_configs:
- job_name: 'tidb-lightning'
static_configs:
- targets: ['192.168.20.10:8289']
- job_name: 'tikv-importer'
static_configs:
- targets: ['192.168.20.9:8286']
```
## Grafana dashboard
Expand Down Expand Up @@ -134,88 +124,7 @@ If any of the duration is too high, it indicates that the disk used by TiDB Ligh

## Monitoring metrics

This section explains the monitoring metrics of `tikv-importer` and `tidb-lightning`, if you need to monitor other metrics not covered by the default Grafana dashboard.

### `tikv-importer`

Metrics provided by `tikv-importer` are listed under the namespace `tikv_import_*`.

- **`tikv_import_rpc_duration`** (Histogram)

Bucketed histogram for the duration of an RPC action. Labels:

- **request**: what kind of RPC is executed
* `switch_mode`: switched a TiKV node to import/normal mode
* `open_engine`: opened an engine file
* `write_engine`: received data and written into an engine
* `close_engine`: closed an engine file
* `import_engine`: imported an engine file into the TiKV cluster
* `cleanup_engine`: deleted an engine file
* `compact_cluster`: explicitly compacted the TiKV cluster
* `upload`: uploaded an SST file
* `ingest`: ingested an SST file
* `compact`: explicitly compacted a TiKV node
- **result**: the execution result of the RPC
* `ok`
* `error`

- **`tikv_import_write_chunk_bytes`** (Histogram)

Bucketed histogram for the uncompressed size of a block of KV pairs received from TiDB Lightning.

- **`tikv_import_write_chunk_duration`** (Histogram)

Bucketed histogram for the time needed to receive a block of KV pairs from TiDB Lightning.

- **`tikv_import_upload_chunk_bytes`** (Histogram)

Bucketed histogram for the compressed size of a chunk of SST file uploaded to TiKV.

- **`tikv_import_upload_chunk_duration`** (Histogram)

Bucketed histogram for the time needed to upload a chunk of SST file to TiKV.

- **`tikv_import_range_delivery_duration`** (Histogram)

Bucketed histogram for the time needed to deliver a range of KV pairs into a `dispatch-job`.

- **`tikv_import_split_sst_duration`** (Histogram)

Bucketed histogram for the time needed to split off a range from the engine file into a single SST file.

- **`tikv_import_sst_delivery_duration`** (Histogram)

Bucketed histogram for the time needed to deliver an SST file from a `dispatch-job` to an `ImportSSTJob`.

- **`tikv_import_sst_recv_duration`** (Histogram)

Bucketed histogram for the time needed to receive an SST file from a `dispatch-job` in an `ImportSSTJob`.

- **`tikv_import_sst_upload_duration`** (Histogram)

Bucketed histogram for the time needed to upload an SST file from an `ImportSSTJob` to a TiKV node.

- **`tikv_import_sst_chunk_bytes`** (Histogram)

Bucketed histogram for the compressed size of the SST file uploaded to a TiKV node.

- **`tikv_import_sst_ingest_duration`** (Histogram)

Bucketed histogram for the time needed to ingest an SST file into TiKV.

- **`tikv_import_each_phase`** (Gauge)

Indicates the running phase. Possible values are 1, meaning running inside the phase, and 0, meaning outside the phase. Labels:

- **phase**: `prepare`/`import`

- **`tikv_import_wait_store_available_count`** (Counter)

Counts the number of times a TiKV node is found to have insufficient space when uploading SST files. Labels:

- **store_id**: The TiKV store ID.

### `tidb-lightning`
This section explains the monitoring metrics of `tidb-lightning`.

Metrics provided by `tidb-lightning` are listed under the namespace `lightning_*`.

Expand Down
1 change: 0 additions & 1 deletion tidb-lightning/tidb-lightning-command-line-full.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ You can configure the following parameters using `tidb-lightning`:
| `--backend <backend>` | Select an import mode. `local` refers to [physical import mode](/tidb-lightning/tidb-lightning-physical-import-mode.md); `tidb` refers to [logical import mode](/tidb-lightning/tidb-lightning-logical-import-mode.md). | `tikv-importer.backend` |
| `--log-file <file>` | Log file path. By default, it is `/tmp/lightning.log.{timestamp}`. If set to '-', it means that the log files will be output to stdout. | `lightning.log-file` |
| `--status-addr <ip:port>` | Listening address of the TiDB Lightning server | `lightning.status-port` |
| `--importer <host:port>` | Address of TiKV Importer | `tikv-importer.addr` |
| `--pd-urls <host:port>` | PD endpoint address | `tidb.pd-addr` |
| `--tidb-host <host>` | TiDB server host | `tidb.host` |
| `--tidb-port <port>` | TiDB server port (default = 4000) | `tidb.port` |
Expand Down
6 changes: 1 addition & 5 deletions tidb-lightning/tidb-lightning-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,7 @@ enable-diagnose-logs = false
# The maximum number of engines to be opened concurrently.
# Each table is split into one "index engine" to store indices, and multiple
# "data engines" to store row data. These settings control the maximum
# concurrent number for each type of engines.
# These values affect the memory and disk usage of tikv-importer.
# The sum of these two values must not exceed the max-open-engines setting
# for tikv-importer.
# concurrent number for each type of engines. Generally, you can use the following two default values.
index-concurrency = 2
table-concurrency = 6

Expand Down Expand Up @@ -454,7 +451,6 @@ log-progress = "5m"
| --backend *[backend](/tidb-lightning/tidb-lightning-overview.md)* | Select an import mode. `local` refers to the physical import mode; `tidb` refers to the logical import mode. | `local` |
| --log-file *file* | Log file path. By default, it is `/tmp/lightning.log.{timestamp}`. If set to '-', it means that the log files will be output to stdout. | `lightning.log-file` |
| --status-addr *ip:port* | Listening address of the TiDB Lightning server | `lightning.status-port` |
| --importer *host:port* | Address of TiKV Importer | `tikv-importer.addr` |
| --pd-urls *host:port* | PD endpoint address | `tidb.pd-addr` |
| --tidb-host *host* | TiDB server host | `tidb.host` |
| --tidb-port *port* | TiDB server port (default = 4000) | `tidb.port` |
Expand Down
41 changes: 3 additions & 38 deletions tidb-lightning/tidb-lightning-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,27 +26,8 @@ If only one table has an error encountered, the rest will still be processed nor

## How to properly restart TiDB Lightning?

If you are using Importer-backend, depending on the status of `tikv-importer`, the basic sequence of restarting TiDB Lightning is like this:

If `tikv-importer` is still running:

1. [Stop `tidb-lightning`](#how-to-stop-the-tidb-lightning-process).
2. Perform the intended modifications, such as fixing the source data, changing settings, replacing hardware etc.
3. If the modification previously has changed any table, [remove the corresponding checkpoint](/tidb-lightning/tidb-lightning-checkpoints.md#--checkpoint-remove) too.
4. Start `tidb-lightning`.

If `tikv-importer` needs to be restarted:

1. [Stop `tidb-lightning`](#how-to-stop-the-tidb-lightning-process).
2. [Stop `tikv-importer`](#how-to-stop-the-tikv-importer-process).
3. Perform the intended modifications, such as fixing the source data, changing settings, replacing hardware etc.
4. Start `tikv-importer`.
5. Start `tidb-lightning` *and wait until the program fails with CHECKSUM error, if any*.
* Restarting `tikv-importer` would destroy all engine files still being written, but `tidb-lightning` did not know about it. As of v3.0 the simplest way is to let `tidb-lightning` go on and retry.
6. [Destroy the failed tables and checkpoints](/tidb-lightning/troubleshoot-tidb-lightning.md#checkpoint-for--has-invalid-status-error-code)
7. Start `tidb-lightning` again.

If you are using Local-backend or TiDB-backend, the operations are the same as those of using Importer-backend when the `tikv-importer` is still running.
1. [Stop the `tidb-lightning` process](#how-to-stop-the-tidb-lightning-process).
2. Start a new `tidb-lightning` task: execute the previous start command, such as `nohup tiup tidb-lightning -config tidb-lightning.toml`.

## How to ensure the integrity of the imported data?

Expand Down Expand Up @@ -93,16 +74,6 @@ sql-mode = "STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION"
...
```

## Can one `tikv-importer` serve multiple `tidb-lightning` instances?

Yes, as long as every `tidb-lightning` instance operates on different tables.

## How to stop the `tikv-importer` process?

To stop the `tikv-importer` process, you can choose the corresponding operation according to your deployment method.

- For manual deployment: if `tikv-importer` is running in foreground, press <kbd>Ctrl</kbd>+<kbd>C</kbd> to exit. Otherwise, obtain the process ID using the `ps aux | grep tikv-importer` command and then terminate the process using the `kill ${PID}` command.

## How to stop the `tidb-lightning` process?

To stop the `tidb-lightning` process, you can choose the corresponding operation according to your deployment method.
Expand All @@ -122,12 +93,6 @@ With the default settings of 3 replicas, the space requirement of the target TiK
- The space occupied by indices
- Space amplification in RocksDB

## Can TiKV Importer be restarted while TiDB Lightning is running?

No. TiKV Importer stores some information of engines in memory. If `tikv-importer` is restarted, `tidb-lightning` will be stopped due to lost connection. At this point, you need to [destroy the failed checkpoints](/tidb-lightning/tidb-lightning-checkpoints.md#--checkpoint-error-destroy) as those TiKV Importer-specific information is lost. You can restart TiDB Lightning afterwards.

See also [How to properly restart TiDB Lightning?](#how-to-properly-restart-tidb-lightning) for the correct sequence.

## How to completely destroy all intermediate data associated with TiDB Lightning?

1. Delete the checkpoint file.
Expand All @@ -140,7 +105,7 @@ See also [How to properly restart TiDB Lightning?](#how-to-properly-restart-tidb

If, for some reason, you cannot run this command, try manually deleting the file `/tmp/tidb_lightning_checkpoint.pb`.

2. If you are using Local-backend, delete the `sorted-kv-dir` directory in the configuration. If you are using Importer-backend, delete the entire `import` directory on the machine hosting `tikv-importer`.
2. If you are using Local-backend, delete the `sorted-kv-dir` directory in the configuration.

3. Delete all tables and databases created on the TiDB cluster, if needed.

Expand Down
1 change: 0 additions & 1 deletion tidb-lightning/tidb-lightning-physical-import-mode.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,6 @@ It is recommended that you allocate CPU more than 32 cores and memory greater th

- TiDB Lightning >= v4.0.3.
- TiDB >= v4.0.0.
- If the target TiDB cluster is v3.x or earlier, you need to use Importer-backend to complete the data import. In this mode, `tidb-lightning` needs to send the parsed key-value pairs to `tikv-importer` via gRPC, and `tikv-importer` will complete the data import.

### Limitations

Expand Down
22 changes: 1 addition & 21 deletions tidb-lightning/troubleshoot-tidb-lightning.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,24 +119,6 @@ tidb-lightning-ctl --config conf/tidb-lightning.toml --checkpoint-error-destroy=
See the [Checkpoints control](/tidb-lightning/tidb-lightning-checkpoints.md#checkpoints-control) section for other options.
### `ResourceTemporarilyUnavailable("Too many open engines …: …")`
**Cause**: The number of concurrent engine files exceeds the limit specified by `tikv-importer`. This could be caused by misconfiguration. Additionally, if `tidb-lightning` exited abnormally, an engine file might be left at a dangling open state, which could cause this error as well.
**Solutions**:
1. Increase the value of `max-open-engines` setting in `tikv-importer.toml`. This value is typically dictated by the available memory. This could be calculated by using:
Max Memory Usage ≈ `max-open-engines` × `write-buffer-size` × `max-write-buffer-number`
2. Decrease the value of `table-concurrency` + `index-concurrency` so it is less than `max-open-engines`.
3. Restart `tikv-importer` to forcefully remove all engine files (default to `./data.import/`). This also removes all partially imported tables, which requires TiDB Lightning to clear the outdated checkpoints.
```sh
tidb-lightning-ctl --config conf/tidb-lightning.toml --checkpoint-error-destroy=all
```
### `cannot guess encoding for input file, please convert to UTF-8 manually`
**Cause**: TiDB Lightning only recognizes the UTF-8 and GB-18030 encodings for the table schemas. This error is emitted if the file isn't in any of these encodings. It is also possible that the file has mixed encoding, such as containing a string in UTF-8 and another string in GB-18030, due to historical `ALTER TABLE` executions.
Expand Down Expand Up @@ -164,9 +146,7 @@ See the [Checkpoints control](/tidb-lightning/tidb-lightning-checkpoints.md#chec
TZ='Asia/Shanghai' bin/tidb-lightning -config tidb-lightning.toml
```

2. When exporting data using Mydumper, make sure to include the `--skip-tz-utc` flag.

3. Ensure the entire cluster is using the same and latest version of `tzdata` (version 2018i or above).
2. Ensure the entire cluster is using the same and latest version of `tzdata` (version 2018i or above).

On CentOS, run `yum info tzdata` to check the installed version and whether there is an update. Run `yum upgrade tzdata` to upgrade the package.

Expand Down
Loading

0 comments on commit e35db63

Please sign in to comment.