Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove Mydumper and tikv-importer #15135

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions binary-package.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ The `TiDB-community-toolkit` package contains the following contents.

| Content | Change history |
|---|---|
| tikv-importer-{version}-linux-{arch}.tar.gz | |
| pd-recover-{version}-linux-{arch}.tar.gz | |
| etcdctl | New in v6.0.0 |
| tiup-linux-{arch}.tar.gz | |
Expand All @@ -67,7 +66,6 @@ The `TiDB-community-toolkit` package contains the following contents.
| sync_diff_inspector | |
| reparo | |
| arbiter | |
| mydumper | New in v6.0.0 |
| server-{version}-linux-{arch}.tar.gz | New in v6.2.0 |
| grafana-{version}-linux-{arch}.tar.gz | New in v6.2.0 |
| alertmanager-{version}-linux-{arch}.tar.gz | New in v6.2.0 |
Expand Down
2 changes: 1 addition & 1 deletion develop/dev-guide-timeouts-in-tidb.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ TiDB's transaction implementation uses the MVCC (Multiple Version Concurrency Co

By default, each MVCC version (consistency snapshots) is kept for 10 minutes. Transactions that take longer than 10 minutes to read will receive an error `GC life time is shorter than transaction duration`.

If you need longer read time, for example, when you are using **Mydumper** for full backups (**Mydumper** backs up consistent snapshots), you can adjust the value of `tikv_gc_life_time` in the `mysql.tidb` table in TiDB to increase the MVCC version retention time. Note that `tikv_gc_life_time` takes effect globally and immediately. Increasing the value will increase the life time of all existing snapshots, and decreasing it will immediately shorten the life time of all snapshots. Too many MVCC versions will impact TiKV's processing efficiency. So you need to change `tikv_gc_life_time` back to the previous setting in time after doing a full backup with **Mydumper**.
If you need longer read time, for example, when you are using **Dumpling** for full backups (**Dumpling** backs up consistent snapshots), you can adjust the value of `tikv_gc_life_time` in the `mysql.tidb` table in TiDB to increase the MVCC version retention time. Note that `tikv_gc_life_time` takes effect globally and immediately. Increasing the value will increase the life time of all existing snapshots, and decreasing it will immediately shorten the life time of all snapshots. Too many MVCC versions will impact TiKV's processing efficiency. So you need to change `tikv_gc_life_time` back to the previous setting in time after doing a full backup with **Dumpling**.

For more information about GC, see [GC Overview](/garbage-collection-overview.md).

Expand Down
2 changes: 1 addition & 1 deletion dm/dm-error-handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ However, you need to reset the data migration task in some cases. For details, r
| `code=11006` | Occurs when the built-in parser of DM parses the incompatible DDL statements. | Refer to [Data Migration - incompatible DDL statements](/dm/dm-faq.md#how-to-handle-incompatible-ddl-statements) for solution. |
| `code=20010` | Occurs when decrypting the database password that is provided in task configuration. | Check whether the downstream database password provided in the configuration task is [correctly encrypted using dmctl](/dm/dm-manage-source.md#encrypt-the-database-password). |
| `code=26002` | The task check fails to establish database connection. For more detailed error information, check the error message which usually includes the error code and error information returned for database operations. | Check whether the machine where DM-master is located has permission to access the upstream. |
| `code=32001` | Abnormal dump processing unit | If the error message contains `mydumper: argument list too long.`, configure the table to be exported by manually adding the `--regex` regular expression in the Mydumper argument `extra-args` in the `task.yaml` file according to the block-allow list. For example, to export all tables named `hello`, add `--regex '.*\\.hello$'`; to export all tables, add `--regex '.*'`. |
| `code=32001` | Abnormal dump processing unit | If the error message contains `mydumper: argument list too long.`, configure the table to be exported by manually adding the `--regex` regular expression in the `mydumpers.extra-args` argument in the `task.yaml` file according to the block-allow list. For example, to export all tables named `hello`, add `--regex '.*\\.hello$'`; to export all tables, add `--regex '.*'`. |
Oreoxmt marked this conversation as resolved.
Show resolved Hide resolved
| `code=38008` | An error occurs in the gRPC communication among DM components. | Check `class`. Find out the error occurs in the interaction of which components. Determine the type of communication error. If the error occurs when establishing gRPC connection, check whether the communication server is working normally. |

### What can I do when a migration task is interrupted with the `invalid connection` error returned?
Expand Down
6 changes: 2 additions & 4 deletions dumpling-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,9 @@ TiDB also provides other tools that you can choose to use as needed.

> **Note:**
>
> PingCAP previously maintained a fork of the [mydumper project](https://github.com/maxbube/mydumper) with enhancements specific to TiDB. This fork has since been replaced by [Dumpling](/dumpling-overview.md), which has been rewritten in Go, and supports more optimizations that are specific to TiDB. It is strongly recommended that you use Dumpling instead of mydumper.
>
> For more information on Mydumper, refer to [v4.0 Mydumper documentation](https://docs.pingcap.com/tidb/v4.0/backup-and-restore-using-mydumper-lightning).
> PingCAP previously maintained a fork of the [mydumper project](https://github.com/maxbube/mydumper) with enhancements specific to TiDB. Starting from v7.5.0, [Mydumper](https://docs.pingcap.com/tidb/v4.0/mydumper-overview) is deprecated and most of its features have been replaced by [Dumpling](/dumpling-overview.md). It is strongly recommended that you use Dumpling instead of mydumper.
Oreoxmt marked this conversation as resolved.
Show resolved Hide resolved

Compared to Mydumper, Dumpling has the following improvements:
Dumpling has the following advantages:

- Support exporting data in multiple formats, including SQL and CSV.
- Support the [table-filter](https://github.com/pingcap/tidb-tools/blob/master/pkg/table-filter/README.md) feature, which makes it easier to filter data.
Expand Down
2 changes: 1 addition & 1 deletion ecosystem-tool-user-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ The following are the basics of Dumpling:

> **Note:**
>
> PingCAP previously maintained a fork of the [mydumper project](https://github.com/maxbube/mydumper) with enhancements specific to TiDB. This fork has since been replaced by [Dumpling](/dumpling-overview.md), which has been rewritten in Golang, and provides more optimizations specific to TiDB. It is strongly recommended that you use Dumpling instead of mydumper.
> PingCAP previously maintained a fork of the [mydumper project](https://github.com/maxbube/mydumper) with enhancements specific to TiDB. Starting from v7.5.0, [Mydumper](https://docs.pingcap.com/tidb/v4.0/mydumper-overview) is deprecated and most of its features have been replaced by [Dumpling](/dumpling-overview.md). It is strongly recommended that you use Dumpling instead of mydumper.
Oreoxmt marked this conversation as resolved.
Show resolved Hide resolved

### Full data import - TiDB Lightning

Expand Down
4 changes: 2 additions & 2 deletions migration-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ summary: Learn the overview of data migration scenarios and the solutions.
This document gives an overview of the data migration solutions that you can use with TiDB. The data migration solutions are as follows:

- Full data migration.
- To import Amazon Aurora snapshots, CSV files, or Mydumper SQL files into TiDB, you can use TiDB Lightning to perform the full migration.
- To export all TiDB data as CSV files or Mydumper SQL files, you can use Dumpling to perform the full migration, which makes data migration from MySQL or MariaDB easier.
- To import Amazon Aurora snapshots, CSV files, or SQL dump files into TiDB, you can use TiDB Lightning to perform the full migration.
- To export all TiDB data as CSV files or SQL dump files, you can use Dumpling to perform the full migration, which makes data migration from MySQL or MariaDB easier.
- To migrate all data from a database with a small data size volume (for example, less than 1 TiB), you can also use TiDB Data Migration (DM).

- Quick initialization of TiDB. TiDB Lightning supports quickly importing data and can quickly initialize a specific table in TiDB. Before you use this feature, pay attention that the quick initialization has a great impact on TiDB and the cluster does not provide services during the initialization period.
Expand Down
2 changes: 1 addition & 1 deletion tidb-cloud/changefeed-sink-to-mysql.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ To load the existing data:
SET GLOBAL tidb_gc_life_time = '720h';
```

2. Use [Dumpling](https://docs.pingcap.com/tidb/stable/dumpling-overview) to export data from your TiDB cluster, then use community tools such as [mydumper/myloader](https://centminmod.com/mydumper.html) to load data to the MySQL service.
2. Use [Dumpling](https://docs.pingcap.com/tidb/stable/dumpling-overview) to export data from your TiDB cluster, then use community tools such as Dumpling to load data to the MySQL service.
Oreoxmt marked this conversation as resolved.
Show resolved Hide resolved

3. From the [exported files of Dumpling](https://docs.pingcap.com/tidb/stable/dumpling-overview#format-of-exported-files), get the start position of MySQL sink from the metadata file:

Expand Down
91 changes: 1 addition & 90 deletions tidb-lightning/monitor-tidb-lightning.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,6 @@ pprof-port = 8289
...
```

and in `tikv-importer.toml`:

```toml
# Listening address of the status server.
status-server-address = '0.0.0.0:8286'
```

You need to configure Prometheus to make it discover the servers. For instance, you can directly add the server address to the `scrape_configs` section:

```yaml
Expand All @@ -37,9 +30,6 @@ scrape_configs:
- job_name: 'tidb-lightning'
static_configs:
- targets: ['192.168.20.10:8289']
- job_name: 'tikv-importer'
static_configs:
- targets: ['192.168.20.9:8286']
```

## Grafana dashboard
Expand Down Expand Up @@ -134,86 +124,7 @@ If any of the duration is too high, it indicates that the disk used by TiDB Ligh

## Monitoring metrics

This section explains the monitoring metrics of `tikv-importer` and `tidb-lightning`, if you need to monitor other metrics not covered by the default Grafana dashboard.

### `tikv-importer`

Metrics provided by `tikv-importer` are listed under the namespace `tikv_import_*`.

- **`tikv_import_rpc_duration`** (Histogram)

Bucketed histogram for the duration of an RPC action. Labels:

- **request**: what kind of RPC is executed
* `switch_mode`: switched a TiKV node to import/normal mode
* `open_engine`: opened an engine file
* `write_engine`: received data and written into an engine
* `close_engine`: closed an engine file
* `import_engine`: imported an engine file into the TiKV cluster
* `cleanup_engine`: deleted an engine file
* `compact_cluster`: explicitly compacted the TiKV cluster
* `upload`: uploaded an SST file
* `ingest`: ingested an SST file
* `compact`: explicitly compacted a TiKV node
- **result**: the execution result of the RPC
* `ok`
* `error`

- **`tikv_import_write_chunk_bytes`** (Histogram)

Bucketed histogram for the uncompressed size of a block of KV pairs received from TiDB Lightning.

- **`tikv_import_write_chunk_duration`** (Histogram)

Bucketed histogram for the time needed to receive a block of KV pairs from TiDB Lightning.

- **`tikv_import_upload_chunk_bytes`** (Histogram)

Bucketed histogram for the compressed size of a chunk of SST file uploaded to TiKV.

- **`tikv_import_upload_chunk_duration`** (Histogram)

Bucketed histogram for the time needed to upload a chunk of SST file to TiKV.

- **`tikv_import_range_delivery_duration`** (Histogram)

Bucketed histogram for the time needed to deliver a range of KV pairs into a `dispatch-job`.

- **`tikv_import_split_sst_duration`** (Histogram)

Bucketed histogram for the time needed to split off a range from the engine file into a single SST file.

- **`tikv_import_sst_delivery_duration`** (Histogram)

Bucketed histogram for the time needed to deliver an SST file from a `dispatch-job` to an `ImportSSTJob`.

- **`tikv_import_sst_recv_duration`** (Histogram)

Bucketed histogram for the time needed to receive an SST file from a `dispatch-job` in an `ImportSSTJob`.

- **`tikv_import_sst_upload_duration`** (Histogram)

Bucketed histogram for the time needed to upload an SST file from an `ImportSSTJob` to a TiKV node.

- **`tikv_import_sst_chunk_bytes`** (Histogram)

Bucketed histogram for the compressed size of the SST file uploaded to a TiKV node.

- **`tikv_import_sst_ingest_duration`** (Histogram)

Bucketed histogram for the time needed to ingest an SST file into TiKV.

- **`tikv_import_each_phase`** (Gauge)

Indicates the running phase. Possible values are 1, meaning running inside the phase, and 0, meaning outside the phase. Labels:

- **phase**: `prepare`/`import`

- **`tikv_import_wait_store_available_count`** (Counter)

Counts the number of times a TiKV node is found to have insufficient space when uploading SST files. Labels:

- **store_id**: The TiKV store ID.
This section explains the monitoring metrics of `tidb-lightning`, if you need to monitor other metrics not covered by the default Grafana dashboard.

### `tidb-lightning`

Expand Down
1 change: 0 additions & 1 deletion tidb-lightning/tidb-lightning-command-line-full.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ You can configure the following parameters using `tidb-lightning`:
| `--backend <backend>` | Select an import mode. `local` refers to [physical import mode](/tidb-lightning/tidb-lightning-physical-import-mode.md); `tidb` refers to [logical import mode](/tidb-lightning/tidb-lightning-logical-import-mode.md). | `tikv-importer.backend` |
| `--log-file <file>` | Log file path. By default, it is `/tmp/lightning.log.{timestamp}`. If set to '-', it means that the log files will be output to stdout. | `lightning.log-file` |
| `--status-addr <ip:port>` | Listening address of the TiDB Lightning server | `lightning.status-port` |
| `--importer <host:port>` | Address of TiKV Importer | `tikv-importer.addr` |
| `--pd-urls <host:port>` | PD endpoint address | `tidb.pd-addr` |
| `--tidb-host <host>` | TiDB server host | `tidb.host` |
| `--tidb-port <port>` | TiDB server port (default = 4000) | `tidb.port` |
Expand Down
4 changes: 0 additions & 4 deletions tidb-lightning/tidb-lightning-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,6 @@ enable-diagnose-logs = false
# Each table is split into one "index engine" to store indices, and multiple
# "data engines" to store row data. These settings control the maximum
# concurrent number for each type of engines.
Oreoxmt marked this conversation as resolved.
Show resolved Hide resolved
# These values affect the memory and disk usage of tikv-importer.
# The sum of these two values must not exceed the max-open-engines setting
# for tikv-importer.
index-concurrency = 2
table-concurrency = 6

Expand Down Expand Up @@ -462,7 +459,6 @@ log-progress = "5m"
| --backend *[backend](/tidb-lightning/tidb-lightning-overview.md)* | Select an import mode. `local` refers to the physical import mode; `tidb` refers to the logical import mode. | `local` |
| --log-file *file* | Log file path. By default, it is `/tmp/lightning.log.{timestamp}`. If set to '-', it means that the log files will be output to stdout. | `lightning.log-file` |
| --status-addr *ip:port* | Listening address of the TiDB Lightning server | `lightning.status-port` |
| --importer *host:port* | Address of TiKV Importer | `tikv-importer.addr` |
| --pd-urls *host:port* | PD endpoint address | `tidb.pd-addr` |
| --tidb-host *host* | TiDB server host | `tidb.host` |
| --tidb-port *port* | TiDB server port (default = 4000) | `tidb.port` |
Expand Down
40 changes: 1 addition & 39 deletions tidb-lightning/tidb-lightning-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,29 +24,6 @@ For details about the permissions, see [Prerequisites for using TiDB Lightning](

If only one table has an error encountered, the rest will still be processed normally.

## How to properly restart TiDB Lightning?

If you are using Importer-backend, depending on the status of `tikv-importer`, the basic sequence of restarting TiDB Lightning is like this:

If `tikv-importer` is still running:

1. [Stop `tidb-lightning`](#how-to-stop-the-tidb-lightning-process).
2. Perform the intended modifications, such as fixing the source data, changing settings, replacing hardware etc.
3. If the modification previously has changed any table, [remove the corresponding checkpoint](/tidb-lightning/tidb-lightning-checkpoints.md#--checkpoint-remove) too.
4. Start `tidb-lightning`.

If `tikv-importer` needs to be restarted:

1. [Stop `tidb-lightning`](#how-to-stop-the-tidb-lightning-process).
2. [Stop `tikv-importer`](#how-to-stop-the-tikv-importer-process).
3. Perform the intended modifications, such as fixing the source data, changing settings, replacing hardware etc.
4. Start `tikv-importer`.
5. Start `tidb-lightning` *and wait until the program fails with CHECKSUM error, if any*.
* Restarting `tikv-importer` would destroy all engine files still being written, but `tidb-lightning` did not know about it. As of v3.0 the simplest way is to let `tidb-lightning` go on and retry.
6. [Destroy the failed tables and checkpoints](/tidb-lightning/troubleshoot-tidb-lightning.md#checkpoint-for--has-invalid-status-error-code)
7. Start `tidb-lightning` again.

If you are using Local-backend or TiDB-backend, the operations are the same as those of using Importer-backend when the `tikv-importer` is still running.

## How to ensure the integrity of the imported data?

Expand Down Expand Up @@ -93,16 +70,6 @@ sql-mode = "STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION"
...
```

## Can one `tikv-importer` serve multiple `tidb-lightning` instances?

Yes, as long as every `tidb-lightning` instance operates on different tables.

## How to stop the `tikv-importer` process?

To stop the `tikv-importer` process, you can choose the corresponding operation according to your deployment method.

- For manual deployment: if `tikv-importer` is running in foreground, press <kbd>Ctrl</kbd>+<kbd>C</kbd> to exit. Otherwise, obtain the process ID using the `ps aux | grep tikv-importer` command and then terminate the process using the `kill ${PID}` command.

## How to stop the `tidb-lightning` process?

To stop the `tidb-lightning` process, you can choose the corresponding operation according to your deployment method.
Expand All @@ -122,11 +89,6 @@ With the default settings of 3 replicas, the space requirement of the target TiK
- The space occupied by indices
- Space amplification in RocksDB

## Can TiKV Importer be restarted while TiDB Lightning is running?

No. TiKV Importer stores some information of engines in memory. If `tikv-importer` is restarted, `tidb-lightning` will be stopped due to lost connection. At this point, you need to [destroy the failed checkpoints](/tidb-lightning/tidb-lightning-checkpoints.md#--checkpoint-error-destroy) as those TiKV Importer-specific information is lost. You can restart TiDB Lightning afterwards.

See also [How to properly restart TiDB Lightning?](#how-to-properly-restart-tidb-lightning) for the correct sequence.

## How to completely destroy all intermediate data associated with TiDB Lightning?

Expand All @@ -140,7 +102,7 @@ See also [How to properly restart TiDB Lightning?](#how-to-properly-restart-tidb

If, for some reason, you cannot run this command, try manually deleting the file `/tmp/tidb_lightning_checkpoint.pb`.

2. If you are using Local-backend, delete the `sorted-kv-dir` directory in the configuration. If you are using Importer-backend, delete the entire `import` directory on the machine hosting `tikv-importer`.
2. If you are using Local-backend, delete the `sorted-kv-dir` directory in the configuration.

3. Delete all tables and databases created on the TiDB cluster, if needed.

Expand Down
1 change: 0 additions & 1 deletion tidb-lightning/tidb-lightning-physical-import-mode.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,6 @@ It is recommended that you allocate CPU more than 32 cores and memory greater th

- TiDB Lightning >= v4.0.3.
- TiDB >= v4.0.0.
- If the target TiDB cluster is v3.x or earlier, you need to use Importer-backend to complete the data import. In this mode, `tidb-lightning` needs to send the parsed key-value pairs to `tikv-importer` via gRPC, and `tikv-importer` will complete the data import.

### Limitations

Expand Down
Loading