Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

titan doc update for release 7.6.0 #15986

Merged
merged 34 commits into from
Jan 25, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
64ecfce
titan doc update for release 7.6.0
tonyxuqqi Jan 5, 2024
9e58e74
lint issue
Jan 5, 2024
19eb7ef
Apply suggestions from code review
hfxsd Jan 8, 2024
9546630
Apply suggestions from code review
hfxsd Jan 8, 2024
b6dacd0
Update tikv-configuration-file.md
hfxsd Jan 8, 2024
481b007
Apply suggestions from code review
hfxsd Jan 9, 2024
c43a35c
change the default value of blob-file-compression to zstd
hfxsd Jan 9, 2024
3ca3d45
Update tikv-configuration-file.md
hfxsd Jan 9, 2024
a47c5bf
Update tikv-configuration-file.md
hfxsd Jan 9, 2024
02d78bb
Apply suggestions from code review
hfxsd Jan 16, 2024
331bfe1
polish titan doc
tonyxuqqi Jan 17, 2024
8bb32cb
Merge branch 'titan_7.6' of https://github.com/tonyxuqqi/docs into ti…
tonyxuqqi Jan 17, 2024
b86de77
address comments
tonyxuqqi Jan 17, 2024
adb3363
update gc thread count
tonyxuqqi Jan 22, 2024
d8b48fd
update num-threads
tonyxuqqi Jan 22, 2024
cc5fecf
titan: update titan doc for v7.6.0 (enable titan by default)
benmaoer Jan 23, 2024
17c7f43
Merge pull request #1 from benmaoer/15986-titan-doc-updates
tonyxuqqi Jan 23, 2024
3937b19
Merge remote-tracking branch 'upstream/master' into pr/15986
hfxsd Jan 24, 2024
8b8a477
synced cn changes
hfxsd Jan 24, 2024
8bb38a4
Update tikv-configuration-file.md
hfxsd Jan 24, 2024
b6554f7
Update titan-configuration.md
hfxsd Jan 24, 2024
c93019e
Update titan-configuration.md
hfxsd Jan 24, 2024
4b9baf6
Update storage-engine/titan-overview.md
hfxsd Jan 24, 2024
a1bbf0a
Apply suggestions from code review
hfxsd Jan 24, 2024
51e07da
Update storage-engine/titan-configuration.md
hfxsd Jan 24, 2024
4c89679
Update storage-engine/titan-configuration.md
hfxsd Jan 24, 2024
5664bc7
add min blob size link
hfxsd Jan 24, 2024
44f95a4
Apply suggestions from code review
hfxsd Jan 24, 2024
5c6ae2c
Update tikv-configuration-file.md
hfxsd Jan 24, 2024
92ff46a
Apply suggestions from code review
hfxsd Jan 24, 2024
a6ba25f
Update storage-engine/titan-configuration.md
hfxsd Jan 24, 2024
204afcd
Update storage-engine/titan-configuration.md
hfxsd Jan 24, 2024
aa0a9a4
Update tikv-configuration-file.md
hfxsd Jan 24, 2024
665c9f9
Update tikv-configuration-file.md
hfxsd Jan 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Apply suggestions from code review
  • Loading branch information
hfxsd authored Jan 8, 2024
commit 19eb7efd78de7ef4bc75f6446f2e3487678a962c
18 changes: 9 additions & 9 deletions storage-engine/titan-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,13 @@ Titan is compatible with RocksDB, so you can directly enable Titan on the existi
enabled = true
```

After Titan is enabled, the existing data stored in RocksDB is not immediately moved to the Titan engine. As new data is written to the TiKV foreground and RocksDB performs compaction, the values are progressively separated from keys and written to Titan. It's same for the data imported from snapshot restore, PiTR restore or TiDB lightning that initially it's in RocksDB format and converted to Titan during compaction. You can view the **TiKV Details** -> **Titan kv** -> **blob file size** panel to confirm the size of the data stored in Titan.
After Titan is enabled, the existing data stored in RocksDB is not immediately moved to the Titan engine. As new data is written to the TiKV foreground and RocksDB performs compaction, the values are progressively separated from keys and written to Titan. Similarly, SST files imported by existing data migration, incremental data migration, or TiDB Lightning are in RocksDB format, and the data is not imported directly into Titan. As compaction proceeds, the large values in the processed SSTs are separated into Titan. You can view the **TiKV Details** -> **Titan kv** -> **blob file size** panel to confirm the size of the data stored in Titan.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

If you want to speed up the writing process, compact data of the whole TiKV cluster manually using tikv-ctl. For details, see [manual compaction](/tikv-control.md#compact-data-of-the-whole-tikv-cluster-manually). Because RocksDB has the Block cache and the access pattern in compaction is sequential read and thus the block cache hit rate can be pretty high. In our test, a 670 GiB TiKV data can be converted to Titan in less than 1 hour.
If you want to speed up the writing process, compact data of the whole TiKV cluster manually using tikv-ctl. For details, see [manual compaction](/tikv-control.md#compact-data-of-the-whole-tikv-cluster-manually). Because RocksDB has the Block cache and the access pattern in compaction is sequential read, the block cache hit rate can be very high. In the test, by using tikv-ctl, a volume of 670 GiB TiKV data can be converted to Titan in less than one hour.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

> **Note:**
>
> Starting from TiDB 7.6.0, the newly created empty cluster will by default enable Titan. And existing clusters' upgrade to TiDB 7.6.0 would keep the original configuration--- if the titan is not explicityly enabled, then it would still use RocksDB.
> Starting from v7.6.0, Titan is enabled by default on the newly created cluster. Existing clusters that are upgraded from earlier versions to v7.6.0 keep the original configuration, which means that if Titan is not explicityly enabled, then it still uses RocksDB.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

> **Warning:**
>
Expand All @@ -68,21 +68,21 @@ To adjust Titan-related parameters using TiUP, refer to [Modify the configuratio

+ Value size threshold.

When the size of the value written to the foreground is smaller than the threshold, this value is stored in RocksDB; otherwise, this value is stored in the blob file of Titan. Based on the distribution of value sizes, if you increase the threshold, more values are stored in RocksDB and TiKV performs better in reading small values. If you decrease the threshold, more values go to Titan, which further reduces RocksDB compactions. In our [test](/storage-engine/titan-overview.md#min-blob-sizes-performance-implications), 1 KB is a balanced threshold which has far better write throughput with about 10% scan throughput regression compared with RocksDB.
When the size of the value written to the foreground is smaller than the threshold, this value is stored in RocksDB; otherwise, this value is stored in the blob file of Titan. Based on the distribution of value sizes, if you increase the threshold, more values are stored in RocksDB and TiKV performs better in reading small values. If you decrease the threshold, more values go to Titan, which further reduces RocksDB compactions. In the [test](/storage-engine/titan-overview.md#min-blob-sizes-performance-implications), 1 KB is a balanced threshold that has far better write throughput with about 10% scan throughput regression compared with RocksDB.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

```toml
[rocksdb.defaultcf.titan]
min-blob-size = "1KB"
```

+ The algorithm used for compressing values in Titan, which takes value as the unit. Starting from TiDB 7.6.0, the default compression is zstd.
+ The algorithm used for compressing values in Titan, which takes value as the unit. Starting from TiDB v7.6.0, the default compression is `zstd`.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

```toml
[rocksdb.defaultcf.titan]
blob-file-compression = "zstd"
```

+ By default, zstd-dict-size is 0KB , which means Titan's compression is based on single value. But RocksDB compression is based on block (32 KB size by default),So when titan value's average size is less than 32 KB, Titan's comression ratio is smaller than RocksdDB Taking json as an example, Titan store size can be 30% ~ 50% bigger than RocksDB. The actual compression ratio depends on the value content and the similiarity among different values. A user can set zstd-dict-size (e.g. 16KB) to enable zstd dictionary compression to boost the compression ratio. Though the zstd dictionary compression can achieve similar compression ratio of RocksDB, it does leads to 10% throughput regression in a typical read-write workload.
+ By default, `zstd-dict-size` is `0KB`, which means Titan's compression is based on single values. But RocksDB compression is based on blocks (`32 KB` by default). When the average size of Titan values is less than `32 KB`, Titan's compression ratio is smaller than RocksdDB. Taking JSON as an example, Titan store size can be 30% to 50% bigger than RocksDB. The actual compression ratio depends on the value content and the similiarity among different values. You can set `zstd-dict-size` (for example, set it to `16 KB`) to enable zstd dictionary compression to increase the compression ratio. Though the zstd dictionary compression can achieve similar compression ratio of RocksDB, it can lead to about 10% throughput regression in a typical read-write workload.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

```toml
[rocksdb.defaultcf.titan]
Expand Down Expand Up @@ -129,7 +129,7 @@ To disable Titan, you can configure the `rocksdb.defaultcf.titan.blob-run-mode`
- When the option is set to `read-only`, all newly written values are written into RocksDB, regardless of the value size.
- When the option is set to `fallback`, all newly written values are written into RocksDB, regardless of the value size. Also, all compacted values stored in the Titan blob file are automatically moved back to RocksDB.

To fully disable Titan for all existing and future data, you can follow these steps. Note that in general you can skip step 2 as it would greatly impact online traffic performance. And in fact even without step 2, the data convertion takes extra IO and CPU and thus performance degrade (some times as large as 50%) is still observed when TiKV's IO or CPU resource reaches near limit.
To fully disable Titan for all existing and future data, you can follow these steps. Note that in general you can skip Step 2 because it can greatly impact online traffic performance. In fact even without Step 2, the data convertion consumes extra IO and CPU and thus performance will degrade (sometimes as much as 50%) when TiKV's IO or CPU resources are limited.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

1. Update the configuration of the TiKV nodes you wish to disable Titan for. You can update configuration in two methods:

Expand All @@ -147,7 +147,7 @@ To fully disable Titan for all existing and future data, you can follow these st
> When `discardable-ratio=1`, it means TiKV will only recycle a Titan blob file when all its data are moved to RocksDB. That means before the convertion completes, these Titan blob files won't be deleted. And therefore, if a TiKV node does not have sufficent disk size to store both Titan and RocksDB data, the parameter should keep the default value instead of `1.0`. However if the disk size is big enough, `discardable-ratio = 1.0` can help to reduce the blob file GC and the disk IO.
>

2. [Optional] Perform a full compaction using tikv-ctl. This process will consume large amount of I/O and CPU resources.
2. [Optional] Perform a full compaction using tikv-ctl. This process will consume a large amount of I/O and CPU resources.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

```bash
tikv-ctl --pd <PD_ADDR> compact-cluster --bottommost force
Expand All @@ -164,7 +164,7 @@ To fully disable Titan for all existing and future data, you can follow these st

### Data convertion speed from Titan to RocksDB

Because Blob cache only helps when a value is accessed more than once, in compaction scenario, it's likely not useful. As a result, the data convertion from Titan to RocksDB can be 10x slower than RocksDB to Titan. In our test, a 800 GiB TiKV takes 12 hour to completely convert its data to RocksDB.
Because the values in Titan Blob files are not contiguous, and Titan's cache is at the value level, the Blob Cache does not help during compaction. The speed from Titan to RocksDB is an order of magnitude slower than the speed from RocksDB to Titan. In the test, it takes 12 hours for a volume of 800 GiB Titan data on a TiKV node to be converted to RocksDB by tikv-ctl to do full compaction.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

## Level Merge (experimental)

Expand Down
16 changes: 8 additions & 8 deletions storage-engine/titan-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ The prerequisites for enabling Titan are as follows:
- No range query will be performed or you do not need a high performance of range query. Because the data stored in Titan is not well-ordered, its performance of range query is poorer than that of RocksDB, especially for the query of a large range. According PingCAP's internal test, Titan's range query performance is 40% to a few times lower than that of RocksDB.
- Sufficient disk space (consider reserving a space twice of the RocksDB disk consumption with the same data volume). This is because Titan reduces write amplification at the cost of disk space. In addition, Titan compresses values one by one, and its compression rate is lower than that of RocksDB. RocksDB compresses blocks one by one. Therefore, Titan consumes more storage space than RocksDB, which is expected and normal. In some situations, Titan's storage consumption can be twice that of RocksDB.

In TiDB 7.6.0, a few optimizations are made to Titan and thus it's enabled by default for newly created clusters. Because small KV data would still be stored in RocksDB even Titan is enabled. So there's no harm to enable the Titan in the configuration.
Starting from v7.6.0, Titan is improved and is enabled by default for newly created clusters. Because small KV datasets will still be stored in RocksDB even Titan is enabled, you can enable Titan in the configuration.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

If you want to improve the performance of Titan, see the blog post [Titan: A RocksDB Plugin to Reduce Write Amplification](https://pingcap.com/blog/titan-storage-engine-design-and-implementation/).

Expand Down Expand Up @@ -127,13 +127,13 @@ Range Merge is an optimized approach of GC based on Level Merge. However, the bo

Therefore, the Range Merge operation is needed to keep the number of sorted runs within a certain level. At the time of OnCompactionComplete, Titan counts the number of sorted runs in a range. If the number is large, Titan marks the corresponding blob file as ToMerge and rewrites it in the next compaction.

### Scale-out and Scale-in
### Scale out and scale in

For backward compatibility, when TiKV sends snapshot to another TiKV in either scale-out or scale-in operation, the snapshot itself is in RocksDB format. And therefore the initial data format for a newly create TiKv node is RocksDB, which means it could have smaller store size meanwhile compaction's write amplification is larger. And later during the compaction, the RocksDB data would be gradually converted to Titan.
For backward compatibility, the TiKV snapshots are still in the RocksDB format during scaling up and scaling down. Because the scaled nodes are all from RocksDB at the beginning, they carry the characteristics of RocksDB, such as higher compression rate than the old TiKV nodes, smaller store size, and relatively larger write amplification in compaction. These SST files in RocksDB format will be gradually converted to Titan format after compaction.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

### min-blob-size's performance implications
### Performance implications of `min-blob-size`

When Titan is enabled, if a value size is no less than `min-blob-size`, it would be stored in Titan. Otherwise, it's stored in RocksDB. Either too big or too small `min-blob-sizesize` would lead to poor performance in some workloads. Below are our test results for different `min-blob-size`'s performance under a few workloads.
The `min-blob-size` is the basis for whether a value is stored in Titan or not. If the value is greater than or equal to `min-blob-size`, it will be stored in Titan. Otherwise it will be in RocksDB's native format. If `min-blob-size` is set too small or too large, it can cause performance degradation. The following are test results for performance of `min-blob-size` under a few workloads.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

| Value size(Bytes) | pointget | pointget(titan)| scan100 | scan100(titan)| scan10000 | scan10000(titan)| update | update(titan) |
hfxsd marked this conversation as resolved.
Show resolved Hide resolved
| ---------------- | ---------| -------------- | --------| ------------- | --------- | --------------- | ------ | ------------ |
Expand All @@ -146,7 +146,7 @@ When Titan is enabled, if a value size is no less than `min-blob-size`, it would
|1024K| 1165 | 1165 | 11.7 |11.7 | NA |NA |32.3 | 233 |

> **Note:**
> >
> > scan100 means scan 100 records and scan10000 means scan 10000 records.
>
> `scan100` means to scan 100 records and `scan10000` means to scan 10000 records.

When value size is 2 KB, Titan's performance is better in all workloads above. When the size is 1 KB, Titan lags only in scan10000 workload by 15%, but gains 50% in update. Therefore, the default value of `min-blob-size` is 1 KB. User can choose the proper `min-blob-size` according to the workloads.
In this table, when the value size is 2 KB, Titan performance is the best in all workloads. When the size is 1 KB, Titan lags only in `scan10000` workload by 15%, but gains 50% in `UPDATE`. Therefore, the appropriate default value of `min-blob-size` is `1 KB`. You can choose a proper value for `min-blob-size` according to the workloads.
8 changes: 4 additions & 4 deletions tikv-configuration-file.md
Original file line number Diff line number Diff line change
Expand Up @@ -1316,7 +1316,7 @@ Configuration items related to Titan.
### `enabled`

+ Enables or disables Titan.
+ Default value: In TiDB 7.6.0 or newer version, the default value is `true` in newly created cluster. In other cases, it's `false`. So if you upgrade to TiDB 7.6.0 from older TiBD version, it would keep the original setting if it's set explictly or be false if it's not set.
+ Default value: for v7.5.0 and earlier versions, the default value is `false`. Starting from v7.6.0, the default value is `true` for only new clusters. Existing clusters upgraded to v7.6.0 or later versions will retain the original configuration.

### `dirname`

Expand Down Expand Up @@ -1616,17 +1616,17 @@ Configuration items related to `rocksdb.defaultcf.titan`.

### `zstd-dict-size`

+ The zstd compression dictionary size. By default it's `0KB` which means zstd compression is based on single value, while RocksDB's comression unit is block (32KB size by default). So when zstd dictionary compression is off and the average value is less than 32KB, Titan's compression ratio is smaller than RocksDB. Using Json data as an example, Titan store size could be 30% to 50% more than RocksDB size. Users could set zstd-dict-size (e.g. 16KB) to enable zstd dictionary compression which can achieve similar compression ratio of RocksDB. Though, zstd dictionary compression can lead to 10% performance regression.
+ The zstd compression dictionary size. The default value is `"0KB"`, which means Titan's compression is based on single values. But RocksDB compression is based on blocks (`32 KB` by default). When the average size of Titan values is less than `32 KB`, Titan's compression ratio is smaller than RocksdDB. Taking JSON as an example, Titan store size can be 30% to 50% bigger than RocksDB. The actual compression ratio depends on the value content and the similiarity among different values. You can set `zstd-dict-size` (for example, set it to `16 KB`) to enable zstd dictionary compression to increase the compression ratio. The actual Store Size can be lower than RocksDB. But zstd dictionary compression can lead to about 10% throughput regression in a typical read-write workload.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved
+
hfxsd marked this conversation as resolved.
Show resolved Hide resolved
+ Default value: "0KB"`
+ Default value: `"0KB"`
+ Unit: KB|MB|GB

### `blob-cache-size`

+ The cache size of a Blob file
+ Default value: `"0GB"`
+ Minimum value: `0`
+ Recommended value: After TiKv runs for a while, set the RocksDB block cache (`storage.block-cache.capacity`) to have 95%+ block cache hit rate. Then the `blob-cache-size` is set with `total memory size * 50% - block cache size`. Block cache hit rate is more important than blob cache hit rate. If `storage.block-cache.capacity` is too small, the overall performance would not be good due to low block cache hit rate.
+ Recommended value: It is recommended that after the database has stabilized, set the RocksDB block cache (`storage.block-cache.capacity`) to just above 95% of the Block Cache hit rate based on monitoring, and `blob-cache-size` to `total memory size * 50% - block cache size`. This is to ensure that the block cache is large enough to cache the entire RocksDB, while keeping the blob cache as large as possible. However, do not set the value of the blob cache too large. Otherwise the block cache hit rate will drop significantly.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved
+ Unit: KB|MB|GB

### `min-gc-batch-size`
Expand Down
Loading