Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

titan doc update for release 7.6.0 #15986

Merged
merged 34 commits into from
Jan 25, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
64ecfce
titan doc update for release 7.6.0
tonyxuqqi Jan 5, 2024
9e58e74
lint issue
Jan 5, 2024
19eb7ef
Apply suggestions from code review
hfxsd Jan 8, 2024
9546630
Apply suggestions from code review
hfxsd Jan 8, 2024
b6dacd0
Update tikv-configuration-file.md
hfxsd Jan 8, 2024
481b007
Apply suggestions from code review
hfxsd Jan 9, 2024
c43a35c
change the default value of blob-file-compression to zstd
hfxsd Jan 9, 2024
3ca3d45
Update tikv-configuration-file.md
hfxsd Jan 9, 2024
a47c5bf
Update tikv-configuration-file.md
hfxsd Jan 9, 2024
02d78bb
Apply suggestions from code review
hfxsd Jan 16, 2024
331bfe1
polish titan doc
tonyxuqqi Jan 17, 2024
8bb32cb
Merge branch 'titan_7.6' of https://github.com/tonyxuqqi/docs into ti…
tonyxuqqi Jan 17, 2024
b86de77
address comments
tonyxuqqi Jan 17, 2024
adb3363
update gc thread count
tonyxuqqi Jan 22, 2024
d8b48fd
update num-threads
tonyxuqqi Jan 22, 2024
cc5fecf
titan: update titan doc for v7.6.0 (enable titan by default)
benmaoer Jan 23, 2024
17c7f43
Merge pull request #1 from benmaoer/15986-titan-doc-updates
tonyxuqqi Jan 23, 2024
3937b19
Merge remote-tracking branch 'upstream/master' into pr/15986
hfxsd Jan 24, 2024
8b8a477
synced cn changes
hfxsd Jan 24, 2024
8bb38a4
Update tikv-configuration-file.md
hfxsd Jan 24, 2024
b6554f7
Update titan-configuration.md
hfxsd Jan 24, 2024
c93019e
Update titan-configuration.md
hfxsd Jan 24, 2024
4b9baf6
Update storage-engine/titan-overview.md
hfxsd Jan 24, 2024
a1bbf0a
Apply suggestions from code review
hfxsd Jan 24, 2024
51e07da
Update storage-engine/titan-configuration.md
hfxsd Jan 24, 2024
4c89679
Update storage-engine/titan-configuration.md
hfxsd Jan 24, 2024
5664bc7
add min blob size link
hfxsd Jan 24, 2024
44f95a4
Apply suggestions from code review
hfxsd Jan 24, 2024
5c6ae2c
Update tikv-configuration-file.md
hfxsd Jan 24, 2024
92ff46a
Apply suggestions from code review
hfxsd Jan 24, 2024
a6ba25f
Update storage-engine/titan-configuration.md
hfxsd Jan 24, 2024
204afcd
Update storage-engine/titan-configuration.md
hfxsd Jan 24, 2024
aa0a9a4
Update tikv-configuration-file.md
hfxsd Jan 24, 2024
665c9f9
Update tikv-configuration-file.md
hfxsd Jan 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Apply suggestions from code review
  • Loading branch information
hfxsd authored Jan 9, 2024
commit 481b0071c152769d4e83ed0ced5a58d337444c06
7 changes: 3 additions & 4 deletions storage-engine/titan-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ If you want to speed up the writing process, compact data of the whole TiKV clus

> **Note:**
>
> Starting from v7.6.0, Titan is enabled by default on the newly created cluster. Existing clusters that are upgraded from earlier versions to v7.6.0 keep the original configuration, which means that if Titan is not explicitly enabled, it still uses RocksDB.
> Starting from v7.6.0, Titan is enabled by default on the newly created clusters. Existing clusters that are upgraded from earlier versions to v7.6.0 retain the original configuration, which means that if Titan is not explicitly enabled, it still uses RocksDB.

> **Warning:**
>
Expand Down Expand Up @@ -82,7 +82,7 @@ To adjust Titan-related parameters using TiUP, refer to [Modify the configuratio
blob-file-compression = "zstd"
```

+ By default, `zstd-dict-size` is `0KB`, which means Titan's compression is based on single values. But RocksDB compression is based on blocks (`32KB` by default). When the average size of Titan values is less than `32KB`, Titan's compression ratio is smaller than RocksdDB. Taking JSON as an example, Titan store size can be 30% to 50% bigger than RocksDB. The actual compression ratio depends on the value content and the similiarity among different values. You can set `zstd-dict-size` (for example, set it to `16KB`) to enable zstd dictionary compression to increase the compression ratio. Though the zstd dictionary compression can achieve similar compression ratio of RocksDB, it can lead to about 10% throughput regression in a typical read-write workload.
+ By default, `zstd-dict-size` is `0KB`, which means Titan's compression is based on single values. But RocksDB compression is based on blocks (`32KB` by default). When the average size of Titan values is less than `32KB`, Titan's compression ratio is smaller than RocksdDB. Taking JSON as an example, Titan store size can be 30% to 50% bigger than RocksDB. The actual compression ratio depends on the value content and the similiarity among different values. You can set `zstd-dict-size` (for example, set it to `16KB`) to enable the zstd dictionary compression to increase the compression ratio. Though the zstd dictionary compression can achieve similar compression ratio of RocksDB, it can lead to about 10% throughput regression in a typical read-write workload.
tonyxuqqi marked this conversation as resolved.
Show resolved Hide resolved

```toml
[rocksdb.defaultcf.titan]
Expand Down Expand Up @@ -144,8 +144,7 @@ To fully disable Titan for all existing and future data, you can follow these st

> **Note:**
>
> When `discardable-ratio=1`, it means TiKV will only recycle a Titan blob file when all its data are moved to RocksDB. That means before the convertion completes, these Titan blob files won't be deleted. And therefore, if a TiKV node does not have sufficent disk size to store both Titan and RocksDB data, the parameter should keep the default value instead of `1.0`. However if the disk size is big enough, `discardable-ratio = 1.0` can help to reduce the blob file GC and the disk IO.
>
> Use the default value `0.5` for [`discardable-ratio`](/tikv-configuration-file.md#discardable-ratio) when there is not enough disk space to hold both Titan and RocksDB data. In general, it is recommended to use the default value if the free disk space is less than 50%. This is because when `discardable-ratio = 1.0`, the RocksDB data keeps growing. At the same time, the recovery of Titan's original blob file requires all the data in that file to be migrated to RocksDB, which is a slow process. If the disk size is big enough, setting `discardable-ratio = 1.0` can reduce the GC of the blob file itself during compaction, which saves bandwidth.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

2. [Optional] Perform a full compaction using tikv-ctl. This process will consume a large amount of I/O and CPU resources.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

Expand Down
10 changes: 5 additions & 5 deletions storage-engine/titan-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ The prerequisites for enabling Titan are as follows:
- No range query will be performed or you do not need a high performance of range query. Because the data stored in Titan is not well-ordered, its performance of range query is poorer than that of RocksDB, especially for the query of a large range. According PingCAP's internal test, Titan's range query performance is 40% to a few times lower than that of RocksDB.
- Sufficient disk space (consider reserving a space twice of the RocksDB disk consumption with the same data volume). This is because Titan reduces write amplification at the cost of disk space. In addition, Titan compresses values one by one, and its compression rate is lower than that of RocksDB. RocksDB compresses blocks one by one. Therefore, Titan consumes more storage space than RocksDB, which is expected and normal. In some situations, Titan's storage consumption can be twice that of RocksDB.

Starting from v7.6.0, Titan is improved and is enabled by default for newly created clusters. Because small KV datasets will still be stored in RocksDB even Titan is enabled, you can enable Titan in the configuration.
Starting from v7.6.0, Titan is improved and is enabled by default for newly created clusters. Because small KV datasets will still be stored in RocksDB even Titan is enabled, you can still enable Titan in the configuration.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

If you want to improve the performance of Titan, see the blog post [Titan: A RocksDB Plugin to Reduce Write Amplification](https://pingcap.com/blog/titan-storage-engine-design-and-implementation/).

Expand Down Expand Up @@ -129,13 +129,13 @@ Therefore, the Range Merge operation is needed to keep the number of sorted runs

### Scale out and scale in

For backward compatibility, the TiKV snapshots are still in the RocksDB format during scaling up and scaling down. Because the scaled nodes are all from RocksDB at the beginning, they carry the characteristics of RocksDB, such as higher compression rate than the old TiKV nodes, smaller store size, and relatively larger write amplification in compaction. These SST files in RocksDB format will be gradually converted to Titan format after compaction.
For backward compatibility, the TiKV snapshots are still in the RocksDB format during scaling. Because the scaled nodes are all from RocksDB at the beginning, they carry the characteristics of RocksDB, such as higher compression rate than the old TiKV nodes, smaller store size, and relatively larger write amplification in compaction. These SST files in RocksDB format will be gradually converted to Titan format after compaction.

### Performance implications of `min-blob-size`

The `min-blob-size` is the basis for whether a value is stored in Titan or not. If the value is greater than or equal to `min-blob-size`, it will be stored in Titan. Otherwise it will be in RocksDB's native format. If `min-blob-size` is set too small or too large, it can cause performance degradation. The following are test results for performance of `min-blob-size` under a few workloads.
The value of `min-blob-size` decides whether a value is stored in Titan or in RocksDB. If the value is greater than or equal to `min-blob-size`, it will be stored in Titan. Otherwise it will be in RocksDB's native format. If `min-blob-size` is set too small or too large, it can cause performance degradation. The following are test results for performance of `min-blob-size` under a few workloads.

| Value size (Bytes) | pointget | pointget (Titan)| scan100 | scan100 (Titan)| scan10000 | scan10000 (Titan)| `UPDATE` | `UPDATE` (Titan) |
| Value size (Bytes) | `Point_Get` | `Point_Get` (Titan)| scan100 | scan100 (Titan)| scan10000 | scan10000 (Titan)| `UPDATE` | `UPDATE` (Titan) |
| ---------------- | ---------| -------------- | --------| ------------- | --------- | --------------- | ------ | ------------ |
| 256 | 156198 | 153487 | 15961 |6489 |269 |119 |30047 |35181 |
|500 |161142 |160234 |16131 |9267 |223 |99.1 |24162 |33113 |
Expand All @@ -149,4 +149,4 @@ The `min-blob-size` is the basis for whether a value is stored in Titan or not.
>
> `scan100` means to scan 100 records and `scan10000` means to scan 10000 records.

In this table, when the value size is 2 KB, Titan performance is the best in all workloads. When the size is 1 KB, Titan lags only in `scan10000` workload by 15%, but gains 50% in `UPDATE`. Therefore, the appropriate default value of `min-blob-size` is `1 KB`. You can choose a proper value for `min-blob-size` according to the workloads.
In this table, when the value size is `2KB`, Titan performance is the best in all workloads. When the size is `1KB`, Titan lags only in `scan10000` workload by 15%, but gains 50% in `UPDATE`. Therefore, the appropriate default value of `min-blob-size` is `1KB`. You can choose a proper value for `min-blob-size` according to the workloads.
Loading