Skip to content

Commit

Permalink
TSDB: sorted ingest (#247)
Browse files Browse the repository at this point in the history
This adds and `ingest_order:sorted` option to the tsdb track. We believe
that in production data is quite likely to be ingested in mostly-sorted
order. And we were testing with unsorted data. This provides the option
to test with fully-sorted data. That's not *exactly* what we expect in
production, but it's much closer than the other option. In a future
follow up we'll add the option to test with mostly-sorted data.
nik9000 authored Mar 3, 2022
1 parent 17d8dbd commit 3f65741
Showing 2 changed files with 47 additions and 12 deletions.
20 changes: 20 additions & 0 deletions tsdb/README.md
Original file line number Diff line number Diff line change
@@ -111,6 +111,25 @@ Once that finishes you need to generate `documents-1k.json` for easy testing:
head -n 1000 documents.json > documents-1k.json
```

Now you'll need to make the `-sorted` variant. First install https://github.com/winebarrel/jlsort .
Then:
```
mkdir tmp
TMPDIR=tmp ~/Downloads/jlsort/target/release/jlsort -k '@timestamp' documents.json > documents-sorted.json
rm -rf tmp
head -n 1000 documents-sorted.json > documents-sorted-1k.json
```

Now zip everything up:
```
pbzip2 documents-1k.json
pbzip2 documents-sorted-1k.json
pbzip2 documents.json
pbzip2 documents-sorted.json
```

Now upload all of that to the AWS location from `track.json`.

### Parameters

This track allows to overwrite the following parameters using `--track-params`:
@@ -124,6 +143,7 @@ This track allows to overwrite the following parameters using `--track-params`:
* `source_enabled` (default: true): A boolean defining whether the `_source` field is stored in the index.
* `index_mode` (default: time_series): Whether to make a standard index (`standard`) or time series index (`time_series`)
* `codec` (default: default): The codec to use compressing the index. `default` uses more space and less cpu. `best_compression` uses less space and more cpu.
* `ingest_order` (default: jumbled): Should the data be loaded in `sorted` order or a more `jumbled`, mostly random order.

### License

39 changes: 27 additions & 12 deletions tsdb/track.json
Original file line number Diff line number Diff line change
@@ -10,18 +10,33 @@
}
],
"corpora": [
{
"name": "tsdb",
"base-url": "https://benchmarks-elasticsearch-org.s3.us-west-2.amazonaws.com/corpora/tsdb/",
"documents": [
{
"source-file": "documents.json.bz2",
"document-count": 122613113,
"compressed-bytes": 11502047679,
"uncompressed-bytes": 138721940450
}
]
}
{%- if ingest_order is defined and ingest_order == "sorted" %}
{
"name": "tsdb",
"base-url": "https://benchmarks-elasticsearch-org.s3.us-west-2.amazonaws.com/corpora/tsdb/",
"documents": [
{
"source-file": "documents-sorted.json.bz2",
"document-count": 122613113,
"compressed-bytes": 10890215370,
"uncompressed-bytes": 138721940450
}
]
}
{%- else %}
{
"name": "tsdb",
"base-url": "https://benchmarks-elasticsearch-org.s3.us-west-2.amazonaws.com/corpora/tsdb/",
"documents": [
{
"source-file": "documents.json.bz2",
"document-count": 122613113,
"compressed-bytes": 11502047679,
"uncompressed-bytes": 138721940450
}
]
}
{%- endif %}
],
"operations": [
{{ rally.collect(parts="operations/*.json") }}

0 comments on commit 3f65741

Please sign in to comment.