perf(sql): speed up parallel sample by with fill #5143

nwoolmer · 2024-11-06T13:41:25Z

WIP

Changelist

Parallel sample by:

Parallel sample by:
- now supports fill with multiple values
- correctly uses all fill values instead of just the first
- will now generate radix sorts more frequently, which are 2x-10x faster than our tree-based sort
- will now only sort unfilled data, instead of sorting both filled and real data
a mismatch between column and fill value in from-to queries previously caused UnsupportedOperationException and a 500 response code. Now it will return a useful message:
- Example query: select timestamp, count from trades sample by 1d from '2008-12-28' to '2009-01-05' FILL(1.0)
- Error: invalid fill value, cannot cast DOUBLE to LONG

Benchmarks

Table of 100m rows.
Schema - (TIMESTAMP, DOUBLE)
Data is over a 1 day period.

With 100m rows

Query:

select timestamp, avg(price) from bench
sample by 1T fill(null)

Plan before

``` Sort keys: [timestamp] Fill Range stride: '1T' values: [null] Async Group By workers: 8 keys: [timestamp] values: [avg(price)] filter: null PageFrame Row forward scan Frame forward scan on: bench ````

Plan after

Fill Range
  stride: '1T'
  values: [null]
    Radix sort light
      keys: [timestamp]
        Async Group By workers: 8
          keys: [timestamp]
          values: [avg(price)]
          filter: null
            PageFrame
                Row forward scan
                Frame forward scan on: bench

Before: timeout (60s)
After: 22.75s

With 10m rows:

Query:

select timestamp, price from (
select timestamp, avg(price) as price from bench4
sample by 1T fill(null)
) limit -1

Before plan:

``` Limit lo: -1 Sort keys: [timestamp] Fill Range stride: '1T' values: [null] Async Group By workers: 8 keys: [timestamp] values: [avg(price)] filter: null PageFrame Row forward scan Frame forward scan on: bench4 ```

After plan:

``` Limit lo: -1 Fill Range stride: '1T' values: [null] Radix sort light keys: [timestamp] Async Group By workers: 8 keys: [timestamp] values: [avg(price)] filter: null PageFrame Row forward scan Frame forward scan on: bench4 ```

Before: 13.4s, 12.51s, 14.4s
After: 1.9s, 1.43s, 1.33s

Queries which do not require the sort will be slower, for example:

select max(price) from (
    select timestamp, avg(price) from bench
    sample by 1T fill(null)
)

This is because in the new version, a sort will be generated, and in the old, it would not. However, this was at the cost of the fill not necessarily being correct or filling on the right timestamp.

Alternative

If we allowed group by to return a timestampIndex in this case, even though the result will not be sorted (i.e (timestamp, SCAN_DIRECTION_OTHER), then we could still use parallel fill in its original form. This requires relaxing the rule that timestampIndex means ASC (designated) timestamp.

Notes

This PR reworks the parallel sample by fill in an effort to improve performance and reliability. Previously, the fill would be generated after group by, and before order by. This meant we had to first exhaust the group by result, recording timestamps. Then we filled the rest of the timestamps.

Unfortunately, group by does not guarantee a designated timestamp, making it problematic to identify the correct column to fill. Furthermore, this would cause both filled and unfilled data would be subsequently sorted, so sparse datasets were as expensive to sort as dense datasets.

By moving it to after order by, we ensure that we have a designated timestamp to orient the fill. We also generate radix sort more frequently, which can be 2-10x faster than our tree-based sort. We also only have to sort the unfilled data, leading to a much quicker an dmore efficient sort.

amunra · 2024-11-14T18:32:50Z

Not really code you've directly touched, just stumbled on it via your tests. The following error message is rather cryptic.

    private void guardAgainstFromToWithKeyedSampleBy(boolean isFromTo) throws SqlException {
        if (isFromTo) {
            throw SqlException.$(0, "FROM-TO intervals are not supported for keyed SAMPLE BY queries");
        }
    }

I have no idea what a keyed query is.

amunra

👍

nwoolmer · 2024-11-15T17:27:48Z

Thanks for the review!

bluestreak01

cc @amunra

core/src/main/java/io/questdb/griffin/engine/groupby/FillRangeRecordCursorFactory.java

glasstiger · 2025-01-04T16:39:14Z

[PR Coverage check]

😍 pass : 28 / 29 (96.55%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/griffin/engine/groupby/FillRangeRecordCursorFactory.java	16	17	94.12%
🔵	io/questdb/griffin/SqlCodeGenerator.java	12	12	100.00%

core/src/main/java/io/questdb/griffin/engine/groupby/FillRangeRecordCursorFactory.java

core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java

core/src/main/java/io/questdb/griffin/engine/groupby/FillRangeRecordCursorFactory.java

… timestamp i.e ascending sorted by timestamp. the result is that we generate faster sorts in some cases, and also sort smaller amounts of data (only unfilled) still an outstanding issue with set operations, where a sort is not being appropriately pushed down, causing the fill to be lost

# Conflicts: # core/src/main/java/io/questdb/griffin/model/QueryModel.java

add some basic validation to catch mismatched type error earlier

7d7b64d

nwoolmer added Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution labels Nov 6, 2024

nwoolmer marked this pull request as draft November 6, 2024 13:42

nwoolmer marked this pull request as ready for review November 6, 2024 13:43

nwoolmer added 3 commits November 6, 2024 13:43

fix condition

7f91fb0

fix SqlOptimiserTest

bcf3703

condition was back to front

2c22e95

amunra self-requested a review November 14, 2024 16:06

amunra previously approved these changes Nov 15, 2024

View reviewed changes

bluestreak01 requested changes Nov 18, 2024

View reviewed changes

core/src/main/java/io/questdb/griffin/engine/groupby/FillRangeRecordCursorFactory.java Outdated Show resolved Hide resolved

core/src/main/java/io/questdb/griffin/engine/groupby/FillRangeRecordCursorFactory.java Outdated Show resolved Hide resolved

bluestreak01 reviewed Nov 18, 2024

View reviewed changes

core/src/main/java/io/questdb/griffin/engine/groupby/FillRangeRecordCursorFactory.java Outdated Show resolved Hide resolved

bluestreak01 reviewed Nov 18, 2024

View reviewed changes

core/src/main/java/io/questdb/griffin/engine/groupby/FillRangeRecordCursorFactory.java Outdated Show resolved Hide resolved

bluestreak01 reviewed Nov 18, 2024

View reviewed changes

core/src/main/java/io/questdb/griffin/engine/groupby/FillRangeRecordCursorFactory.java Show resolved Hide resolved

nwoolmer added 2 commits November 19, 2024 09:42

refactor to pass fill values positions

2e7d305

comments

536cdb7

nwoolmer dismissed amunra’s stale review via 536cdb7 November 19, 2024 09:46

nwoolmer and others added 6 commits November 19, 2024 09:50

fix condition

7de45a9

Merge branch 'master' into nw_fill_range_error

fe086c7

Merge remote-tracking branch 'origin/master' into nw_fill_range_error

3348c92

merge fallout

413d43e

fixes

1355786

formatting

23aa47c

bluestreak01 requested changes Jan 4, 2025

View reviewed changes

core/src/main/java/io/questdb/griffin/engine/groupby/FillRangeRecordCursorFactory.java Outdated Show resolved Hide resolved

core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java Outdated Show resolved Hide resolved

bluestreak01 reviewed Jan 4, 2025

View reviewed changes

core/src/main/java/io/questdb/griffin/engine/groupby/FillRangeRecordCursorFactory.java Show resolved Hide resolved

nwoolmer changed the title ~~fix(sql): add better error message when sample by fill type is mismatched~~ perf(sql): speed up parallel sample by with fill Jan 7, 2025

nwoolmer added 10 commits January 7, 2025 16:49

fix tests, still todo - fix set operations

b3c763a

Merge branch 'master' into nw_fill_range_error

90e634f

improve error message

bbddae6

safety

e77eadb

safety

5c4ec03

still not a fix

3dd80f6

hacky fix, more attention required

9f1a08e

Merge branch 'refs/heads/master' into nw_fill_range_error

924b648

# Conflicts: # core/src/main/java/io/questdb/griffin/model/QueryModel.java

Merge branch 'master' into nw_fill_range_error

f8e51df

fix test

21ce459

bluestreak01 added the DO NOT MERGE These changes should not be merged to main branch label Jan 13, 2025

bluestreak01 marked this pull request as draft January 13, 2025 12:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(sql): speed up parallel sample by with fill #5143

perf(sql): speed up parallel sample by with fill #5143

nwoolmer commented Nov 6, 2024 •

edited

Loading

amunra commented Nov 14, 2024

amunra left a comment

nwoolmer commented Nov 15, 2024

bluestreak01 left a comment

glasstiger commented Jan 4, 2025

perf(sql): speed up parallel sample by with fill #5143

Are you sure you want to change the base?

perf(sql): speed up parallel sample by with fill #5143

Conversation

nwoolmer commented Nov 6, 2024 • edited Loading

amunra commented Nov 14, 2024

amunra left a comment

Choose a reason for hiding this comment

nwoolmer commented Nov 15, 2024

bluestreak01 left a comment

Choose a reason for hiding this comment

glasstiger commented Jan 4, 2025

[PR Coverage check]

file detail

nwoolmer commented Nov 6, 2024 •

edited

Loading