-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
app/vmselect/promql: limit staleness detection for increase/increase_pure/delta #8052
Conversation
…pure/delta doInternal has adaptive staleness detection mechanism. It is calculated using timestamp distance between samples in selected list of samples. It is dynamic because VM can store signals from many sources with different samples resolution. And while it works for most of cases, there are edge cases for rollup functions that are comparing values between windows: increase, increase_pure, delta. The edge case 1. There was a gap between series because of the missed scrape or two. In this case staleness will trigger and increase-like functions will assume the value they need to compare with is 0. In result, this could produce spikes for a flappy data - see #894 This problem was solved by introducing a `realPrevValue` field - 1f19c16. It stores the closest real sample value on selected interval and is used for comparison when samples start after the gap. The edge case 2. `realPrevValue` doesn't respect staleness interval. In result, even if gap between samples is huge (hours), the increase-like functions will not consider it as a new series that started from 0. See #8002. Covering both edge cases is tricky, because `realPrevValue` has to respect and not respect the staleness interval in the same time. In other words, it should be able to ignore periodic missing gaps, but reset if the gap is too big. While "too big gap" can't be figured out empirically, I suggest using `-search.maxStalenessInterval` for this purpose. If `-search.maxStalenessInterval` is set to 0 (default), then `realPrevValue` ignores staleness interval. If `-search.maxStalenessInterval` is > 0, then `realPrevValue` respects it as a staleness interval. This also covers #8045 Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
app/vmselect/promql/rollup_test.go
Outdated
testRowsEqual(t, gotValues, rc.Timestamps, valuesExpected, timestampsExpected) | ||
}) | ||
// even if LookbackDelta < gap | ||
t.Run("step>gapLookbackDelta<gap", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in all other sub-tests: Please use the same sub-test names in both tests (TestRollupDeltaWithStaleness and TestRollupIncreasePureWithStaleness). Up to you which one to choose.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
…pure/delta (#8052) `doInternal` has adaptive staleness detection mechanism. It is calculated using timestamp distance between samples in selected list of samples. It is dynamic because VM can store signals from many sources with different samples resolution. And while it works for most of cases, there are edge cases for rollup functions that are comparing values between windows: increase, increase_pure, delta. The edge case 1. There was a gap between series because of the missed scrape or two. In this case staleness will trigger and increase-like functions will assume the value they need to compare with is 0. In result, this could produce spikes for a flappy data - see #894 This problem was solved by introducing a `realPrevValue` field - 1f19c16. It stores the closest real sample value on selected interval and is used for comparison when samples start after the gap. The edge case 2. `realPrevValue` doesn't respect staleness interval. In result, even if gap between samples is huge (hours), the increase-like functions will not consider it as a new series that started from 0. See #8002. Covering both edge cases is tricky, because `realPrevValue` has to respect and not respect the staleness interval in the same time. In other words, it should be able to ignore periodic missing gaps, but reset if the gap is too big. While "too big gap" can't be figured out empirically, I suggest using `-search.maxStalenessInterval` for this purpose. If `-search.maxStalenessInterval` is set to 0 (default), then `realPrevValue` ignores staleness interval. If `-search.maxStalenessInterval` is > 0, then `realPrevValue` respects it as a staleness interval. ### Checklist The following checks are **mandatory**: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit 7d2a676)
doInternal
has adaptive staleness detection mechanism. It is calculated using timestamp distance between samples in selected list of samples. It is dynamic because VM can store signals from many sources with different samples resolution. And while it works for most of cases, there are edge cases for rollup functions that are comparing values between windows: increase, increase_pure, delta.The edge case 1.
There was a gap between series because of the missed scrape or two. In this case staleness will trigger and increase-like functions will assume the value they need to compare with is 0. In result, this could produce spikes for a flappy data - see #894 This problem was solved by introducing a
realPrevValue
field - 1f19c16. It stores the closest real sample value on selected interval and is used for comparison when samples start after the gap.The edge case 2.
realPrevValue
doesn't respect staleness interval. In result, even if gap between samples is huge (hours), the increase-like functions will not consider it as a new series that started from 0. See #8002.Covering both edge cases is tricky, because
realPrevValue
has to respect and not respect the staleness interval in the same time. In other words, it should be able to ignore periodic missing gaps, but reset if the gap is too big. While "too big gap" can't be figured out empirically, I suggest using-search.maxStalenessInterval
for this purpose. If-search.maxStalenessInterval
is set to 0 (default), thenrealPrevValue
ignores staleness interval. If-search.maxStalenessInterval
is > 0, thenrealPrevValue
respects it as a staleness interval.Checklist
The following checks are mandatory: