You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It was noticed during a test suite that running the backfill procedure, would occasionally fail with confusing results (including chunks losing all entries, or the uniqueness constraint of a chunk/hypertable being broken, or a compression job being rescheduled for -infinity).
What seems to happen, is that a brand new database, created by our test runner, has not yet registered the Compression Policy worker.
The worker registration looks like this in the logs. It seems that there's a generic worker for Timescale for a given database, that is then responsible for starting jobs.
2022-05-12 14:33:46.027 UTC [1] DEBUG: registering background worker "TimescaleDB Background Worker Scheduler"
2022-05-12 14:33:46.027 UTC [1] DEBUG: starting background worker process "TimescaleDB Background Worker Scheduler"
2022-05-12 14:33:46.048 UTC [627] DEBUG: database scheduler starting for database 18419
2022-05-12 14:33:46.049 UTC [627] DEBUG: launching job 1000 "Compression Policy [1000]"
2022-05-12 14:33:46.049 UTC [1] DEBUG: registering background worker "Compression Policy [1000]"
2022-05-12 14:33:46.049 UTC [1] DEBUG: starting background worker process "Compression Policy [1000]"
backfill.sql reschedules the compression job for given chunks before doing any operations. But, when the compression policy job has not been created, the rescheduling does not take any action.
The race condition is if the registration of the background worker happens after the attempt to reschedule. The worker then can run at the same time as the main part of decompress_backfill(), causing data corruption.
We've been unable to reproduce outside of our test suite annoyingly, and I'm not sure if this something that can arise outside of freshly created databases that don't have background workers yet.
The text was updated successfully, but these errors were encountered:
It was noticed during a test suite that running the backfill procedure, would occasionally fail with confusing results (including chunks losing all entries, or the uniqueness constraint of a chunk/hypertable being broken, or a compression job being rescheduled for
-infinity
).What seems to happen, is that a brand new database, created by our test runner, has not yet registered the Compression Policy worker.
The worker registration looks like this in the logs. It seems that there's a generic worker for Timescale for a given database, that is then responsible for starting jobs.
backfill.sql
reschedules the compression job for given chunks before doing any operations.But, when the compression policy job has not been created, the rescheduling does not take any action.
timescaledb-extras/backfill.sql
Lines 98 to 108 in 2358d75
The race condition is if the registration of the background worker happens after the attempt to reschedule. The worker then can run at the same time as the main part of
decompress_backfill()
, causing data corruption.We've been unable to reproduce outside of our test suite annoyingly, and I'm not sure if this something that can arise outside of freshly created databases that don't have background workers yet.
The text was updated successfully, but these errors were encountered: