-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[indexer] table info service backup support #14373
Conversation
⏱️ 2h 23m total CI duration on this PR
|
8daf2f6
to
d663816
Compare
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Show resolved
Hide resolved
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
ab39429
to
84f816c
Compare
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
i am working on:
|
f2aa2fc
to
5823faa
Compare
5823faa
to
ecf8017
Compare
let backup_restore_operator = backup_restore_operator.clone(); | ||
let context = self.context.clone(); | ||
let _task = tokio::spawn(async move { | ||
let backup_restore_operator = backup_restore_operator.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to clone so many times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
/// Backup service mode with GCS bucket name. | ||
Backup(String), | ||
/// Restore service mode with GCS bucket name. | ||
Restore(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do you plan to use Restore mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question.
Restore mode is for people to use when bootstrap a new node. It can help when the node lags/has issues and the rest of the world has already pruned the data.
IndexingOnly - no backup or restore, async db only.
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
ecosystem/indexer-grpc/indexer-grpc-table-info/src/table_info_service.rs
Outdated
Show resolved
Hide resolved
let file_name = path.file_name().unwrap().to_string_lossy(); | ||
if path.is_dir() | ||
&& file_name.starts_with(&target_snapshot_directory_prefix) | ||
&& !file_name.ends_with(".tmp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it safe to assume that snapshot is done if tmp doesn't show up in the filename? cc @grao1991
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know? @msmouse any idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What? You mean rocksdb adds a ".tmp" suffix while taking a checkpoint? That's great design (moving / renaming folder should be assumed atomic)
if rocksdb didn't do it, we should do it manually in your use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea then i'll use the tmp
for now to determine the snapshot is done.
let file_name = path.file_name().unwrap().to_string_lossy(); | ||
if path.is_dir() | ||
&& file_name.starts_with(&target_snapshot_directory_prefix) | ||
&& !file_name.ends_with(".tmp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know? @msmouse any idea?
} | ||
} | ||
|
||
backup_restore_operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if you need to do something here to handle the case when multiple jobs uploading the backup at the same time. Can you double check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean multiple backup services are running? No; my thought is to check it on the deployment side. If we do want to check within the program(runtime?), we can either use some distributed locking or some coarse method like, checking metadata versioning. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we want to have multiple jobs to be able to backup things to ensure a fair reliability. I don't have preference on how they corporate. Either doing it on client side or on server side are ok.
We can leave it as a TODO for future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'd just add a random suffix to the file name so we never write to the same remote file and we can observe the conflicting behavior better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I added a to-do as follow-up to check if concurrent jobs are running. One of the concurrent scenarios for now is multiple full nodes running with backup enabled.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
✅ Forge suite
|
Description
In this PR, we introduce the backup of db
index_indexer_async_v2_db
, i.e., async table info db.Design
There are two major loops:
Example Config
Breaking Change
This change is a breaking change around the configuration. It's not released yet and the only impact is our internal deployment. We need to update the configuration for the next deployment.
Type of Change
Which Components or Systems Does This Change Impact?
How Has This Been Tested?
tested locally, looks good
GCS:
Key Areas to Review
Checklist