Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add wait param for all snapshot API #1571

Merged
merged 12 commits into from
Apr 4, 2023

Conversation

Weijun-H
Copy link
Contributor

This pr is related to #1532

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you lint your code locally using cargo fmt command prior to submission?
  3. Have you checked your code using cargo clippy command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@Weijun-H Weijun-H force-pushed the wait-parm-for-snapshot branch 2 times, most recently from 9973712 to 34a6128 Compare March 16, 2023 11:50
@agourlay agourlay changed the title feat: add wait parm for all snapshot API feat: add wait param for all snapshot API Mar 16, 2023
Ok(true)
}

pub async fn do_list_full_snapshots(
toc: &TableOfContent,
dispatcher: &Dispatcher,
// toc: &TableOfContent,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left over to remove :)


archiving.await??;
archiving.await.unwrap().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the unwrap here and below in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use ? firstly, but it will be error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.unwrap().unwrap() will kill the whole service

});

if wait {
let result = task.await.unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about unwrap

@agourlay
Copy link
Member

agourlay commented Mar 16, 2023

Thank you for the contribution 👍

There are a few things missing but it is looking promising otherwise.

When changing the REST API, it is required to also update the openapi schema definition.

This schema is used for integration testing and generating clients down the line.

The following documentation will guide you through it.

At the end, it will be great to add an openapi integration test using the new wait param.

@Weijun-H Weijun-H force-pushed the wait-parm-for-snapshot branch from c81149a to f23eb18 Compare March 17, 2023 23:40
Copy link
Member

@agourlay agourlay Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would nice to test the usage of wait=false as well.

Having rock solid test assertions there is not clear to me, maybe:
1 - count number of snapshots on collection
2 - create snapshot async
3 - assert count is the same
4 - wait until count goes up by one

Not ideal as prone to timing issues if the snapshot creating is very fast.

Got other ideas maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following my commit, you could also assert on the status code 202

Copy link
Contributor Author

@Weijun-H Weijun-H Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to test them locally but met some errors when running ./tests/openapi_integration_test.sh . My env is MacOS 12.5.1

/usr/local/lib/python3.9/site-packages/requests/adapters.py:519: ConnectionError
=========================== short test summary info ============================
ERROR openapi_integration/test_alias.py::test_cant_create_alias_if_collection_exists
ERROR openapi_integration/test_alias.py::test_cant_create_collection_if_alias_exists
ERROR openapi_integration/test_alias.py::test_alias_operations - requests.exc...
ERROR openapi_integration/test_basic_retrieve_api.py::test_points_retrieve - ...
ERROR openapi_integration/test_basic_retrieve_api.py::test_exclude_payload - ...
ERROR openapi_integration/test_basic_retrieve_api.py::test_is_empty_condition
ERROR openapi_integration/test_basic_retrieve_api.py::test_recommendation - r...
ERROR openapi_integration/test_basic_retrieve_api.py::test_query_nested - req...
ERROR openapi_integration/test_basic_retrieve_api_on_disk.py::test_points_retrieve_on_disk
ERROR openapi_integration/test_basic_retrieve_api_on_disk.py::test_exclude_payload_on_disk
ERROR openapi_integration/test_basic_retrieve_api_on_disk.py::test_is_empty_condition_on_disk
ERROR openapi_integration/test_basic_retrieve_api_on_disk.py::test_recommendation_on_disk
ERROR openapi_integration/test_basic_retrieve_api_on_disk.py::test_query_nested_on_disk
ERROR openapi_integration/test_basic_retrieve_multivec_api.py::test_points_retrieve
ERROR openapi_integration/test_basic_retrieve_multivec_api.py::test_exclude_payload
ERROR openapi_integration/test_basic_retrieve_multivec_api.py::test_is_empty_condition
ERROR openapi_integration/test_basic_retrieve_multivec_api.py::test_recommendation
ERROR openapi_integration/test_basic_retrieve_multivec_api.py::test_query_nested
ERROR openapi_integration/test_collection_update.py::test_collection_update
ERROR openapi_integration/test_count.py::test_exact_count_search - requests.e...
ERROR openapi_integration/test_count.py::test_approx_count_search - requests....
ERROR openapi_integration/test_db_lock.py::test_lock_db_for_writes - requests...
ERROR openapi_integration/test_delete_points.py::test_delete_points - request...
ERROR openapi_integration/test_euclid_dist.py::test_search_with_threshold - r...
ERROR openapi_integration/test_filter.py::test_match_any - requests.exception...
ERROR openapi_integration/test_filter_values_count.py::test_filter_values_count
ERROR openapi_integration/test_fts.py::test_scroll_with_prefix - requests.exc...
ERROR openapi_integration/test_geo_payload_index.py::test_payload_operations
ERROR openapi_integration/test_multicollection_reco.py::test_recommend_with_wrong_vector_size
ERROR openapi_integration/test_multicollection_reco.py::test_recommend_from_another_collection
ERROR openapi_integration/test_nested_payload_indexing.py::test_payload_indexing_operations
ERROR openapi_integration/test_payload_indexing.py::test_payload_indexing_operations
ERROR openapi_integration/test_payload_operations.py::test_payload_operations
ERROR openapi_integration/test_score_threshold.py::test_search_with_threshold
ERROR openapi_integration/test_snapshot.py::test_snapshot_operations - reques...
ERROR openapi_integration/test_uuid_ops.py::test_uuid_operations - requests.e...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error also happened on original dev branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are working fine, they just require you to start the server manually beforehand.

cargo run --bin qdrant

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, definitely a docker issue.

Not currently on Mac so it is hard to help concretely sorry.

Have you had a look at this doc?

For debugging purpose, you could try to to replace
QDRANT_HOST='localhost:6334' in ./tests/basic_grpc_test.sh
by
QDRANT_HOST='host.docker.internal:6334'

See if it helps :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to QDRANT_HOST='host.docker.internal:6334 can work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It it works, you could apply the same trick to openapi/tests/openapi_integration/helpers/settings.py for the time being.

Setting QDRANT_HOST should work as well.

This way you can make progress on your test 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should open a new ticket for this issue? If so, I am willing to do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think an issue is necessary as we have other devs on Mac who have no issues running those tests.
It seems to be specific to your system :)

if let Err(e) = filename {
return Err(ErrorInternalServerError(e.to_string()));
}
let filename = filename.unwrap().map_err(storage_into_actix_error)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning the unwraps, there is one left here :)

let filename = filename.unwrap().map_err(storage_into_actix_error)?;
Ok(NamedFile::open(filename)?)
} else {
Ok(NamedFile::open("not_found")?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the snapshot does not exist, we need a proper 404

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should return 202 here too, because it will not wait the background process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this function can be non waiting.
The user wants to retrieve the path via a single API call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes sense. Not all of API in snapshot should have wait parm.

Ok(snapshot.await??)
} else {
Ok(SnapshotDescription {
name: "".to_string(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure about using the snapshot description, empty or with message, to convey the semantic of the operation being in progress.

Ideally we should return a HTTP 202 I believe https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/202

WDYT?

Copy link
Contributor Author

@Weijun-H Weijun-H Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, return 202 is a better solution. But I am not sure how to return StatusCode here because of the function signature.

@agourlay
Copy link
Member

@Weijun-H I took the liberty to push to your branch to streamline the changes here.

My commit introduces a way to return HTTP 202 accepted when not waiting on the snapshot creation.

This takes place at the API level, meaning that we still need to create a dummy SnapshotDescription that won't be used.

WDYT?

@Weijun-H Weijun-H force-pushed the wait-parm-for-snapshot branch 2 times, most recently from d605ebf to a10b8b1 Compare March 23, 2023 19:23
@Weijun-H Weijun-H requested a review from agourlay March 25, 2023 13:04
Copy link
Member

@agourlay agourlay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the integration tests 👍

I have added a few comments that need to be addressed to move forward.

This is slowly getting closer to completion.

result:
$ref: "#/components/schemas/SnapshotDescription"
'202':
description: operation in prohress
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: operation in prohress
description: operation in progress

result:
$ref: "#/components/schemas/SnapshotDescription"
'202':
description: operation in prohress
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: operation in prohress
description: operation in progress

@@ -83,6 +83,12 @@ paths:
required: true
schema:
type: string
- name: wait
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block needs to be removed given that there is no async. behavior possible on GET

)
assert response.ok
# wait for snapshot to be deleted
sleep(5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 seconds seems a bit excessive IMHO

Was it not working with lower values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must set the values to a minimum of 2 in order to pass the tests. If the values are any lower than 2, I will not meet the requirements for the tests.

)
assert response.status_code == 202
# wait for snapshot to be created
sleep(5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same concern about duration as above

@@ -171,13 +182,12 @@ async fn recover_from_snapshot(

#[get("/collections/{name}/snapshots/{snapshot_name}")]
async fn get_snapshot(
toc: web::Data<TableOfContent>,
dispatcher: web::Data<Dispatcher>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one could rollback fully to using toc instead of dispatcher

}

#[get("/snapshots/{snapshot_name}")]
async fn get_full_snapshot(
toc: web::Data<TableOfContent>,
dispatcher: web::Data<Dispatcher>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about rollback to toc

@@ -63,21 +63,49 @@ pub async fn do_list_aliases(
}

pub async fn do_list_snapshots(
toc: &TableOfContent,
dispatcher: &Dispatcher,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listing snapshot cannot be performed in an async fashion.

The caller has no mean to get the collection afterwards.

collection_name: &str,
) -> Result<Vec<SnapshotDescription>, StorageError> {
Ok(toc
Ok(dispatcher
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be rollback to toc as a side effect

) -> impl Responder {
let snapshot_name = path.into_inner();
let timing = Instant::now();
let response = do_delete_full_snapshot(toc.get_ref(), &snapshot_name).await;
let wait = params.wait.unwrap_or(true);
let response = do_delete_full_snapshot(dispatcher.get_ref(), &snapshot_name, wait).await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it also possible to return 202 for the delete operations as well?
This would make the API more consistent.

@@ -73,11 +73,23 @@ pub async fn do_list_snapshots(
.await?)
}

pub async fn do_create_snapshot(
toc: &TableOfContent,
pub async fn do_create_snapshot<'a>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure why this lifetime was made explicit. Why is it needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is redundant. I will remove it.

@Weijun-H
Copy link
Contributor Author

Hi @agourlay, I ran cargo test --all locally, and it passed. Not sure why Tests / test (windows-latest) (pull_request) failed

@agourlay
Copy link
Member

@Weijun-H This is indeed an issue with CI, I will fix it 👍

Copy link
Member

@agourlay agourlay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thank you for your patience throughout the review process 👍

@agourlay
Copy link
Member

agourlay commented Apr 3, 2023

@Weijun-H A PR impacting the REST APIs has been merged.
Could you please fix the conflicts so that we can merge your PR!

@Weijun-H Weijun-H force-pushed the wait-parm-for-snapshot branch from 85117c2 to 9a479bf Compare April 3, 2023 15:41
@Weijun-H
Copy link
Contributor Author

Weijun-H commented Apr 3, 2023

@Weijun-H A PR impacting the REST APIs has been merged. Could you please fix the conflicts so that we can merge your PR!

It seems that I accidentally overwrote the changes by others.

@generall
Copy link
Member

generall commented Apr 3, 2023

image

@Weijun-H
Copy link
Contributor Author

Weijun-H commented Apr 3, 2023

image

Sorry for this inconvenience 😢

@generall
Copy link
Member

generall commented Apr 3, 2023

i might be able to recover from a local copy

@generall generall merged commit d1e5498 into qdrant:dev Apr 4, 2023
generall added a commit that referenced this pull request Apr 11, 2023
* feat: add wait parm for all snapshot API

* update openAPI

* avoid unwrap

* remove all unwarp

* return HTTP 202 when no waiting on snapshot creation

* return 202 when non wait

* update open API

* update param on python test

* update and test api

* optimize the test

* return 202 when non-wait delete

* recover fixes

---------

Co-authored-by: Arnaud Gourlay <arnaud.gourlay@gmail.com>
Co-authored-by: Andrey Vasnetsov <andrey@vasnetsov.com>
@generall generall mentioned this pull request Apr 19, 2023
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants