-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add featurestore module including Featurestore, EntityType, and Feature classes; add get, update, delete, list methods in all featurestore classes; add search method in Feature class #850
Conversation
9af4f69
to
6c97c6e
Compare
|
||
def __init__( | ||
self, | ||
entity_type_name: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a common pattern in other resource? I personally find this a bit confuse for overload this arg with both short form and long form, which make a weird case when full path is used, but project/location/featurestore_id also provided. Is this possible to have overloading constructor in Python?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python does not support overloading constructor, that's why we have this as the common pattern for all other Vertex SDK resource classes, for example this is Vertex AI SDK Dataset class constructor.
Agree on the confusion of the usage of optional arguments, but that's the language limitation we are working with. We try to compensate by emphasizing the Docstring and Example Usage throughout the SDK.
def __init__( | ||
self, | ||
entity_type_name: str, | ||
featurestore_id: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably easier to keep the project , location, featurestore_id order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the usage pattern for Vertex SDK is user to initialize the project, see usage example https://github.com/googleapis/python-aiplatform#initialization. And user does not need to specify project
and location
unless overwritten. That's why Vertex SDK normally ranks project
and location
to bottom among other arguments and mark as optional.
Also Optional
arguments can not go before the Required
arguments.
Example: "projects/123/locations/us-central1/featurestores/my_featurestore_id/entityTypes/my_entity_type_id" | ||
or "my_entity_type_id" when project and location are initialized or passed, with featurestore_id passed. | ||
featurestore_id (str): | ||
Optional. Featurestore to retrieve entityType from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the value if not set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optional featurestore_id
defaults to None
, see couple of lines above.
featurestore_id: Optional[str] = None,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if it's None, does it mean the value from the entity_type_name name will be used, which implies entity_type_name must be the fully qualified path. And when it's None, then entity_type_name must be the short ID form. Probably adding some explanation to be clear
""" | ||
return _featurestores.Featurestore(self._featurestore_name) | ||
|
||
def get_feature(self, feature_id: str) -> "_featurestores.Feature": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect the feature resource exist (i.e it was created before)? If not, then should the (Feature) update function throw error in that case?
Similar for get_entity_type for featurestore class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both get_* methods will pass in the fully-qualified resource name into resource class constructor. If the resource exists, the method will return the constructed resource class, if the resource does not exist, it will actually hit the service client's get_* method and throw service error.
FEATURE_NAME_PATTERN_REGEX = ( | ||
ENTITY_TYPE_NAME_PATTERN_REGEX | ||
+ r"\/features\/(?P<feature_id>" | ||
+ RESOURCE_ID_PATTERN_REGEX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more check for feature_id -- it cannot be one of reserved words: entity_id, feature_timestamp, arrival_timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vertex SDK is relying on the service for this type of value validation, most validation we are doing is to assist certain usage within the SDK which is to prepare the value to be passed into the service, i.e. these pattern validations essentially are to assist the limitation due to lack of overloading constructor, and we are validating the arguments and form the final resource name to pass into the service.
However, if the service does not throw an error on the validation you proposed and will cause un alerted error, we can definitely add these in the SDK layer.
Please let me know!
|
||
def __init__( | ||
self, | ||
featurestore_name: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change the params in project , location, featurestore_id order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
project
and location
are Optional
arguments, whereas featurestore_name
is Required
argument. The required arguments have to be listed before the optional arguments. There was another discussion of why our project
and location
are marked as optional.
): | ||
raise ValueError( | ||
f"{featurestore_name} is not in form of a fully-qualified " | ||
f"featurestore resource name nor a featurestore ID." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nor a featurestore ID -> nor a featurestore ID with project and location are initialized or passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored this raise ValueError
into featurestore_utils.py
and it does not involve project
and location
.
"""Deletes entity_type resources in this Featurestre given their entity_type IDs. | ||
WARNING: This deletion is permanent. | ||
|
||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add entity_type_ids in args descripiton
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"""Deletes feature resources in this EntityType given their feature IDs. | ||
WARNING: This deletion is permanent. | ||
|
||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add feature_ids description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added ids docstring in both Featurestore class and EntityType class's delete method
6c97c6e
to
6aece79
Compare
List[EntityTypes] - A list of managed entityType resource objects | ||
""" | ||
|
||
cls._featurestore_id = featurestore_utils.validate_and_get_featurestore_resource_id( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mutating the class itself, not an instance of the class, which we shouldn't do because it alters the value for this class and all instances of this class. I assume this is needed because the _list
implementation or one of the base methods it depends on isn't satisfactory for nested resources.
The guidance here is to override those methods in a general way so they can be used for nested resources. I'm open to raising NotImplementedError
on list
in this PR and following up in another PR to capture this change because it would be valuable for more than just FeatureStore including Metadata and Tensorboard resources. Or you can tackle this in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated a couple of unnecessary cls._* =*
which seems causing the confusion on mutating the class (Those are only to parse some ids and prepare the parent resource name for retrieving child resources).
I have changed the base _list()
method to include an optional parent
field. Let me know what you think on this implementation and if this addressed your concern.
One thing I would need more guidance is the forming of _resource_noun
for the nested resource, which require the ancestor IDs as part of the _resource_noun, which can not be pre defined as our current pattern.
(compat.V1, featurestore_service_client_v1.FeaturestoreServiceClient), | ||
(compat.V1BETA1, featurestore_service_client_v1beta1.FeaturestoreServiceClient), | ||
( | ||
"online_" + compat.V1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider managing online as a separate client because that's currently the pattern we follow(ie: Endpoint and Prediction), so this would be introducing a new pattern in usage, and it seems like we may want the online client to remain connected (_is_temporary=False
) for performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added an FeaturestoreOnlineServingClientWithOverride
matching the current pattern
return bool(re.compile(r"^" + RESOURCE_ID_PATTERN_REGEX + r"$").match(resource_id)) | ||
|
||
|
||
def validate_featurestore_name(featurestore_name: str) -> Dict[str, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preference to name these parse
instead of validate
.
Also, is additional RESOURCE_ID validation the main reason we are not relying on the GAPIC parse methods[1]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but now since you point out, I think it is not necessary to have the resource id validation inside the path parsing. Switching to GAPIC parse methods
6aece79
to
8437b58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a more informative PR title because it will be added to the release notes.
4b35fcd
to
f5bfc39
Compare
f5bfc39
to
7ef17e7
Compare
|
56285ae
to
f2bad65
Compare
Example: "projects/123/locations/us-central1/featurestores/my_featurestore_id/entityTypes/my_entity_type_id" | ||
or "my_entity_type_id" when project and location are initialized or passed, with featurestore_id passed. | ||
featurestore_id (str): | ||
Optional. Featurestore to retrieve entityType from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if it's None, does it mean the value from the entity_type_name name will be used, which implies entity_type_name must be the fully qualified path. And when it's None, then entity_type_name must be the short ID form. Probably adding some explanation to be clear
e394d74
to
293c8ec
Compare
add private _featurestores module add Featurestore/EntityType/Feature class add featurestore_utils.py add FeaturestoreClientWithOverride in utils __init__.py update compat services and types to include featurestore and featurestore online add get/update/delete/list/search methods add unit tests for above methods add skeleton for create/ingest/read/serve methods
…tity_type and feature into featurestore_utils add: tests for validate and get resource ids
…me is passed as resource ID
293c8ec
to
78c172d
Compare
…d forming methods; moved Featurestore.search_features() to Feature.search() and utilize existing class methods; update delete sync wait
78c172d
to
e50e763
Compare
featurestore_name=self.resource_name, filter=filter, order_by=order_by, | ||
) | ||
|
||
@base.optional_sync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that this is decorated. I think this is new usage pattern for the SDK and there are three options on usage here to consider. This is worth discussion because this pattern should be used throughout.
1: if this method is decorated then the usage is:
feature_store.delete_entity_types(..., sync=False)
feature_store.wait() # will throw if any of the deletions had issues
In this case there is no need to pass sync
to the underlying deletions and instead we can default to asynchronous to make deletions concurrent. Also note that the method needs to be changed to wait on all entity_type deletes not just the last one as it is currently handled.
entity_types = []
for entity_type_id in entity_type_ids:
entity_type = self.get_entity_type(entity_type_id=entity_type_id)
entity_type.delete(sync=False)
entity_types.append(entity_type)
for entity_type in entity_types:
entity_type.wait()
2: other option is to pass sync
directly to the underlying delete calls but not associate them with the FeatureStore:
feature_store.delete_entity_types(..., sync=False)
feature_store.wait() # will NOT throw if any of the deletions had issues
In this case this method should not be decorated with base.optional_sync
. There is no need to call wait here and the implementation would look like:
for entity_type_id in entity_type_ids:
entity_type = self.get_entity_type(entity_type_id=entity_type_id)
entity_type.delete(sync=sync)
3: is similar to the second option but instead we can return the list of deleted entity types so the user can wait on each one. So usage look like:
deleted_entity_types: List[EntityType] = feature_store.delete_entity_types(..., sync=False)
feature_store.wait() # will NOT throw if any of the deletions had issues
for entity_type in deleted_entity_types:
entity_type.wait()
The implementation would look similar to (2) but we compose the list and return it.
entity_types = []
for entity_type_id in entity_type_ids:
entity_type = self.get_entity_type(entity_type_id=entity_type_id)
entity_type.delete(sync=sync)
entity_types.append(entity_type)
return entity_types
Also to unblock this PR, feel free to remove the async support for now and follow up. Up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the options. I would vote for option 1 over 2 and 3. The main reason is when we delete child resources from the parent resource, I do want to provide sync option to prevent user to retrieve the delete pending child from the parent, i.e. Featurestore.get_entity_type() when Featurestore.delete_entity_types(). So add a Featurestore.wait() during deleting kind of make sense, and if any of the deletion had issue, it is captured in one place without extra user command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
8742825
to
8b75b2e
Compare
feat: add featurestore module including Featurestore, EntityType, and Feature classes; add get, update, delete, list methods in all featurestore classes; add search method in Feature class