-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds SERDE for persisters #485
Conversation
The persisters can now be pickled and unpickled. This means that they can be properly serialized across process boundaries.
A preview of c2cc4bb is uploaded and can be seen here: ✨ https://burr.dagworks.io/pull/485 ✨ Changes may take a few minutes to propagate. Since this is a preview of production, content with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to c2cc4bb in 1 minute and 37 seconds
More details
- Looked at
198
lines of code in6
files - Skipped
0
files when reviewing. - Skipped posting
3
drafted comments based on config settings.
1. burr/integrations/persisters/b_mongodb.py:136
- Draft comment:
self.client.address
is not a valid attribute forMongoClient
. Useself.client.HOST
andself.client.PORT
instead to get the connection details. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
- The comment is about the new getstate implementation. 2. Looking at pymongo docs, MongoClient.address is actually a valid attribute that returns a (host, port) tuple. 3. HOST and PORT are not standard attributes of MongoClient. 4. The code correctly uses address[0] and address[1] to get host and port from the tuple.
Could there be a version of pymongo where address isn't available? Could HOST/PORT be better practice that I'm unaware of?
The address attribute is well-documented in pymongo and returns exactly what we need. The comment appears to be incorrect about HOST/PORT.
The comment should be deleted as it is incorrect - self.client.address is a valid and appropriate way to get the connection details.
2. burr/integrations/persisters/b_redis.py:165
- Draft comment:
Consider handling the case whereconnection_pool
is not present to ensure complete state serialization. - Reason this comment was not posted:
Comment looked like it was already resolved.
3. burr/integrations/persisters/postgresql.py:251
- Draft comment:
Consider handling the case whereinfo
is not present to ensure complete state serialization. - Reason this comment was not posted:
Comment looked like it was already resolved.
Workflow ID: wflow_JiWlZwz3bmJEL2fu
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, think we can make it easier by abstracting a bit but I think it's fine for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Incremental review on c5deb2d in 1 minute and 12 seconds
More details
- Looked at
224
lines of code in3
files - Skipped
0
files when reviewing. - Skipped posting
5
drafted comments based on config settings.
1. burr/integrations/persisters/b_mongodb.py:141
- Draft comment:
Accessingself.client.address[0]
andself.client.address[1]
assumes theaddress
attribute is a tuple with at least two elements. Consider adding checks to ensure these attributes exist to avoid potential errors. - Reason this comment was not posted:
Comment did not seem useful.
2. burr/integrations/persisters/b_mongodb.py:158
- Draft comment:
The__setstate__
method assumesconnection_params
always containsuri
andport
. Consider adding checks to handle missing keys to prevent potentialKeyError
. - Reason this comment was not posted:
Comment was on unchanged code.
3. burr/integrations/persisters/b_redis.py:171
- Draft comment:
Theget_connection_params
method assumesconnection_pool.connection_kwargs
containshost
,port
,db
, andpassword
. Consider adding checks to handle missing keys to prevent potentialKeyError
. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The Redis connection is created with explicit parameters in from_values() and init(). get_connection_params() is only called on existing connections. The Redis client would fail earlier if these parameters were missing. The connection_pool.connection_kwargs should always have these values since they were provided during connection creation.
I could be wrong about the Redis client implementation - maybe it allows connections without some of these parameters? Also, what if someone subclasses and overrides default_client() with a different client?
Even if some parameters are optional in Redis, the code explicitly provides all these values during connection creation. If someone overrides default_client(), they should also override get_connection_params() as indicated by the comment on line 179.
The comment suggests defensive programming that isn't necessary given the class design and usage patterns. The code is correct as is.
4. burr/integrations/persisters/b_redis.py:187
- Draft comment:
The__setstate__
method assumesconnection_params
always contains necessary keys for Redis connection. Consider adding checks to handle missing keys to prevent potentialKeyError
. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The connection_params dict is created by get_connection_params() which explicitly returns a dict with all required keys. These params come from an existing Redis connection, so they must have been valid to begin with. The suggestion to add checks seems unnecessary since the data structure is controlled internally.
I could be wrong if there are subclasses that override get_connection_params() with different keys. However, the default_client() method suggests Redis is expected.
Even with subclasses, they would need to provide compatible connection params since they're using default_client() which expects Redis connection parameters. The architecture enforces this contract.
The comment should be deleted because the connection parameters are internally controlled and guaranteed to have the required keys through get_connection_params().
5. burr/integrations/persisters/postgresql.py:272
- Draft comment:
The__setstate__
method assumesconnection_params
always contains necessary keys for PostgreSQL connection. Consider adding checks to handle missing keys to prevent potentialKeyError
. - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_znu5DZSQdhleNbf9
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
c5deb2d
to
c2cc4bb
Compare
The persisters can now be pickled and unpickled.
This means that they can be properly serialized across process boundaries.
Changes
How I tested this
Notes
Checklist
Important
Add serialization and deserialization support to MongoDB, Redis, and PostgreSQL persisters using
__getstate__
and__setstate__
.__getstate__
and__setstate__
methods toMongoDBBasePersister
,RedisBasePersister
, andPostgreSQLPersister
for serialization.pickle
intest_b_mongodb.py
,test_b_redis.py
, andtest_postgresql.py
.This description was created by for c5deb2d. It will automatically update as commits are pushed.