-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sk] Improved PostgreSQL Data Loader #651
Conversation
mage_ai/io/postgres.py
Outdated
df: DataFrame, | ||
table_name: str, | ||
index: bool = False, | ||
if_exists: str = 'replace', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's create constants/enum for the options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
mage_ai/io/postgres.py
Outdated
from mage_ai.io.export_utils import ( | ||
BadConversionError, | ||
clean_df_for_export, | ||
PandasTypes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
self, df: DataFrame, name: str, index: bool = False, if_exists: str = 'replace', **kwargs | ||
self, | ||
df: DataFrame, | ||
table_name: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you also need to specify schema_name to identify a table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot about schemas, added schema_name
parameter + auto creation of schema if the given schema doesn't exist.
mage_ai/io/postgres.py
Outdated
) | ||
self.conn.commit() | ||
|
||
def __table_exists(self, table_name: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. you need both schema_name and table_name to identify a table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added schema name to table existence check.
Summary
This PR refactors the PostgreSQL IO handler to avoid using SQLAlchemy. There are a few benefits to this change:
The change made was to use PostgreSQLs
COPY TO
query to improve export times in place ofpandas.DataFrame.to_sql()
. To do this, a new framework was added to manage type conversion (mapping Pandas datatypes to PostgreSQL data types). This framework involves:np.timedelta64
has no appropriate type in PostgreSQL, so it is mapped to an integer before exporting)Tests
Changes tested locally with PostgreSQL database alongside unit tests.
cc
@wangxiaoyou1993 @dy46