Skip to content

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

License

Notifications You must be signed in to change notification settings

aws/aws-sdk-pandas

Repository files navigation

AWS Data Wrangler

Pandas on AWS

Release Python Version Documentation Status Coverage Average time to resolve an issue License

PyPI: PyPI Downloads

Conda: Conda Downloads

Resources

Use Cases

Pandas

FROM TO Features
Pandas DataFrame Amazon S3 Parquet, CSV, Partitions, Parallelism, Overwrite/Append/Partitions-Upsert modes,
KMS Encryption, Glue Metadata (Athena, Spectrum, Spark, Hive, Presto)
Amazon S3 Pandas DataFrame Parquet (Pushdown filters), CSV, Fixed-width formatted, Partitions, Parallelism,
KMS Encryption, Multiple files
Amazon Athena Pandas DataFrame Workgroups, S3 output path, Encryption, and two different engines:

- ctas_approach=False -> Batching and restrict memory environments
- ctas_approach=True -> Blazing fast, parallelism and enhanced data types
Pandas DataFrame Amazon Redshift Blazing fast using parallel parquet on S3 behind the scenes
Append/Overwrite/Upsert modes
Amazon Redshift Pandas DataFrame Blazing fast using parallel parquet on S3 behind the scenes
Pandas DataFrame Amazon Aurora Supported engines: MySQL, PostgreSQL
Blazing fast using parallel CSV on S3 behind the scenes
Append/Overwrite modes
Amazon Aurora Pandas DataFrame Supported engines: MySQL
Blazing fast using parallel CSV on S3 behind the scenes
CloudWatch Logs Insights Pandas DataFrame Query results
Glue Catalog Pandas DataFrame List and get Tables details. Good fit with Jupyter Notebooks.

General

Feature Details
List S3 objects e.g. wr.s3.list_objects("s3://...")
Delete S3 objects Parallel
Delete listed S3 objects Parallel
Delete NOT listed S3 objects Parallel
Copy listed S3 objects Parallel
Get the size of S3 objects Parallel
Get CloudWatch Logs Insights query results
Load partitions on Athena/Glue table Through "MSCK REPAIR TABLE"
Create EMR cluster "For humans"
Terminate EMR cluster "For humans"
Get EMR cluster state "For humans"
Submit EMR step(s) "For humans"
Get EMR step state "For humans"
Query Athena to receive python primitives Returns Iterable[Dict[str, Any]
Load and Unzip SageMaker jobs outputs
Dump Amazon Redshift as Parquet files on S3
Dump Amazon Aurora as CSV files on S3 Only for MySQL engine