Pandas on AWS
FROM | TO | Features |
---|---|---|
Pandas DataFrame | Amazon S3 | Parquet, CSV, Partitions, Parallelism, Overwrite/Append/Partitions-Upsert modes, KMS Encryption, Glue Metadata (Athena, Spectrum, Spark, Hive, Presto) |
Amazon S3 | Pandas DataFrame | Parquet (Pushdown filters), CSV, Fixed-width formatted, Partitions, Parallelism, KMS Encryption, Multiple files |
Amazon Athena | Pandas DataFrame | Workgroups, S3 output path, Encryption, and two different engines: - ctas_approach=False -> Batching and restrict memory environments - ctas_approach=True -> Blazing fast, parallelism and enhanced data types |
Pandas DataFrame | Amazon Redshift | Blazing fast using parallel parquet on S3 behind the scenes Append/Overwrite/Upsert modes |
Amazon Redshift | Pandas DataFrame | Blazing fast using parallel parquet on S3 behind the scenes |
Pandas DataFrame | Amazon Aurora | Supported engines: MySQL, PostgreSQL Blazing fast using parallel CSV on S3 behind the scenes Append/Overwrite modes |
Amazon Aurora | Pandas DataFrame | Supported engines: MySQL Blazing fast using parallel CSV on S3 behind the scenes |
CloudWatch Logs Insights | Pandas DataFrame | Query results |
Glue Catalog | Pandas DataFrame | List and get Tables details. Good fit with Jupyter Notebooks. |
Feature | Details |
---|---|
List S3 objects | e.g. wr.s3.list_objects("s3://...") |
Delete S3 objects | Parallel |
Delete listed S3 objects | Parallel |
Delete NOT listed S3 objects | Parallel |
Copy listed S3 objects | Parallel |
Get the size of S3 objects | Parallel |
Get CloudWatch Logs Insights query results | |
Load partitions on Athena/Glue table | Through "MSCK REPAIR TABLE" |
Create EMR cluster | "For humans" |
Terminate EMR cluster | "For humans" |
Get EMR cluster state | "For humans" |
Submit EMR step(s) | "For humans" |
Get EMR step state | "For humans" |
Query Athena to receive python primitives | Returns Iterable[Dict[str, Any] |
Load and Unzip SageMaker jobs outputs | |
Dump Amazon Redshift as Parquet files on S3 | |
Dump Amazon Aurora as CSV files on S3 | Only for MySQL engine |