AWS Data Wrangler

The missing link between AWS services and the most popular Python data libraries

AWS Data Wrangler aims to fill a gap between AWS Big Data Services (Glue, Athena, EMR, Redshift) and the most popular Python libraries for lightweight workloads.

Installation

pip install awswrangler

AWS Data Wrangler runs on Python 2 and 3.

Usage

Writing Pandas Dataframe to Data Lake:

awswrangler.s3.write(
        df=df,
        database="database",
        path="s3://...",
        file_format="parquet",
        preserve_index=True,
        mode="overwrite",
        partition_cols=["col"],
    )

If a Glue Database name is passed, all the metadata will be created in the Glue Catalog. If not, only the s3 data write will be done.

Reading from Data Lake to Pandas Dataframe:

df = awswrangler.athena.read("database", "select * from table")

Dependencies

AWS Data Wrangler project relies on others great initiatives:

Boto3
Pandas
Apache Arrow
Dask s3fs

Known Limitations

By now only writes in Parquet and CSV file formats
By now only reads through AWS Athena
By now there are not compression support
By now there are not nested type support

Contributing

For almost all features we need rely on AWS Services that didn't have mock tools in the community yet (AWS Glue, AWS Athena). So we are focusing on integration tests instead unit tests.

So, you will need provide a S3 bucket and a Glue/Athena database through environment variables.

export AWSWRANGLER_TEST_BUCKET=...

export AWSWRANGLER_TEST_DATABASE=...

CAUTION: This may this may incur costs in your AWS Account

make init

Make your changes...

make format

make lint

make test

License

This library is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
awswrangler		awswrangler
data_samples		data_samples
tests		tests
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
NOTICE		NOTICE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
THIRD_PARTY		THIRD_PARTY
pytest.ini		pytest.ini
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AWS Data Wrangler

Installation

Usage

Writing Pandas Dataframe to Data Lake:

Reading from Data Lake to Pandas Dataframe:

Dependencies

Known Limitations

Contributing

License

About

Releases 126

Used by 1.9k

Contributors 156

Languages

License

aws/aws-sdk-pandas

Folders and files

Latest commit

History

Repository files navigation

AWS Data Wrangler

Installation

Usage

Writing Pandas Dataframe to Data Lake:

Reading from Data Lake to Pandas Dataframe:

Dependencies

Known Limitations

Contributing

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 126

Used by 1.9k

Contributors 156

Languages