Hello there,
I will assist you in building and optimizing your data transformation and ETL workflows using Apache Airflow. I will design ETL pipelines using Python and Pyspark to handle large-scale data and using Apache Airflow to automate the workflow processes, designing Directed Acyclic Graphs (DAGs) to ensure that data flows from source to destination, with proper scheduling and error handling. I will use AWS Glue for data cataloging, AWS Athena for querying large datasets, and S3 for data storage.
Similar Projects
Project - Deployed .Net application on AWS using CICD
Description
Designed and implemented a CI/CD pipeline using AWS services (CodeCommit, CodeBuild ,CodePipeline) to
automate the deployment of .NET application.
Managed CodeCommit, CodeBuild, and CodePipeline for source control and automation.
Orchestrated deployment to ECS using Fargate and ALB for scalability.
Configured IAM roles and S3 for artifacts storage and security.
Monitored application health with CloudWatch for optimization.
Collaborated with development teams for continuous improvement
Technical Skills
Python & Pyspark: Data Transformation, Scripting, ETL Processes
Apache Airflow: DAGs, Workflow Automation, Scheduling, Monitoring
AWS Services: AWS Glue, AWS Athena, S3, Data Storage, and Querying
ETL Best Practices: Data Cleaning, Aggregation, and Transformation
Let’s connect and discuss how we can ensure the smooth execution of your data engineering project!
Regards,
Navina