π‘ Expertise:
In my journey as a Big Data Engineer, I have honed my skills in:
πΉ Big Data Technologies: I have a strong command over Hadoop, Spark and their ecosystems. I specialize in building scalable data pipelines, processing large datasets, and optimizing performance for efficient data processing.
πΉ Programming Languages: I am proficient in Python, SQL and Spark, using them to develop data-centric applications, perform data analysis, and build machine learning models.
πΉ Data Warehousing: I have hands-on experience with data warehousing principles, including data modeling, ETL (Extract, Transform, Load) processes, and dimensional modeling. I am well-versed in designing and implementing data warehouses for improved data accessibility and reporting.
πΉ Database Management: I have a strong grasp of SQL and have worked extensively with both relational databases (MySQL, PostgreSQL) and NoSQL databases (MongoDB, Cassandra). I excel at writing complex queries, optimizing database performance, and ensuring data integrity.
πΉ Cloud Platforms: I am adept at working with cloud-based environments, particularly on AWS and Azure.
πΉ Data Visualization: I possess a keen eye for visualizing data insights and effectively communicating complex findings to stakeholders. I am skilled in using tools like Tableau and Power BI to create intuitive dashboards and reports.
π§βπ» Programming Languages:
Python | SQL | Spark
βοΈ Distributed Framework:
Spark | Hadoop | Hive | Kafka | Sqoop
πΎ Databases:
MySQL | MongoDB | Cassandra | HBase
𧬠Version Control:
Git | DVC
β° Workflow Management:
Airflow | Mage
βοΈ AWS Services:
S3 | EC2 | EMR | RDS | Redshift | Glue | CloudWatch |
ECS
βοΈ Azure Services:
Data Factory | Databricks | Functions | Blob | Synapse
| Delta Lake
π MLOps:
Docker | Docker Compose | GitHub Actions | MLflow
πͺ ML Frameworks:
Pandas | Numpy | Sklearn | PySpark | Pytorch |
Matplotlib | Seaborn | TFX