Welcome to my GitHub profile! I am a passionate Data Scientist and Data Engineer with over 4 years of professional experience, working in industries like healthcare, manufacturing, and fintech. My focus is on building scalable data pipelines, applying machine learning algorithms, and solving complex problems using data-driven insights.
- Languages: Python, R, Java, C, C++, PySpark, Scala, Hive
- Data Engineering: Docker, Jenkins, Airflow, Pentaho-ETL, Kafka, Databricks, Informatica
- Machine Learning & AI: PyTorch, TensorFlow, CNN, RNN, LSTM, GNN, Random Forest, Decision Trees
- Cloud Services: AWS (S3, Glue, SageMaker), Azure Databricks, Kubernetes, IBM Watson, Redshift
- Databases: Snowflake, DynamoDB, Redshift, Hadoop, PostgreSQL, MySQL, BigQuery, HBase
- Other Tools: Docker, Jenkins, Git, Linux, Elastic MapReduce, Lake House Architecture
- Algorithms & Techniques: PCA, SVM, CNN, RNN, LSTM, DBN, NAS, Unsupervised NLP, DQN
Leveraged EfficientNet with ImageNet pre-training to predict the severity of pulmonary fibrosis using clinical and DICOM datasets.
- Tech Stack: Python, TensorFlow, ImageNet, DICOM
- Key Features: Ensemble learning, medical data prediction, early intervention strategies
- Outcome: Achieved 68% accuracy in severity prediction, facilitating timely medical decisions
Developed a CNN-based model to detect defects in semiconductor wafers, improving manufacturing efficiency.
- Tech Stack: Python, CNN, TensorFlow, AWS EC2
- Key Features: Defect detection, semiconductor manufacturing, pattern recognition
- Outcome: Achieved 94% accuracy in defect detection, reducing fabrication errors significantly
Created a synthetic data generation pipeline using Unity3D and GANs to enhance autonomous systems' training datasets.
- Tech Stack: Unity3D, Blender, GANs, AWS EC2
- Key Features: High-fidelity synthetic data, scaling data production for real-world scenarios
- Outcome: Improved model accuracy by addressing edge cases, enhancing autonomous system performance
- Enhanced fraud detection by 35% using AWS Rekognition and restructured mobile app architecture.
- Boosted analytical insights by 30% with a HIPAA-compliant Snowflake DB architecture.
- Led cross-functional teams, improving application security by 12%.
- Drove 37% YoY growth by reengineering ETL pipelines and implementing dynamic pricing models.
- Transitioned ETL processes to Informatica, boosting SQL efficiency by 95% and reducing B2B costs by 20%.
- Increased pharmaceutical sales by 35% through planogram optimization and predictive modeling.
- Improved real-time data retrieval by 30% for GCP Nearby-Search API, serving over 1,000 daily searches.
- Expanded application reach by 50% to 95 hospitals, improving healthcare data access and interaction.
- Exploring MLOps and improving proficiency in Kubernetes for large-scale machine learning pipelines.
- Working on an AI-powered fitness tracker with integrated community engagement and meal recognition.
- Enhancing skills in Generative AI for autonomous system performance and NLP tasks.
- Email: patel.shrey4@northeastern.edu
- LinkedIn: Shrey Patel
- GitHub: GitHub Profile
- Website: Portfolio
I love applying data science techniques to everyday problems like optimizing travel routes or analyzing personal fitness data. Also, I’m a huge fan of photography and often combine it with data visualization!