Welcome to my portfolio! This repository showcases projects demonstrating my skills and experience in data science and various machine-learning techniques and applications.
This project demonstrates the process of predicting customer churn using machine learning techniques. Customer churn is when customers stop doing business with a company.
Customer churn, or customer attrition, is critical for many businesses. Predicting churn can help businesses take proactive steps to retain customers. I'll start with Logistic Regression as our baseline model. It's a great starting point as it's easy to implement and interpret. Using Logistic Regression, I've achieved an accuracy of 79.29% in predicting customer churn. This serves as baseline performance. Then, using Feature Engineering and Hyperparameter Tuning, I improved this model. Then I experimented with more advanced algorithms with hyperparameter tuning to improve this baseline. Then I compared all model metrics and chose the best one. By comparing all models to our Logistic Regression baseline, I can assess the better performance. The best model is XGBClassifier with an accuracy of 95.737% and a precision for a True value of 0.91 on the independent test set.
- Data preprocessing
- Feature engineering
- Implementation of logistic regression using scikit-learn
- Model evaluation
- Hyperparameter Tuning
- Decision Tree
- XGBoost
- Support Vector Machines
The best model is XGBClassifier with an accuracy of 95.737% and a precision for a True value of 0.91 on the independent test set.
- Python
- Pandas and NumPy for data manipulation
- Scikit-learn for model building and evaluation
- Matplotlib and Seaborn for data visualization
- Imblearn
- XGBoost
Project 2: Dashboard for Sentimental analysis of Tweets about major NASDAQ-listed companies in 2019 year
This project uses a dataset Tweets about the Top Companies from 2015 to 2020, featuring tweets related to major NASDAQ-listed companies, that was posted between 01-01-2019 and 31-12-2019. Sentiment analysis was performed using the NLTK library, along with a financial dictionary to better capture the nuances of financial terms. Engagement metrics were computed based on the number of likes, retweets, and comments each tweet received. You can find the preprocessing steps and code on the project’s GitHub repository.
- Number of tweets by company plot (Histogram, pie chart, timeline)
- Engagement plot (Histogram, pie chart, timeline)
- Detailed analysis for a selected company:
- Number of Tweets by Sentiment for selected company (Histogram, pie chart, timeline)
- Number of tweets about Apple by sentiment vs stock price for selected company
- Word cloud by sentiment for selected company
- Random tweet about a selected company by sentiment
- Top 5 Most Engaging Tweets About selected company by sentiment
- Python
- Pandas for data manipulation
- Streamlit
- NLTK library
- Recommender System for online store using deep learning and content-based filtering
- Product classification by Image with Convolutional Neural Networks
- ...
I am a Backend Developer with 6 years of expertise in web development and e-commerce with Magento 2, using PHP, SQL, and JS. I participated in creating and maintaining some of the most popular Magento extensions used in more than 40,000 stores worldwide.
My Computer Science degree from Belarusian State University gave me a solid foundation in programming, algorithms, and software engineering principles. In addition, right now I am expanding my skill set into the field of Data Science. I am currently studying Machine Learning, with a focus on Deep Learning, to use it in my work. This knowledge will allow me to integrate data science and machine learning with backend development to create smarter, more efficient applications.
- LinkedIn: Maryia Snarava
- Email:snaravam@gmail.com
I am looking for new challenges and collaboration. If you're seeking a backend developer who combines technical expertise with a passion for innovation, feel free to contact me via direct messages or by e-mail.