Welcome to the MAANG-Stock-Prediction-MLOps repository. This project aims to provide an automated pipeline for forecasting stock prices of MAANG
using machine learning and MLOps practices powered by ZenML.
The world of stocks is ever-evolving, and forecasting stock prices can provide stakeholders with invaluable insights. This repository uses data from Kaggle and combines it with the power of machine learning and the robustness of ZenML pipelines and frontend inference using streamlit to predict MAANG
stock prices.
- Three main artifacts in ML-based software: Data, Model, and Code.
- Three main phases: Data Engineering, ML Model Engineering, and Code Engineering.
- Data Engineering is the process of collecting, cleaning, and transforming data into a usable format for building ML models.
- Data Engineering is the most time-consuming part of the ML workflow.
- Data Ingestion: Collecting data from various sources.
- Data Validation: Validating the data to ensure that it is in the correct format.
- Data Transformation: Transforming the data into a usable format for ML models.
- Data Splitting: Splitting the data into training, validation, and testing sets.
- Data Versioning: Versioning the data to keep track of changes.
- Data Analysis: Analyzing the data to gain insights.
- ML Model Engineering is the process of building ML models.
- Objective of ML Model Engineering is to build a model that can make accurate predictions on unseen data.
- Feature Engineering: Extracting features from the data.
- Model Training: Training the model on the training data.
- Model Evaluation: Evaluating the model on the validation data.
- Model Selection: Selecting the best model based on the evaluation metrics.
- Model Versioning: Versioning the model to keep track of changes.
- Model Packaging: Packaging the model to make it usable in production.
- Code Engineering is the process of building a production-ready codebase for the ML model.
- Code Engineering is the most overlooked part of the ML workflow.
- Code Packaging: Packaging the code to make it usable in production.
- Code Testing: Testing the code to ensure that it works as expected.
- Code Deployment: Deploying the code to a production environment.
- Code Monitoring: Monitoring the code to ensure that it works as expected.
A machine learning pipeline is simply an extension, including other steps you would do before or after building a model, like data aquistion, preprocessing, model deployment or monitoring. The ML pipeline essentially defines a step-by-step procedureof your work as an ML practitioner.
Defining ML pipeline explicitly in code is great because:
- We can easily rerun all of our steps in the pipeline, eliminating bugs and making our models reproducible.
- Data and models can be versioned, making it easy to track changes and revert to previous versions.
- We can automate many operational tasks, such as retraining models and redeploying or rolling out new and improved models with CI/CD workflows.
- We can easily share our work with others, making it easy to collaborate and get feedback.