Here's an updated README.md
for your project:
This project classifies forest cover types using cartographic variables. We employ machine learning models such as SVM and XGBoost, including hyperparameter tuning, to predict cover types in the Roosevelt National Forest, Colorado.
- Introduction
- Life cycle
- Problem Statement
- Dataset Information
- Project Structure
- Setup
- Usage
- Technologies Used
- Contributing
- License
The Forest Cover Classification project uses machine learning to predict forest cover types based on cartographic features, enhancing our understanding of ecological processes in minimally disturbed wilderness areas.
Forest Cover Type Prediciton
- Understanding the Problem Statement
- Data Collection
- Data Cleaning
- Exploratory data analysis
- Data Pre-Processing
- Model Training and hyper parameter tuning
- Model selection
- The data is from 4 wilderness areas located in the Roosevelt National Forest of northern Colorado. The wilderness areas are:
- Rawah Wilderness Area
- Neota Wilderness Area
- Comanche Peak Wilderness Area
- Cache la Poudre Wilderness Area The observation are taken from 30m x 30m patches of forest that are classified as one of seven cover types
- Spruce/Fir
- Lodgepole Pine
- Ponderosa Pine
- Cottonwood/Willow
- Aspen
- Douglas-fir
- Krummholz
-
Source: UCI Machine Learning Repository - Forest Cover Type Dataset.
Blackard,Jock. (1998). Covertype. UCI Machine Learning Repository. https://doi.org/10.24432/C50K5N.
-
Region: Roosevelt National Forest, Colorado
-
Variables: Cartographic variables (no remotely sensed data)
-
Independent Variables: Derived from USGS and USFS, including binary columns for qualitative data (wilderness areas and soil types)
forest-cover-classification/
│
├── data/
│ ├── raw/ # Raw data files
│ ├── processed/ # Processed data files
│
├── notebooks/ # Jupyter notebooks for exploration and prototyping
│
├── models/ # Trained models
│
├── src/ # Source code for model training and evaluation
│ ├── data_processing.py # Data preprocessing scripts
│ ├── model.py # Model architecture and training
│ ├── evaluation.py # Evaluation metrics and visualization
│
├── app/ # Flask application for deployment
│ ├── app.py # Flask API
│
├── .github/workflows/ # GitHub Actions configuration
│ ├── ci.yml # Continuous integration pipeline
│
└── README.md # Project documentation
-
Clone the repository:
git clone https://github.com/yourusername/forest-cover-classification.git
-
Install dependencies:
pip install -r requirements.txt
-
Train the model:
python src/model.py
-
Evaluate the model:
python src/evaluation.py
-
Run the Flask application:
python app/app.py
-
Access the app at:
http://localhost:8080
- SVM and XGBoost: For model development and training.
- MLflow: To track and manage machine learning experiments.
- GitHub Actions: For continuous integration and deployment.
- Flask: For building the web application.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License.