Project: Big Data (Predictive Analysis)

This is a university project for Advanced Databases. We use PySpark (RDD structure) to create different pipelines that read from PostgreSQL DB and CSV files to create a Decision Tree Classifier.

Main library

pyspark
- pyspark.mllib Machine Learning Library (MLlib) For machine learning models

Files

main.py: Main file from which you can acces to the pipelines. No parameter is needed to execute it.
utils.py: Imported in every other file. It contains the auxiliary functions we have created and all 'import's necessary.
mangement.py: Management pipeline. Execute main.py and later select option 'management'.
analysis.py: Analysis pipeline. Execute main.py and later select option 'management'.
runtime.py: Runtime pipeline. Execute main.py and later select option 'management'.

Sketches

See the sketches in the Assumptions.pdf file

Assumptions

General Pipeline Assumptions:
- User is connected (or knows how) to the FIB PostgreSQL.
- Sensor data is in csv file with name in format date-airport-airport-4digits-aircraft.csv
  example: 010615-FUE-TXL-3573-XY-YCV
Management Pipeline Assumptions:
- All sensor data is located in the './resources/trainingData/' path.
Analysis Pipeline Assumptions:
- You have succesfully executed Management Pipeline.
- There is one and only one csv file for each aircraft-date pair.
- (impurity='gini', maxDepth=5, maxBins=32) are good hyperparameters for the Decision Tree.
Runtime Pipeline Assumptions:
- It is assumed that you have succesfully executed Analysis Pipeline.

Authors

Miquel Palet López
Gonzalo Córdova Pou

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
LibSVM-files		LibSVM-files
resources		resources
.gitignore		.gitignore
Assumptions.pdf		Assumptions.pdf
README.md		README.md
analysis.py		analysis.py
main.py		main.py
management.py		management.py
runtime.py		runtime.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project: Big Data (Predictive Analysis)

Main library

Files

Sketches

Assumptions

Authors

About

Releases

Packages

Languages

gonzalo-cordova-pou/BDA_bigdata_project

Folders and files

Latest commit

History

Repository files navigation

Project: Big Data (Predictive Analysis)

Main library

Files

Sketches

Assumptions

Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages