Absenteeism is a major problem when dealing with employees especially with respect to assigning tasks, timeliness and achieving team goals at large. If a worker is frequently absent, the work flow sooner or later be negatively affected, unless the worker is replaced. This ordeal may be prevented in the recruitment process by developing an effective machine learning model to predict whether an employee will be given to absenteeism. The absenteeism model is a classification model aimed at predicting whether a potential employee will be given to absenteeism or not.
Data Source: The data was obtained from Machine Learning Repository
Classification Technique used:
- Logistic Regression (LOR)
- Support Vector Machine (SVM)
- K-Nearest Neighbor (KNN)
- Decision Tree (DT)
- Multi-Layer Perceptron Neural Network
Packages Used: The Scikit-learn (sklearn) package was used to build these models. The GridSearchCV of the sklearn was used to tune the parameters of these models. For each of the models, an average validation score and a test score were obtained after the models were trained. The best model was the one with the highest validation score. The score is the accuracy of the model. Other tools used are Pandas and Numpy
Evaluation Metric: The accuracy score was used to evaluate the models.
The best Model:The higher the accuracy score, the better the model. The Decision Tree model (DecisionTreeClassifier(max_depth=7, max_features=3, max_leaf_nodes=14,random_state=123) was found to be the best model, since it had the best accuracy score.