Preparation of training and test samples, cross-validation and selection of hyperparameters on the example of the nearest neighbor method.
As a data set, we will use a data set for predicting diabetes in women - https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database
- Using the train_test_split method to divide the sample into training and test.
- Training the nearest neighbor model for an arbitrarily given hyperparameter K. Evaluation of the quality of the model.
- Selection of hyperparameter K using Grid Search CV, RandomizedSearchCV and cross-validation
- Evaluation of the quality of the optimal model.
- Comparison of the quality metrics of the initial and optimal models.