Using MetPy to split up testing/training/validation xarray datasets for Machine Learning

### What should we add?

Creating testing/training/validation datasets is a key step in machine learning [workflows](https://machinelearningmastery.com/difference-test-validation-datasets/). Usually for Climate/Weather ML analysis, we split these datasets on a time dimension. 

Scikit-learn has a function that does this for 2D arrays / pandas dataframes [here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html). This function can't split xarray datasets. 

Improvements on the scikit-learn implementation:
1. Built for xarray datasets
2. Can create a validation dataset (a third dataset) instead of doing it in two lines
3. Can split datasets up in a useful way for time series analysis (do not split up datasets randomly for time series analysis!)

Big questions:
1. Where should this go? 
2. can we use Xr.dataset.parse_cf() in a smart way to pull the time dimension automagically? This might not be required anyways.

### Reference

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using MetPy to split up testing/training/validation xarray datasets for Machine Learning #3579

What should we add?

Reference

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development