Using MetPy to split up testing/training/validation xarray datasets for Machine Learning #3579
Open
Description
What should we add?
Creating testing/training/validation datasets is a key step in machine learning workflows. Usually for Climate/Weather ML analysis, we split these datasets on a time dimension.
Scikit-learn has a function that does this for 2D arrays / pandas dataframes here. This function can't split xarray datasets.
Improvements on the scikit-learn implementation:
- Built for xarray datasets
- Can create a validation dataset (a third dataset) instead of doing it in two lines
- Can split datasets up in a useful way for time series analysis (do not split up datasets randomly for time series analysis!)
Big questions:
- Where should this go?
- can we use Xr.dataset.parse_cf() in a smart way to pull the time dimension automagically? This might not be required anyways.
Reference
No response
Activity