update kats 204

facebookresearch · iamxiaodong · May 6, 2021 · Apr 25, 2021 · Apr 29, 2021 · Apr 30, 2021
commit 48baf39cc3b8d781fbdd390b8496acf16599a7aa
diff --git a/tutorials/kats_204.ipynb b/tutorials/kats_204.ipynb
@@ -0,0 +1,134 @@
+{
+ "metadata": {
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.3"
+  },
+  "orig_nbformat": 2,
+  "kernelspec": {
+   "name": "python383jvsc74a57bd05b6e8fba36db23bc4d54e0302cd75fdd75c29d9edcbab68d6cfc74e7e4b30305",
+   "display_name": "Python 3.8.3 64-bit"
+  },
+  "metadata": {
+   "interpreter": {
+    "hash": "5b6e8fba36db23bc4d54e0302cd75fdd75c29d9edcbab68d6cfc74e7e4b30305"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+  {
+   "source": [
+    "<h1><center>Kats 204 Forecasting with Meta Learning</center></h1>"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "source": [
+    "We proposed a meta-learning framework for forecasting predictability, model selection and hyper-parameters tuning (HPT) through a supervised learning perspective, which provides accurate forecasts with lower computational time and resources. \n",
+    "\n",
+    "It first classifies whether a time series is predictable or not (i.e. whether we can get a good forecasting without much human-engineering efforts. Then it uses classification algorithm to predict the best-performing model from features extracted from given time series data sets, and uses a multi-tasks neural network model to predict the best parameters for a given model and time series data. \n",
+    "\n",
+    "The meta-learning framework contains four steps:\n",
+    "\n",
+    "1. Meta-data collection. Exhaustive tuning to obtain the best performed model for a given data, and hyper-parameters for each model and data combination. \n",
+    "2. Predict whether a time series is predicatable or not. This is a examination step, which aims to inform user whether the target time series can be easily fitted by a single model or more delicated human-engineering efforts are needed.\n",
+    "\n",
+    "3. Predict forecasting model for the target time series.\n",
+    "\n",
+    "4. Predict hyper-parameters for the target time series.\n",
+    "\n",
+    "Kats provides APIs for all these steps.\n",
+    "\n",
+    "\n"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "source": [
+    "## 1. **Meta-data collection**\n",
+    "\n",
+    "Extract the meta-data of a time series, which includes:\n",
+    "\n",
+    "1. the hyper-parameters and the corresponding error metrics of 6 best candidate models after hyper-parameter tuning; \n",
+    "\n",
+    "2. 40 time series features; \n",
+    "\n",
+    "3. search method for hyper-parameter tuning (e.g., random search, grid search or Bayesian Optimal Search);\n",
+    "\n",
+    "4. error metrics used in evaluating hyper-parameters (MAE is defualt and MAPE is recommended.).\n",
+    "\n",
+    "Paremeters for GetMetaData() class:\n",
+    "* **data**: TimeSeriesData\n",
+    "* **all_models**: Dict\\[str, m.Model], a dictionary that includes all candidate models.\n",
+    "* **all_params**: Dict\\[str, Params], a dictionary that includes all candidate hyper-params corresponding to all candidate models.\n",
+    "* **min_length**: int, minimal length of time series. Time series data which is too short will be excluded.\n",
+    "* **scale**: bool, It indicates whether to scale TS in order to get a more comparable feature vector. If it's true, each value of TS will be divided by the max value of TS.\n",
+    "* **method**: SearchMethodEnum, Search method for hyper-parameters tuning.\n",
+    "* **executor**: Any, A callable parallel executor. Tune individual model in candidate models parallel. The default executor is native implementation with Python's multiprocessing.\n",
+    "* **error_method**: str, Type of error metric. Only support mape, smape, mae, mase, mse, rmse.\n",
+    "* **num_trials**: optional, Number of trials in RandomSearch.\n",
+    "* **num_arm**: optional, Number of arms in RandomSearch."
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "source": [
+    "### 1.1 Collect meta-data from a time series"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "error",
+     "ename": "ModuleNotFoundError",
+     "evalue": "No module named 'fbprophet'",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mModuleNotFoundError\u001b[0m                       Traceback (most recent call last)",
+      "\u001b[0;32m<ipython-input-1-2f7063eddc3b>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mkats\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconsts\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mTimeSeriesData\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mkats\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmodels\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmetalearner\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_metadata\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mGetMetaData\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      7\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      8\u001b[0m \u001b[0;31m#load data and transform it into TimeSeriesData\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m~/fb_internal/Kats/kats/models/metalearner/get_metadata.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     24\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mpandas\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     25\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mkats\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconsts\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mParams\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mSearchMethodEnum\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mTimeSeriesData\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 26\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mkats\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmodels\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0marima\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mholtwinters\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mprophet\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msarima\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstlf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtheta\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     27\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mkats\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtsfeatures\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtsfeatures\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mTsFeatures\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     28\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m~/fb_internal/Kats/kats/models/prophet.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     10\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mkats\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmodels\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmodel\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mm\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     11\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mpandas\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 12\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mfbprophet\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mProphet\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     13\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mkats\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconsts\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mParams\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mTimeSeriesData\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     14\u001b[0m from kats.utils.parameter_tuning_utils import (\n",
+      "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'fbprophet'"
+     ]
+    }
+   ],
+   "source": [
+    "import pandas as pd\n",
+    "import sys\n",
+    "sys.path.append(\"../\")\n",
+    "\n",
+    "from kats.consts import TimeSeriesData\n",
+    "from kats.models.metalearner.get_metadata import GetMetaData\n",
+    "\n",
+    "#load data and transform it into TimeSeriesData\n",
+    "data = pd.read_csv(\"../data/air_passengers.csv\")\n",
+    "data.columns = [\"time\", \"y\"]\n",
+    "TSdata = TimeSeriesData(data)\n",
+    "\n",
+    "# create an object MD of class GetMetaData\n",
+    "MD = GetMetaData(data=TSdata, error_method='mape')\n",
+    "\n",
+    "# get meta data, as well as search method and type of error metric\n",
+    "my_meta_data = MD.get_meta_data()"
+   ]
+  }
+ ]
+}