new notebooks for alphalens tut

parrondo · Oct 25, 2018 · 0900f6f · 0900f6f
1 parent ec7bc78
commit 0900f6f
Show file tree

Hide file tree

Showing 4 changed files with 813 additions and 0 deletions.
diff --git a/notebooks/tutorials/3_alphalens_lesson_2/notebook.ipynb b/notebooks/tutorials/3_alphalens_lesson_2/notebook.ipynb
@@ -0,0 +1,179 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Companion notebook for Alphalens tutorial lesson 2\n",
+    "\n",
+    "# Creating tear sheets with Alphalens\n",
+    "\n",
+    "In the previous lesson, you learned what Alphalens is. In this lesson, you will learn a four step process for how to use it:\n",
+    "\n",
+    "1. Express an alpha factor and define a trading universe by creating and running a Pipeline over a certain time period.\n",
+    "2. Query pricing data for the assets in our universe during that same time period with `get_pricing()`.\n",
+    "3. Align the alpha factor data with the pricing data with `get_clean_factor_and_forward_returns()`.\n",
+    "4. Visualize how well our alpha factor predicts future price movements with `create_full_tear_sheet()`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Build And Run A Pipeline\n",
+    "Execute the following code to express an alpha factor based on asset growth, then run it with `run_pipeline()`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from quantopian.pipeline.data import factset \n",
+    "\n",
+    "from quantopian.pipeline import Pipeline\n",
+    "from quantopian.research import run_pipeline\n",
+    "from quantopian.pipeline.filters import QTradableStocksUS\n",
+    "\n",
+    "def make_pipeline():\n",
+    "    \n",
+    "    # Measures a company's asset growth rate.\n",
+    "    asset_growth = factset.Fundamentals.assets_gr_qf.latest \n",
+    "    \n",
+    "    return Pipeline(\n",
+    "        columns = {'Asset Growth': asset_growth},\n",
+    "        screen = QTradableStocksUS() & asset_growth.notnull()\n",
+    "    )\n",
+    "\n",
+    "factor_data = run_pipeline(pipeline=make_pipeline(), start_date='2014-1-1', end_date='2016-1-1')\n",
+    "\n",
+    "# Show the first 5 rows of factor_data\n",
+    "factor_data.head(5) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Query Pricing Data\n",
+    "\n",
+    "Now that we have factor data, let's get pricing data for the same time period. `get_pricing()` returns pricing data for a list of assets over a specified time period. It requires four arguments:\n",
+    "- A list of assets for which we want pricing.\n",
+    "- A start date\n",
+    "- An end date\n",
+    "- Whether to use open, high, low or close pricing.\n",
+    "\n",
+    "Execute the following cell to get pricing data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pricing_data = get_pricing(\n",
+    "    symbols=factor_data.index.levels[1], # Finds all assets that appear at least once in \"factor_data\"  \n",
+    "    start_date='2014-1-1',\n",
+    "    end_date='2016-2-1', # must be after run_pipeline()'s end date. Explained more in lesson 4\n",
+    "    fields='open_price' # Generally, you should use open pricing. Explained more in lesson 4\n",
+    ")\n",
+    "\n",
+    "# Show the first 5 rows of pricing_data\n",
+    "pricing_data.head(5)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Align Data\n",
+    "\n",
+    "`get_clean_factor_and_forward_returns()` aligns factor data from a Pipeline with pricing data from `get_pricing()`, and returns an object suitable for analysis with Alphalens' charting functions. It requires two arguments:\n",
+    "- The factor data we created with `run_pipeline()`.\n",
+    "- The pricing data we created with `get_pricing()`.\n",
+    "\n",
+    "Execute the following cell to align the factor data with the pricing data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "from alphalens.utils import get_clean_factor_and_forward_returns\n",
+    "\n",
+    "merged_data = get_clean_factor_and_forward_returns(\n",
+    "    factor=factor_data, \n",
+    "    prices=pricing_data\n",
+    ")\n",
+    "\n",
+    "# Show the first 5 rows of merged_data\n",
+    "merged_data.head(5) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Visualize Results\n",
+    "\n",
+    "Finally, execute the following cell to pass the output of `get_clean_factor_and_forward_returns()` to a function called `create_full_tear_sheet()`. This will create whats known as a tear sheet."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "from alphalens.tears import create_full_tear_sheet\n",
+    "\n",
+    "create_full_tear_sheet(merged_data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## That's It!\n",
+    "\n",
+    "In the next lesson, we will show you how to interpret the charts produced by `create_full_tear_sheet()`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/notebooks/tutorials/3_alphalens_lesson_3/notebook.ipynb b/notebooks/tutorials/3_alphalens_lesson_3/notebook.ipynb
@@ -0,0 +1,210 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Companion notebook for Alphalens tutorial lesson 3\n",
+    "\n",
+    "# Interpreting Alphalens Tear Sheets\n",
+    "\n",
+    "In the previous lesson, you learned how to query and process data so that we can analyze it with Alphalens tear sheets. In this lesson, you will experience a few iterations of the alpha discovery phase of the [quant workflow](https://blog.quantopian.com/a-professional-quant-equity-workflow/) by analyzing those tear sheets.\n",
+    "\n",
+    "In this lesson, we will:\n",
+    "\n",
+    "1. Analyze how well an alpha factor predicts future price movements with `create_information_tear_sheet()`.\n",
+    "2. Try to improve our original alpha factor by combining it with another alpha factor.\n",
+    "3. Preview how profitable our alpha factor might be with `create_returns_tear_sheet()`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Our Starting Alpha Factor\n",
+    "\n",
+    "The following code expresses an alpha factor based on a company's net income and market cap, and then creates an information tear sheet for that alpha factor. We will start analyzing the alpha factor by looking at it's information coefficient (IC). The IC is a number ranging from -1, to 1, which quantifies the predictiveness of an alpha factor. Any number above 0 is considered somewhat predictive.\n",
+    "\n",
+    "The first number you should look at is the IC mean, which is an alpha factor's average IC over a given time period. You want your factor's IC Mean to be as high as possible. Generally speaking, a factor is worth investigating if it has an IC mean over 0. If it has an IC mean close to .1 (or higher) over a large trading universe, that factor is probably really good.\n",
+    "\n",
+    "**Run the cell below to create an information tear sheet for our alpha factor. Notice how the IC Mean figures (the first numbers on the first chart) are all positive. That is a good sign!**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "from quantopian.pipeline.data import factset\n",
+    "from quantopian.pipeline import Pipeline\n",
+    "from quantopian.research import run_pipeline\n",
+    "from quantopian.pipeline.factors import CustomFactor, SimpleMovingAverage\n",
+    "from quantopian.pipeline.filters import QTradableStocksUS\n",
+    "from alphalens.tears import create_information_tear_sheet\n",
+    "from alphalens.utils import get_clean_factor_and_forward_returns\n",
+    "\n",
+    "\n",
+    "def make_pipeline():\n",
+    "    \n",
+    "    # 1 year moving average of year over year net income\n",
+    "    net_income_moving_average = SimpleMovingAverage( \n",
+    "        inputs=[factset.Fundamentals.net_inc_af], \n",
+    "        window_length=252\n",
+    "    )\n",
+    "    \n",
+    "    # 1 year moving average of market cap\n",
+    "    market_cap_moving_average = SimpleMovingAverage( \n",
+    "        inputs=[factset.Fundamentals.mkt_val], \n",
+    "        window_length=252\n",
+    "    )\n",
+    "    \n",
+    "    average_market_cap_per_net_income = (market_cap_moving_average / net_income_moving_average)\n",
+    "    \n",
+    "    # the last quarter's net income\n",
+    "    net_income = factset.Fundamentals.net_inc_qf.latest \n",
+    "    \n",
+    "    projected_market_cap = average_market_cap_per_net_income * net_income\n",
+    "    \n",
+    "    return Pipeline(\n",
+    "        columns = {'projected_market_cap': projected_market_cap},\n",
+    "        screen = QTradableStocksUS() & projected_market_cap.notnull()\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "factor_data = run_pipeline(make_pipeline(), '2010-1-1', '2012-1-1')\n",
+    "pricing_data = get_pricing(factor_data.index.levels[1], '2010-1-1', '2012-2-1', fields='open_price')\n",
+    "merged_data = get_clean_factor_and_forward_returns(factor_data, pricing_data)\n",
+    "\n",
+    "create_information_tear_sheet(merged_data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Add Another Alpha Factor\n",
+    "\n",
+    "**Alphalens is useful for identifying alpha factors that aren't predictive early in the quant workflow. This allows you to avoid wasting time running a full backtest on a factor that could have been discarded earlier in the process.**\n",
+    "\n",
+    "Run the following cell to express another alpha factor called `price_to_book`, combine it with `projected_market_cap` using zscores and winsorizing, then creates another information tearsheet based on our new (and hopefully improved) alpha factor. \n",
+    "\n",
+    "Notice how the IC figures are lower than they were in the first chart. That means the factor we added is making our predictions worse!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "def make_pipeline():\n",
+    "\n",
+    "    net_income_moving_average = SimpleMovingAverage( # 1 year moving average of year over year net income\n",
+    "        inputs=[factset.Fundamentals.net_inc_af], \n",
+    "        window_length=252\n",
+    "    )\n",
+    "    \n",
+    "    market_cap_moving_average = SimpleMovingAverage( # 1 year moving average of market cap\n",
+    "        inputs=[factset.Fundamentals.mkt_val], \n",
+    "        window_length=252\n",
+    "    )\n",
+    "    \n",
+    "    average_market_cap_per_net_income = (market_cap_moving_average / net_income_moving_average)\n",
+    "    \n",
+    "    net_income = factset.Fundamentals.net_inc_qf.latest # the last quarter's net income\n",
+    "    \n",
+    "    projected_market_cap = average_market_cap_per_net_income * net_income\n",
+    "    \n",
+    "    price_to_book = factset.Fundamentals.pbk_qf.latest\n",
+    "    \n",
+    "    factor_to_analyze = projected_market_cap.zscore() + price_to_book.zscore()\n",
+    "    \n",
+    "    return Pipeline(\n",
+    "        columns = {'factor_to_analyze': factor_to_analyze},\n",
+    "        screen = QTradableStocksUS() & factor_to_analyze.notnull()\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "\n",
+    "factor_data = run_pipeline(make_pipeline(), '2010-1-1', '2012-1-1')\n",
+    "pricing_data = get_pricing(factor_data.index.levels[1], '2010-1-1', '2012-2-1', fields='open_price')\n",
+    "new_merged_data = get_clean_factor_and_forward_returns(factor_data, pricing_data)\n",
+    "\n",
+    "create_information_tear_sheet(new_merged_data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### See If Our Alpha Factor Might Be Profitable\n",
+    "\n",
+    "We found that the first iteration of our alpha factor had more predictive value than the second one. Let's see if the original alpha factor might make any money.\n",
+    "\n",
+    "`create_returns_tear_sheet()` splits your universe into quantiles, then shows the returns generated by each quantile over different time periods. Quantile 1 is the 20% of assets with the lowest alpha factor values, and quantile 5 is the highest 20%.\n",
+    "\n",
+    "This function creates six types of charts, but the two most important ones are:\n",
+    "\n",
+    "- **Mean period wise returns by factor quantile:** This chart shows the average return for each quantile in your universe, per time period. You want the quantiles on the right to have higher average returns than the quantiles on the left.\n",
+    "- **Cumulative return by quantile:** This chart shows you how each quantile performed over time. You want to see quantile 1 consistently performing the worst, quantile 5 consistently performing the best, and the other quantiles in the middle.\n",
+    "\n",
+    "**Run the following cell, and notice how quantile 5 doesn't have the highest returns. Ideally, you want quantile 1 to have the lowest returns, and quantile 5 to have the highest returns. This tear sheet is telling us we still have work to do!**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "from alphalens.tears import create_returns_tear_sheet\n",
+    "\n",
+    "create_returns_tear_sheet(merged_data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this lesson, you experienced a few cycles of the alpha discovery stage of the quant worfklow. Making good alpha factors isn't easy, but Alphalens allows you to iterate through them quickly to find out if you're on the right track! You can usually improve existing alpha factors in some way by getting creative with moving averages, looking for trend reversals, or any number of other stratgies.\n",
+    "\n",
+    "Try looking around [Quantopian's forums](https://www.quantopian.com/posts), or reading academic papers for inspiration. **This is where you get to be creative!** In the next lesson, we'll discuss advanced Alphalens concepts."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}