From 661930598851b932f44cccdfa33b3b4311b70772 Mon Sep 17 00:00:00 2001 From: Cal Mitchell Date: Fri, 16 Nov 2018 11:55:35 -0500 Subject: [PATCH] added qe alphalens tut notebooks --- .../3_alphalens_lesson_4/notebook.ipynb | 148 ++++---- .../notebook.ipynb | 189 +++++++++++ .../notebook.ipynb | 235 +++++++++++++ .../notebook.ipynb | 315 ++++++++++++++++++ .../notebook.ipynb | 182 ++++++++++ 5 files changed, 995 insertions(+), 74 deletions(-) create mode 100644 notebooks/tutorials/3_factset_alphalens_lesson_2/notebook.ipynb create mode 100644 notebooks/tutorials/3_factset_alphalens_lesson_3/notebook.ipynb create mode 100644 notebooks/tutorials/3_factset_alphalens_lesson_4/notebook.ipynb create mode 100644 notebooks/tutorials/3_factset_alphalens_lesson_5/notebook.ipynb diff --git a/notebooks/tutorials/3_alphalens_lesson_4/notebook.ipynb b/notebooks/tutorials/3_alphalens_lesson_4/notebook.ipynb index 9d83a35c..3439d239 100644 --- a/notebooks/tutorials/3_alphalens_lesson_4/notebook.ipynb +++ b/notebooks/tutorials/3_alphalens_lesson_4/notebook.ipynb @@ -70,6 +70,75 @@ ")" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Analyzing Alpha Factors By Group\n", + "\n", + "Alphalens allows you to group assets using a classifier. A common use case for this is creating a classifier that specifies which sector each equity belongs to, then comparing your alpha factor's returns among sectors.\n", + "\n", + "You can group assets by any classifier, but sector is most common. The Pipeline in the first cell of this lesson returns a column named `sector`, whose values represent the corresponding Morningstar sector code. All we have to do now is pass that column to the `groupby` argument of `get_clean_factor_and_forward_returns()`\n", + "\n", + "**Run the following cell, and notice the charts at the bottom of the tear sheet showing how our factor performs in different sectors.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from alphalens.tears import create_returns_tear_sheet\n", + "\n", + "sector_labels, sector_labels[-1] = dict(Sector.SECTOR_NAMES), \"Unknown\"\n", + "\n", + "factor_data = get_clean_factor_and_forward_returns(\n", + " factor=pipeline_output['factor_to_analyze'],\n", + " prices=pricing_data,\n", + " groupby=pipeline_output['sector'],\n", + " groupby_labels=sector_labels,\n", + ")\n", + "\n", + "create_returns_tear_sheet(factor_data=factor_data, by_group=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Writing Group Neutral Strategies\n", + "\n", + "Not only does Alphalens allow us to simulate how our alpha factor would perform in a long/short trading strategy, it also allows us to simulate how it would do if we went long/short on every group! \n", + "\n", + "Grouping by sector, and going long/short on each sector allows you to limit exposure to the overall movement of sectors. For example, you may have noticed in step three of this tutorial, that certain sectors had all positive returns, or all negative returns. That information isn't useful to us, because that just means the sector group outperformed (or underperformed) the market; it doesn't give us any insight into how our factor performs within that sector.\n", + "\n", + "Since we grouped our assets by sector in the previous cell, going group neutral is easy; just make the two following changes:\n", + "- Pass `binning_by_group=True` as an argument to `get_clean_factor_and_forward_returns()`.\n", + "- Pass `group_neutral=True` as an argument to `create_full_tear_sheet()`.\n", + "\n", + "**The following cell has made the approriate changes. Try running it and notice how the results differ from the previous cell.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": false + }, + "outputs": [], + "source": [ + "factor_data = get_clean_factor_and_forward_returns(\n", + " pipeline_output['factor_to_analyze'],\n", + " prices=pricing_data,\n", + " groupby=pipeline_output['sector'],\n", + " groupby_labels=sector_labels,\n", + " binning_by_group=True,\n", + ")\n", + "\n", + "create_returns_tear_sheet(factor_data, by_group=True, group_neutral=True)" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -169,75 +238,6 @@ "*Note: MaxLossExceededError has two possible causes; forward returns computation and binning. We showed you how to fix forward returns computation here because it is much more common. Try passing `quantiles=None` and `bins=5` if you get MaxLossExceededError because of binning.*" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Analyzing Alpha Factors By Group\n", - "\n", - "Alphalens allows you to group assets using a classifier. A common use case for this is creating a classifier that specifies which sector each equity belongs to, then comparing your alpha factor's returns among sectors.\n", - "\n", - "You can group assets by any classifier, but sector is most common. The Pipeline in the first cell of this lesson returns a column named `sector`, whose values represent the corresponding Morningstar sector code. All we have to do now is pass that column to the `groupby` argument of `get_clean_factor_and_forward_returns()`\n", - "\n", - "**Run the following cell, and notice the charts at the bottom of the tear sheet showing how our factor performs in different sectors.**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from alphalens.tears import create_returns_tear_sheet\n", - "\n", - "sector_labels, sector_labels[-1] = dict(Sector.SECTOR_NAMES), \"Unknown\"\n", - "\n", - "factor_data = get_clean_factor_and_forward_returns(\n", - " factor=pipeline_output['factor_to_analyze'],\n", - " prices=pricing_data,\n", - " groupby=pipeline_output['sector'],\n", - " groupby_labels=sector_labels,\n", - ")\n", - "\n", - "create_returns_tear_sheet(factor_data=factor_data, by_group=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Writing Group Neutral Strategies\n", - "\n", - "Not only does Alphalens allow us to simulate how our alpha factor would perform in a long/short trading strategy, it also allows us to simulate how it would do if we went long/short on every group! \n", - "\n", - "Grouping by sector, and going long/short on each sector allows you to limit exposure to the overall movement of sectors. For example, you may have noticed in step three of this tutorial, that certain sectors had all positive returns, or all negative returns. That information isn't useful to us, because that just means the sector group outperformed (or underperformed) the market; it doesn't give us any insight into how our factor performs within that sector.\n", - "\n", - "Since we grouped our assets by sector in the previous cell, going group neutral is easy; just make the two following changes:\n", - "- Pass `binning_by_group=True` as an argument to `get_clean_factor_and_forward_returns()`.\n", - "- Pass `group_neutral=True` as an argument to `create_full_tear_sheet()`.\n", - "\n", - "**The following cell has made the approriate changes. Try running it and notice how the results differ from the previous cell.**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": false - }, - "outputs": [], - "source": [ - "factor_data = get_clean_factor_and_forward_returns(\n", - " pipeline_output['factor_to_analyze'],\n", - " prices=pricing_data,\n", - " groupby=pipeline_output['sector'],\n", - " groupby_labels=sector_labels,\n", - " binning_by_group=True,\n", - ")\n", - "\n", - "create_returns_tear_sheet(factor_data, by_group=True, group_neutral=True)" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -255,21 +255,21 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 2", + "display_name": "Python 3", "language": "python", - "name": "python2" + "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", - "version": 2 + "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython2", - "version": "2.7.12" + "pygments_lexer": "ipython3", + "version": "3.7.0" } }, "nbformat": 4, diff --git a/notebooks/tutorials/3_factset_alphalens_lesson_2/notebook.ipynb b/notebooks/tutorials/3_factset_alphalens_lesson_2/notebook.ipynb new file mode 100644 index 00000000..4ee2d1a0 --- /dev/null +++ b/notebooks/tutorials/3_factset_alphalens_lesson_2/notebook.ipynb @@ -0,0 +1,189 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Companion notebook for Alphalens tutorial lesson 2\n", + "\n", + "# Creating Tear Sheets With Alphalens\n", + "\n", + "In the previous lesson, you learned what Alphalens is. In this lesson, you will learn a four step process for how to use it:\n", + "\n", + "1. Express an alpha factor and define a trading universe by creating and running a Pipeline over a certain time period.\n", + "2. Query pricing data for the assets in our universe during that same time period with `get_pricing()`.\n", + "3. Align the alpha factor data with the pricing data with `get_clean_factor_and_forward_returns()`.\n", + "4. Visualize how well our alpha factor predicts future price movements with `create_full_tear_sheet()`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Build And Run A Pipeline\n", + "The following code creates a trading universe and expresses an alpha factor within a pipeline, then runs it with `run_pipeline()`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from quantopian.pipeline import Pipeline\n", + "from quantopian.research import run_pipeline\n", + "from quantopian.pipeline.factors import AverageDollarVolume\n", + "from quantopian.pipeline.data import factset, USEquityPricing\n", + "\n", + "def make_pipeline():\n", + " # Filter out equities with low market capitalization\n", + " market_cap_filter = factset.Fundamentals.mkt_val.latest > 500000000\n", + "\n", + " # Filter out equities with low volume\n", + " volume_filter = AverageDollarVolume(window_length=200) > 2500000\n", + "\n", + " # Filter out equities with a close price below $5\n", + " price_filter = USEquityPricing.close.latest > 5\n", + "\n", + " # Our final base universe\n", + " base_universe = market_cap_filter & volume_filter & price_filter\n", + "\n", + " # Measures a company's asset growth rate.\n", + " asset_growth = factset.Fundamentals.assets_gr_qf.latest\n", + "\n", + " return Pipeline(\n", + " columns={'asset_growth': asset_growth},\n", + " screen=base_universe & asset_growth.notnull()\n", + " )\n", + "\n", + "pipeline_output = run_pipeline(pipeline=make_pipeline(), start_date='2014-1-1', end_date='2016-1-1')\n", + "\n", + "# Show the first 5 rows of factor data\n", + "pipeline_output.head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Query Pricing Data\n", + "\n", + "Now that we have factor data, let's get pricing data for the same time period. `get_pricing()` returns pricing data for a list of assets over a specified time period. It requires four arguments:\n", + "- A list of assets for which we want pricing.\n", + "- A start date\n", + "- An end date\n", + "- Whether to use open, high, low or close pricing.\n", + "\n", + "Execute the following cell to get pricing data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pricing_data = get_pricing(\n", + " symbols=pipeline_output.index.levels[1], # Finds all assets that appear at least once in \"pipeline_output\" \n", + " start_date='2014-1-1',\n", + " end_date='2016-2-1', # must be after run_pipeline()'s end date. Explained more in lesson 4\n", + " fields='open_price' # Generally, you should use open pricing. Explained more in lesson 4\n", + ")\n", + "\n", + "# Show the first 5 rows of pricing_data\n", + "pricing_data.head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Align Data\n", + "\n", + "`get_clean_factor_and_forward_returns()` aligns the factor data created by `run_pipeline()` with the pricing data created by `get_pricing()`, and returns an object suitable for analysis with Alphalens' charting functions. It requires two arguments:\n", + "- The factor data we created with `run_pipeline()`.\n", + "- The pricing data we created with `get_pricing()`.\n", + "\n", + "Execute the following cell to align the factor data with the pricing data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": false + }, + "outputs": [], + "source": [ + "from alphalens.utils import get_clean_factor_and_forward_returns\n", + "\n", + "factor_data = get_clean_factor_and_forward_returns(\n", + " factor=pipeline_output, \n", + " prices=pricing_data\n", + ")\n", + "\n", + "# Show the first 5 rows of merged_data\n", + "factor_data.head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Visualize Results\n", + "\n", + "Finally, execute the following cell to pass the output of `get_clean_factor_and_forward_returns()` to a function called `create_full_tear_sheet()`. This will create whats known as a tear sheet." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": false + }, + "outputs": [], + "source": [ + "from alphalens.tears import create_full_tear_sheet\n", + "\n", + "create_full_tear_sheet(factor_data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## That's It!\n", + "\n", + "In the next lesson, we will show you how to interpret the charts produced by `create_full_tear_sheet()`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 2", + "language": "python", + "name": "python2" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/tutorials/3_factset_alphalens_lesson_3/notebook.ipynb b/notebooks/tutorials/3_factset_alphalens_lesson_3/notebook.ipynb new file mode 100644 index 00000000..37c92e57 --- /dev/null +++ b/notebooks/tutorials/3_factset_alphalens_lesson_3/notebook.ipynb @@ -0,0 +1,235 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Companion notebook for Alphalens tutorial lesson 3\n", + "\n", + "# Interpreting Alphalens Tear Sheets\n", + "\n", + "In the previous lesson, you learned how to query and process data so that we can analyze it with Alphalens tear sheets. In this lesson, you will experience a few iterations of the alpha discovery phase of the [quant workflow](https://blog.quantopian.com/a-professional-quant-equity-workflow/) by analyzing those tear sheets.\n", + "\n", + "In this lesson, we will:\n", + "\n", + "1. Analyze how well an alpha factor predicts future price movements with `create_information_tear_sheet()`.\n", + "2. Try to improve our original alpha factor by combining it with another alpha factor.\n", + "3. Preview how profitable our alpha factor might be with `create_returns_tear_sheet()`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Our Starting Alpha Factor\n", + "\n", + "The following code expresses an alpha factor based on a company's net income and market cap, then creates an information tear sheet for that alpha factor. We will start analyzing the alpha factor by looking at it's information coefficient (IC). The IC is a number ranging from -1, to 1, which quantifies the predictiveness of an alpha factor. Any number above 0 is considered somewhat predictive.\n", + "\n", + "The first number you should look at is the IC mean, which is an alpha factor's average IC over a given time period. You want your factor's IC Mean to be as high as possible. Generally speaking, a factor is worth investigating if it has an IC mean over 0. If it has an IC mean close to .1 (or higher) over a large trading universe, that factor is probably **exceptionally good**. In fact, you might want to check to make sure there isn't some lookahead bias if your alpha factor's IC mean is over .1\n", + "\n", + "**Run the cell below to create an information tear sheet for our alpha factor. Notice how the IC Mean figures (the first numbers on the first chart) are all positive. That is a good sign!**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "from quantopian.pipeline import Pipeline\n", + "from quantopian.research import run_pipeline\n", + "from alphalens.tears import create_information_tear_sheet\n", + "from quantopian.pipeline.data import factset, USEquityPricing\n", + "from alphalens.utils import get_clean_factor_and_forward_returns\n", + "from quantopian.pipeline.factors import CustomFactor, SimpleMovingAverage, AverageDollarVolume\n", + "\n", + "\n", + "def make_pipeline():\n", + " # Filter out equities with low market capitalization\n", + " market_cap_filter = factset.Fundamentals.mkt_val.latest > 500000000\n", + "\n", + " # Filter out equities with low volume\n", + " volume_filter = AverageDollarVolume(window_length=200) > 2500000\n", + "\n", + " # Filter out equities with a close price below $5\n", + " price_filter = USEquityPricing.close.latest > 5\n", + "\n", + " # Our final base universe\n", + " base_universe = market_cap_filter & volume_filter & price_filter\n", + "\n", + " # 1 year moving average of year over year net income\n", + " net_income_moving_average = SimpleMovingAverage( \n", + " inputs=[factset.Fundamentals.net_inc_af], \n", + " window_length=252\n", + " )\n", + "\n", + " # 1 year moving average of market cap\n", + " market_cap_moving_average = SimpleMovingAverage( \n", + " inputs=[factset.Fundamentals.mkt_val], \n", + " window_length=252\n", + " )\n", + "\n", + " average_market_cap_per_net_income = (market_cap_moving_average / net_income_moving_average)\n", + "\n", + " # the last quarter's net income\n", + " net_income = factset.Fundamentals.net_inc_qf.latest \n", + "\n", + " projected_market_cap = average_market_cap_per_net_income * net_income\n", + "\n", + " return Pipeline(\n", + " columns={'projected_market_cap': projected_market_cap},\n", + " screen=base_universe & projected_market_cap.notnull()\n", + " )\n", + "\n", + "\n", + "pipeline_output = run_pipeline(make_pipeline(), '2010-1-1', '2012-1-1')\n", + "pricing_data = get_pricing(pipeline_output.index.levels[1], '2010-1-1', '2012-2-1', fields='open_price')\n", + "factor_data = get_clean_factor_and_forward_returns(pipeline_output, pricing_data)\n", + "\n", + "create_information_tear_sheet(factor_data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Add Another Alpha Factor\n", + "\n", + "**Alphalens is useful for identifying alpha factors that aren't predictive early in the quant workflow. This allows you to avoid wasting time running a full backtest on a factor that could have been discarded earlier in the process.**\n", + "\n", + "Run the following cell to express another alpha factor called `price_to_book`, combine it with `projected_market_cap` using zscores and winsorization, then creates another information tearsheet based on our new (and hopefully improved) alpha factor. \n", + "\n", + "Notice how the IC figures are lower than they were in the first chart. That means the factor we added is making our predictions worse!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "def make_pipeline():\n", + " # Filter out equities with low market capitalization\n", + " market_cap_filter = factset.Fundamentals.mkt_val.latest > 500000000\n", + "\n", + " # Filter out equities with low volume\n", + " volume_filter = AverageDollarVolume(window_length=200) > 2500000\n", + "\n", + " # Filter out equities with a close price below $5\n", + " price_filter = USEquityPricing.close.latest > 5\n", + "\n", + " # Our final base universe\n", + " base_universe = market_cap_filter & volume_filter & price_filter\n", + "\n", + " # 1 year moving average of year over year net income\n", + " net_income_moving_average = SimpleMovingAverage( \n", + " inputs=[factset.Fundamentals.net_inc_af], \n", + " window_length=252\n", + " )\n", + "\n", + " # 1 year moving average of market cap\n", + " market_cap_moving_average = SimpleMovingAverage( \n", + " inputs=[factset.Fundamentals.mkt_val], \n", + " window_length=252\n", + " )\n", + "\n", + " average_market_cap_per_net_income = (market_cap_moving_average / net_income_moving_average)\n", + "\n", + " # The last quarter's net income\n", + " net_income = factset.Fundamentals.net_inc_qf.latest\n", + "\n", + " projected_market_cap = average_market_cap_per_net_income * net_income\n", + "\n", + " # The alpha factor we are adding\n", + " price_to_book = factset.Fundamentals.pbk_qf.latest \n", + "\n", + " factor_to_analyze = projected_market_cap.zscore() + price_to_book.zscore()\n", + "\n", + " return Pipeline(\n", + " columns={'factor_to_analyze': factor_to_analyze},\n", + " screen=base_universe & factor_to_analyze.notnull()\n", + " )\n", + "\n", + "\n", + "\n", + "pipeline_output = run_pipeline(make_pipeline(), '2010-1-1', '2012-1-1')\n", + "pricing_data = get_pricing(pipeline_output.index.levels[1], '2010-1-1', '2012-2-1', fields='open_price')\n", + "new_factor_data = get_clean_factor_and_forward_returns(pipeline_output, pricing_data)\n", + "\n", + "create_information_tear_sheet(new_factor_data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### See If Our Alpha Factor Might Be Profitable\n", + "\n", + "We found that the first iteration of our alpha factor had more predictive value than the second one. Let's see if the original alpha factor might make any money.\n", + "\n", + "`create_returns_tear_sheet()` splits your universe into quantiles, then shows the returns generated by each quantile over different time periods. Quantile 1 is the 20% of assets with the lowest alpha factor values, and quantile 5 is the highest 20%.\n", + "\n", + "This function creates six types of charts, but the two most important ones are:\n", + "\n", + "- **Mean period-wise returns by factor quantile:** This chart shows the average return for each quantile in your universe, per time period. You want the quantiles on the right to have higher average returns than the quantiles on the left.\n", + "- **Cumulative return by quantile:** This chart shows you how each quantile performed over time. You want to see quantile 1 consistently performing the worst, quantile 5 consistently performing the best, and the other quantiles in the middle.\n", + "\n", + "**Run the following cell, and notice how quantile 5 doesn't have the highest returns. Ideally, you want quantile 1 to have the lowest returns, and quantile 5 to have the highest returns. This tear sheet is telling us we still have work to do!**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": false + }, + "outputs": [], + "source": [ + "from alphalens.tears import create_returns_tear_sheet\n", + "\n", + "create_returns_tear_sheet(factor_data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this lesson, you experienced a few cycles of the alpha discovery stage of the quant worfklow. Making good alpha factors isn't easy, but Alphalens allows you to iterate through them quickly to find out if you're on the right track! You can usually improve existing alpha factors in some way by getting creative with moving averages, looking for trend reversals, or any number of other stratgies.\n", + "\n", + "Try looking around [Quantopian's forums](https://www.quantopian.com/posts), or reading academic papers for inspiration. **This is where you get to be creative!** In the next lesson, we'll discuss advanced Alphalens concepts." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 2", + "language": "python", + "name": "python2" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/tutorials/3_factset_alphalens_lesson_4/notebook.ipynb b/notebooks/tutorials/3_factset_alphalens_lesson_4/notebook.ipynb new file mode 100644 index 00000000..dfa7c2e0 --- /dev/null +++ b/notebooks/tutorials/3_factset_alphalens_lesson_4/notebook.ipynb @@ -0,0 +1,315 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Companion notebook for Alphalens tutorial lesson 4\n", + "\n", + "# Advanced Alphalens concepts\n", + "\n", + "You've learned the basics of using Alphalens. This lesson explores the following advanced Alphalens concepts:\n", + "\n", + "1. Grouping assets by market cap, then analyzing each cap type individually.\n", + "2. Writing group neutral strategies.\n", + "3. Determining an alpha factor's decay rate.\n", + "4. Dealing with a common Alphalens error named MaxLossExceededError.\n", + "\n", + "**All sections of this lesson will use the data produced by the Pipeline created in the following cell. Please run it.**\n", + "\n", + "**Important note**: Until this lesson, we passed the output of `run_pipeline()` to `get_clean_factor_and_forward_returns()` without any changes. This was possible because the previous lessons' Pipelines only returned one column. This lesson's Pipeline returns two columns, which means we need to *specify the column* we're passing as factor data. Look for commented code near `get_clean_factor_and_forward_returns()` in the following cell to see how to do this." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from quantopian.pipeline import Pipeline\n", + "from quantopian.research import run_pipeline\n", + "from quantopian.pipeline.factors import AverageDollarVolume\n", + "from quantopian.pipeline.data import factset, USEquityPricing\n", + "from alphalens.utils import get_clean_factor_and_forward_returns\n", + "\n", + "\n", + "def make_pipeline():\n", + " # Filter out equities with low market capitalization\n", + " market_cap_filter = factset.Fundamentals.mkt_val.latest > 500000000\n", + "\n", + " # Filter out equities with low volume\n", + " volume_filter = AverageDollarVolume(window_length=200) > 2500000\n", + "\n", + " # Filter out equities with a close price below $5\n", + " price_filter = USEquityPricing.close.latest > 5\n", + "\n", + " # Our final base universe\n", + " base_universe = market_cap_filter & volume_filter & price_filter\n", + " \n", + " change_in_working_capital = factset.Fundamentals.wkcap_chg_qf.latest\n", + " ciwc_processed = change_in_working_capital.winsorize(.2, .98).zscore()\n", + " \n", + " sales_per_working_capital = factset.Fundamentals.sales_wkcap_qf.latest\n", + " spwc_processed = sales_per_working_capital.winsorize(.2, .98).zscore()\n", + "\n", + " factor_to_analyze = (ciwc_processed + spwc_processed).zscore()\n", + "\n", + " # The following columns will help us group assets by market cap. This will allow us to analyze\n", + " # whether our alpha factor's predictiveness varies among assets with different market caps.\n", + " market_cap = factset.Fundamentals.mkt_val.latest\n", + " is_small_cap = market_cap.percentile_between(0, 100)\n", + " is_mid_cap = market_cap.percentile_between(50, 100)\n", + " is_large_cap = market_cap.percentile_between(90, 100)\n", + "\n", + " return Pipeline(\n", + " columns = {\n", + " 'factor_to_analyze': factor_to_analyze, \n", + " 'small_cap_filter': is_small_cap,\n", + " 'mid_cap_filter': is_mid_cap,\n", + " 'large_cap_filter': is_large_cap,\n", + " },\n", + " screen = (\n", + " base_universe\n", + " & factor_to_analyze.notnull()\n", + " & market_cap.notnull()\n", + " )\n", + " )\n", + "\n", + "\n", + "pipeline_output = run_pipeline(make_pipeline(), '2013-1-1', '2014-1-1')\n", + "pricing_data = get_pricing(pipeline_output.index.levels[1], '2013-1-1', '2014-3-1', fields='open_price')\n", + "\n", + "# To group by market cap, we will follow the following steps.\n", + "\n", + "# Convert the \"True\" values to ones, so they can be added together\n", + "pipeline_output[['small_cap_filter', 'mid_cap_filter', 'large_cap_filter']] *= 1\n", + "\n", + "# If a stock passed the large_cap filter, it also passed the mid_cap and small_cap filters.\n", + "# This means we can add the three columns, and stocks that are large_cap will get a value of 3,\n", + "# stocks that are mid cap will get a value of 2, and stocks that are small cap will get 1.\n", + "pipeline_output['cap_type'] = (\n", + " pipeline_output['small_cap_filter'] + pipeline_output['mid_cap_filter'] + pipeline_output['large_cap_filter']\n", + ")\n", + "\n", + "# drop the old columns, we don't need them anymore\n", + "pipeline_output.drop(['small_cap_filter', 'mid_cap_filter', 'large_cap_filter'], axis=1, inplace=True)\n", + "\n", + "# rename the 1's, 2's and 3's for clarity\n", + "pipeline_output['cap_type'].replace([1, 2, 3], ['small_cap', 'mid_cap', 'large_cap'], inplace=True)\n", + "\n", + "# the final product\n", + "pipeline_output.head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Analyzing Alpha Factors By Group\n", + "\n", + "Alphalens allows you to group assets using a classifier. A common use case for this is classifying equities by market cap then comparing your alpha factor's returns among cap types.\n", + "\n", + "You can group assets by any classifier, but sector and market cap are most common. The Pipeline in the first cell of this lesson returns a column named `cap_type`, whose values represent the assets market capitalization. All we have to do now is pass that column to the `groupby` argument of `get_clean_factor_and_forward_returns()`\n", + "\n", + "**Run the following cell, and notice the charts at the bottom of the tear sheet showing how our factor performs among different cap types.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": false + }, + "outputs": [], + "source": [ + "from alphalens.tears import create_returns_tear_sheet\n", + "\n", + "factor_data = get_clean_factor_and_forward_returns(\n", + " factor=pipeline_output['factor_to_analyze'],\n", + " prices=pricing_data,\n", + " groupby=pipeline_output['cap_type'],\n", + ")\n", + "\n", + "create_returns_tear_sheet(factor_data=factor_data, by_group=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Writing Group Neutral Strategies\n", + "\n", + "Not only does Alphalens allow us to simulate how our alpha factor would perform in a long/short trading strategy, it also allows us to simulate how it would do if we went long/short on every group! \n", + "\n", + "Grouping by cap type, and going long/short on each cap type allows you to limit exposure to the overall movement of those market cap groups. For example, you may have noticed in step three of this tutorial, that certain cap types had all positive returns, or all negative returns. That information isn't useful to us, because that just means the market cap group outperformed (or underperformed) the market; it doesn't give us any insight into how our factor performs within that cap type.\n", + "\n", + "Since we grouped our assets by cap type in the previous cell, going group neutral is easy; just make the two following changes:\n", + "- Pass `binning_by_group=True` as an argument to `get_clean_factor_and_forward_returns()`.\n", + "- Pass `group_neutral=True` as an argument to `create_full_tear_sheet()`.\n", + "\n", + "**The following cell has made the approriate changes. Try running it and notice how the results differ from the previous cell.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": false + }, + "outputs": [], + "source": [ + "factor_data = get_clean_factor_and_forward_returns(\n", + " pipeline_output['factor_to_analyze'],\n", + " prices=pricing_data,\n", + " groupby=pipeline_output['cap_type'],\n", + " binning_by_group=True,\n", + ")\n", + "\n", + "create_returns_tear_sheet(factor_data, by_group=True, group_neutral=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Visualizing An Alpha Factor's Decay Rate\n", + "\n", + "A lot of fundamental data only comes out 4 times a year in quarterly reports. Because of this low frequency, it can be useful to increase the amount of time `get_clean_factor_and_forward_returns()` looks into the future to calculate returns. \n", + "\n", + "**Tip:** A month usually has 21 trading days, a quarter usually has 63 trading days, and a year usually has 252 trading days.\n", + "\n", + "Let's say you're creating a strategy that buys stock in companies with rising profits (data that is released every 63 trading days). Would you only look 10 days into the future to analyze that factor? Probably not! But how do you decide how far to look forward?\n", + "\n", + "**Run the following cell to chart our alpha factor's IC mean over time. The point where the line dips below 0 represents when our alpha factor's predictions stop being useful.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "longest_look_forward_period = 63 # week = 5, month = 21, quarter = 63, year = 252\n", + "range_step = 5\n", + "\n", + "factor_data = get_clean_factor_and_forward_returns(\n", + " factor = pipeline_output['factor_to_analyze'],\n", + " prices = pricing_data,\n", + " periods = range(1, longest_look_forward_period, range_step)\n", + ")\n", + "\n", + "from alphalens.performance import mean_information_coefficient\n", + "mean_information_coefficient(factor_data).plot(title=\"IC Decay\");" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What do you think the chart will look like if we calculate the IC a full year into the future?\n", + "\n", + "*Hint*: This is a setup for the next part of this lesson." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "factor_data = get_clean_factor_and_forward_returns(\n", + " pipeline_output['factor_to_analyze'], \n", + " pricing_data,\n", + " periods=range(1,252,20) # The third argument to the range statement changes the \"step\" of the range\n", + ")\n", + "\n", + "mean_information_coefficient(factor_data).plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dealing With MaxLossExceededError\n", + "\n", + "Oh no! What does `MaxLossExceededError` mean?\n", + "\n", + "`get_clean_factor_and_forward_returns()` looks at how alpha factor data affects pricing data *in the future*. This means we need our pricing data to go further into the future than our alpha factor data **by at least as long as our forward looking period.** \n", + "\n", + "In this case, we'll change `get_pricing()`'s `end_date` to be at least a year after `run_pipeline()`'s `end_date`.\n", + "\n", + "**Run the following cell to make those changes. As you can see, this alpha factor's IC decays quickly after a quarter, but comes back even stronger six months into the future. Interesting!**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "new_pipeline_output = run_pipeline(\n", + " make_pipeline(),\n", + " start_date='2013-1-1', \n", + " end_date='2014-1-1' # *** NOTE *** Our factor data ends in 2014\n", + ")\n", + "\n", + "new_pricing_data = get_pricing(\n", + " pipeline_output.index.levels[1], \n", + " start_date='2013-1-1',\n", + " end_date='2015-2-1', # *** NOTE *** Our pricing data ends in 2015\n", + " fields='open_price'\n", + ")\n", + "\n", + "new_factor_data = get_clean_factor_and_forward_returns(\n", + " new_pipeline_output['factor_to_analyze'], \n", + " new_pricing_data,\n", + " periods=range(1,252,20) # Change the step to 10 or more for long look forward periods to save time\n", + ")\n", + "\n", + "mean_information_coefficient(new_factor_data).plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*Note: MaxLossExceededError has two possible causes; forward returns computation and binning. We showed you how to fix forward returns computation here because it is much more common. Try passing `quantiles=None` and `bins=5` if you get MaxLossExceededError because of binning.*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "That's it! This tutorial got you started with Alphalens, but there's so much more to it. Check out our [API docs](http://quantopian.github.io/alphalens/) to see the rest!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.0" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/tutorials/3_factset_alphalens_lesson_5/notebook.ipynb b/notebooks/tutorials/3_factset_alphalens_lesson_5/notebook.ipynb new file mode 100644 index 00000000..c67fbe43 --- /dev/null +++ b/notebooks/tutorials/3_factset_alphalens_lesson_5/notebook.ipynb @@ -0,0 +1,182 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Alphalens Quickstart Template" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from quantopian.pipeline import Pipeline\n", + "from quantopian.research import run_pipeline\n", + "from quantopian.pipeline.data import factset, USEquityPricing\n", + "from quantopian.pipeline.factors import SimpleMovingAverage, AverageDollarVolume\n", + "\n", + "from alphalens.performance import mean_information_coefficient\n", + "from alphalens.utils import get_clean_factor_and_forward_returns\n", + "from alphalens.tears import create_information_tear_sheet, create_returns_tear_sheet" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Define Your Alpha Factor Here\n", + "\n", + "Spend your time in this cell, creating good factors. Then simply run the rest of the notebook to analyze `factor_to_analyze`!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def make_pipeline():\n", + " # Filter out equities with low market capitalization\n", + " market_cap_filter = factset.Fundamentals.mkt_val.latest > 500000000\n", + "\n", + " # Filter out equities with low volume\n", + " volume_filter = AverageDollarVolume(window_length=200) > 2500000\n", + "\n", + " # Filter out equities with a close price below $5\n", + " price_filter = USEquityPricing.close.latest > 5\n", + "\n", + " # Our final base universe\n", + " base_universe = market_cap_filter & volume_filter & price_filter\n", + " \n", + " assets_moving_average = SimpleMovingAverage(inputs=[factset.Fundamentals.assets], window_length=252)\n", + " current_assets = factset.Fundamentals.assets.latest\n", + " \n", + " # This is the factor that the rest of the notebook will analyze\n", + " factor_to_analyze = (current_assets - assets_moving_average)\n", + " \n", + " # The following columns will help us group assets by market cap. This will allow us to analyze\n", + " # whether our alpha factor's predictiveness varies among assets with different market caps.\n", + " market_cap = factset.Fundamentals.mkt_val.latest\n", + " is_small_cap = market_cap.percentile_between(0, 100)\n", + " is_mid_cap = market_cap.percentile_between(50, 100)\n", + " is_large_cap = market_cap.percentile_between(90, 100)\n", + "\n", + " return Pipeline(\n", + " columns = {\n", + " 'factor_to_analyze': factor_to_analyze, \n", + " 'small_cap_filter': is_small_cap,\n", + " 'mid_cap_filter': is_mid_cap,\n", + " 'large_cap_filter': is_large_cap,\n", + " },\n", + " screen = (\n", + " base_universe\n", + " & factor_to_analyze.notnull()\n", + " & market_cap.notnull()\n", + " )\n", + " )\n", + "\n", + "# To group by market cap, we will follow the following steps.\n", + "\n", + "# Convert the \"True\" values to ones, so they can be added together\n", + "pipeline_output[['small_cap_filter', 'mid_cap_filter', 'large_cap_filter']] *= 1\n", + "\n", + "# If a stock passed the large_cap filter, it also passed the mid_cap and small_cap filters.\n", + "# This means we can add the three columns, and stocks that are large_cap will get a value of 3,\n", + "# stocks that are mid cap will get a value of 2, and stocks that are small cap will get 1.\n", + "pipeline_output['cap_type'] = (\n", + " pipeline_output['small_cap_filter'] + pipeline_output['mid_cap_filter'] + pipeline_output['large_cap_filter']\n", + ")\n", + "\n", + "# drop the old columns, we don't need them anymore\n", + "pipeline_output.drop(['small_cap_filter', 'mid_cap_filter', 'large_cap_filter'], axis=1, inplace=True)\n", + "\n", + "# rename the 1's, 2's and 3's for clarity\n", + "pipeline_output['cap_type'].replace([1, 2, 3], ['small_cap', 'mid_cap', 'large_cap'], inplace=True)\n", + "\n", + "# the final product\n", + "pipeline_output.head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Group Neutral Tear Sheets" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "factor_data = get_clean_factor_and_forward_returns(\n", + " factor = pipeline_output['factor_to_analyze'],\n", + " prices = pricing_data,\n", + " groupby = pipeline_output['cap_type'],\n", + " binning_by_group = True,\n", + " periods = (1,5,10)\n", + ")\n", + "\n", + "create_information_tear_sheet(factor_data, by_group=True, group_neutral=True)\n", + "create_returns_tear_sheet(factor_data, by_group=True, group_neutral=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Determine The Decay Rate Of Your Alpha Factor." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "longest_look_forward_period = 63 # week = 5, month = 21, quarter = 63, year = 252\n", + "range_step = 5\n", + "\n", + "factor_data = get_clean_factor_and_forward_returns(\n", + " factor = pipeline_output['factor_to_analyze'],\n", + " prices = pricing_data,\n", + " periods = range(1, longest_look_forward_period, range_step)\n", + ")\n", + "\n", + "mean_information_coefficient(factor_data).plot(title=\"IC Decay\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.0" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}