update glm rolling regression #197

chiral-carbon · 2021-07-30T14:12:48Z

Addresses issue #89 and aims to use bambi instead of pymc3 GLM module.

review-notebook-app · 2021-07-30T14:12:52Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

review-notebook-app · 2021-08-01T09:16:17Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-01T09:16:16Z
----------------------------------------------------------------

Line #2.        trace_rw = pm.sample(tune=2000, cores=4, target_accept=0.9)

This should use return_inferencedata=True

review-notebook-app · 2021-08-03T19:28:02Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-03T19:28:01Z
----------------------------------------------------------------

just adding what we already commented here in the PR too. Here use coords when defining the model

review-notebook-app · 2021-08-03T19:28:02Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-03T19:28:02Z
----------------------------------------------------------------

slice at dataset level and choose variables afterwards plus try to remove the double loop. I think no loop will be needed, but maybe 1 loop will be to get the right coloring.

twiecki · 2021-08-04T10:22:48Z

I feel like we should just do the linear regression by hand here instead of using bambi, that'd be more obvious.

review-notebook-app · 2021-08-06T15:35:06Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-06T15:35:05Z
----------------------------------------------------------------

try fixing xticklabels

OriolAbril commented on 2021-08-26T15:45:55Z
----------------------------------------------------------------

try

fig.autofmt_xdate()

reference: https://matplotlib.org/stable/gallery/text_labels_and_annotations/date.html

review-notebook-app · 2021-08-10T22:42:59Z

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-10T22:42:58Z
----------------------------------------------------------------

need help here on how to convert this to a single loop. I sliced it at the dataset level and converted xi to a DataArray but after we still need each value in the selection to be multiplied with xi , as we can't multiply xi with alpha or beta since they have different shapes (50 and 9 respectively).

I'm afraid it's still unclear to me after you explained it :/

review-notebook-app · 2021-08-11T07:12:16Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-11T07:12:15Z
----------------------------------------------------------------

Line #15.    xi = xr.DataArray(np.linspace(prices_zscored.GFI.min(), prices_zscored.GFI.max(), 50))

The DataArray needs to have it's dimension named. Then you can do alpha + beta * xi and xarray will know how to align and broadcast the result. If the dimension is not named, xarray doesn't know if the dimension of xi is supposed to be one of the ones in alpha or beta and can't know if it has to align xi with one of the current dimensions in alpha/beta or create a new dimension in alpha/beta and align with this new one. (note, this sum and product will work in xarray both if being done before and after stacking, so I would stack afterwards to make sure the sample dimension is the last one which will be needed for plotting)

To reduce the code to a single looping, I would do something like (this is pseudocode, it won't work as is, it's only illustrational):

dataarray = (a + b * x).stack(...)
for t in dataarray.time:

ax.plot(x, dataarray.sel(time=t), color=color_fun(t), ...)

I think this will differ slightly from the current plot because the current plot has a color slighly different for every line, however, the color should represent the time according to the colormap on the plot, so multiple samples from the same time should have the exactly the same color not a similar one.

review-notebook-app · 2021-08-17T13:05:37Z

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-17T13:05:36Z
----------------------------------------------------------------

replaced the glm with manual linear regression here

review-notebook-app · 2021-08-17T13:05:37Z

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-17T13:05:37Z
----------------------------------------------------------------

the kernel dies here when I run this code, need some help with fixing this

OriolAbril commented on 2021-08-24T16:28:23Z
----------------------------------------------------------------

where and how does the kernel die? Have you tried for example only looping and subetting the data, no stacking nor plotting? or stacking before looping with something like

reg_eq = (trace_rw.posterior["alpha"] + trace_rw.posterior["beta"] * xi).stack(sample=("chain", "draw"))

chiral-carbon commented on 2021-08-25T06:03:04Z
----------------------------------------------------------------

it dies here:

for i in prices.index.values[::15]:
   reg = reg_eq.sel(time=i).stack(pooled_chain=("chain", "draw"))
   plt.plot(xi, reg)

will try stacking before looping, should reduce the complexity somewhat.

OriolAbril · 2021-08-24T16:28:24Z

where and how does the kernel die? Have you tried for example only looping and subetting the data, no stacking nor plotting? or stacking before looping with something like

reg_eq = (trace_rw.posterior["alpha"] + trace_rw.posterior["beta"] * xi).stack(sample=("chain", "draw"))

View entire conversation on ReviewNB

review-notebook-app · 2021-08-24T16:32:16Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-24T16:32:16Z
----------------------------------------------------------------

what are the dimensions and coordinates of alpha? have you tried using xarray plotting to get labels generated automatically?

chiral-carbon · 2021-08-25T06:03:05Z

it dies here:

for i in prices.index.values[::15]:
   reg = reg_eq.sel(time=i).stack(pooled_chain=("chain", "draw"))
   plt.plot(xi, reg)

will try stacking before looping, should reduce the complexity somewhat.

View entire conversation on ReviewNB

OriolAbril · 2021-08-26T15:45:56Z

try `fig.autofmt_xdate()

reference: https://matplotlib.org/stable/gallery/text_labels_and_annotations/date.html

View entire conversation on ReviewNB

review-notebook-app · 2021-08-26T15:54:57Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-26T15:54:56Z
----------------------------------------------------------------

now plotting works, but still needs some work

double check slicing. If I haven't messed up checking the above cells or while operating, the slicing on the sample dimension is not working. There are 4 chains and 1000 draws -> 4000 samples. If we take one every 200 samples we should have 20 lines per time. I think I see more than 20 lines of the same color, or maybe the lines are very thick?
I think lines should be a bit transparent and be below the scatterplot (play with zorder). Moreover, it looks like unlike in the above plot the scatterplot doesn't have the right coloring. This should be fixed now that we are updating and correcting the notebook.
The colormap is now broken, it looks like it does a max mix between the old code and the code above to fix the locator warnings, but unlike above, here it's not working.

chiral-carbon commented on 2021-08-28T08:12:16Z
----------------------------------------------------------------

we are plotting 23 lines, and that's because we are slicing prices (which has length 1136) every 50 entries. so I think I should slice the sample dimension < 200 times?

should I use different colormaps for the scatter plot and the lines? currently both use plt.get_cmap("winter")

OriolAbril commented on 2021-08-28T20:06:21Z
----------------------------------------------------------------

We have 4 chains, 1000 draws and 1136 times (note we are slicing the index of prices which is the time, not the prices themselves). So we have 4544000 lines that we could plot which is way too much if we plan to interpret the plot and not have it be a mess of overlapping lines.

So we loop over 1 out of every 10 times and for each of those we plot multiple lines (we could plot 4000 lines per time!). I think a handful of lines ~10-30 is enough to show that the model predictions take some uncertainty into account even within a single time. Note also that we want to encode the time each line corresponds to into its color. We will then have multiple lines with the same color because we are plotting multiple samples that correspond to the same time.

The 23 therefore is the number of times we are plotting, not the number of lines. The number of lines is 23 * 4000 / 200 ~= 450 If my mental math is right.

As I said before, we want the color to encode the time information, so we need both scatter and line plots to use the same encoding -> they should use the same colormap and the same scale to transform time values to 0-1 values used by the colormap.

chiral-carbon · 2021-08-28T08:12:16Z

we are plotting 23 lines, and that's because we are slicing prices (which has length 1136) every 50 entries. so I think I should slice the sample dimension < 200 times?

should use different colormaps for the scatter plot and the lines? currently both use plt.get_cmap("winter")

View entire conversation on ReviewNB

review-notebook-app · 2021-08-28T08:58:05Z

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-28T08:58:05Z
----------------------------------------------------------------

using FixedLocator removes this warning:

UserWarning: FixedFormatter should only be used together with FixedLocator

but the yticklabels in the colorbar end up not getting printed properly. this happens in the plots below in the notebook as well.

OriolAbril commented on 2021-08-28T19:41:29Z
----------------------------------------------------------------

The suggestions provided by warnings can be useful, but they do what they can in trying to help avoid the warning. It may only cover the most common case or we might be triggering the warning via an edge case the developers did not consider. Here using this fixedlocator approach is not a solution because while it's true it prevents the warning from being printed, it completely breaks the plot, you have to try something else. This example from matplotlib docs might be worth playing with: https://matplotlib.org/stable/gallery/ticks_and_spines/colorbar_tick_labelling_demo.html

chiral-carbon commented on 2021-08-30T11:34:42Z
----------------------------------------------------------------

I tried the examples provided in the link, mainly by passing a ticks argument to plt.colorbar() but that does the same thing as before; in which the warning does get removed but the labels on the colorbar disappear like we see here and in the other plots below.

OriolAbril commented on 2021-08-30T16:48:37Z
----------------------------------------------------------------

This works for me:

review-notebook-app · 2021-08-28T08:58:18Z

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-28T08:58:17Z
----------------------------------------------------------------

using FixedLocator removes this warning:

UserWarning: FixedFormatter should only be used together with FixedLocator

but the yticklabels in the colorbar end up not getting printed properly. this happens in the plots below in the notebook as well.

OriolAbril · 2021-08-28T19:41:30Z

The suggestions provided by warnings can be useful, but they do what they can in trying to help avoid the warning. It may only cover the most common case or we might be triggering the warning via an edge case the developers did not consider. Here using this fixedlocator approach is not a solution because while it's true it prevents the warning from being printed, it completely breaks the plot, you have to try something else. This example from matplotlib docs might be worth playing with: https://matplotlib.org/stable/gallery/ticks_and_spines/colorbar_tick_labelling_demo.html

View entire conversation on ReviewNB

OriolAbril · 2021-08-28T20:06:22Z

We have 4 chains, 1000 draws and 1136 times (note we are slicing the index of prices which is the time, not the prices themselves). So we have 4544000 lines that we could plot which is way too much if we plan to interpret the plot and not have it be a mess of overlapping lines.

So we loop over 1 out of every 10 times and for each of those we plot multiple lines (we could plot 4000 lines per time!). I think a handful of lines ~10-30 is enough to show that the model predictions take some uncertainty into account even within a single time. Note also that we want to encode the time each line corresponds to into its color. We will then have multiple lines with the same color because we are plotting multiple samples that correspond to the same time.

The 23 therefore is the number of times we are plotting, not the number of lines. The number of lines is 23 * 4000 / 200 ~= 450 If my mental math is right.

As I said before, we want the color to encode the time information, so we need both scatter and line plots to use the same encoding -> they should use the same colormap and the same scale to transform time values to 0-1 values used by the colormap.

View entire conversation on ReviewNB

chiral-carbon · 2021-08-30T11:34:42Z

I tried the examples provided in the link, mainly by passing a ticks argument to plt.colorbar() but that does the same thing as before; in which the warning does get removed but the labels on the colorbar disappear like we see here and in the other plots below.

View entire conversation on ReviewNB

OriolAbril · 2021-08-30T16:48:37Z

This works for me:

I have also realized that we should change the colormap to one of the perceptually uniform sequential colormaps, see https://matplotlib.org/stable/tutorials/colors/colormaps.html (for all plots)

View entire conversation on ReviewNB

review-notebook-app · 2021-09-08T15:32:12Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-09-08T15:32:12Z
----------------------------------------------------------------

Line #4.    mymap = plt.get_cmap("winter")

change this colormap to a perceptually sequential colormap (see link from previous comment)

chiral-carbon commented on 2021-09-08T17:10:08Z
----------------------------------------------------------------

does this colormap look better? there are of course others I could change this with

OriolAbril commented on 2021-09-08T17:20:35Z
----------------------------------------------------------------

It's not an appearance issue. Any of the perceptually sequential colormaps is fine by me. Here we are encoding the time which is sequential in our case, so with our aim of promoting best practices in notebooks as much as possible, not only with pymc3 related things we should use a perceptually sequential colormap. If instead we were interested in a diverging quantity such as values being above or below the mean or timestamps being earlier or later than a given date, we would use a diverging colormap.

The link I shared in 2 comments ago from matplotlib docs (https://matplotlib.org/stable/tutorials/colors/colormaps.html) explains it much better, the above paragraph is a barebones overview

review-notebook-app · 2021-09-08T15:32:13Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-09-08T15:32:12Z
----------------------------------------------------------------

Line #9.    colors = np.linspace(0.1, 1, len(prices))

start linspaces at 0

Line #12.    mymap = plt.get_cmap("winter")
Line #13.    mymap_sc = plt.get_cmap("winter")

there is no need to create two colormaps nor to overwrite the original one. plt.get_cmap("winter") returns a callable which takes values between 0-1 and converts them to a color

chiral-carbon · 2021-09-08T17:10:09Z

does this colormap look better? there are of course others I could change this with

View entire conversation on ReviewNB

OriolAbril · 2021-09-08T17:20:36Z

It's not an appearance issue. Any of the perceptually sequential colormaps is fine by me. Here we are encoding the time which is sequential in our case, so with our aim of promoting best practices in notebooks as much as possible, not only with pymc3 related things we should use a perceptually sequential colormap. If instead we were interested in a diverging quantity such as values being above or below the mean or timestamps being earlier or later than a given date, we would use a diverging colormap.

The link I shared in 2 comments ago from matplotlib docs (https://matplotlib.org/stable/tutorials/colors/colormaps.html) explains it much better, the above paragraph is a barebones overview

View entire conversation on ReviewNB

review-notebook-app · 2021-09-11T05:24:28Z

View / edit / reply to this conversation on ReviewNB

Sayam753 commented on 2021-09-11T05:24:27Z
----------------------------------------------------------------

Pairs trading link is not working

chiral-carbon commented on 2021-09-11T10:15:50Z
----------------------------------------------------------------

ooh thanks a lot for pointing it out

review-notebook-app · 2021-09-11T05:24:28Z

View / edit / reply to this conversation on ReviewNB

Sayam753 commented on 2021-09-11T05:24:28Z
----------------------------------------------------------------

And stochastic volatility model link is also dead

chiral-carbon · 2021-09-11T10:15:51Z

ooh thanks a lot for pointing it out

View entire conversation on ReviewNB

OriolAbril · 2021-09-11T11:57:33Z

There is also an issue with many broken links throughout the notebooks linked in https://github.com/pymc-devs/pymc-examples/wiki/Notebook-updates-overview

correction: use return_inferencedata=True in pm.sample commits: using coords and removing double loops change plotting code correction in plotting fix warnings change colormap revised: colormap

chiral-carbon · 2021-09-16T19:51:26Z

not sure if this needs a references section? the link to pairs trading in the beginning of the notebook is the only external reference, should I include it in references.bib file?

chiral-carbon added the outreachy2021 label Jul 30, 2021

chiral-carbon changed the title ~~update glm rolling regression to use bambi~~ update glm rolling regression Aug 28, 2021

update glm rolling regression to use bambi

0099606

correction: use return_inferencedata=True in pm.sample commits: using coords and removing double loops change plotting code correction in plotting fix warnings change colormap revised: colormap

chiral-carbon force-pushed the rolling-regression branch from dfd1548 to 0099606 Compare September 16, 2021 06:34

chiral-carbon added 2 commits September 16, 2021 12:14

fix broken links

3d9184d

add tags

0b130e0

chiral-carbon added 2 commits September 17, 2021 01:30

fix tags

c6c689f

fix links

d03787f

OriolAbril approved these changes Sep 22, 2021

View reviewed changes

OriolAbril merged commit 1f40572 into pymc-devs:main Sep 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update glm rolling regression #197

update glm rolling regression #197

chiral-carbon commented Jul 30, 2021

review-notebook-app bot commented Jul 30, 2021

review-notebook-app bot commented Aug 1, 2021 •

edited

Loading

review-notebook-app bot commented Aug 3, 2021 •

edited

Loading

review-notebook-app bot commented Aug 3, 2021

twiecki commented Aug 4, 2021

review-notebook-app bot commented Aug 6, 2021 •

edited

Loading

review-notebook-app bot commented Aug 10, 2021

review-notebook-app bot commented Aug 11, 2021

review-notebook-app bot commented Aug 17, 2021 •

edited

Loading

review-notebook-app bot commented Aug 17, 2021 •

edited

Loading

OriolAbril commented Aug 24, 2021

review-notebook-app bot commented Aug 24, 2021

chiral-carbon commented Aug 25, 2021

OriolAbril commented Aug 26, 2021

review-notebook-app bot commented Aug 26, 2021 •

edited

Loading

chiral-carbon commented Aug 28, 2021

review-notebook-app bot commented Aug 28, 2021 •

edited

Loading

review-notebook-app bot commented Aug 28, 2021 •

edited

Loading

OriolAbril commented Aug 28, 2021

OriolAbril commented Aug 28, 2021

chiral-carbon commented Aug 30, 2021

OriolAbril commented Aug 30, 2021 •

edited

Loading

review-notebook-app bot commented Sep 8, 2021 •

edited

Loading

review-notebook-app bot commented Sep 8, 2021

chiral-carbon commented Sep 8, 2021

OriolAbril commented Sep 8, 2021

review-notebook-app bot commented Sep 11, 2021 •

edited

Loading

review-notebook-app bot commented Sep 11, 2021 •

edited

Loading

chiral-carbon commented Sep 11, 2021

OriolAbril commented Sep 11, 2021

chiral-carbon commented Sep 16, 2021

update glm rolling regression #197

update glm rolling regression #197

Conversation

chiral-carbon commented Jul 30, 2021

review-notebook-app bot commented Jul 30, 2021

review-notebook-app bot commented Aug 1, 2021 • edited Loading

review-notebook-app bot commented Aug 3, 2021 • edited Loading

review-notebook-app bot commented Aug 3, 2021

twiecki commented Aug 4, 2021

review-notebook-app bot commented Aug 6, 2021 • edited Loading

review-notebook-app bot commented Aug 10, 2021

review-notebook-app bot commented Aug 11, 2021

review-notebook-app bot commented Aug 17, 2021 • edited Loading

review-notebook-app bot commented Aug 17, 2021 • edited Loading

OriolAbril commented Aug 24, 2021

review-notebook-app bot commented Aug 24, 2021

chiral-carbon commented Aug 25, 2021

OriolAbril commented Aug 26, 2021

review-notebook-app bot commented Aug 26, 2021 • edited Loading

chiral-carbon commented Aug 28, 2021

review-notebook-app bot commented Aug 28, 2021 • edited Loading

review-notebook-app bot commented Aug 28, 2021 • edited Loading

OriolAbril commented Aug 28, 2021

OriolAbril commented Aug 28, 2021

chiral-carbon commented Aug 30, 2021

OriolAbril commented Aug 30, 2021 • edited Loading

review-notebook-app bot commented Sep 8, 2021 • edited Loading

review-notebook-app bot commented Sep 8, 2021

chiral-carbon commented Sep 8, 2021

OriolAbril commented Sep 8, 2021

review-notebook-app bot commented Sep 11, 2021 • edited Loading

review-notebook-app bot commented Sep 11, 2021 • edited Loading

chiral-carbon commented Sep 11, 2021

OriolAbril commented Sep 11, 2021

chiral-carbon commented Sep 16, 2021

review-notebook-app bot commented Aug 1, 2021 •

edited

Loading

review-notebook-app bot commented Aug 3, 2021 •

edited

Loading

review-notebook-app bot commented Aug 6, 2021 •

edited

Loading

review-notebook-app bot commented Aug 17, 2021 •

edited

Loading

review-notebook-app bot commented Aug 17, 2021 •

edited

Loading

review-notebook-app bot commented Aug 26, 2021 •

edited

Loading

review-notebook-app bot commented Aug 28, 2021 •

edited

Loading

review-notebook-app bot commented Aug 28, 2021 •

edited

Loading

OriolAbril commented Aug 30, 2021 •

edited

Loading

review-notebook-app bot commented Sep 8, 2021 •

edited

Loading

review-notebook-app bot commented Sep 11, 2021 •

edited

Loading

review-notebook-app bot commented Sep 11, 2021 •

edited

Loading