-
-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update glm rolling regression #197
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-01T09:16:16Z Line #2. trace_rw = pm.sample(tune=2000, cores=4, target_accept=0.9) This should use return_inferencedata=True |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-03T19:28:01Z just adding what we already commented here in the PR too. Here use coords when defining the model |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-03T19:28:02Z slice at dataset level and choose variables afterwards plus try to remove the double loop. I think no loop will be needed, but maybe 1 loop will be to get the right coloring. |
I feel like we should just do the linear regression by hand here instead of using bambi, that'd be more obvious. |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-06T15:35:05Z try fixing xticklabels OriolAbril commented on 2021-08-26T15:45:55Z try
fig.autofmt_xdate()
reference: https://matplotlib.org/stable/gallery/text_labels_and_annotations/date.html |
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-10T22:42:58Z need help here on how to convert this to a single loop. I sliced it at the dataset level and converted I'm afraid it's still unclear to me after you explained it :/ |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-11T07:12:15Z Line #15. xi = xr.DataArray(np.linspace(prices_zscored.GFI.min(), prices_zscored.GFI.max(), 50))
The DataArray needs to have it's dimension named. Then you can do
To reduce the code to a single looping, I would do something like (this is pseudocode, it won't work as is, it's only illustrational):
dataarray = (a + b * x).stack(...)
I think this will differ slightly from the current plot because the current plot has a color slighly different for every line, however, the color should represent the time according to the colormap on the plot, so multiple samples from the same time should have the exactly the same color not a similar one. |
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-17T13:05:36Z replaced the glm with manual linear regression here
|
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-17T13:05:37Z the kernel dies here when I run this code, need some help with fixing this OriolAbril commented on 2021-08-24T16:28:23Z where and how does the kernel die? Have you tried for example only looping and subetting the data, no stacking nor plotting? or stacking before looping with something like
reg_eq = (trace_rw.posterior["alpha"] + trace_rw.posterior["beta"] * xi).stack(sample=("chain", "draw")) chiral-carbon commented on 2021-08-25T06:03:04Z it dies here: for i in prices.index.values[::15]: reg = reg_eq.sel(time=i).stack(pooled_chain=("chain", "draw")) plt.plot(xi, reg)
will try stacking before looping, should reduce the complexity somewhat. |
where and how does the kernel die? Have you tried for example only looping and subetting the data, no stacking nor plotting? or stacking before looping with something like
reg_eq = (trace_rw.posterior["alpha"] + trace_rw.posterior["beta"] * xi).stack(sample=("chain", "draw")) View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-24T16:32:16Z what are the dimensions and coordinates of |
it dies here: for i in prices.index.values[::15]: reg = reg_eq.sel(time=i).stack(pooled_chain=("chain", "draw")) plt.plot(xi, reg)
will try stacking before looping, should reduce the complexity somewhat. View entire conversation on ReviewNB |
try `fig.autofmt_xdate()
reference: https://matplotlib.org/stable/gallery/text_labels_and_annotations/date.html View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-26T15:54:56Z now plotting works, but still needs some work
chiral-carbon commented on 2021-08-28T08:12:16Z we are plotting 23 lines, and that's because we are slicing
should I use different colormaps for the scatter plot and the lines? currently both use OriolAbril commented on 2021-08-28T20:06:21Z We have 4 chains, 1000 draws and 1136 times (note we are slicing the index of prices which is the time, not the prices themselves). So we have 4544000 lines that we could plot which is way too much if we plan to interpret the plot and not have it be a mess of overlapping lines.
So we loop over 1 out of every 10 times and for each of those we plot multiple lines (we could plot 4000 lines per time!). I think a handful of lines ~10-30 is enough to show that the model predictions take some uncertainty into account even within a single time. Note also that we want to encode the time each line corresponds to into its color. We will then have multiple lines with the same color because we are plotting multiple samples that correspond to the same time.
The 23 therefore is the number of times we are plotting, not the number of lines. The number of lines is 23 * 4000 / 200 ~= 450 If my mental math is right.
As I said before, we want the color to encode the time information, so we need both scatter and line plots to use the same encoding -> they should use the same colormap and the same scale to transform time values to 0-1 values used by the colormap. |
we are plotting 23 lines, and that's because we are slicing
should use different colormaps for the scatter plot and the lines? currently both use View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-28T08:58:05Z using FixedLocator removes this warning: UserWarning: FixedFormatter should only be used together with FixedLocator but the yticklabels in the colorbar end up not getting printed properly. this happens in the plots below in the notebook as well. OriolAbril commented on 2021-08-28T19:41:29Z The suggestions provided by warnings can be useful, but they do what they can in trying to help avoid the warning. It may only cover the most common case or we might be triggering the warning via an edge case the developers did not consider. Here using this fixedlocator approach is not a solution because while it's true it prevents the warning from being printed, it completely breaks the plot, you have to try something else. This example from matplotlib docs might be worth playing with: https://matplotlib.org/stable/gallery/ticks_and_spines/colorbar_tick_labelling_demo.html chiral-carbon commented on 2021-08-30T11:34:42Z I tried the examples provided in the link, mainly by passing a OriolAbril commented on 2021-08-30T16:48:37Z This works for me:
|
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-28T08:58:17Z using FixedLocator removes this warning: UserWarning: FixedFormatter should only be used together with FixedLocator but the yticklabels in the colorbar end up not getting printed properly. this happens in the plots below in the notebook as well. |
The suggestions provided by warnings can be useful, but they do what they can in trying to help avoid the warning. It may only cover the most common case or we might be triggering the warning via an edge case the developers did not consider. Here using this fixedlocator approach is not a solution because while it's true it prevents the warning from being printed, it completely breaks the plot, you have to try something else. This example from matplotlib docs might be worth playing with: https://matplotlib.org/stable/gallery/ticks_and_spines/colorbar_tick_labelling_demo.html View entire conversation on ReviewNB |
We have 4 chains, 1000 draws and 1136 times (note we are slicing the index of prices which is the time, not the prices themselves). So we have 4544000 lines that we could plot which is way too much if we plan to interpret the plot and not have it be a mess of overlapping lines.
So we loop over 1 out of every 10 times and for each of those we plot multiple lines (we could plot 4000 lines per time!). I think a handful of lines ~10-30 is enough to show that the model predictions take some uncertainty into account even within a single time. Note also that we want to encode the time each line corresponds to into its color. We will then have multiple lines with the same color because we are plotting multiple samples that correspond to the same time.
The 23 therefore is the number of times we are plotting, not the number of lines. The number of lines is 23 * 4000 / 200 ~= 450 If my mental math is right.
As I said before, we want the color to encode the time information, so we need both scatter and line plots to use the same encoding -> they should use the same colormap and the same scale to transform time values to 0-1 values used by the colormap. View entire conversation on ReviewNB |
I tried the examples provided in the link, mainly by passing a View entire conversation on ReviewNB |
This works for me: I have also realized that we should change the colormap to one of the perceptually uniform sequential colormaps, see https://matplotlib.org/stable/tutorials/colors/colormaps.html (for all plots) View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-09-08T15:32:12Z Line #4. mymap = plt.get_cmap("winter") change this colormap to a perceptually sequential colormap (see link from previous comment) chiral-carbon commented on 2021-09-08T17:10:08Z does this colormap look better? there are of course others I could change this with OriolAbril commented on 2021-09-08T17:20:35Z It's not an appearance issue. Any of the perceptually sequential colormaps is fine by me. Here we are encoding the time which is sequential in our case, so with our aim of promoting best practices in notebooks as much as possible, not only with pymc3 related things we should use a perceptually sequential colormap. If instead we were interested in a diverging quantity such as values being above or below the mean or timestamps being earlier or later than a given date, we would use a diverging colormap.
The link I shared in 2 comments ago from matplotlib docs (https://matplotlib.org/stable/tutorials/colors/colormaps.html) explains it much better, the above paragraph is a barebones overview |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-09-08T15:32:12Z Line #9. colors = np.linspace(0.1, 1, len(prices))
start linspaces at 0
Line #12. mymap = plt.get_cmap("winter")
there is no need to create two colormaps nor to overwrite the original one. |
does this colormap look better? there are of course others I could change this with View entire conversation on ReviewNB |
It's not an appearance issue. Any of the perceptually sequential colormaps is fine by me. Here we are encoding the time which is sequential in our case, so with our aim of promoting best practices in notebooks as much as possible, not only with pymc3 related things we should use a perceptually sequential colormap. If instead we were interested in a diverging quantity such as values being above or below the mean or timestamps being earlier or later than a given date, we would use a diverging colormap.
The link I shared in 2 comments ago from matplotlib docs (https://matplotlib.org/stable/tutorials/colors/colormaps.html) explains it much better, the above paragraph is a barebones overview View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB Sayam753 commented on 2021-09-11T05:24:27Z Pairs trading link is not working chiral-carbon commented on 2021-09-11T10:15:50Z ooh thanks a lot for pointing it out |
View / edit / reply to this conversation on ReviewNB Sayam753 commented on 2021-09-11T05:24:28Z And stochastic volatility model link is also dead |
ooh thanks a lot for pointing it out View entire conversation on ReviewNB |
There is also an issue with many broken links throughout the notebooks linked in https://github.com/pymc-devs/pymc-examples/wiki/Notebook-updates-overview |
correction: use return_inferencedata=True in pm.sample commits: using coords and removing double loops change plotting code correction in plotting fix warnings change colormap revised: colormap
dfd1548
to
0099606
Compare
not sure if this needs a |
Addresses issue #89 and aims to use
bambi
instead of pymc3 GLM module.