-
-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update gaussian mixture model notebooks #208
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-11T22:46:08Z here I wanted to add
thought of something like this: with pm.Model(coords={"k": np.empty(shape=k), "ndata":np.empty(shape=ndata)}):
but that didn't make sense OriolAbril commented on 2021-08-12T07:33:32Z
coords={"cluster": np.arange(k), "obs_id": np.arange(ndata)}
so that both the dimension name and coordinate value are somewhat informative |
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-11T22:46:08Z the OriolAbril commented on 2021-08-12T12:00:04Z It looks like it has not converged. Try increasing the tune value and see what happens.
Also, I suspect these "full trace" and "after convergence" trace were originally plotting all warm up and posterior samples so they were different, they no longer have any difference. Selecting one out of five samples should only have some memory/performace effects, not change the plotting, and ArviZ funcions expect chain and draw dimensions to be present, as they provide important information that the plots use. They should not be passed stacked arrays; somehow plot_trace still works but plot_autocorr doesn't because it needs the chain and draw dimensions. chiral-carbon commented on 2021-08-12T12:28:56Z oh okay. so should I forego divding this into different sections(full trace and after convergence)?
OriolAbril commented on 2021-08-12T14:44:53Z yes. This has nothing to do with the mixture models but with the metropolis sampler that needs a burn in period in order to discard the samples before reaching stationarity. It should (and I think it's covered already) in the metropolis and/or sampling notebooks.
Also as a side note, "after convergence" can be a bit misleading because there are no convergence guarantees. In your case for example, even though you are plotting only the _after burn-in_ samples, the sampler has not converged chiral-carbon commented on 2021-08-12T15:54:04Z okay. tried changing the OriolAbril commented on 2021-08-12T16:37:07Z does it still look as wiggly? chiral-carbon commented on 2021-08-14T06:57:07Z yes it is still very wiggly OriolAbril commented on 2021-08-24T17:22:19Z I think I know the reason of the apparently increased wigglyness. The kde implemented in ArviZ used to be a bit too smoothing, so distributions with very small wiggles would seem smooth even though they weren't really that smooth.
The tune (which is actually a burn-in in this case) is actually long enough (otherwise we would still have seen non-stationarity and convergence to the typical set in the first plot trace which wasn't the case). The problem is probably the too small effective sample size. How does the plot change with a few more thousand samples? chiral-carbon commented on 2021-08-25T06:36:07Z I changed the number of samples to 12000 but it is still similar, and not smooth at all.
chiral-carbon commented on 2021-08-25T19:56:59Z I'll run this again with more number of samples since the plot is still not smooth, have pushed this for now OriolAbril commented on 2021-08-26T15:30:23Z before doing that, what is the effective sample size? Also try doing
az.plot_ess(trace, var_names=["p", "sd", "means"], kind="evolution"); chiral-carbon commented on 2021-08-30T17:59:18Z this is the output of
OriolAbril commented on 2021-08-31T16:35:16Z just leave it wiggly then, focus on the trace plot comments and I think it will be done |
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-11T22:46:09Z I thought the slicing syntax here was correct but clearly not :( wanted to know how to correct this. |
coords={"cluster": np.arange(k), "obs_id": np.arange(ndata)}
so that both the dimension name and coordinate value are somewhat informative View entire conversation on ReviewNB |
It looks like it has not converged. Try increasing the tune value and see what happens.
Also, I suspect these "full trace" and "after convergence" trace were originally plotting all warm up and posterior samples so they were different, they no longer have any difference. Selecting one out of five samples should only have some memory/performace effects, not change the plotting, and ArviZ funcions expect chain and draw dimensions to be present, as they provide important information that the plots use. They should not be passed stacked arrays; somehow plot_trace still works but plot_autocorr doesn't because it needs the chain and draw dimensions. View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-12T12:00:23Z Line #4. step2 = pm.ElemwiseCategorical(vars=[category], values=[0, 1, 2])
According to the deprecation warning this should be replaced by |
oh okay. so should I forego divding this into different sections(full trace and after convergence)?
View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-12T13:53:15Z all the three graphs for
OriolAbril commented on 2021-08-12T14:47:08Z I think it's fine like this. If you want to recover the previous behaviour though it's only passing |
yes. This has nothing to do with the mixture models but with the metropolis sampler that needs a burn in period in order to discard the samples before reaching stationarity. It should (and I think it's covered already) in the metropolis and/or sampling notebooks.
Also as a side note, "after convergence" can be a bit misleading because there are no convergence guarantees. In your case for example, even though you are plotting only the _after burn-in_ samples, the sampler has not converged View entire conversation on ReviewNB |
I think it's fine like this. If you want to recover the previous behaviour though it's only passing View entire conversation on ReviewNB |
okay. tried changing the View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-12T15:54:10Z I printed this twice (this cell and the one after this) just to show that I thought we need to do stacking here, but I was still confused from both the results.
|
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-12T15:54:11Z I think here we need to do stacking (see result of the cell before this one) because previously the code took 1 row from the OriolAbril commented on 2021-08-12T16:51:33Z this step plot is using stacked samples yes, and the cluster_posterior I think also needs the stacked trace because matplotlib doesn't flatten multidim arrays passed to
chiral-carbon commented on 2021-08-14T06:58:43Z will keep in mind, thanks! |
does it still look as wiggly? View entire conversation on ReviewNB |
this step plot is using stacked samples yes, and the cluster_posterior I think also needs the stacked trace because matplotlib doesn't flatten multidim arrays passed to
View entire conversation on ReviewNB |
yes it is still very wiggly View entire conversation on ReviewNB |
will keep in mind, thanks! View entire conversation on ReviewNB |
I think I know the reason of the apparently increased wigglyness. The kde implemented in ArviZ used to be a bit too smoothing, so distributions with very small wiggles would seem smooth even though they weren't really that smooth.
The tune (which is actually a burn-in in this case) is actually long enough (otherwise we would still have seen non-stationarity and convergence to the typical set in the first plot trace which wasn't the case). The problem is probably the too small effective sample size. How does the plot change with a few more thousand samples? View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-24T17:47:00Z Each observation has a category assigned at every sample. With the old data, the first observation was assigned category 2 most of the time, and category 1 a handful of times. Now it's aways assigned category 1.
I think this plot would be more clear if it has a handful of subplots (say 3 columns 1 row) each with a different observation (out of the 10k available) and if it were a scatter plot where each chain had a different color to more or less see all chains have the same distribution.
I have two ideas in mind to do that (disclaimer, as usual I haven't tried the codes):
az.plot_trace(trace, var_names="category", sel={"obs_id": [3, 956, 5674]}, trace_kwargs={"ls": "none", "marker": "."})
or
trace.posterior.sel(obs_id=[3, 956, 5674]).plot.scatter(x="draw", y="category", hue="chain", col="obs_id", marker=".")
I think I'll prefer the plot_trace option because it already includes this cluster_posterior thingy as the left column. But it's always pretty to show how xarray plotting can be used for simple plots and that if you have used coords sensibly you get automatic labels. |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-24T17:47:01Z
|
I changed the number of samples to 12000 but it is still similar, and not smooth at all.
View entire conversation on ReviewNB |
I'll run this again with more number of samples since the plot is still not smooth, have pushed this for now View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-25T20:00:58Z when we had tried running this with fewer no. of samples and plotting it we got three distinct columns as expected, but now the plots are weird. I think they might be overlapping here? I'll check this and try plotting again , have pushed as is for now |
before doing that, what is the effective sample size? Also try doing
az.plot_ess(trace, var_names=["p", "sd", "means"], kind="evolution"); View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-26T15:38:11Z this means that these 3 observations are always assigned the same category. always 2, always 0 and always 0. In the example with different data that was used before, the observation was nearly always assigned category 2 with a few 1 too. Try choosing other observations preferably close to -2 or close to 2 which look llike the boundaries between categories in the first plot, these should have multiple categories assigned to them.
I would also share the axis. I think using
also remember to add some alpha to the trace scatterplot on the right column chiral-carbon commented on 2021-08-31T16:52:34Z
this is the error message: TypeError: init() got an unexpected keyword argument 'sharex' I passed it to OriolAbril commented on 2021-08-31T17:01:35Z It looks like backed kwargs are not passed to the general arviz helper function to create axes so you'll have to create the axes manually. Something like:
fig, ax = plt.subplots(..., sharex="col", sharey="col") az.plot_trace(..., ax=ax) |
this is the output of
View entire conversation on ReviewNB |
just leave it wiggly then, focus on the trace plot comments and I think it will be done View entire conversation on ReviewNB |
this is the error message: TypeError: init() got an unexpected keyword argument 'sharex' I passed it to View entire conversation on ReviewNB |
It looks like backed kwargs are not passed to the general arviz helper function to create axes so you'll have to create the axes manually. Something like:
fig, ax = plt.subplots(..., sharex="col", sharey="col") az.plot_trace(..., ax=ax) View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-09-08T15:26:28Z The plot looks much better now, and I'm not sure which are the observed values for the points 2, 16 and 422 you are using now, but I think it would be better to get at least one point with significant competition between two clusters, maybe even make sure to plot the point we are most uncertain about. chiral-carbon commented on 2021-09-09T08:10:06Z I agree, I am not sure how to get those points though since in the histogram we refer to in the beginning of the notebook where -2 or 2 look like boundary points, the observed data is normalized but here we have to pass values between 0 and 500 Sayam753 commented on 2021-09-19T16:42:40Z The values between 0 and 500 are indices for data. So, as Oriol said, we need to find data points close to -2 and 2. On top of that, we need their indices so that they can be passed in Rephrasing - data close to -2 data + 2 close to 0 |data + 2| close to 0 # Taking absolute to make values on R+ scale. argmin(|data + 2|) # Will give the index So, I think something like
chiral-carbon commented on 2021-09-19T20:36:27Z yup, understood. thanks a lot for explaning. |
I agree, I am not sure how to get those points though since in the histogram we refer to in the beginning of the notebook where -2 or 2 look like boundary points, the observed data is normalized but here we have to pass values between 0 and 500 View entire conversation on ReviewNB |
|
The values between 0 and 500 are indices for data. So, as Oriol said, we need to find data points close to -2 and 2. On top of that, we need their indices so that they can be passed in Rephrasing - data close to -2 data + 2 close to 0 |data + 2| close to 0 # Taking absolute to make values on R+ scale. argmin(|data + 2|) # Will give the index So, I think something like
View entire conversation on ReviewNB |
yup, understood. thanks a lot for explaning. View entire conversation on ReviewNB |
fixes correct selection while plotting correction: coords/dims naming and plot minor changes in coords/dims init change plotting axes change obs indices
84a8750
to
51f299b
Compare
I was proposing a different approach though so that we don't rely on the initial exploration that -2/2 look like boundaries and instead take the results and plot the observation to which the model assigned more "weight" to the 2nd category. Both will probably be fine so feel free to try whichever looks easier and if it were to not work try the other. |
@OriolAbril I had tried @Sayam753's method and have used that to get points that have greater competition between clusters. do they look okay? |
Addresses the issues: #102 and #103 to advance them to best practices.