Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update gaussian mixture model notebooks #208

Merged
merged 2 commits into from
Oct 13, 2021

Conversation

chiral-carbon
Copy link
Collaborator

Addresses the issues: #102 and #103 to advance them to best practices.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 11, 2021

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-11T22:46:08Z
----------------------------------------------------------------

here I wanted to add coordsand dims instead of shape , but was unsure how to implement it since k and ndata are scalar values.

thought of something like this:

with pm.Model(coords={"k": np.empty(shape=k), "ndata":np.empty(shape=ndata)}):

but that didn't make sense


OriolAbril commented on 2021-08-12T07:33:32Z
----------------------------------------------------------------

k is the number of clusters/mixture components, and ndata the number of observations. We could use something like:

coords={"cluster": np.arange(k), "obs_id": np.arange(ndata)}

so that both the dimension name and coordinate value are somewhat informative

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 11, 2021

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-11T22:46:08Z
----------------------------------------------------------------

the p and sd graphs are quite different from before


OriolAbril commented on 2021-08-12T12:00:04Z
----------------------------------------------------------------

It looks like it has not converged. Try increasing the tune value and see what happens.

Also, I suspect these "full trace" and "after convergence" trace were originally plotting all warm up and posterior samples so they were different, they no longer have any difference. Selecting one out of five samples should only have some memory/performace effects, not change the plotting, and ArviZ funcions expect chain and draw dimensions to be present, as they provide important information that the plots use. They should not be passed stacked arrays; somehow plot_trace still works but plot_autocorr doesn't because it needs the chain and draw dimensions.

chiral-carbon commented on 2021-08-12T12:28:56Z
----------------------------------------------------------------

oh okay. so should I forego divding this into different sections(full trace and after convergence)?

OriolAbril commented on 2021-08-12T14:44:53Z
----------------------------------------------------------------

yes. This has nothing to do with the mixture models but with the metropolis sampler that needs a burn in period in order to discard the samples before reaching stationarity. It should (and I think it's covered already) in the metropolis and/or sampling notebooks.

Also as a side note, "after convergence" can be a bit misleading because there are no convergence guarantees. In your case for example, even though you are plotting only the _after burn-in_ samples, the sampler has not converged

chiral-carbon commented on 2021-08-12T15:54:04Z
----------------------------------------------------------------

okay. tried changing the tune but the results are similar, also takes a longer time to run, probably because of CategoricalGibbsMetropolis

OriolAbril commented on 2021-08-12T16:37:07Z
----------------------------------------------------------------

does it still look as wiggly?

chiral-carbon commented on 2021-08-14T06:57:07Z
----------------------------------------------------------------

yes it is still very wiggly

OriolAbril commented on 2021-08-24T17:22:19Z
----------------------------------------------------------------

I think I know the reason of the apparently increased wigglyness. The kde implemented in ArviZ used to be a bit too smoothing, so distributions with very small wiggles would seem smooth even though they weren't really that smooth.

The tune (which is actually a burn-in in this case) is actually long enough (otherwise we would still have seen non-stationarity and convergence to the typical set in the first plot trace which wasn't the case). The problem is probably the too small effective sample size. How does the plot change with a few more thousand samples?

chiral-carbon commented on 2021-08-25T06:36:07Z
----------------------------------------------------------------

I changed the number of samples to 12000 but it is still similar, and not smooth at all.

chiral-carbon commented on 2021-08-25T19:56:59Z
----------------------------------------------------------------

I'll run this again with more number of samples since the plot is still not smooth, have pushed this for now

OriolAbril commented on 2021-08-26T15:30:23Z
----------------------------------------------------------------

before doing that, what is the effective sample size? Also try doing

az.plot_ess(trace, var_names=["p", "sd", "means"], kind="evolution");

chiral-carbon commented on 2021-08-30T17:59:18Z
----------------------------------------------------------------

this is the output of az.plot_ess(trace, var_names=["p", "sd", "means"], kind="evolution"); for the above sampling:

OriolAbril commented on 2021-08-31T16:35:16Z
----------------------------------------------------------------

just leave it wiggly then, focus on the trace plot comments and I think it will be done

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-11T22:46:09Z
----------------------------------------------------------------

I thought the slicing syntax here was correct but clearly not :( wanted to know how to correct this.


Copy link
Member

k is the number of clusters/mixture components, and ndata the number of observations. We could use something like:

coords={"cluster": np.arange(k), "obs_id": np.arange(ndata)}

so that both the dimension name and coordinate value are somewhat informative


View entire conversation on ReviewNB

Copy link
Member

It looks like it has not converged. Try increasing the tune value and see what happens.

Also, I suspect these "full trace" and "after convergence" trace were originally plotting all warm up and posterior samples so they were different, they no longer have any difference. Selecting one out of five samples should only have some memory/performace effects, not change the plotting, and ArviZ funcions expect chain and draw dimensions to be present, as they provide important information that the plots use. They should not be passed stacked arrays; somehow plot_trace still works but plot_autocorr doesn't because it needs the chain and draw dimensions.


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 12, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-12T12:00:23Z
----------------------------------------------------------------

Line #4.        step2 = pm.ElemwiseCategorical(vars=[category], values=[0, 1, 2])

According to the deprecation warning this should be replaced by CategoricalGibbsMetropolis 


Copy link
Collaborator Author

oh okay. so should I forego divding this into different sections(full trace and after convergence)?


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 12, 2021

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-12T13:53:15Z
----------------------------------------------------------------

all the three graphs for w and mu get clubbed into one plot, is this fine?


OriolAbril commented on 2021-08-12T14:47:08Z
----------------------------------------------------------------

I think it's fine like this. If you want to recover the previous behaviour though it's only passing compact=False to az.plot_trace 

Copy link
Member

yes. This has nothing to do with the mixture models but with the metropolis sampler that needs a burn in period in order to discard the samples before reaching stationarity. It should (and I think it's covered already) in the metropolis and/or sampling notebooks.

Also as a side note, "after convergence" can be a bit misleading because there are no convergence guarantees. In your case for example, even though you are plotting only the _after burn-in_ samples, the sampler has not converged


View entire conversation on ReviewNB

Copy link
Member

I think it's fine like this. If you want to recover the previous behaviour though it's only passing compact=False to az.plot_trace 


View entire conversation on ReviewNB

Copy link
Collaborator Author

okay. tried changing the tune but the results are similar, also takes a longer time to run, probably because of CategoricalGibbsMetropolis


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 12, 2021

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-12T15:54:10Z
----------------------------------------------------------------

I printed this twice (this cell and the one after this) just to show that I thought we need to do stacking here, but I was still confused from both the results.

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 12, 2021

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-12T15:54:11Z
----------------------------------------------------------------

I think here we need to do stacking (see result of the cell before this one) because previously the code took 1 row from the trace.posterior["category"] and plotted that.


OriolAbril commented on 2021-08-12T16:51:33Z
----------------------------------------------------------------

this step plot is using stacked samples yes, and the cluster_posterior I think also needs the stacked trace because matplotlib doesn't flatten multidim arrays passed to plt.hist, it needs them to already be 1d. Note that the main difference however is more in what is selected, not so much in the shape of the data. Using [0] on a dataarray works but should never be used (there may be some exceptions but in very rare cases).

i is an index designed to subset on the obs_id dimension, so the [0] or [i] should be changed to sel(obs_id=i) to be explicit and make sure it subsets along the right dimension. In the second version of cluster_posterior this is not happening, the [0] is subsetting along the chain dimension instead, because it's the first one. When stacking, the new stacked dimension is moved to the last position by xarray, so selection happens along obs_id as it should, but we should be explicit with that

chiral-carbon commented on 2021-08-14T06:58:43Z
----------------------------------------------------------------

will keep in mind, thanks!

Copy link
Member

does it still look as wiggly?


View entire conversation on ReviewNB

Copy link
Member

this step plot is using stacked samples yes, and the cluster_posterior I think also needs the stacked trace because matplotlib doesn't flatten multidim arrays passed to plt.hist, it needs them to already be 1d. Note that the main difference however is more in what is selected, not so much in the shape of the data. Using [0] on a dataarray works but should never be used (there may be some exceptions but in very rare cases).

i is an index designed to subset on the obs_id dimension, so the [0] or [i] should be changed to sel(obs_id=i) to be explicit and make sure it subsets along the right dimension. In the second version of cluster_posterior this is not happening, the [0] is subsetting along the chain dimension instead, because it's the first one. When stacking, the new stacked dimension is moved to the last position by xarray, so selection happens along obs_id as it should, but we should be explicit with that


View entire conversation on ReviewNB

Copy link
Collaborator Author

yes it is still very wiggly


View entire conversation on ReviewNB

Copy link
Collaborator Author

will keep in mind, thanks!


View entire conversation on ReviewNB

@MarcoGorelli MarcoGorelli self-requested a review August 15, 2021 21:02
Copy link
Member

I think I know the reason of the apparently increased wigglyness. The kde implemented in ArviZ used to be a bit too smoothing, so distributions with very small wiggles would seem smooth even though they weren't really that smooth.

The tune (which is actually a burn-in in this case) is actually long enough (otherwise we would still have seen non-stationarity and convergence to the typical set in the first plot trace which wasn't the case). The problem is probably the too small effective sample size. How does the plot change with a few more thousand samples?


View entire conversation on ReviewNB

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-24T17:47:00Z
----------------------------------------------------------------

Each observation has a category assigned at every sample. With the old data, the first observation was assigned category 2 most of the time, and category 1 a handful of times. Now it's aways assigned category 1.

I think this plot would be more clear if it has a handful of subplots (say 3 columns 1 row) each with a different observation (out of the 10k available) and if it were a scatter plot where each chain had a different color to more or less see all chains have the same distribution.

I have two ideas in mind to do that (disclaimer, as usual I haven't tried the codes):

az.plot_trace(trace, var_names="category", sel={"obs_id": [3, 956, 5674]}, trace_kwargs={"ls": "none", "marker": "."})

or

trace.posterior.sel(obs_id=[3, 956, 5674]).plot.scatter(x="draw", y="category", hue="chain", col="obs_id", marker=".")

I think I'll prefer the plot_trace option because it already includes this cluster_posterior thingy as the left column. But it's always pretty to show how xarray plotting can be used for simple plots and that if you have used coords sensibly you get automatic labels.


@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-24T17:47:01Z
----------------------------------------------------------------

N is the number of observations, the lenght of W (which corresponds to K in the math notation) is the number of clusters. Here I would use again cluster and obs_id as dimensions, aranges of length K or N as coord values.


Copy link
Collaborator Author

I changed the number of samples to 12000 but it is still similar, and not smooth at all.


View entire conversation on ReviewNB

Copy link
Collaborator Author

I'll run this again with more number of samples since the plot is still not smooth, have pushed this for now


View entire conversation on ReviewNB

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-25T20:00:58Z
----------------------------------------------------------------

when we had tried running this with fewer no. of samples and plotting it we got three distinct columns as expected, but now the plots are weird. I think they might be overlapping here? I'll check this and try plotting again , have pushed as is for now


Copy link
Member

before doing that, what is the effective sample size? Also try doing

az.plot_ess(trace, var_names=["p", "sd", "means"], kind="evolution");

View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 26, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-26T15:38:11Z
----------------------------------------------------------------

this means that these 3 observations are always assigned the same category. always 2, always 0 and always 0. In the example with different data that was used before, the observation was nearly always assigned category 2 with a few 1 too. Try choosing other observations preferably close to -2 or close to 2 which look llike the boundaries between categories in the first plot, these should have multiple categories assigned to them.

I would also share the axis. I think using backend_kwargs={"sharex": "col", "sharey": "col"} when calling plot_trace will do the trick. with this, the xaxis of the left column should include all categories from 0 to 2 as should the y axis of the right column

also remember to add some alpha to the trace scatterplot on the right column


chiral-carbon commented on 2021-08-31T16:52:34Z
----------------------------------------------------------------

backend_kwargs={"sharex": "col", "sharey": "col"} gives an error. I think sharex or sharey might not be correct arguments?

this is the error message:

TypeError: init() got an unexpected keyword argument 'sharex'

I passed it to plot_kwargs but that didn't work, didn't get an error but didn't get any changes either.

OriolAbril commented on 2021-08-31T17:01:35Z
----------------------------------------------------------------

It looks like backed kwargs are not passed to the general arviz helper function to create axes so you'll have to create the axes manually. Something like:

fig, ax = plt.subplots(..., sharex="col", sharey="col")
az.plot_trace(..., ax=ax)

Copy link
Collaborator Author

chiral-carbon commented Aug 30, 2021

this is the output of az.plot_ess(trace, var_names=["p", "sd", "means"], kind="evolution"); for the above sampling:

index


View entire conversation on ReviewNB

Copy link
Member

just leave it wiggly then, focus on the trace plot comments and I think it will be done


View entire conversation on ReviewNB

Copy link
Collaborator Author

backend_kwargs={"sharex": "col", "sharey": "col"} gives an error. I think sharex or sharey might not be correct arguments?

this is the error message:

TypeError: init() got an unexpected keyword argument 'sharex'

I passed it to plot_kwargs but that didn't work, didn't get an error but didn't get any changes either.


View entire conversation on ReviewNB

Copy link
Member

It looks like backed kwargs are not passed to the general arviz helper function to create axes so you'll have to create the axes manually. Something like:

fig, ax = plt.subplots(..., sharex="col", sharey="col")
az.plot_trace(..., ax=ax)

View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 8, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-09-08T15:26:28Z
----------------------------------------------------------------

The plot looks much better now, and I'm not sure which are the observed values for the points 2, 16 and 422 you are using now, but I think it would be better to get at least one point with significant competition between two clusters, maybe even make sure to plot the point we are most uncertain about.


chiral-carbon commented on 2021-09-09T08:10:06Z
----------------------------------------------------------------

I agree, I am not sure how to get those points though since in the histogram we refer to in the beginning of the notebook where -2 or 2 look like boundary points, the observed data is normalized but here we have to pass values between 0 and 500

Sayam753 commented on 2021-09-19T16:42:40Z
----------------------------------------------------------------

The values between 0 and 500 are indices for data. So, as Oriol said, we need to find data points close to -2 and 2. On top of that, we need their indices so that they can be passed in coords={"obs_id": [...]} .

Rephrasing -

data close to -2
data + 2 close to 0
|data + 2| close to 0  # Taking absolute to make values on R+ scale.
argmin(|data + 2|)  # Will give the index

So, I think something like np.argmin(np.abs(data + 2)) will return the index for data point close to -2. And the same can be done for any data point of interest.

chiral-carbon commented on 2021-09-19T20:36:27Z
----------------------------------------------------------------

yup, understood. thanks a lot for explaning.

Copy link
Collaborator Author

I agree, I am not sure how to get those points though since in the histogram we refer to in the beginning of the notebook where -2 or 2 look like boundary points, the observed data is normalized but here we have to pass values between 0 and 500


View entire conversation on ReviewNB

@OriolAbril
Copy link
Member

  1. Count how many occurences of each integer are along all chains and draws for each observation. This will generate a 2d array with dims (obs_id, mixture_component)
  2. Sort along the mixture component and keep only the second column
  3. Find the argmax. This is the observation whose the 2nd mixture component has a higher weight so it should be the less clear one

Copy link
Member

The values between 0 and 500 are indices for data. So, as Oriol said, we need to find data points close to -2 and 2. On top of that, we need their indices so that they can be passed in coords={"obs_id": [...]} .

Rephrasing -

data close to -2
data + 2 close to 0
|data + 2| close to 0  # Taking absolute to make values on R+ scale.
argmin(|data + 2|)  # Will give the index

So, I think something like np.argmin(np.abs(data + 2)) will return the index for data point close to -2. And the same can be done for any data point of interest.


View entire conversation on ReviewNB

Copy link
Collaborator Author

yup, understood. thanks a lot for explaning.


View entire conversation on ReviewNB

fixes

correct selection while plotting

correction: coords/dims naming and plot

minor changes in coords/dims init

change plotting axes

change obs indices
@OriolAbril
Copy link
Member

The values between 0 and 500 are indices for data. So, as Oriol said, we need to find data points close to -2 and 2. On top of that, we need their indices so that they can be passed in coords={"obs_id": [...]} .

Rephrasing -

data close to -2
data + 2 close to 0
|data + 2| close to 0 # Taking absolute to make values on R+ scale.
argmin(|data + 2|) # Will give the index

So, I think something like np.argmin(np.abs(data + 2)) will return the index for data point close to -2. And the same can be done for any data point of interest.

I was proposing a different approach though so that we don't rely on the initial exploration that -2/2 look like boundaries and instead take the results and plot the observation to which the model assigned more "weight" to the 2nd category. Both will probably be fine so feel free to try whichever looks easier and if it were to not work try the other.

@chiral-carbon
Copy link
Collaborator Author

@OriolAbril I had tried @Sayam753's method and have used that to get points that have greater competition between clusters. do they look okay?

@OriolAbril OriolAbril merged commit 0e35678 into pymc-devs:main Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants