Skip to content

Commit

Permalink
Updated 3 survival notebooks to v5 (pymc-devs#501)
Browse files Browse the repository at this point in the history
  • Loading branch information
fonnesbeck authored Jan 20, 2023
1 parent 462700a commit 5288e05
Show file tree
Hide file tree
Showing 6 changed files with 525 additions and 329 deletions.
87 changes: 53 additions & 34 deletions examples/survival_analysis/censored_data.ipynb

Large diffs are not rendered by default.

7 changes: 5 additions & 2 deletions examples/survival_analysis/censored_data.myst.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ jupytext:
format_name: myst
format_version: 0.13
kernelspec:
display_name: pymc_env
display_name: pymc
language: python
name: pymc_env
name: python3
---

(censored_data)=
Expand Down Expand Up @@ -241,3 +241,6 @@ As we can see, both censored models appear to capture the mean and variance of t
%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor,aeppl
```

:::{include} ../page_footer.md
:::
276 changes: 182 additions & 94 deletions examples/survival_analysis/survival_analysis.ipynb

Large diffs are not rendered by default.

52 changes: 32 additions & 20 deletions examples/survival_analysis/survival_analysis.myst.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,30 +5,36 @@ jupytext:
format_name: myst
format_version: 0.13
kernelspec:
display_name: Python 3
display_name: pymc
language: python
name: python3
---

(survival_analysis)=
# Bayesian Survival Analysis

Author: Austin Rochford

[Survival analysis](https://en.wikipedia.org/wiki/Survival_analysis) studies the distribution of the time to an event. Its applications span many fields across medicine, biology, engineering, and social science. This tutorial shows how to fit and analyze a Bayesian survival model in Python using PyMC3.
[Survival analysis](https://en.wikipedia.org/wiki/Survival_analysis) studies the distribution of the time to an event. Its applications span many fields across medicine, biology, engineering, and social science. This tutorial shows how to fit and analyze a Bayesian survival model in Python using PyMC.

We illustrate these concepts by analyzing a [mastectomy data set](https://vincentarelbundock.github.io/Rdatasets/doc/HSAUR/mastectomy.html) from `R`'s [HSAUR](https://cran.r-project.org/web/packages/HSAUR/index.html) package.

:::{post} Jan 17, 2023
:tags: censored, survival analysis
:category: intermediate, how-to
:author: Austin Rochford, Chris Fonnesbeck
:::

```{code-cell} ipython3
import arviz as az
import numpy as np
import pandas as pd
import pymc3 as pm
import theano
import pymc as pm
import pytensor
%matplotlib inline
from matplotlib import pyplot as plt
from pymc3.distributions.timeseries import GaussianRandomWalk
from theano import tensor as T
from pymc.distributions.timeseries import GaussianRandomWalk
from pytensor import tensor as T
print(f"Running on PyMC v{pm.__version__}")
```

```{code-cell} ipython3
Expand Down Expand Up @@ -189,7 +195,7 @@ ax.set_ylabel("Number of observations")
ax.legend();
```

With the prior distributions on $\beta$ and $\lambda_0(t)$ chosen, we now show how the model may be fit using MCMC simulation with `pymc3`. The key observation is that the piecewise-constant proportional hazard model is [closely related](http://data.princeton.edu/wws509/notes/c7s4.html) to a Poisson regression model. (The models are not identical, but their likelihoods differ by a factor that depends only on the observed data and not the parameters $\beta$ and $\lambda_j$. For details, see Germán Rodríguez's WWS 509 [course notes](http://data.princeton.edu/wws509/notes/c7s4.html).)
With the prior distributions on $\beta$ and $\lambda_0(t)$ chosen, we now show how the model may be fit using MCMC simulation with `pymc`. The key observation is that the piecewise-constant proportional hazard model is [closely related](http://data.princeton.edu/wws509/notes/c7s4.html) to a Poisson regression model. (The models are not identical, but their likelihoods differ by a factor that depends only on the observed data and not the parameters $\beta$ and $\lambda_j$. For details, see Germán Rodríguez's WWS 509 [course notes](http://data.princeton.edu/wws509/notes/c7s4.html).)

We define indicator variables based on whether the $i$-th subject died in the $j$-th interval,

Expand All @@ -214,7 +220,7 @@ exposure[patients, last_period] = df.time - interval_bounds[last_period]

Finally, denote the risk incurred by the $i$-th subject in the $j$-th interval as $\lambda_{i, j} = \lambda_j \exp(\mathbf{x}_i \beta)$.

We may approximate $d_{i, j}$ with a Poisson random variable with mean $t_{i, j}\ \lambda_{i, j}$. This approximation leads to the following `pymc3` model.
We may approximate $d_{i, j}$ with a Poisson random variable with mean $t_{i, j}\ \lambda_{i, j}$. This approximation leads to the following `pymc` model.

```{code-cell} ipython3
coords = {"intervals": intervals}
Expand Down Expand Up @@ -244,7 +250,6 @@ with model:
n_samples,
tune=n_tune,
target_accept=0.99,
return_inferencedata=True,
random_seed=RANDOM_SEED,
)
```
Expand Down Expand Up @@ -326,23 +331,23 @@ fig.suptitle("Bayesian survival model");

We see that the cumulative hazard for metastasized subjects increases more rapidly initially (through about seventy months), after which it increases roughly in parallel with the baseline cumulative hazard.

These plots also show the pointwise 95% high posterior density interval for each function. One of the distinct advantages of the Bayesian model fit with `pymc3` is the inherent quantification of uncertainty in our estimates.
These plots also show the pointwise 95% high posterior density interval for each function. One of the distinct advantages of the Bayesian model fit with `pymc` is the inherent quantification of uncertainty in our estimates.

+++

##### Time varying effects

Another of the advantages of the model we have built is its flexibility. From the plots above, we may reasonable believe that the additional hazard due to metastization varies over time; it seems plausible that cancer that has metastasized increases the hazard rate immediately after the mastectomy, but that the risk due to metastization decreases over time. We can accommodate this mechanism in our model by allowing the regression coefficients to vary over time. In the time-varying coefficient model, if $s_j \leq t < s_{j + 1}$, we let $\lambda(t) = \lambda_j \exp(\mathbf{x} \beta_j).$ The sequence of regression coefficients $\beta_1, \beta_2, \ldots, \beta_{N - 1}$ form a normal random walk with $\beta_1 \sim N(0, 1)$, $\beta_j\ |\ \beta_{j - 1} \sim N(\beta_{j - 1}, 1)$.

We implement this model in `pymc3` as follows.
We implement this model in `pymc` as follows.

```{code-cell} ipython3
coords = {"intervals": intervals}
with pm.Model(coords=coords) as time_varying_model:
lambda0 = pm.Gamma("lambda0", 0.01, 0.01, dims="intervals")
beta = GaussianRandomWalk("beta", tau=1.0, dims="intervals")
beta = GaussianRandomWalk("beta", init_dist=pm.Normal.dist(), sigma=1.0, dims="intervals")
lambda_ = pm.Deterministic("h", lambda0 * T.exp(T.outer(T.constant(df.metastasized), beta)))
mu = pm.Deterministic("mu", exposure * lambda_)
Expand Down Expand Up @@ -384,15 +389,15 @@ ax.step(interval_bounds[:-1], beta_hat, color="C0")
ax.scatter(
interval_bounds[last_period[(df.event.values == 1) & (df.metastasized == 1)]],
beta_hpt.isel(intervals=last_period[(df.event.values == 1) & (df.metastasized == 1)]),
beta_hat.isel(intervals=last_period[(df.event.values == 1) & (df.metastasized == 1)]),
color="C1",
zorder=10,
label="Died, cancer metastasized",
)
ax.scatter(
interval_bounds[last_period[(df.event.values == 0) & (df.metastasized == 1)]],
beta_hpt.isel(intervals=last_period[(df.event.values == 0) & (df.metastasized == 1)]),
beta_hat.isel(intervals=last_period[(df.event.values == 0) & (df.metastasized == 1)]),
color="C0",
zorder=10,
label="Censored, cancer metastasized",
Expand Down Expand Up @@ -501,11 +506,18 @@ We have really only scratched the surface of both survival analysis and the Baye

This tutorial is available as an [IPython](http://ipython.org/) notebook [here](https://gist.github.com/AustinRochford/4c6b07e51a2247d678d6). It is adapted from a blog post that first appeared [here](http://austinrochford.com/posts/2015-10-05-bayes-survival.html).

+++

## Authors

- Originally authored by [Austin Rochford](https://github.com/AustinRochford).
- Updated by [Fernando Irarrázaval](https://github.com/cuchoi) in June 2022 to PyMC v4 ([pymc-examples#372](https://github.com/pymc-devs/pymc-examples/pull/372)).
- Updated by [Chris Fonnesbeck](https://github.com/fonnesbeck) in January 2023 to PyMC v5.

```{code-cell} ipython3
%load_ext watermark
%watermark -n -u -v -iv -w -p xarray
```

```{code-cell} ipython3
```
:::{include} ../page_footer.md
:::
398 changes: 230 additions & 168 deletions examples/survival_analysis/weibull_aft.ipynb

Large diffs are not rendered by default.

34 changes: 23 additions & 11 deletions examples/survival_analysis/weibull_aft.myst.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,29 @@ jupytext:
format_name: myst
format_version: 0.13
kernelspec:
display_name: Python 3
display_name: pymc
language: python
name: python3
---

(weibull_aft)=

# Reparameterizing the Weibull Accelerated Failure Time Model

:::{post} January 17, 2023
:tags: censored, survival analysis, weibull
:category: intermediate, how-to
:author: Junpeng Lao, George Ho, Chris Fonnesbeck
:::

```{code-cell} ipython3
import arviz as az
import numpy as np
import pymc3 as pm
import pymc as pm
import pytensor.tensor as pt
import statsmodels.api as sm
import theano.tensor as tt
print(f"Running on PyMC3 v{pm.__version__}")
print(f"Running on PyMC v{pm.__version__}")
```

```{code-cell} ipython3
Expand Down Expand Up @@ -83,8 +91,8 @@ with pm.Model() as model_1:
mu = pm.Normal("mu", mu=0, sigma=100)
alpha_raw = pm.Normal("a0", mu=0, sigma=0.1)
alpha = pm.Deterministic("alpha", tt.exp(alpha_sd * alpha_raw))
beta = pm.Deterministic("beta", tt.exp(mu / alpha))
alpha = pm.Deterministic("alpha", pt.exp(alpha_sd * alpha_raw))
beta = pm.Deterministic("beta", pt.exp(mu / alpha))
y_obs = pm.Weibull("y_obs", alpha=alpha, beta=beta, observed=y[~censored])
y_cens = pm.Potential("y_cens", weibull_lccdf(y[censored], alpha, beta))
Expand All @@ -93,7 +101,7 @@ with pm.Model() as model_1:
```{code-cell} ipython3
with model_1:
# Change init to avoid divergences
data_1 = pm.sample(target_accept=0.9, init="adapt_diag", return_inferencedata=True)
data_1 = pm.sample(target_accept=0.9, init="adapt_diag")
```

```{code-cell} ipython3
Expand All @@ -114,7 +122,7 @@ For more information, see [this Stan example model](https://github.com/stan-dev/
with pm.Model() as model_2:
alpha = pm.Normal("alpha", mu=0, sigma=10)
r = pm.Gamma("r", alpha=1, beta=0.001, testval=0.25)
beta = pm.Deterministic("beta", tt.exp(-alpha / r))
beta = pm.Deterministic("beta", pt.exp(-alpha / r))
y_obs = pm.Weibull("y_obs", alpha=r, beta=beta, observed=y[~censored])
y_cens = pm.Potential("y_cens", weibull_lccdf(y[censored], r, beta))
Expand All @@ -123,7 +131,7 @@ with pm.Model() as model_2:
```{code-cell} ipython3
with model_2:
# Increase target_accept to avoid divergences
data_2 = pm.sample(target_accept=0.9, return_inferencedata=True)
data_2 = pm.sample(target_accept=0.9)
```

```{code-cell} ipython3
Expand All @@ -144,7 +152,7 @@ logtime = np.log(y)
def gumbel_sf(y, mu, sigma):
"""Gumbel survival function."""
return 1.0 - tt.exp(-tt.exp(-(y - mu) / sigma))
return 1.0 - pt.exp(-pt.exp(-(y - mu) / sigma))
```

```{code-cell} ipython3
Expand All @@ -159,7 +167,7 @@ with pm.Model() as model_3:
```{code-cell} ipython3
with model_3:
# Change init to avoid divergences
data_3 = pm.sample(init="adapt_diag", return_inferencedata=True)
data_3 = pm.sample(init="adapt_diag")
```

```{code-cell} ipython3
Expand All @@ -174,8 +182,12 @@ az.summary(data_3, round_to=2)

- Originally collated by [Junpeng Lao](https://junpenglao.xyz/) on Apr 21, 2018. See original code [here](https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/65447fdb431c78b15fbeaef51b8c059f46c9e8d6/PyMC3QnA/discourse_1107.ipynb).
- Authored and ported to Jupyter notebook by [George Ho](https://eigenfoo.xyz/) on Jul 15, 2018.
- Updated for compatibility with PyMC v5 by Chris Fonnesbeck on Jan 16, 2023.

```{code-cell} ipython3
%load_ext watermark
%watermark -n -u -v -iv -w
```

:::{include} ../page_footer.md
:::

0 comments on commit 5288e05

Please sign in to comment.