Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing reference model components to delay_group_lmpf and generated quantities #147

Merged
merged 60 commits into from
Sep 20, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
b58ecfe
add required delay_group_lmpf changes for missing ref model
seabbs Jul 31, 2022
3cadc35
debug to compilation
seabbs Jul 31, 2022
a004d8e
add missing reference model to gq
seabbs Jul 31, 2022
5dd2dbc
use segment where possible
seabbs Aug 1, 2022
b235579
add first draft of missing reference look-up
seabbs Aug 1, 2022
48acfdc
correct delay structure
seabbs Aug 1, 2022
9a47a13
model fitting
seabbs Aug 1, 2022
f172d8a
debug allocation of missing reference effects
seabbs Aug 1, 2022
c1e8396
model fitting but not recovering simulated proportion
seabbs Aug 1, 2022
28c47bd
update snaps for enw_missing
seabbs Aug 1, 2022
77cee2d
add plot
seabbs Aug 2, 2022
1856b89
make a output processing
seabbs Aug 2, 2022
a9f8e01
use the correct likelihood you tool
seabbs Aug 2, 2022
28a0d58
make example multi-threaded
seabbs Aug 2, 2022
ddee268
use the correct helper function (log1m_exp not log1m)
seabbs Aug 2, 2022
85a4617
add enw_incidence_to_cumulativ and update enw_new_reports to match
seabbs Aug 10, 2022
0223a5c
update nowcast date for missing example and clean up code
seabbs Aug 10, 2022
7bbd646
reset to same date as used in all other examples
seabbs Aug 10, 2022
3da960e
fix merge issues and turn off warning
seabbs Aug 10, 2022
f726334
use a fixed proportion missing
seabbs Aug 10, 2022
d5dbd24
explore example
seabbs Aug 11, 2022
09e5536
add enw_incidence_to_cumulative
seabbs Aug 11, 2022
c009112
add enw_incidence_to_cumulative
seabbs Aug 11, 2022
ac9c2dd
local CRAN check
seabbs Aug 11, 2022
6a7aac5
solve merge conflicts
seabbs Aug 11, 2022
7a00cbe
add internal helper functions for missing reference lookup
seabbs Aug 11, 2022
4aa7a18
add new global variables
seabbs Aug 11, 2022
be5bbe8
update wordlist
seabbs Aug 11, 2022
d9e8f4f
Merge branch 'develop' into feature-missing-reference-function
seabbs Aug 12, 2022
8a09ec7
add missing refeerence model definition
seabbs Aug 12, 2022
2d1964e
Merge branch 'feature-missing-reference-function' of https://github.c…
seabbs Aug 12, 2022
0d45c7b
write tests for enw_reps_with_complete_refs
seabbs Aug 16, 2022
8e7c1de
fix indexing bug with enw_reference_by_report
seabbs Aug 16, 2022
e77c079
merge develop
seabbs Aug 16, 2022
74ad2df
fix issues from #151 causing spurious warnings
seabbs Aug 16, 2022
8029a18
fix failing tests due to sorting standardisation
seabbs Aug 16, 2022
0b64683
debug ordering changes
seabbs Aug 16, 2022
caef167
more test fixes
seabbs Aug 16, 2022
83ef418
fix enw_complete_dates tests
seabbs Aug 17, 2022
bd0a430
add enw_simulate_missing_reference to package
seabbs Aug 17, 2022
b6cc38f
exporte enw_simulate_missing_reference
seabbs Aug 17, 2022
5808134
make cmdstanr tests skip locally
seabbs Aug 17, 2022
6a2ab6d
complete missing model convergence check
seabbs Aug 17, 2022
79154a5
update news and contribnuting
seabbs Aug 17, 2022
38ae0d0
typo in contributing
seabbs Aug 18, 2022
40af056
Change temp variable name
adrian-lison Aug 30, 2022
b890f81
Refactor filt_obs_indexed
adrian-lison Aug 30, 2022
1a7cb2c
Fix filt_obs_indexes
adrian-lison Aug 30, 2022
5231dec
Improve in-model code doc
adrian-lison Aug 31, 2022
a68bf80
Streamline time wording in in-model doc
adrian-lison Aug 31, 2022
3cbe6b4
Refactor variable names in delay_group_lpmf
adrian-lison Sep 1, 2022
6f3ff55
add localisation changes from main
seabbs Sep 1, 2022
da55c32
Merge branch 'develop' into feature-missing-reference-function
seabbs Sep 1, 2022
0e6a1c7
add usage warning for the missing data MVP
seabbs Sep 1, 2022
11d647b
update news
seabbs Sep 1, 2022
2686216
spelling and global variables
seabbs Sep 1, 2022
2ec8058
add handling of group-wise missing reference observations and look-ups
seabbs Sep 1, 2022
c3494e5
use filtered missing reference obs in likelihood
seabbs Sep 1, 2022
aeb7eef
add missing lookup variables
seabbs Sep 1, 2022
8bdf530
update test snapshot
seabbs Sep 1, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add enw_simulate_missing_reference to package
  • Loading branch information
seabbs committed Aug 17, 2022
commit bd0a430730c03925ff451589306d6720d43fa358
57 changes: 57 additions & 0 deletions R/simulate.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#' Simulate observations with a missing reference date.
#'
#' A simple binomial simulator of missing data by reference date using simulated
#' or observed data as an input. This function may be used to validate missing
#' data models, as part of examples and case studies, or to explore the
#' implications of missing data for your use case.
#'
#' @param proportion Numeric, the proportion of observations that are missing a
#' reference date, indexed by reference date. Currently only a fixed proportion
#' are supported and this defaults to 0.2.
#'
#' @return A `data.table` of the same format as the input but with a simulated
#' proportion of observations now having a missing reference date.
#'
#' @inheritParams enw_cumulative_to_incidence
#' @family simulate
#' @examples
#' # Load and filter germany hospitalisations
#' nat_germany_hosp <- subset(
#' germany_covid19_hosp, location == "DE" & age_group %in% "00+"
#' )
#' nat_germany_hosp <- enw_filter_report_dates(
#' nat_germany_hosp,
#' latest_date = "2021-08-01"
#' )
#'
#' # Make sure observations are complete
#' nat_germany_hosp <- enw_complete_dates(
#' nat_germany_hosp,
#' by = c("location", "age_group"), missing_reference = FALSE
#' )
#'
#' # Simulate
#' enw_simulate_missing_reference(
#' nat_germany_hosp,
#' proportion = 0.35, by = c("location", "age_group")
#' )
enw_simulate_missing_reference <- function(obs, proportion = 0.2, by = c()) {
obs <- enw_cumulative_to_incidence(obs, by = by)

obs[, missing := purrr::map2_dbl(
new_confirm, proportion, ~ rbinom(1, .x, .y)
)]
obs[, new_confirm := new_confirm - missing]

complete_ref <- enw_incidence_to_cumulative(obs, by = by)
complete_ref[, c("new_confirm", "delay", "missing") := NULL]

missing_ref <- obs[, .(confirm = sum(missing)),
by = c(by, "report_date")
]
missing_ref[, reference_date := as.IDate(NA)]

obs <- rbind(complete_ref, missing_ref, use.names = TRUE)
data.table::setkeyv(obs, c(by, "reference_date", "report_date"))
return(obs[])
}
4 changes: 4 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,10 @@ reference:
desc: Package datasets used in examples and by users to explore the package functionality.
contents:
- has_concept("data")
- title: Simulate Datasets
desc: Tools for simulating datasets
contents:
- has_concept("simulate")
- title: Check inputs
desc: Functions to check the structure of user inputs.
contents:
Expand Down
31 changes: 4 additions & 27 deletions inst/examples/germany_missing.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ options(mc.cores = 4)
nat_germany_hosp <- germany_covid19_hosp[location == "DE"][age_group %in% "00+"]
nat_germany_hosp <- enw_filter_report_dates(
nat_germany_hosp,
latest_date = "2021-10-01"
latest_date = "2021-08-01"
)

# Make sure observations are complete
Expand All @@ -24,29 +24,6 @@ nat_germany_hosp <- enw_complete_dates(
# Set proportion missing at 35%
prop_miss <- 0.35

# Prototypes for simulating missing data - likely to be implemented in 0.2.0
enw_simulate_missing_reference <- function(obs, proportion = 0.2, by = c()) {
obs <- check_dates(obs)
obs <- enw_cumulative_to_incidence(obs, by = by)

obs[, missing := purrr::map2_dbl(
new_confirm, proportion, ~ rbinom(1, .x, .y)
)]
obs[, new_confirm := new_confirm - missing]

complete_ref <- enw_incidence_to_cumulative(obs, by = by)
complete_ref[, c("new_confirm", "delay", "missing") := NULL]

missing_ref <- obs[, .(confirm = sum(missing)),
by = c(by, "report_date")
]
missing_ref[, reference_date := as.IDate(NA)]

obs <- rbind(complete_ref, missing_ref, use.names = TRUE)
obs[order(reference_date, report_date)]
return(obs[])
}

# Simulate using this function
nat_germany_hosp <- enw_simulate_missing_reference(
nat_germany_hosp,
Expand All @@ -67,7 +44,7 @@ retro_nat_germany <- enw_filter_reference_dates(
latest_obs <- enw_latest_data(nat_germany_hosp)
latest_obs <- enw_filter_reference_dates(
latest_obs,
remove_days = 40, include_days = 20
remove_days = 40, include_days = 60
)

# Preprocess observations (note this maximum delay is likely too short)
Expand All @@ -81,14 +58,14 @@ model <- enw_model(threads = FALSE)
# dates and produce a nowcast
# Note that we have reduced samples for this example to reduce runtimes
nowcast <- epinowcast(pobs,
missing = enw_missing(~week, data = pobs),
missing = enw_missing(~ (1 | week), data = pobs),
report = enw_report(~ (1 | day_of_week), data = pobs),
fit = enw_fit_opts(
save_warmup = FALSE, pp = TRUE,
chains = 4, iter_warmup = 500, iter_sampling = 500,
likelihood_aggregation = "groups", adapt_delta = 0.9
),
obs = enw_obs(family = "negbin", data = pobs),
obs = enw_obs(family = "poisson", data = pobs),
model = model
)

Expand Down
56 changes: 56 additions & 0 deletions man/enw_simulate_missing_reference.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/roxygen/meta.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ list( # nolint
nowcast = "Functions used for nowcasting", # nolint
generatedata = "Functions to generate simulated data",
scenarios = "Functions to define and create scenarios",
simulate = "Tools for data simulation",
data = "Package data sets",
check = "Functions used for checking inputs",
utils = "Utility functions"
Expand Down