sars2pack

Overview

The sars2pack R package provides one-line access to over 40 COVID-related datasets. Datasets are accessed in real time directly from their sources and then transformed to tidy-data form where possible and applicable. The result of each dataset accessor is a ready-to-use R dataset, often a dataframe. Documentation includes dataset descriptions, sources and references, and examples. Online documentation is available in two locations:

The sars2pack documentation, which includes reference docs and detailed dataset descriptions.
Extended workflows and use cases, as an online book

Questions addressed by sars2pack

What are the current and historical total, new cases, and deaths of COVID-19 at the city, county, state, national, and international levels?
How do changes in infection rates differ across locations?
What are the non-pharmacological interventions in place at the local and national levels?
In the United States, what is the geographical distribution of healthcare capacity (ICU beds, total beds, doctors, etc.)?
What are the published values of key epidemic parameters, as curated from the literature?

Installation

# If you do not have BiocManager installed:
install.packages('BiocManager')

# Then, if sars2pack is not already installed:
BiocManager::install('seandavi/sars2pack')

After the one-time installation, load the packge to get started.

library(sars2pack)

Available datasets

name	accessor	data\_type	geographical	geospatial	region	resolution	url
United States county-level geographic details	us\_county\_geo\_details	c(“demographics”, “geographic”)	TRUE	TRUE	United States	admin2	[LINK](https://github.com/josh-byster/fips_lat_long)
OECD International Unemployment Data	oecd\_unemployment\_data	c(“economics”, “time series”)	TRUE	FALSE	World	admin0	[LINK](https://oecd.org)
healthdata.org COVID-19 Mobility Observations and Projections	healthdata\_mobility\_data	c(“mobility”, “time series”, “projections”)	TRUE	FALSE	International	c(“admin0”, “admin1”)	[LINK](https://covid19.healthdata.org/projections)
healthdata.org COVID-19 Testing Observations and Projections	healthdata\_testing\_data	c(“testing”, “time series”, “projections”)	TRUE	FALSE	International	c(“admin0”, “admin1”)	[LINK](https://covid19.healthdata.org/projections)
Our World In Data testing and cases reporting	owid\_data	c(“time series”, “cases”, “deaths”, “testing”)	TRUE	FALSE	World	admin0	[LINK](https://ourworldindata.org/coronavirus)
CovidTracker data	covidtracker\_data	c(“time series”, “cases”, “deaths”, “testing”)	TRUE	FALSE	United States	admin1	[LINK](https://covidtracking.com/)
European CDC world tracking	ecdc\_data	c(“time series”, “cases”, “deaths”)	TRUE	FALSE	World	admin0	[LINK](https://www.ecdc.europa.eu/en/covid-19)
EU data Github aggregator	eu\_data\_cache\_data	c(“time series”, “cases”, “deaths”)	TRUE	FALSE	Europe	c(“admin0”, “admin1”)	[LINK](https://github.com/covid19-eu-zh/covid19-eu-data)
USA Facts	usa\_facts\_data	c(“time series”, “cases”, “deaths”)	TRUE	FALSE	United States	admin1	[LINK](https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/)
Johns Hopkins dataset	jhu\_data	c(“time series”, “cases”, “deaths”)	TRUE	FALSE	World	admin0	[LINK](https://github.com/CSSEGISandData/COVID-19)
Johns Hopkins US-centric data	jhu\_us\_data	c(“time series”, “cases”, “deaths”)	TRUE	FALSE	United States	c(“admin1”, “admin2”)	[LINK](https://github.com/CSSEGISandData/COVID-19)
New York Times county level data	nytimes\_county\_data	c(“time series”, “cases”, “deaths”)	TRUE	FALSE	United States	admin2	[LINK](https://raw.githubusercontent.com/nytimes/covid-19-data)
New York Times state level data	nytimes\_state\_data	c(“time series”, “cases”, “deaths”)	TRUE	FALSE	United States	admin1	[LINK](https://raw.githubusercontent.com/nytimes/covid-19-data)
The Economist: Excess deaths during COVID pandemic	economist\_excess\_deaths	c(“time series”, “deaths”, “excess deaths”)	TRUE	FALSE	International	c(“admin0”, “admin1”)	[LINK](https://github.com/TheEconomist/covid-19-excess-deaths-tracker)
The : Excess deaths during COVID pandemic	financial\_times\_excess\_deaths	c(“time series”, “deaths”, “excess deaths”)	TRUE	FALSE	International	c(“admin0”, “admin1”)	[LINK](https://github.com/Financial-Times/coronavirus-excess-mortality-data)
US CDC excess deaths dataset	cdc\_excess\_deaths	c(“time series”, “deaths”, “excess deaths”)	TRUE	FALSE	United States	admin1	[LINK](https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.html)
Descartes Labs Mobility Data	descartes\_mobility\_data	c(“time series”, “mobility”)	TRUE	FALSE	United States	admin1	[LINK](https://raw.githubusercontent.com/descarteslabs/DL-COVID-19)
Apple mobility data from maps	apple\_mobility\_data	c(“time series”, “mobility”)	TRUE	FALSE	World	c(“admin0”, “admin1”, “admin2”, “admin3”)	[LINK](https://www.apple.com/covid19/mobility)
Healthdata.org projections of hospital utilization and deaths	healthdata\_projections\_data	c(“time series”, “projections”, “cases”, “deaths”)	TRUE	FALSE	c(“United States”, “World”)	c(“admin1”, “admin2”)	[LINK](http://www.healthdata.org/covid)
Healthdata.org mobility data	healthdata\_mobility\_data	c(“time series”, “projections”, “mobility”)	TRUE	FALSE	c(“United States”, “World”)	c(“admin1”, “admin2”)	[LINK](http://www.healthdata.org/covid)
United States CDC Social Vulnerability Index	cdc\_social\_vulnerability\_index	demographics	TRUE	FALSE	United States	admin2	[LINK](https://svi.cdc.gov/)
US county health rankings from ‘’	us\_county\_health\_rankings	demographics	TRUE	FALSE	United States	c(“admin0”, “admin1”, “admin2”)	[LINK](https://www.countyhealthrankings.org)
Country metadata from restcountries.eu	country\_metadata	demographics	TRUE	FALSE	World	admin0	[LINK](https://restcountries.eu)
Extensive United States hospital capabilities	us\_hospital\_details	healthcare capacity	TRUE	TRUE	United States	individual hospital	[LINK](https://hub.arcgis.com/datasets/geoplatform::hospitals)
Kaiser Family Foundation ICU bed data	kff\_icu\_beds	healthcare capacity	TRUE	TRUE	United States	Individual hospital	[LINK](https://khn.org/news/as-coronavirus-spreads-widely-millions-of-older-americans-live-in-counties-with-no-icu-beds)
CovidCare United States Healthcare Capacity	us\_healthcare\_capacity	healthcare capacity	TRUE	TRUE	United States	Individual hospital	[LINK](https://github.com/covidcaremap/covid19-healthsystemcapacity)
GISAID metadata from thousands of SARS-CoV-2 sequences	cov\_glue\_lineage\_data	line list	TRUE	FALSE	World	multiple	[LINK](https://github.com/hCoV-2019/lineages)
beoutbreakprepared	beoutbreakprepared\_data	line list	TRUE	FALSE	World	patient	[LINK](https://github.com/beoutbreakprepared/nCoV2019)
Published epidemic parameters for COVID-19	param\_estimates\_published	miscellaneous	FALSE	FALSE	list()	list()	[LINK](https://github.com/midas-network/COVID-19/blob/master/parameter_estimates/2019_novel_coronavirus/estimates.csv)
Google mobility data	google\_mobility\_data	mobility	TRUE	FALSE	World	c(“admin0”, “admin1”, “admin2”)	[LINK](https://www.google.com/covid19/mobility/)
Newick tree from thousands of SARS-CoV-2 sequences	cov\_glue\_newick\_data	phylogenetic	FALSE	FALSE	World	multiple	[LINK](https://github.com/hCoV-2019/lineages)
Aggregated projections from US CDC	cdc\_aggregated\_projections	projections	TRUE	FALSE	list()	c(“admin0”, “admin1”)	[LINK](https://www.cdc.gov/coronavirus/2019-ncov/covid-data/forecasting-us.html)
CoronaNet government response database	coronanet\_government\_response\_data	public policy	TRUE	FALSE	World	c(“admin0”, “admin1”)	[LINK](https://coronanet-project.org/index.html)
Oxford Government Policy Intervention time series	government\_policy\_timeline	public policy	TRUE	FALSE	World	admin0	[LINK](https://www.bsg.ox.ac.uk/research/research-projects/oxford-covid-19-government-response-tracker)
United States social distancing policies	us\_state\_distancing\_policy	public policy	TRUE	FALSE	United States	admin1	[LINK](https://github.com/COVID19StatePolicy/SocialDistancing/)

Case tracking -------------

Updated tracking of city, county, state, national, and international confirmed cases, deaths, and testing is critical to driving policy, implementing interventions, and measuring their effectiveness. Case tracking datasets include date, a count of cases, and usually numerous other pieces of information related to location of reporting, etc.

Accessing case-tracking datasets is typically done with one function per dataset. The example here is data from the European Centers for Disease Control, or ECDC.

ecdc = ecdc_data()

Get a quick overview of the dataset.

head(ecdc)

## # A tibble: 6 x 8
## # Groups:   location_name, subset [6]
##   date       location_name iso2c iso3c population_2019 continent subset    count
##   <date>     <chr>         <chr> <chr>           <dbl> <chr>     <chr>     <dbl>
## 1 2019-12-31 Afghanistan   AF    AFG          38041757 Asia      confirmed     0
## 2 2019-12-31 Afghanistan   AF    AFG          38041757 Asia      deaths        0
## 3 2019-12-31 Algeria       DZ    DZA          43053054 Africa    confirmed     0
## 4 2019-12-31 Algeria       DZ    DZA          43053054 Africa    deaths        0
## 5 2019-12-31 Armenia       AM    ARM           2957728 Europe    confirmed     0
## 6 2019-12-31 Armenia       AM    ARM           2957728 Europe    deaths        0

The ecdc dataset is just a data.frame (actually, a tibble), so applying standard R or tidyverse functionality can get answers to basic questions with little code. The next code block generates a top10 of countries with the most deaths recorded to date. Note that if you do this on your own computer, the data will be updated to today’s data values.

library(dplyr)
top10 = ecdc %>% filter(subset=='deaths') %>% 
    group_by(location_name) %>%
    filter(count==max(count)) %>%
    arrange(desc(count)) %>%
    head(10) %>% select(-starts_with('iso'),-continent,-subset) %>%
    mutate(rate_per_100k = 1e5*count/population_2019)

Finally, present a nice table of those countries:

knitr::kable(
    top10,
    caption = "Reported COVID-19-related deaths in ten most affected countries.",
    format = 'pandoc')

Reported COVID-19-related deaths in ten most affected countries.

date	location_name	population_2019	count	rate_per_100k
2020-07-06	United_States_of_America	329064917	129947	39.489776
2020-07-06	Brazil	211049519	64867	30.735441
2020-07-06	United_Kingdom	66647112	44220	66.349462
2020-07-06	Italy	60359546	34861	57.755570
2020-07-06	Mexico	127575529	30639	24.016361
2020-07-04	France	67012883	29893	44.607841
2020-07-05	France	67012883	29893	44.607841
2020-07-06	France	67012883	29893	44.607841
2020-05-24	Spain	46937060	28752	61.256500
2020-07-06	India	1366417756	19693	1.441214

Examine the spread of the pandemic throughout the world by examining cumulative deaths reported for the top 10 countries above.

ecdc_top10 = ecdc %>% filter(location_name %in% top10$location_name & subset=='deaths')
plot_epicurve(ecdc_top10,
              filter_expression = count > 10, 
              color='location_name')

Comparing the features of disease spread is easiest if all curves are shifted to “start” at the same absolute level of infection. In this case, shift the origin for all countries to start at the first time point when more than 100 cumulative cases had been observed. Note how some curves cross others which is evidence of less infection control at the same relative time in the pandemic for that country (eg., Brazil).

ecdc_top10 %>% align_to_baseline(count>100,group_vars=c('location_name')) %>%
    plot_epicurve(date_column = 'index',color='location_name')

Name		Name	Last commit message	Last commit date
Latest commit History 829 Commits
.github		.github
R		R
binder		binder
data		data
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.dockerignore		.dockerignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

sars2pack

Overview

Questions addressed by sars2pack

Installation

Available datasets

Contributions

Adding new datasets

Similar work

About

Licenses found

Releases

Packages

Contributors 5

Languages

License

Licenses found

seandavi/sars2pack

Folders and files

Latest commit

History

Repository files navigation

sars2pack

Overview

Questions addressed by sars2pack

Installation

Available datasets

Contributions

Adding new datasets

Similar work

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages