Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May 29;15(5):e0233686.
doi: 10.1371/journal.pone.0233686. eCollection 2020.

A framework to build similarity-based cohorts for personalized treatment advice - a standardized, but flexible workflow with the R package SimBaCo

Affiliations

A framework to build similarity-based cohorts for personalized treatment advice - a standardized, but flexible workflow with the R package SimBaCo

Lucas Wirbka et al. PLoS One. .

Abstract

Along with increasing amounts of big data sources and increasing computer performance, real-world evidence from such sources likewise gains in importance. While this mostly applies to population averaged results from analyses based on the all available data, it is also possible to conduct so-called personalized analyses based on a data subset whose observations resemble a particular patient for whom a decision is to be made. Claims data from statutory health insurance companies could provide necessary information for such personalized analyses. To derive treatment recommendations from them for a particular patient in everyday care, an automated, reproducible and efficiently programmed workflow would be required. We introduce the R-package SimBaCo (Similarity-Based Cohort generation) offering a simple, but modular, and intuitive framework for this task. With the six built-in R-functions, this framework allows the user to create similarity cohorts tailored to the characteristics of particular patients. An exemplary workflow illustrates the distinct steps beginning with an initial cohort selection according to inclusion and exclusion criteria. A plotting function facilitates investigating a particular patient's characteristics relative to their distribution in a reference cohort, for example the initial cohort or the precision cohort after the data has been trimmed in accordance with chosen variables for similarity finding. Such precision cohorts allow any form of personalized analysis, for example personalized analyses of comparative effectiveness or customized prediction models developed from precision cohorts. In our exemplary workflow, we provide such a treatment comparison whereupon a treatment decision for a particular patient could be made. This is only one field of application where personalized results can directly support the process of clinical reasoning by leveraging information from individual patient data. With this modular package at hand, personalized studies can efficiently weight benefits and risks of treatment options of particular patients.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1
Scale charts of Patient 1 (A) and Patient 2 (B) in the initial, unselected cohort. The scale chart compares a patient’s characteristics (red vertical line) to the distribution of characteristics in the reference cohort (blue vertical bars). Each bar represents ranges derived from medians and interquartile ranges for continuous variables, and relative frequency for categorical variables. Different shades of blue indicate different categories or ranges above and below the median, respectively.
Fig 2
Fig 2
Scale charts of Patient 1 (A) and Patient 2 (B) from the respective personalized precision cohorts after the similarity search.
Fig 3
Fig 3
Kaplan-Meier plots comparing overall survival between two treatments (ATC codes B01AE07 and B01AF01 in personalized precision cohorts trimmed for Patient 1 (A) and Patient 2 (B)).

Similar articles

Cited by

References

    1. Sacristan JA, Dilla T. No big data without small data: learning health care systems begin and end with the individual patient. J Eval Clin Pract. 2015;21(6):1014–7. 10.1111/jep.12350 . - DOI - PMC - PubMed
    1. Schneeweiss S. Learning from big health care data. N Engl J Med. 2014;370(23):2161–3. 10.1056/NEJMp1401111 . - DOI - PubMed
    1. Frankovich J, Longhurst CA, Sutherland SM. Evidence-based medicine in the EMR era. N Engl J Med. 2011;365(19):1758–9. 10.1056/NEJMp1108726 . - DOI - PubMed
    1. Hill AB. Reflections on controlled trial. Ann Rheum Dis. 1966;25(2):107–13. 10.1136/ard.25.2.107 . - DOI - PMC - PubMed
    1. Kravitz RL, Duan N, Braslow J. Evidence-based medicine, heterogeneity of treatment effects, and the trouble with averages. Milbank Q. 2004;82(4):661–87. 10.1111/j.0887-378X.2004.00327.x . - DOI - PMC - PubMed

Publication types

Grants and funding

This work was supported by the German Innovation Funds according to § 92a (2) Volume V of the Social Insurance Code (§ 92a Abs. 2, SGB V - Fünftes Buch Sozialgesetzbuch), grant number: 01VSF18019. URL: https://innovationsfonds.g-ba.de/ Andreas D. Meid is funded by the Physician-Scientist Programme of Heidelberg University, Faculty of Medicine. URL: http://www.medizinische-fakultaet-hd.uni-heidelberg.de/Physician-Scientist-Programm.111367.0.html The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.