To the Editor — The levels-of-evidence hierarchy stratifies the quality of medical research for evidence-based medicine (EBM) therapeutic decisions (Fig. 1)1. Physicians and policy makers are encouraged to find the highest level of evidence to solve clinical questions. An important issue that remains poorly addressed is how to balance levels of evidence and urgency of medical need. This issue is crucial to the discovery of treatments for terminal cancers, as well as other lethal diseases, but has been most recently highlighted by the global turmoil caused by the COVID-19 pandemic and the search for countermeasures2. Here, we provide our perspective on updated established levels of evidence3,4 and discuss the debate regarding a properly balanced strategy for identifying and accessing new treatments for medical emergencies and disease outbreaks.

Fig. 1: Levels of evidence by time period.
figure 1

a, Canadian Task Force on the Periodic Health Examination’s Levels of Evidence (1979)1. b, Levels of evidence from Sackett (1989)5. c, OCEBM Levels of Evidence Working Group (March 2009); levels of evidence shown were for therapy4. d, OCEBM Levels of Evidence Working Group (2011). Levels of evidence shown were for therapy3. aSystematic review (SR) with homogeneity, meaning one that is free of concerning variations in the directions and degrees of the results. bIndividual RCTs should have narrow confidence interval. cAll or none means that all patients died before treatment became available but some now survive on it, or some patients died before the treatment became available but none now die on it. dPoor quality means a study that failed to clearly define comparison groups and/or measure exposures and outcomes in the same (preferably blinded) objective way in controls and cases, and/or failed to identify and control for confounders. eA first principle is a basic assumption that cannot be deduced from any other proposition or assumption. fAn n-of-1 trial was defined as a type of randomized controlled trial in which a sequence of alternative therapies is randomly given to a patient. The outcomes of regimens are compared, with the aim of deciding on the optimum treatment for the patient. The OCEBM also clarified that the intended interpretation of the first tier was not either systematic reviews of randomized trials or systematic reviews of n-of-1 trials, but rather was “either n-of-1 randomized trials or systematic reviews of randomized trials”3.

The gold standard

Because they attenuate bias by allocating people to arms on the basis of chance alone, randomized controlled trials (RCTs) are always assigned the top rung on the levels-of-evidence ladder1,3,4,5. However, not all RCTs are conducted properly, and the conclusions should therefore be carefully scrutinized in the context of the power of studies and of estimating types of errors. Even so, RCTs rightly remain a gold standard for EBM.

Notably, RCTs can also take years to perform. While there were several published RCTs regarding treatments for COVID-19 in 2020, in general these trials take time6,7,8,9,10. But serious diseases require more urgent answers, and we may need to weigh the life years lost waiting for conclusions from RCTs versus those of the possible deaths from therapies derived from other types of evidence, should the therapies turn out to be ineffective with longer follow up. In this regard, a survey of all 31 anticancer drugs approved in the United States over 34 years on the basis of the survival surrogate endpoints (response rate or progression-free survival), without a RCT, demonstrated that these agents fared well, with all showing long-term safety and efficacy11.

Importantly, in the time since the original levels-of-evidence hierarchy was created in 1979 and expanded in 1989 (Fig. 1a,b)1,5, there has been an evolution in thinking about levels of evidence, as depicted in the updated hierarchies from the Oxford Centre for Evidence-Based Medicine (OCEBM) (Fig. 1c,d)3,4. Indeed, in these modernized guidelines, some types of observational studies with striking effects now occupy the first or second tier of the levels-of-evidence pyramid.

In recent years, the digital world has also made remarkable developments. Computerized medical records now give us access to dense data on millions of patients, with the possibility of using real-world observations for drug discovery and even approvals. Indeed, the US Food and Drug Administration (FDA) has already set a precedent by approving two drugs for advanced cancers on the basis of high response rates noted in part or in whole by retrospective data mining or real-world data12,13. These considerations are important because of the urgency of certain diseases and in crises, such as the present pandemic.

Moving to next-generation evidence

Increasing adoption of levels of evidence occurred after publications on this topic in 1979 and updates in 1989 (Fig. 1a,b)1,5. However, these first-generation levels of evidence have since been extensively reappraised by expert committees3,4. Notably, in 2009 and 2011, the OCEBM hierarchies for therapeutic studies (Fig. 1c,d)3,4 changed the equation for levels of evidence by moving all-or-none studies to the top tier in 2009 (with all-or-none studies implying that all patients perished before treatment existed but some now survive, or that some patients died before the treatment existed, but none now succumbs). Observational studies with dramatic results were raised to the second tier in 2011, thereby upgrading “dramatic results from uncontrolled studies such as introduction of penicillin” from a very low tier in the 1979 levels-of-evidence pyramid (Fig. 1a)1. Effectively, these changes established the significance of certain types of high-impact observational studies and recognized a paradigm shift in trial design14 driven by a variety of rapidly emerging platform technologies, biological tools and digital technologies, while keeping RCTs at the top of the hierarchy.

The earliest levels-of-evidence hierarchies were conceived more than 40 years ago1, before the advent of modern computer technology. To put this into context, there was no Internet or Google; computers were the size of a large room, used paper punch cards and had only 1 kilobyte of memory. In contrast, a contemporary iPhone may have 4 gigabytes (4 million kilobytes) of random-access memory and 512 gigabytes of storage. Today’s most powerful computers have 160 terabytes of random-access memory (160 billion kilobytes), which was unimaginable in the era when levels of evidence were first developed. This type of digital power provides fertile soil for the growth of new and powerful types of evidence15.

Structured trials have undergone major changes with the advent of adaptive designs that exploit advanced statistical methodology to optimize understanding of response and toxicity with many fewer patients than in classic RCTs. Modernized evidence now also includes that derived from mining clinical trial databases or real-world electronic medical or insurance records, as well as data from downloadable apps (including rapidly developed direct-to-patient apps to collect self-reported information on COVID-19) processed via machine learning (https://covid.joinzoe.com/us; Fig. 2)14.

Fig. 2: Emerging technologies fuel new types of trials and evidence.
figure 2

Advances in powerful genomic sequencing, digital technologies and machine learning have enabled novel trial design. Adaptive trials refer to studies in which data collection and analysis are ongoing throughout the life of the trial, and the number of patients in each arm or other characteristics of the arms are adapted in real time on the basis of that data. n-of-one individualized trial design in this context refers to trials in which each patient receives a different treatment on the basis of that patient’s characteristics; the success of the trial is judged by the effectiveness of the strategy to determine treatments, rather than on the effectiveness of any one type of treatment. Real-world data can now be collected from millions of computerized electronic medical or insurance or other similar records and analyzed. A master observation protocol also collects real-world data, but the data may follow a certain preconceived structure for consistency.

Adaptive designs augment clinical trial flexibility by continuously reassessing results accumulating in the trial to modify the trial’s course in accordance with prespecified guidelines16. Adaptive designs for exploratory clinical trials deal mostly with dose–response modeling and/or with identifying safe and effective doses. In confirmatory trials, the adaptive lexicon encompasses telescoped or seamless phase 1–3 designs, trials with ongoing sample size re-estimation, biomarker-driven adaptive population enrichment studies (allocating a larger proportion of the participants to treatment groups that are performing well and hence minimizing the number of participants in treatment groups that are doing poorly), and adaptive group sequential design (which permits alteration of sample size and/or endpoints during the course of the trial). Adaptive trials can often allow accurate conclusions to be drawn quickly and with much smaller numbers of patients than are needed for standard RCTs, which is particularly important in the case of COVID-19.

Large-scale, rapid evaluation of real-world data has also become a reality, leading to regulatory approvals. For example, the anti-programmed cell death 1 (PD-1) human IgG4 monoclonal antibody Keytruda (pembrolizumab) received FDA approval, in part, from a retrospective, pooled analysis and data mining of five single-arm trials in various tumor types showing an objective response rate of ~40%)12. Another example of using real-world data and digital technology for regulatory purposes is the FDA approval of Ibrance (palbociclib), a small-molecule inhibitor of cyclin-dependent kinases 4 and 6 (CDK4 and CDK6) for men with breast cancer13. The data included information on 2,675 patients that was collected over six years, including analysis from the PALOMA-2 and PALOMA-3 clinical studies, insurance claims, and electronic health records. This was the first oncology approval, to our knowledge, to have been derived largely or in whole from real-world data without a trial13. Going forward, the question is, in health emergencies like COVID-19, can rapid collection and analysis of massive amounts of data find clinically meaningful benefits, without the lengthy process required for a prospective trial?

Recent years have seen the emergence of yet another type of data collection: the master observational trial; for example, the local PREDICT trial at the University of California17, the IMPACT trial at MD Anderson Cancer Center18 or the national Master Registry of Oncology Outcomes Associated With Testing and Treatment (ROOT) trial (NCT04028479). The ROOT study, as an example, plans to prospectively follow patients for data collection and allows analysis of biological as well as clinical information14,19. The ROOT master observational trial differs from real-world data collection in that the former prospectively structures the data, whereas real-world data collection involves the downloading of information from medical records or other databases.

A related development to the above is the use of smart phone apps for self-reporting by patients in the community. This has been exploited for COVID-19, with the launch on March last year of a free smartphone downloadable app for symptom tracking (https://covid.joinzoe.com/us-2) developed by Zoe Global in collaboration with the Massachusetts General Hospital, King’s College London and the University of Nottingham. In a few weeks (from launch until 21 April 2020), an astounding 2,618,862 people (including 2,450,569 from the United Kingdom and 168,293 from the United States) used the app to report COVID-19-relevant symptoms. The app gathers data and tracks, in real time, how the disease progresses by recording self-reported health information on a daily basis: demographics, symptoms, hospitalization, test outcomes and pre-existing medical conditions. The results showed that, among 18,401 individuals who had undergone a SARS-CoV-2 test, the proportion of participants who reported loss of smell and taste was higher in those with a positive test result (4,668 of 7,178 individuals; 65.03%) than in those with a negative test result (2,436 of 11,223 participants; 21.71%) (odds ratio = 6.74; 95% confidence interval = 6.31–7.21)20. The model was able to predict COVID-19 infection without patients having to be tested. Using machine learning, the mobile application will also offer data on geographical hot spots, risk factors, harbinger symptoms and clinical outcomes. It represents a proof of concept for exploiting digital approaches to scale epidemiologic data collection at a remarkable pace21.

Interestingly, in the most recent version of the OCEBM (Fig. 1d), n-of-1 trials, in which there is randomization of treatment in the individual patient, shares the highest and same level of evidence as systematic review of RCTs3. The most common form of n-of-1 trials uses a multiple-crossover design; multiple exposures to reversible treatments are given in a random order, and the patient’s response to each treatment can be compared with each of his or her other responses. These n-of-1 studies have been carried out in chronic fatigue, sleep disturbances, reflux disease and depression, for example, but are rarely to never carried out in oncology22. Indeed, the classic n-of-1 trial typically cannot be applied to aggressive or to acute illnesses because randomizing patients with lethal diseases to multiple treatments, some of which may be ineffective, may result in permanent disability or death.

The classic type of n-of-1 trial described above should be differentiated from a distinct new terminology wherein n-of-1 refers to individualizing therapy in the precision medicine setting23. Using cancer as an example, these types of n-of-1 studies acknowledge that metastatic tumors are genomically complex and distinct from each other, indicating that each patient needs a customized combination therapy solution. Thus, the classic analysis that determines how well a drug regimen works in a group of people is not applicable (because each patient receives a different regimen). For these types of n-of-1 precision studies, the efficacy of the matching strategy is assessed, rather than the efficacy of any drug or combination of drugs. Effective genomic-sequencing-based matching approaches demonstrate improved outcomes for n-of-1 precision medicine studies in patients with lethal malignancies23 and might be translatable to other complicated diseases that require individualized treatment tactics. In this context, COVID-19 may illustrate the need to individualize the clinical approach on the basis of patient age, type and number or comorbidities, and presenting symptoms, as well as host immune response and genetic background predisposition when data on the latter become available.

Emerging lessons from COVID-19

From when the pandemic was declared on 11 March 2020 to five weeks later, there were 142 studies registered on https://clinicaltrials.gov that were “Recruiting, Active, not recruiting Studies | Interventional Studies | COVID”. For context, in the case of metastatic lung cancer, one of the most lethal cancers, there were 345 studies that were “Recruiting, Active, not recruiting Studies | Interventional Studies | metastatic lung cancer” at that time — but these studies had been opened over a period of years, rather than just a few weeks. The number of deaths per year of lung cancer is ~150,000 in the USA alone24 and ~2 million globally25; COVID-19 has killed >400,000 people in one year in the United States. In essence, in just over a one-month span after the pandemic was declared, the number of active recruiting and non-recruiting interventional studies begun or activated for COVID-19 was already at almost one half that for metastatic lung cancer. Furthermore, these trials are being rapidly published. Indeed, the New England Journal of Medicine published three COVID-19 therapeutic trials in the eight weeks after the pandemic was declared (only one of which was a RCT; Table 1)7,26,27.

Table 1 Clinical trials on COVID-19 treatments published in the New England Journal of Medicine from 12 March through 17 July 2020

The need for more speed and efficiency in clinical trial development and completion has been long recognized in the cancer field, with median times to opening trials often being anywhere from 6 months to over 1.5 years and requiring hundreds of administrative steps28. The COVID-19 pandemic clearly demonstrates that a road to rapid trial activation, completion and reporting exists. In the wake of the pandemic, this road should be traversable for the benefit of lethal diseases, such as cancer.

Although mechanism-based reasoning occupies the lowest tier of the levels-of-evidence pyramid (Fig. 1d) and is often inadequate by itself to establish a new therapy, such reasoning was the foundation on which clinical trials to advance COVID-19 care was built. From preclinical studies, it was hoped that human immunodeficiency virus (HIV) protease inhibitors like Kaletra (lopinavir and ritonavir) would be efficacious against SARS-CoV-2. In the first reported RCT for a COVID-19 treatment, Kaletra was found, however, to yield no benefit7. Preclinical studies29 also suggested that the endosomal inhibitor hydroxychloroquine (HCQ) — a drug that also decreases viral budding in vitro and has known anti-inflammatory properties — inhibits viral replication of COVID-19. These observations provided the rationale for trials in the setting of pre-exposure (NCT04334148) and post-exposure (NCT04308668) prophylaxis in healthcare workers. Arguments were also made for interleukin-6 (IL-6) inhibitors (Actemra (tocilizumab), Kevzara (sarilumab) and Sylvant (siltuximab)) on the basis of their ability to suppress cytokine storm in severe COVID-19. The race to find treatments for COVID-19 led to the FDA’s decision to grant Emergency Use Authorizations for HCQ in April, although it was revoked in June. As yet the antiviral Veklury (remdesivir) is the only drug to receive a full approval for COVID-19, on the basis of three RCTs.

Data are also being curated at record pace and prepublished before peer review, as well as having been published after peer review in prominent medical journals within weeks of the start of the pandemic. Rapid review should not, however, mean a compromise of reproducibility and transparency standards, an issue that arose when The Lancet and The New England Journal of Medicine published COVID-19 papers that required retraction. On the other hand, many rapidly published papers provide urgently needed data. For instance, observations from compassionate use of Veklury showed that 57% (17 of 30) of previously mechanically ventilated patients were extubated26. An observational study on 1,376 patients, also published in The New England Journal of Medicine (less than 2 months after the pandemic was announced), showed that there was no significant association between HCQ use and intubation27. Ultimately, these preliminary observations must be explored in RCTs, but they still provide important evidence in a pressing situation.

The pursuit of a COVID-19 therapy is unveiling the capability to rapidly investigate and deploy medications, which could be a lasting positive legacy of the pandemic. Indeed, several aspects of the COVID-19 reaction highlight how the slow processes for conceiving and activating clinical trials, as well as evaluating and approving drugs, can become immensely more efficient during a public health crisis. These clinical trials and publications, with 142 interventional studies registered on clinicaltrials.gov within five weeks of the declaration of the pandemic and three therapeutic studies published in The New England Journal of Medicine within eight weeks (Table 1), have been fast-tracked on the basis of the perceived emergency generated by the COVID-19 situation. The search for a COVID-19 treatment has been fueled by a mechanism-based understanding of COVID-19 biology, as well as anecdotal reports (ironically, the lowest tiers in the OCEBM levels-of-evidence pyramid for therapies; Fig. 1d)3,4.

Newer forms of evidence are also being interrogated. For instance, in a program providing rapid access to compassionate use of the antiviral Veklury, ~60% of patients hospitalized for severe COVID-19 demonstrated improvement, a finding that was quickly disseminated by publication26. These findings also raise the possibility of implementing master observational studies for COVID-19, as has been proposed for cancer with clinical trials such as ROOT that plan large-scale structured data acquisition in an observational setting14,19. In addition, acquisition of real-world data by exploiting digital technology to download medical or insurance records or to mine clinical trial databases has also led to approvals in cancer12,13 and may provide rapid access to important information related to COVID-19 therapeutic effectiveness. It is understood that some of the studies that are ongoing or proposed for COVID-19 are not RCTs and, therefore, while providing proof of concept, may still need to be confirmed by RCTs. Still, it is critical to appreciate how our response to COVID-19 has demonstrated that we do not need to become mired in old or misinterpreted dogma concerning levels-of-evidence rankings to advance a field where there is urgency.

It is also important to recognize that levels-of-evidence hierarchies have been extensively updated since their earliest renditions, 30 to 40 years ago1,5. Indeed, in 2009 and 2011, the OCEBM levels-of-evidence pyramid for treatment studies (Fig. 1c,d)3,4 raised several forms of non-RCT with dramatic effects to the top evidence tiers. For these types of observations, therapeutic efficacy may be such that randomization to a control arm may not be ethical30. The key is to balance the risk of authorizing a therapy that may later be disproven versus that of delaying adoption of a life-saving therapy by requiring a RCT that would likely take years to perform30. Indeed, there are quantifiable threshold values above which it is highly likely that effectiveness seen in non-randomized trials will consistently translate to improved survival.

In summary, contemporary levels-of-evidence hierarchies have already been broadened to acknowledge the important role played by non-RCTs (Fig. 1). Furthermore, powerful digital and molecular technologies exist today that were inconceivable when the earliest levels of evidence were formulated, over 40 years ago1. Newer types of evidence are being exploited, including real-world data and the use of genomic sequencing and mechanism-based reasoning to select cancer patients for matched gene- and immune-targeted treatments. The COVID-19 pandemic has revealed that we can exploit novel types of evidence, including those generated by observational studies (Table 1) and by digital technologies, including downloadable apps. The latter can produce clinically relevant information self-reported by millions of individuals within a few weeks20,21. In all, the COVID-19 pandemic has shown that we must balance scientific rigor, reflected by classic levels of evidence, with the need for urgency. The lessons learned may expedite the discovery of important treatments for other deadly diseases.