Main

The World Health Organization (WHO) estimates that close to one million children die from malaria every year1, and the highest mortality is found among African children. Increased control efforts have reduced the malaria burden in some areas, but evidence of rebounding malaria2 brings these general trends into question and confirms that we have much to learn to defeat this important human pathogen. It is now more important than ever to understand key aspects of malaria biology and transmission, to identify targets that are vulnerable to intervention strategies and to create tools to interpret the changing landscape of infection. One powerful approach uses population-biology-based investigations to provide crucial insight into the causes and spread of disease. This strategy aids biological discovery by using population structure and genetic diversity to identify loci under selection or associated with clinical phenotypes and to develop tools for monitoring and evaluating interventions.

Plasmodium falciparum is a eukaryotic pathogen with a complex life cycle, spending part of its lifespan in its definitive host, the Anopheles mosquito, as a mostly diploid organism, and the remainder of the time in its human host as a haploid organism, where it gives rise to numerous clinical manifestations from mild to life-threatening illness. The 24 Mb genome of the parasite is distributed among 14 linear chromosomes, and the parasite contains two extra-chromosomal DNA circles that constitute the apicoplast and mitochondrial genomes. The full P. falciparum genome sequence was published in 2002 (Ref. 3) and was followed by the publication of other Plasmodium genome sequences, including that of Plasmodium vivax4. These data have allowed the elucidation of basic genome architecture and the identification of key structural elements, common metabolic and biosynthesis pathways and unique aspects that are shared among several Plasmodium parasites3,4,5,6. P. vivax causes extensive human malaria globally, but genomic data for this species are limited4,7. We therefore focus on P. falciparum while recognizing the need to develop and apply population-genomics approaches to P. vivax.

The P. falciparum genome is evolving in response to natural selection pressures of the human host immune system, the mosquito vector and various environmental factors, including drug treatment and changes in transmission intensity owing to specific interventions8,9,10. The data imply that parasites can 'escape' both natural and artificial selection through evolution and further provide a historical 'roadmap' of its evolutionary path. These changes to P. falciparum population structure can potentially be leveraged to identify and to circumvent survival strategies used by the parasite with an eye towards, for example, new drug and vaccine development for malaria control, elimination and eradication. Indeed, similar approaches have been applied to much simpler viral genomes for tracking influenza outbreaks, for example, and for developing effective influenza vaccines11.

In this Review, we discuss opportunities that are afforded by new technologies for genomic analyses to understand the genetic diversity of P. falciparum further12,13,14 and to identify signatures of selection. We also describe genetic studies to reveal loci that are connected to the important clinical phenotype of drug resistance. Finally, we discuss population structure and its implications for assessing the impact of intervention strategies and for monitoring malaria transmission dynamics to inform best practices for global eradication campaigns.

Technological advancements

To exploit P. falciparum genomics fully for studies of malaria, a new toolkit is being developed. Various genome-wide array-based methods have been developed for whole-genome genotyping13,15,16,17,18,19,20 that can capture hundreds to thousands of markers. Depending on the technology, these methods can be used to characterize genomic variation at different levels, including SNPs, microsatellite variations (MSVs), insertions or deletions (indels) and copy number variants (CNVs). However, array-based methods need to be custom-built for each organism, requiring specific tools for P. falciparum13,15,16,17,18,19,20and P. vivax21 to be constructed. Although some arrays remain valuable — for example, for identifying CNVs — next-generation sequencing methodologies are similar in cost to array-based methods and provide information about a much larger number of SNP markers without the species-specific customization that is required of arrays. Consequently, additional P. falciparum genomes have been sequenced at a greater depth to identify essentially all genetic variation in those genomes, thus allowing delineation of the composition and relative proportions of parasite types within a human infection. In parallel, advances in functional genomics, including transcriptome, proteome and metabolome analysis, provide valuable insight into the basic molecular functions of the parasite. An important toolkit has been developed for such 'omics' studies (Box 1). Interrogating the diverse populations of P. falciparum in humans has also required a shift in focus from animal model systems towards analysing material that has been directly isolated from infected human patients (Box 2).

These practical advancements are enhanced by several key improvements in bioinformatics analysis, many of which were developed for the analysis of other organisms and driven by international collaborative efforts, such as the Human Genome Project22. New computational strategies for the identification of SNPs and other variants, including MSVs, indels and CNVs from sequencing data, are being developed and applied to P. falciparum, providing additional markers, such as CNVs, that are potentially associated with drug resistance. Availability of genome sequences for closely related species, including Plasmodium reichenowi12 (which infects non-human primates)23, will greatly advance our ability to make population genomic inferences by identifying derived alleles for selection analysis.

After it has been obtained, this rich genomic information can yield important information about the biology of the malaria parasite, as described in the following sections. Applications include gaining insights into parasite population structure and interventions or identifying important regions of the genome — either those that show evidence of evolutionary selection or specific loci identified by genome-wide association studies (GWASs) or mutant screens — that are responsible for particular parasite phenotypes.

P. falciparum population structure and LD

Genetic variation in parasites12,13,14 reveals the exposure history of a given parasite or parasite population to selective pressures. Central to our understanding of genetic variation in P. falciparum is determining its current population structure, including how allele frequencies vary between different populations within the species, and the degree to which alleles at neighbouring variant sites are correlated by linkage disequilibrium (LD). Most data that are used to inform our current understanding of population structure and LD are derived from either genome-wide array-based methods16,19,20,24,25 or from SNP information provided by sequencing data12,13,14 from a few dozen published genomes. Genomic structure among isolates has been delineated using principal components analysis (PCA), from which one can infer the relatedness of samples.

Genomic diversity across world regions. Advances in genomics technology have enabled analysis of many different parasites derived from distinct geographic locations. Large-scale population structure in P. falciparum follows continental lines, and there are major branches in Africa, South and Central America and South and East Asia (extending to Papua New Guinea)19,26. The picture within each group depends on the region. Within Africa, population differences between countries have ranged from undetectable (as in Uganda versus Congo27 and Cameroon versus Congo28) or very small (as in Zimbabwe versus Uganda and Congo27) to modest over the longest distances (as in Nigeria versus Sudan versus South Africa29). By contrast, in South East Asia and the western hemisphere, local population structure is pronounced even within a single country14,26. The overall picture is consistent with a recent geographic spread from a source population in Africa, which has remained by far the largest population. Estimates of genetic diversity measurements present the same picture: the highest values consistently occur within Africa, and the lowest values occur in the Americas. From these types of analyses, we can identify mutations that are fixed in one parasite population but that are distinct from other populations and thus may be useful to identify specific parasites. Knowledge of the genetic characteristics of geographically distinct populations is important for controlling for population stratification in GWASs, tracking persistent parasite types as interventions are applied and localizing new sources of infection to maximize the effectiveness of control measures.

Explanations for observed LD structure. Levels of genomic diversity in different populations are reflected in the observed patterns of LD. Very little LD is seen in Africa, and it extends less than 1 kb14,26. LD is slightly higher in South East Asia (mean squared correlation coefficient (r2) = 0.3 for markers that are less than 1 kb apart) and more so in South America (mean r2 = 0.5 for markers that are less than 1 kb apart), where it spans ~10 kb. This difference could stem from demographic history. For example, population bottlenecks in the non-African populations could have eliminated many allele combinations, leaving strong correlations, and thus high LD, in the remaining parasites. It could also stem from a smaller effective population size outside Africa, which can sustain fewer allele combinations, or from lower outcrossing rates that small population sizes entail, both of which contribute to higher levels of LD. Detailed understanding of recombination and of the demographic history of different populations is needed to distinguish the two causes.

LD patterns also reflect the transmission history of strains within that population. The sexual phase of the P. falciparum life cycle occurs in the midgut of female Anopheles mosquitoes, following consumption of a blood meal containing male and female P. falciparum gametocytes. Because female Anopheles mosquitoes typically only bite a single human host during each egg-brooding cycle, the gametocyte pool that is available for sexual union in the midgut matches the gametocyte composition in individual infected human hosts. In geographical regions with high levels of malaria transmission, a high complexity of infection (COI) is thought to be produced through 'super-infection' (that is, multiple bites from distinct, P. falciparum-infected mosquitoes), although new evidence supports a model of co-infection of distinct P. falciparum parasites from single bites30. A high COI makes recombination (outcrossing) possible between genetically distinct P. falciparum gametocytes during the sexual phase (providing that genetic diversity is high enough to ensure genetically distinct parasites) and results in short blocks of LD. By contrast, if only a single P. falciparum strain is present, the gametocytes will be identical, and the lack of recombination will result in longer blocks of LD.

Ultimately, as parasite population sizes get extremely small, it would be anticipated that LD should become extended and should theoretically approach a value of one. Thus, tools for measuring changes in parasite population structure have the potential to inform reductions in malaria transmission, as outcrossing rates are reduced to the point at which self-fertilizing (selfing) among parasites occurs in a given population, such as might be expected during a successful intervention strategy, as discussed later.

Signatures of selection

Of the two host environments of P. falciparum, selection by the human host rather than the mosquito places the most pressure on the parasite. Evidence abounds from recent studies of parasite diversity for two broad classes of strong natural selection in the parasite genome (Fig. 1).

Figure 1: Signatures of selection.
figure 1

a | In a given population (shown by the large grey circle), there are numerous individuals (shown by the square matrix of circles) each containing alleles (shown by the red or yellow circles) across their genomes. Diversity refers to the amount of allelic variation among individuals in a population, whereas divergence refers to the amount of allelic variation between different populations. Under balancing selection, high diversity would be expected at a locus under selection but low divergence would be expected between populations. b | Distribution of loci based on both within-population differences (diversity, as denoted by π) and between-population differences (divergence, as measured by the fixation index (FST), between parasites from Senegal and Thailand) is shown. Loci that are classified as transporters or enzymes, including the acyl-CoA-synthetase (ACS) genes, are shown as blue diamonds, loci that are classified as antigens, including var, rifin, stevor and surfin molecules, are shown as red diamonds, and all other loci are shown as grey diamonds. Genes near the x axis (which have a high diversity and a low divergence) are under diversifying selection, and these include a number of known antigens. By contrast, genes near the y axis (which have a low diversity and a high divergence) are more likely to be under directional selection, and these include known drug resistance genes. c | Selective sweep as a consequence of selection for drug-resistant parasites results in the reduction of diversity (shown by the red line) compared with average diversity values for that genomic region (shown by the blue line). The sweep is caused by selection for an allele (shown by the red box) that confers survival under drug pressure. Neighbouring alleles are maintained along with the advantageous allele, resulting in a fairly large area of the genome with reduced diversity. Identification of genomic regions with reduced diversity in drug-resistant parasites as a consequence of directional selection reveals candidate drug-resistant genes. d | A haplotype bifurcation diagram36 visualizes long-range associations for a given SNP. The thickness of the line represents the relative frequency of each haplotype in the population under study. Although the long-range associations between the ancestral (A) allele have been whittled away by recombination, the derived (T) allele maintains long-range associations with other SNPs, suggesting that it arose recently and that insufficient time has passed for recombination to break down these associations substantially. Part b of the figure is adapted from Ref. 25. Part d of the figure is adapted, with permission, from Ref. 14 © (2007) Macmillan Publishers Ltd. All rights reserved.

Balancing selection. Genotyping and sequence analysis indicate that an unusually large fraction of the P. falciparum genome exhibits the polymorphism profile of immune-mediated balancing selection: a high density of high-frequency polymorphisms is seen in hundreds of antigenic genes9,10. Balancing selection maintains polymorphisms with the potential to encode alternative immunological identities and indefinitely keeps them at an intermediate population frequency (Fig. 1a)

When parasite populations are geographically separated, genes that are subject to balancing selection are unlikely to diverge as rapidly as other genes, because the selection prevents differences from differentially fixing in the populations. Genome-wide comparisons of diversity (within a population, denoted by π) and divergence (between populations, measured by the fixation index (FST)) identify genetic loci that are more likely to be affected by this diversifying selection25 in that they exhibit elevated diversity with low divergence. (Figure 1b shows an example of diversity and divergence analysis between parasites from Senegal and Thailand25.) From these analyses, known antigens and vaccine candidates are identified, as well as novel genetic loci that encode putative antigens that trigger the human immune response. This prediction was validated when several highly polymorphic genes were expressed and recognized by human immune sera, including seven previously unknown antigens13. This result suggests that diverse genomic regions may encode antigenic loci that are useful for vaccine approaches. However, a number of vaccine studies suggest that the ability to target a polymorphic locus successfully, such as merozoite surface protein 1 (MSP1)31 or apical membrane antigen 1 (AMA1)32, may be undermined by the parasite's ability to survive the elicitation of a locus-specific immune response. Thus, strategies for using a combination of non-variant but immunogenic vaccine targets may be warranted.

Diversity and divergence analyses can identify loci that, conversely, diverge between parasite populations (FST >0.4). Divergent loci between populations from Senegal and Thailand25 encode proteins that are proposed to have various cellular functions, including DNA replication (for example, PF10_ 0165, PF14_0278 and PF14_0316), lipid metabolism (for example, PFB0695c, PFE1250w, PFB0685c and PFC0050c), gametocytogenesis or sexual development molecules (for example, PF13_ 0248 and PFC0640w) and transporters (for example, PFL1125w, PF14_0342 and PF14_0455) (Fig. 1b). The reasons for the divergence are currently unclear and require further investigation but may be a consequence of differences in vector populations or other distinct selective pressures between Senegal and Thailand.

Directional selection. Directional selection in the context of a traditional selective sweep33 leaves a distinctive genomic imprint that consists of depleted polymorphism and enhanced LD (Fig. 1c,d). This genomic signature is detectable via 'haplotype-based' tests of natural selection, such as the long-range haplotype test (LRH test). In P. falciparum, there is generally short LD as a consequence of considerable recombination35. In response to strong selective pressures, long haplotype signals that result from the rapid rise of variants that are linked to flanking mutations are easily detected as they stand out from the normally short LD of the genomic background36 (Fig. 1d). It is important to note that these signals may be absent if directional selection began on common or standing variation37. Equally important is the difficulty in identifying a clear demarcation between selective sweeps and neutral processes without having a detailed understanding of demographic history and recombination rate variation — knowledge that is lacking for P. falciparum. Nevertheless, in genome-wide scans for selective sweeps, a number of loci show strong evidence for recent directional selection, and they all point to a single, recent evolutionary pressure: drugs. Loci that are known to confer resistance to formerly effective anti-malarial drugs — including the chloroquine resistance transporter (pfcrt) for chloroquine38 and bifunctional dihydrofolate reductase thymidylate synthase (dhfr-ts) for pyrimethamine39 — show all of the signs: a local desert of diversity and a strong LD between those SNPs found in the swept region (Fig. 1c). Other key modifier genes, including P. falciparum multiple drug resistance gene (pfmdr1)40,41 and the GTP cyclohydrolase gene (gch1)18,42, have been implicated in some drug responses, generally through adaptive changes in their copy number.

A plethora of additional genes show weaker evidence for recent positive selection24,25,43, raising the possibility that they are also associated with drug responses. Their products have various putative functional roles25, including cell surface adhesion, membrane transport, genome maintenance, transcriptional regulation, metabolism and post-translational modification, such as ubiquitylation. Evidence for sweeps at multiple genes in a single pathway suggests that selection has been involved. For example, several genes in the ubiquitylation pathway44 are under positive selection, as seen in worldwide populations and in a thorough population analysis of parasites recently isolated in Senegal. Similarly, proteins in the fatty acid and lipid metabolism pathway have among the highest signals of selection, implicating the human or mosquito physiological state as strong selective forces on parasite survival and propagation. Key to the success of these approaches is the functional characterization of candidate loci and demonstrating their involvement in conferring important clinical phenotypes, such as drug resistance25. Preliminary functional data suggest that members of the cytoadherence-linked asexual gene (clag) family can modulate parasite drug responses45: a finding that is consistent with the observation that some surface molecules are under positive selection24,25.

Demographic confounders. As mentioned above, genomic patterns that are indicative of selection can be difficult to interpret if they can also be caused by demographic history. For example, population bottlenecks and expansion, which have occurred in the P. falciparum lineage46, can also reduce diversity, increase haplotype lengths and alter allele frequency ranges; in these instances, assessing the significance of any selection test result usually requires computational simulations across a range of reasonable demographic scenarios47. Past studies that have estimated the demographic parameters for P. falciparum (that is, the ages and scales of bottlenecks and expansions) vary somewhat in their methodologies and outcomes46,48,49, but truly genome-wide studies24,25 benefit from being able to use most of the genome as the null distribution, using the assumption that most SNPs reflect demography more than selection. Moreover, recent computational methods relax that assumption by jointly modelling demography and selection together50,51.

Monitoring transmission using signatures of selection. Understanding the relationship between the distribution of mutations within and between parasite populations has important implications for the discovery of alleles under either balancing or directional selection, and knowledge of this variation can potentially provide powerful tools for monitoring malaria transmission. For example, identifying divergent alleles that have become fixed in distinct populations could be useful as biomarkers to identify sources of new infection in an endemic population when transmission becomes very low or to distinguish whether new cases are coming from changing epidemiology within that endemic population or from distinct areas of transmission. This knowledge has great use in terms of informing the types and timing of intervention strategies ultimately to eliminate malaria in those places where it is endemic.

Identification of causal loci for specific phenotypes

Parasite phenotypes. A key challenge for identifying causal loci from genomic or functional studies is the classification and quantification of robust and reliable phenotypes. Clinical phenotypes that are related to pathogenesis (such as anaemia or severe disease), immunity or parasite clearance rates (that are associated with drug resistance) are the most informative and most reliable when they are assessed in parasites that are directly taken from the patient during a natural infection. However, the human host unavoidably complicates interpretation of these traits, and their assessment can often only be obtained once. A thorough phenotypic assessment would therefore necessitate large sample numbers and thoughtful study design to account for variation both within the human and within the parasite populations. Thus far, phenotyping has mainly been carried out on culture-adapted52 parasites that have been isolated from patients. These can then be tested for various in vitro phenotypes, including drug response, invasion types, cytoadherence properties, the ability to produce gametocytes and their metabolic profiles.

Below, we elaborate on the use of genomics to identify mutations that are associated with altered drug responses, but these approaches (for example, GWASs) could be extended to other biologically important phenotypes.

Linkage analysis in P. falciparum. Linkage mapping in P. falciparum has been accomplished using laboratory genetic crosses to correlate segregation patterns in the progeny that is associated with specific phenotypes, including drug response53,54,55,56,57,58, pathogenesis59 or mosquito infectivity60. Originally, MSVs were used as genomic markers61, but these have now been augmented with variants determined using whole-genome methods (whether array- or sequencing-based). Linkage studies leverage the reasonably high level of recombination in P. falciparum to map the genetic determinants for specific traits: one round of recombination between parents and progeny results in large haplotype blocks that require fewer markers to identify than for population-based association studies8. In some geographic regions, where low diversity means that recombination rarely results in the reassortment of haplotype blocks, it may even be possible to carry out similar analyses using field isolates62,63. Combining linkage analysis with other independent tests, such as association mapping, provides a potentially powerful means of prioritizing candidate genes that are responsible for a given phenotype.

Challenges of GWASs in P. falciparum. The molecular and genetic mechanisms of many phenotypic traits that are most relevant to elimination and eradication, such as variability in parasite responsiveness to drugs, are poorly understood. Because GWASs do not require prior knowledge of gene functions or trait mechanisms, they are useful for identifying important genetic variants in organisms such as P. falciparum that have many genetic loci with no known functional homologues. Although these candidate variants require functional validation, the use of GWASs as hypothesis-generating experiments provides a powerful starting point for identifying traits and is one of the most effective approaches available in our modern genomic toolkit.

Undertaking GWASs in P. falciparum requires overcoming various challenges, including identifying heritable traits, coping with low LD, using appropriate sample collection or other methods to deal with population stratification and functionally validating associations. These challenges are described below, followed by examples of successful GWASs.

When surveying the P. falciparum genome for genotype–phenotype associations, only phenotypes with a strong genetic basis (those with high heritability) will be detected by a GWAS. Heritability of P. falciparum traits, such as drug resistance, can be variable: recent studies of parasite clearance rates in South East Asia found that the heritability of this phenotype depends on when and where samples were collected64,65. Confounding this complication, anti-malarial responses can be quantified in various ways, including in vitro based metrics, such as IC50, or clinically derived metrics, such as in vivo parasite clearance rates.

The short blocks of LD in P. falciparum, particularly in African populations19,25, are an important consideration for study design. Traditional GWASs rely on the genotyped markers being correlated to causal mutations through high LD66. In a population with low LD, an array-based GWAS may not have sufficient detection power unless the causal mutation is present on the array. However, when a signal is found, short LD makes localizing the signal to a single gene much easier67. Loss of detection power owing to limited LD can therefore be circumvented by using whole-genome sequencing to identify all variants in the genome. Use of sequence data for GWASs reveals stronger association signals, provides more supporting markers in areas of high LD and can detect candidate loci in areas of low LD that were previously missed by array-based GWASs. Sequencing-based approaches are thus a promising avenue for future GWASs in malaria.

Population demography — particularly in the form of population stratification — can hinder GWAS analyses if it is not appropriately controlled for. The presence of closely related individuals in the data set or, conversely, broad genetic differences between groups of samples owing to differing population histories can erroneously inflate associations and can produce false positives68. The ideal GWAS would eliminate such confounders by studying a phenotype that is heritable, measurable and strongly apparent in a subset of parasites (such as drug resistance) while entirely sampling from a single, non-stratified population. However, this is not always possible, and many approaches have been developed to eliminate false positives69,70,71 while analysing stratified data sets. In particular, mixed-model approaches72,73,74,75 have successfully been used to control for population stratification in malaria studies25.

For studies with small sample sizes, which include all malaria-based GWASs to date, gains in study power can be achieved by using multi-marker or haplotype-based association approaches25,66 instead of standard single-marker tests. Positively selected variants typically lie on long haplotypes36 that are more easily detectable by multi-marker tests, such as the LRH test.

GWAS approaches towards identifying drug resistance loci. Although the GWAS approach is promising, we are still in the early days of applying this methodology for loci discovery in the malaria parasite, and the few GWASs that have been carried out have primarily investigated novel variants that are associated with drug resistance. These studies24,25,43 generate long lists of loci that are hypothesized to be associated with specific phenotypes, and a current challenge is winnowing the list to the most likely candidates. One strategy involves combining results of independent tests to identify the most likely candidate genes for a functional follow-up (Fig. 2).

Figure 2: Identifying drug resistance loci using GWASs and functional studies.
figure 2

Aa | Schematic representing data from a drug-treatment parasite viability assay, which allows the determination of the IC50; this represents the drug concentration at which parasite viability is decreased by half. Such assays can be used to classify parasite responses to anti-malarial compounds, thus distinguishing a drug-resistant parasite (shown by the red line) from a drug-sensitive parasite (shown by the black line). Ab | Genome-wide association study (GWAS) analysis results shown as a Manhattan plot in which P values for variants across the 14 chromosomes (represented by different colours across the x axis) are shown. The Bonferroni level for genome-wide significance is shown as a dotted line, and genetic variants that rise above this level are associated with the drug phenotype that was observed in panel Aa. B | Multiple independent analysis approaches can be combined to improve the power of locus identification for functional follow-up studies. For example, alleles identified by GWASs, long-range haplotype (LRH) tests or diversity and divergence analyses are identified, and a gene expression construct is created comprising the genetic variant that is associated with drug resistance. To test whether each gene variant is necessary and sufficient to confer the observed drug-resistant phenotype, each putative drug resistance variant is introduced into a drug-sensitive parasite by transfection followed by testing for drug resistance.

Providing validation for GWAS-based approaches, several genes that are already known to be associated with drug resistance have been identified, such as pfcrt, pfmdr1 and dhfr-ts; additionally, one novel candidate has been identified and functionally validated. The GWAS by Van Tyne et al.25 identified a highly polymorphic locus, PF10_0355, as being associated with halofantrine resistance based on a small set of globally diverse parasites. PF10_0355 was classified as a member of the msp3 gene family76. When a variant of PF10_0355 from a drug-resistant parasite was introduced into a drug-sensitive parasite through transfection, the parasite was rendered resistant not only to halofantrine but also to the chemically similar drugs mefloquine and lumefantrine, but not to the chemically distinct compounds chloroquine, artemisinin and atovaquone. This is the first functional demonstration that a potential drug resistance locus identified by a GWAS confers a drug resistance phenotype. This study used a modest number of parasites that had been sampled from many populations using a limited marker set; current studies now assay larger parasite numbers from single populations using sequencing data to capture essentially all genetic variation.

In addition to profiling sequence variants, array-based methods identified a novel P. falciparum CNV corresponding to the first gene in the folate pathway, GTP cyclohydrolase (gch1)77, which was later validated42 as being associated with antifolate use in P. falciparum in Thailand.

We anticipate increased power to detect mutations that are associated with, or responsible for, key clinical phenotypes using sequencing-based GWASs, which are now being applied to several phenotypes.

Laboratory-based selection of drug resistance. A second application of genomics, which is useful for understanding drug resistance and for promoting drug development, involves the study of artificially selected drug-resistant parasites and the use of sequencing to find the causal mutations conferring resistance. This approach allows the identification of genetic variants that are capable of conferring drug resistance, which in many cases have been found in the drug target. Typically, it involves placing a clonal parasite line under drug selection in vitro to create mutant forms of the parasite that are resistant to the compound; then, whole-genome analysis (either array- or sequencing-based) is performed to identify the causal mutation that is associated with drug resistance. This approach has successfully been implemented for a number of anti-malarial agents16,78,79,80,81,82, demonstrating its use for identifying drug targets.

Selection studies using different drugs have revealed intriguing distinctions in the drug sensitivity profiles between different mutant strains, and there is a potential relevance to therapeutic strategies for malaria control. For example, mutations in pfcrt conferring chloroquine resistance were also found to confer sensitivity to some novel compounds43. As a consequence, these novel compounds were effective against chloroquine-resistant but not chloroquine-sensitive parasites. Furthermore, continued exposure to these novel compounds selected for reversion of the pfcrt mutation, thus resensitizing parasites to chloroquine43. Such observations give hope to the idea of repurposing chloroquine, which is an excellent anti-malarial drug that is only compromised by drug resistance. It could be speculated that giving compounds with these complementary effects in combination might prevent the emergence of drug-resistant parasites.

Many drug selection studies focus on compounds that are not yet in clinical use, so we do not currently expect to see these mutations in wild populations. Nevertheless, these studies suggest that the application of anti-malarial pressure in nature will select for drug-resistant forms of the parasite and that the use of drugs in combination is a key strategy for protecting against the selection of drug-resistant mutants. Although there may be problems with creating evolved strains, and laboratory culture may not mimic what happens in humans, in vitro studies have the advantage of selecting only for a few mutations; this leads to low ambiguity and, providing that the results are replicated in independent clones, there can be high statistical confidence in the involvement of a specific allele. This level of statistical power would be more difficult to achieve with GWASs.

Emerging strategies for assessing interventions

Genetic and genomic information can identify novel mechanisms of drug resistance and can assist with new drug development and target identification. We propose that understanding parasite population structure and LD would allow a novel (and as of yet untested) means of monitoring parasite dynamics related to transmission, assessing interventions such as drugs or vaccines and identifying sources of new infection. Such assessments, which are carried out through easily deployed genomic tools, are key requirements of the current malaria eradication campaign83.

As discussed above, decreases in transmission levels reduce outcrossing and ultimately shift allele frequencies and increase LD. As intervention strategies (such as insecticide spraying, bed net use or vaccination) are deployed, transmission intensity is expected to decline, thus reducing COI. Furthermore, as we approach parasite elimination, we anticipate observation of clonal populations of parasites in a given transmission area. Ultimately, as individual parasite types have their own 'fingerprint' or genomic signature, we should be able to track these parasite types using molecular barcodes as interventions are applied.

Determining and applying COI. Classic means of evaluating transmission intensity are based on indicators such as the entomologic inoculation rate (EIR), seropositivity for malaria antigens or gametocyte carriage rates; however, these measures can be difficult to obtain, are indirect measures of the parasite types within individuals and provide mainly population-level information rather than specific information about the parasites that survive or escape an applied intervention. By contrast, evaluation of patient COI (Fig. 3a) can assess transmission directly in humans. Empirical observations of LD in parasites from regions that exhibit differing levels of COI support this prediction: there is much lower LD in high-transmission African sites than in moderate-to-low-transmission sites in South East Asia or low-transmission sites in the Americas13,14,26.

Figure 3: Plasmodium falciparum complexity of infection.
figure 3

a | Complexity of infection (COI) represents the number of different parasite types in an infected individual. A person infected with one parasite type is represented by a homogeneous colour, whereas a person infected with multiple parasite types is represented with multiple colours. b | When the mean COI is exactly 1, meaning that a single parasite type exclusively determines infection, LD is perfect in the parasite genome, irrespective of chromosomal distance (shown by the brown line). However, when the mean COI is barely above 1 (1.01 would be equivalent to 99% single infections and 1% double infections among infected individuals in a population), expected LD begins to drop precipitously as a function of distance to the extent that mean COI levels above 2 are incapable of producing significant further reduction in the statistic. c | Next-generation sequencing approaches can be used to determine COI. In the example shown, DNA is derived from a patient sample with a COI of 3 and subjected to PCR amplification across a highly polymorphic locus, such as the circumsporozoite (csp) gene. PCR primers can be engineered with unique sequence tags to identify each individual patient sample. The PCR products from multiple samples are pooled and then sequenced to high coverage. Bioinformatic tools are used to filter out systematic errors (to distinguish true haplotypes from PCR or sequencing error haplotypes) and to deconvolute the pooled sequence reads based on the sample-specific sequence tags. The number and type of different haplotypes for each sample are tallied to quantify COI values for each individual across the population. d | Expectations under high transmission intensity would be that individuals within that population would be infected with multiple parasite types (with a high COI). By contrast, under low transmission intensity, COI would be expected to be low, representative of infected individuals harbouring only a single parasite type (shown as a homogenous colour; black represents uninfected individuals).

Figure 3b shows the expected relationship between parasite LD and mean COI in a population. This indicates that LD might be useful for determining the approximate historical COI and therefore the relative rate of transmission in populations that undergo low-to-moderate levels of infection. However, population-level LD is not sensitive to change in locations where infection rates are high; for example, a further increase in transmission in a high-transmission area where LD is already very low will have a negligible effect, as nearly all infections already permit outcrossing. Population-level LD also does not respond rapidly to change: a dramatic increase in transmission in a high-LD region might be detectable within a year or two, but more modest increases and decreases would take substantially longer. Thus, rather than measuring LD, determining the actual distribution of COI among individuals in the population might be a better means of inferring changes in transmission rates over short periods of time.

Historically, COI has been determined by assaying highly polymorphic loci84,85,86 (Fig. 3c). Several groups are now using PCR-directed next-generation sequencing of highly polymorphic loci to estimate COI. If a locus with suitably high levels of polymorphism is chosen, such that the probability that two strains will share the same haplotype within an individual infection is small, tallying the number of distinct haplotypes can sensitively — and, to some extent, quantitatively — be used to measure COI. Broadly deploying this kind of deep sequencing of highly polymorphic loci could provide a practical approach towards monitoring COI as interventions are applied (Fig. 3d).

Molecular barcoding to identify parasite types. Measuring COI is anticipated to be useful for distinguishing populations that undergo moderate and high levels of transmission (Fig. 3); there is less use for detecting populations that undergo very low levels of transmission where infected individuals would typically share a COI of 1 (hence measuring COI would provide no additional information). In these cases, alternative population genomic tools that can identify particular parasite types rather than just the number of distinct parasites have potential use. An example is the molecular barcode87, which is a tool that evaluates 24 independent, highly variable SNPs to identify a parasite type uniquely within a population of parasites (Fig. 4a). Under moderate- or high-transmission conditions, almost every parasite in a population will exhibit a distinct molecular barcode signature. Under extremely low transmission conditions, such as those that might result from a disease elimination campaign, transmission of individual strains will be stochastic, resulting in the random extinction of some lineages and the amplification of others, which can then be observed repeatedly. Parasite clonality, where LD has reached a value of 1, is a direct theoretical consequence of small population size. The molecular barcode tool can follow individual parasite types and can identify micro-epidemiological factors. Under the right conditions, this type of tool could be deployed to distinguish reinfection or recrudescence to see whether specific parasite types are escaping vaccine efficacy and to follow parasite transmission patterns both across and within transmission seasons (Fig. 4b,c). Leveraging parasite genetic diversity to fingerprint parasites and to follow them in a population can give insight into parasite population dynamics that will be a key advancement towards elimination.

Figure 4: Molecular barcoding to identify parasite types.
figure 4

a | The molecular barcode comprises 24 unlinked SNPs that are present in a global parasite population at a high minor allele frequency87. The molecular barcode can distinguish one parasite type (A, B or C) from another. b | This schematic indicates the molecular barcode for the parasite causing the peak in parasitemia (A or C). The molecular barcode tool can be applied in drug or vaccine trials to determine whether the parasite that is present before the intervention (parasite A) is the cause of subsequent infection (recrudescence) or whether a distinct parasite type (parasite C) is the cause of that subsequent infection (reinfection). c | The molecular barcode can be applied to assess changing transmission levels during therapeutic interventions and is particularly useful when the mean complexity of infection (COI) among infected individuals approaches a value of 1. In populations with a high transmission and high COI levels, both multiple parasite genomes in each sample and highly variable combinations of barcodes among individuals in the population would be expected to be seen. Under moderate transmission and low COI levels, although single barcodes would be found in infected individuals (COI = 1), these are likely to differ among individuals in the population. Finally, as the disease nears eradication, not only will individuals harbour parasites with unique barcodes, but these barcodes will be shared among the few infected individuals, representing clonal parasite populations.

Application of genomic tools to evaluate vaccines. Genomic tools for estimating COI and for identifying particular parasite infections by molecular barcode can now be applied to field-based investigations, including vaccine studies. Monovalent vaccines that are designed to enhance immunity to highly polymorphic targets, such as the csp locus, may not completely confer cross-protective immunity to the range of segregating csp variants in the parasite population. Deployment of genomic tools in clinical trials will enable investigators to detect this phenomenon and anticipate the evolution of vaccine resistance in the parasite population85. Genomic approaches that are applied to the ongoing RTS,S (also known as Mosquirix) vaccine Phase III trial88, in which approximately half of all vaccinated individuals are protected from disease, might help us to understand why only that subset was protected. For example, the vaccine may work to reduce the burden of disease by decreasing the number of distinct parasite types in the host. Alternatively, the vaccine might be successfully eliminating all variants that are represented in the vaccine, and individuals who are not protected may harbour parasites with novel variants that are not represented in the original vaccine. Although molecular analysis of RTS,S Phase I and Phase IIb samples did not suggest selection of non-vaccine genotypes89,90,91,92, these studies only evaluated individual polymorphisms rather than haplotypic differences. The deep coverage that is afforded by next-generation sequencing, when applied with appropriate analysis methods, allows us to identify the specific variants that are present in individuals within a large patient population with confidence, enabling us to distinguish between these two cases. This kind of approach will enable researchers not only to monitor the success of interventions but also to evaluate the causes of success and failure.

Population history suggests that natural immunity induces balancing selection but that monovalent vaccines may induce directional selection. Similar to the situation for drug resistance, where parasites escape drug pressure by becoming resistant through changes at a defined locus, it might be anticipated that parasites escaping monovalent vaccine pressure have forms of the target allele that do not elicit specific immune responses through vaccination, resulting in parasite populations that are refractory to the given vaccine. Thus, vaccine pressure may enrich for parasite forms containing intractable versions of the vaccine target, as they are selected to high prevalence in the population. It might be imagined that careful monitoring using genetic approaches — as is currently done for influenza vaccine strategies — could identify parasites that escape this immune response and inform the development of a modified vaccine to include these specific parasite subtypes.

Conclusions and future directions

Genomic analysis is providing rich and unique insight into the biology and life history of P. falciparum. There are several natural selective forces that leave their imprint on the parasite genome: the human immune response is implicated in the diversifying selection that is evident in over 500 genes encoding membrane and membrane-associated proteins, and there is strong support for the directional selection of anti-malarial drugs. Perhaps what is most surprising is the evidence for directional selection of many genes encoding metabolic, cell-signalling and protein-processing and protein-degradation pathways. The underlying selective mechanisms are not known, but this work provides the basis for developing hypotheses that can be tested with functional analysis. For example, the positive selection that has recently been seen for genes in the ubiquitylation pathway leads us to propose that regulation of protein degradation might be an important fitness component of parasite survival. The goal of contemporary disease elimination efforts should be to prevent selective sweeps from occurring in response to novel compounds, even if resistance mutations arise, through careful genomic surveillance and flexible, combinatorial application of drug treatments.

In addition to uncovering new biology, we hypothesize that genetic measurements of P. falciparum population structure in infected human populations can also be used to infer transmission levels. This idea will need to be tested using field-based samples, but it opens the possibility of directly measuring changes in transmission, particularly during elimination and eradication efforts, from patient-derived samples, thereby providing outcome-based data in real time. In the decade since the publication of the P. falciparum genome sequence3, malaria genomics and genome biology have advanced our understanding of malaria biology. However, we are just beginning to leverage population genetic information from this organism to discover novel variants that are responsible for crucial clinical phenotypes and to develop tools for assessment of interventions as we work towards elimination of this important human pathogen.