Significance
We explored the limit of noninvasive prenatal testing by performing genome-wide sequencing of maternal plasma DNA at 195× and 270× haploid genome coverages. Combined with the use of a series of bioinformatics filters, fetal de novo mutations could be detected with a positive predictive value that was two orders of magnitude higher than previously reported. A de novo BRAF mutation was noninvasively detected in a case with cardiofaciocutaneous syndrome. The maternal inheritance of the fetus could be ascertained on a genome-wide level without the use of maternal haplotypes, hence greatly increasing the resolution of such analysis. Finally, we showed that certain genomic locations were overrepresented at the ends of plasma DNA fragments with fetal or maternal selectivity.
Keywords: noninvasive prenatal testing, massively parallel sequencing, DNA fragmentation patterns
Abstract
Plasma DNA obtained from a pregnant woman was sequenced to a depth of 270× haploid genome coverage. Comparing the maternal plasma DNA sequencing data with the parental genomic DNA data and using a series of bioinformatics filters, fetal de novo mutations were detected at a sensitivity of 85% and a positive predictive value of 74%. These results represent a 169-fold improvement in the positive predictive value over previous attempts. Improvements in the interpretation of the sequence information of every base position in the genome allowed us to interrogate the maternal inheritance of the fetus for 618,271 of 656,676 (94.2%) heterozygous SNPs within the maternal genome. The fetal genotype at each of these sites was deduced individually, unlike previously, where the inheritance was determined for a collection of sites within a haplotype. These results represent a 90-fold enhancement in the resolution in determining the fetus’s maternal inheritance. Selected genomic locations were more likely to be found at the ends of plasma DNA molecules. We found that a subset of such preferred ends exhibited selectivity for fetal- or maternal-derived DNA in maternal plasma. The ratio of the number of maternal plasma DNA molecules with fetal preferred ends to those with maternal preferred ends showed a correlation with the fetal DNA fraction. Finally, this second generation approach for noninvasive fetal whole-genome analysis was validated in a pregnancy diagnosed with cardiofaciocutaneous syndrome with maternal plasma DNA sequenced to 195× coverage. The causative de novo BRAF mutation was successfully detected through the maternal plasma DNA analysis.
The discovery of cell-free fetal DNA in maternal plasma has enabled the development of noninvasive prenatal testing (NIPT) (1). Over the last few years, NIPT has been implemented globally for the noninvasive prenatal investigation of fetal chromosomal aneuploidies (2–5). With higher depth of sequencing and improved bioinformatics analyses, NIPT has now been extended to the detection of a variety of subchromosomal aberrations (6, 7). Further expanding the applications of NIPT, we showed in 2010 that it was possible to deduce the fetal genome by deep sequencing of maternal plasma (8). Work by Fan et al. (9) and Kitzman et al. (10) confirmed these results. In these previous efforts, the depths of maternal plasma DNA sequencing ranged from 52.7× to 78× haploid human genome coverages (8–10).
There are a number of limitations in these previous studies. For example, Kitzman et al. (10) explored the possibility of detecting fetal de novo mutations on a genome-wide level from the maternal plasma DNA sequencing data. In one variation of bioinformatics analysis, they found 2.5 × 107 candidate fetal de novo mutation sites in the plasma DNA sequencing data. Only 39 of these were true fetal de novo mutations. Because the studied fetus had a total of 44 de novo mutations, the positive predictive value (PPV) was 0.000156%, and the sensitivity was 88.6%. With additional refinement in bioinformatics analysis, Kitzman et al. (10) improved the PPV to 0.438%, although the sensitivity was reduced to 38.6%. These data, thus, indicate the enormous challenge of detecting fetal de novo mutations on a genome-wide scale using NIPT. In particular, dramatic improvement in the PPV would be needed for such an approach to be clinically practical.
A second area that needs improvement concerns the detection of sequences that the fetus has inherited from its mother. Previous efforts in elucidating the maternal inheritance of the fetus on a genome-wide scale have generally used a haplotype-based strategy, which has been referred to as the relative haplotype dosage (RHDO) approach (8–10). Hence, for a pregnant woman who has two haplotypes in a particular chromosomal region, she would pass one of these onto her fetus. Because her plasma contains a mixture of her own DNA and that from the fetus, there would be a slight overrepresentation of the haplotype shared by both the pregnant mother and her fetus. This haplotype-based approach has imposed two limitations to NIPT. First, it requires the elucidation of the maternal haplotype using a direct haplotyping approach (11–13), via pedigree analysis (8, 14), or via founder haplotype analysis in selected populations (15). Second, this haplotype-based approach has limited the resolution in which the maternal inheritance of the fetus can be determined. In this regard, the mean length of the haplotype blocks that had been used in previous efforts ranged from 300 kb to over 1 Mb (8–10).
Recently, there is a lot of interest in the fragmentation patterns of plasma DNA (16–18). Studies showed that plasma DNA fragmentation sites are located in clusters across the genome that bore relationships with positions of nucleosome arrays and open chromatin domains. Based on these data, researchers have concluded that the fragmentation process of plasma DNA is nonrandom. One interpretation of such observations is that plasma DNA fragments are cleaved at the accessible parts of the genome. We have gone further in this study to explore whether the actual ending sites of plasma DNA were nonrandom down to a single-base level. In other words, are plasma DNA molecules repeatedly cut at a selected set of genome coordinates? Previous studies could not address this question, because the sequencing depths of individual samples were not high enough, and PCR duplicates were discarded before additional data analysis. We hypothesized that some of such “duplicates” were not created by PCR. Instead, they were plasma DNA molecules that had originated from different cells and were cleaved at the same genome coordinates. To investigate this hypothesis, we processed some of the studied plasma samples using a PCR-free library preparation protocol.
In this study, we aimed to develop a second generation approach to noninvasively decipher the fetal genome from maternal plasma using sequencing depth that had hitherto not been achieved in previous studies (Fig. 1). We hypothesized that the use of ultradeep genome-wide plasma DNA sequencing and multiparametric bioinformatics analysis that takes into account the sequence, concentration, and size of the circulating DNA fragments would allow one to significantly improve the detection of fetal de novo mutations, interrogate fetal inheritance at single-base resolution, and study the end characteristics of plasma DNA on a molecule by molecule manner. We further illustrated how such detailed level of fetal genomic information would be of clinical utility.
Results
Clinical Samples and DNA Sequencing.
Pregnant women were recruited from the Department of Obstetrics and Gynecology, Prince of Wales Hospital with informed consent. Hypotheses were first explored and methodologies were developed based on the analysis of a plasma sample from a pregnant woman at 38 wk of gestation. These “second generation” fetal genome methodologies were then applied to the analysis of plasma collected from a second trimester pregnancy affected by cardiofaciocutaneous syndrome. Plasma samples from 26 first trimester pregnant women were analyzed to further investigate the phenomenon of preferred ending sites among plasma DNA molecules.
Key Maternal Plasma DNA Sequencing Metrics.
Maternal plasma DNA of the third trimester case was sequenced using the Illumina TruSeq PCR-free library preparation protocol to a depth of 270× coverage of a haploid human genome. The maternal blood cells, paternal blood cells, and umbilical cord blood cells were sequenced to 40×, 45×, and 50× haploid human genome coverages, respectively, using the same sequencing protocol. We analyzed SNPs where both parents were homozygous but for different alleles and identified 239,816 of such SNPs. For these SNPs, the fetus would be an obligatory heterozygote for the paternal and maternal alleles. Among these SNPs, the paternal alleles were detectable in the maternal plasma of all of 239,816 SNPs. The fetal DNA fraction (F) in maternal plasma was deduced as 31.3% from the aggregated numbers of paternally inherited fetal alleles (p) and the maternal alleles (m) at these SNPs using the formula F = 2p/(p + m).
We next measured the difference in the sizes of fetally and maternally derived DNA in the maternal plasma at individual SNPs. We compared the difference in size distribution of fragments carrying the fetal-specific allele and the allele shared by the fetus and the mother at a particular SNP (Fig. S1). Such size information would be useful for determining the likelihood of any variant allele detected in maternal plasma being a fetal de novo mutation. The median difference between fragments carrying the fetal-specific alleles and the shared alleles was 22 bp (Fig. S2A). At 90% of these SNP loci, the difference was larger than 10 bp (Fig. S2A).
Detection of Fetal de Novo Mutations from Maternal Plasma.
Fifty-six single-nucleotide variants were found to be present in the umbilical cord blood cells and absent in the parents. These 56 single-nucleotide variants were thus deemed de novo mutations possessed by the fetus. Deep genome-wide sequencing is needed for the screening of fetal de novo mutations from maternal plasma. At this level of sequencing, a vast number of sequencing errors would be generated and hinder the sensitive and specific detection of true fetal de novo mutations. To tackle the challenge, we designed a bioinformatics protocol that used a series of steps to filter out the false-positive mutations (Fig. 2). The bioinformatics protocol made use of steps that identified high-quality base sequences in combination with steps that took into account the known biological characteristics of cell-free fetal DNA.
First, we identified genomic locations where the maternal plasma DNA sequencing showed at least one sequence variant, whereas the parental DNA analysis showed that both parents shared the same sequence. Second, we applied a dynamic cutoff algorithm to determine if the variant detected in the maternal plasma was more likely to have originated from the fetus or sequencing errors. In this algorithm, a variant would need to be observed in a threshold number of sequenced reads so as to qualify as a candidate de novo mutation (Table S1). This threshold was determined from a binomial probability function based on the total number of times that a particular nucleotide was actually sequenced and the sequencing error rate, so that the theoretical probability of a putative de novo mutation being a sequencing error would be less than 1 in 3 × 109. Using this dynamic cutoff algorithm, 60,111 candidate de novo mutations were identified, and these candidate de novo mutations included 54 of 56 (i.e., 96%) de novo mutations present in the cord blood (Fig. 2).
Table S1.
No. of sequenced reads covering the nucleotide position | No. of sequenced reads covering the variant required to qualify the variant as a candidate de novo mutation in the fetus |
50–56 | ≥6 |
57–110 | ≥7 |
111–188 | ≥8 |
189–288 | ≥9 |
289–410 | ≥10 |
411–552 | ≥11 |
553–713 | ≥12 |
714–892 | ≥13 |
893–1,000 | ≥14 |
The number of sequenced reads carrying a variant required to qualify a variant as a candidate de novo mutation was dependent on the total number of sequenced reads covering the variant’s nucleotide position.
Then, we realigned the sequenced reads covering the 60,111 candidate de novo mutation sites to the reference human genome using the BOWTIE2 software (19). The initial alignment software, Short Oligonucleotide Alignment Program 2 (SOAP2) (20), and the realignment software, BOWTIE2, were based on two different matching algorithms, namely the Burrows–Wheeler transform algorithm and Smith–Waterman-like algorithm, respectively (19, 20). A variant would be removed from being considered as a candidate de novo mutation if the two alignment algorithms do not map the variant to the same genomic location. This realignment procedure significantly reduced the number of false-positive results caused by alignment errors. After this realignment process, the number of candidate de novo mutations was reduced to 148 and contained 52 (93%) of the de novo mutations detected in the cord blood (Fig. 2). This result is equivalent to a PPV of 35%.
To further reduce the number of false positives, we applied a filter based on the fractional concentration of the variants. Because the fetal DNA fraction in the sample was 31.3%, 95% of the variants would be expected to be present at a level of ≥10% of the sequenced reads covering a particular nucleotide based on the Poisson distribution. Hence, we filtered out the candidate mutations that were present at a fraction of <10%. Eighty-five candidate de novo mutations were present at ≥10% of the sequenced reads covering the nucleotide and included 51 (91%) of the mutations detected in the cord blood (Fig. 2). This result represents a PPV of 60%.
Next, we included a filter based on the size of the plasma DNA molecules carrying the candidate de novo mutations. As illustrated in Fig. S2A, the difference between the mean fragment size for fragments carrying the fetal-specific alleles and the shared alleles was larger than 10 bp for 90% of loci at which the fetus was heterozygous and the mother was homozygous. Thus, we applied a size filter to the candidate de novo mutations, requiring the mean size of the plasma DNA fragments carrying the mutants to be at least 10 bp shorter than that of the fragments carrying the parental alleles. Sixty-five candidates passed this size-filtering criterion that included 48 (85%) of the mutations detected in the cord blood (Fig. 2). This result represents a PPV of 74%.
Paternal Inheritance Analysis.
For paternal inheritance analysis, we focused on SNPs that were heterozygous in the father and homozygous in the mother. In principle, the presence or absence of the paternal-specific allele in the maternal plasma would indicate the inheritance of paternal-specific allele or the shared parental allele by the fetus, respectively. To enhance the accuracy of deducing the paternal inheritance, we used a binomial mixture model, which took into account the fetal DNA concentration, the sequencing error rate, and the number of times that the paternal-specific allele was observed in the maternal plasma (details are in Materials and Methods) (21). For 667,586 SNPs for which the mother was homozygous and the father was heterozygous, the overall accuracy was 99.99% for the deduction of the paternal inheritance.
Genome-Wide Relative Allele Dosage Analysis.
Previous efforts in prenatally elucidating the maternal inheritance of the fetus on a genome-wide level had all been based on the RHDO approach (8–10). Enabled by the high-quality genome-wide base information, we showed that the maternal inheritance of these heterozygous SNPs could be resolved without the use of the haplotype information of the mother. The dosage of the two maternal alleles in maternal plasma would be compared to determine, statistically, whether the two alleles are present at the same or different concentrations. For SNP loci at which one allele was present at significantly higher concentration and at which the two alleles were present at statistically nondifferentiable concentrations, the fetus would be deduced to be homozygous and heterozygous, respectively. We denote this method as genome-wide relative allelic dosage (GRAD) analysis. This method is a genome-wide and sequencing-based version of our previously described relative mutation dosage method (22). There were a total of 656,676 heterozygous SNPs in the mother’s genome. Using GRAD analysis, the maternal inheritance of the fetus was resolved with statistically significant results in 618,271 SNPs (i.e., 94.2%). The allelic counts were insufficient to give statistically significant results in 38,405 SNPs (i.e., 5.8%). The fetus was deduced to be homozygous at 335,988 (54%) SNPs and heterozygous at 282,283 (46%) SNPs. Among 618,271 SNPs with statistically significant results, the deduction of maternal inheritance was correct for 610,084 (98.7%) SNPs compared with the genotypes from the analysis of the cord blood cells.
The resolution of GRAD analysis would be affected by the sequencing depth and the fetal DNA fraction in the maternal plasma sample. We performed computer simulation analyses to determine the number of SNPs required for achieving one classification of maternal inheritance with an overall accuracy of over 95% (Fig. S3). With a sequencing depth of 250×, we would be able to achieve close to single-nucleotide resolution for the deduction of maternal inheritance when the fetal DNA fraction was above 20%. When the fetal DNA fraction was 10% and a sequencing depth was at 300×, maternal inheritance could be deduced in 1 of every 5 SNPs with an accuracy of over 95%, but the resolution of maternal inheritance would drop to 1 classification per 30 SNPs with a sequencing depth of 150×.
Preferred End Sequences in Maternal Plasma DNA.
Facilitated by the high-sequencing depth of a non-PCR–amplified library of the maternal plasma DNA sample, we investigated if certain base positions in the genome would be preferentially represented at the ends of plasma DNA fragments. In this regard, we showed that plasma DNA fragments carrying the most prevalent 0.5% of plasma DNA ends represented 3.5% of all DNA fragments. Across the genome, 25% of the fragments had at least one identical fragment that shared the same ending sites at both ends. If the cleavage or breakage of plasma DNA was completely random, only 1.45% of the DNA molecules would be expected to have at least one counterpart sharing common ends (details are in Materials and Methods). Had the sequencing been performed using a PCR-amplified library, then 14% of the fragments would have been wrongly assumed to be PCR duplicates and would have been filtered off.
We further investigated if there would be differences in the preferred ending sites for maternal and fetal DNA in the same cell-free DNA sample. To show this phenomenon, informative SNP loci where the mother was homozygous (genotype denoted as AA) and the fetus was heterozygous (genotype denoted as AB) were identified. In this illustrative example, the B allele would be fetal-specific, and the A allele would be shared by the mother and the fetus. The fetal-specific and shared reads covering one such informative SNP are shown in Fig. 3. For comparison, the sequencing results of a DNA sample obtained from blood cells and artificially fragmented using sonication are also shown. Clusters of preferred ending positions could be observed among the plasma DNA data. For the plot of the probability of a genomic location being an end of DNA fragments (also referred to as the fragment end probability), three peaks were observed for each of two groups of fragments carrying the fetal-specific allele and the allele shared by the mother. These peaks represent the hotspots for the end positions of fetal- and maternal-derived DNA in maternal plasma, respectively. In contrast, the fragmentation pattern of the sonicated DNA seems to be random without such clusters, and the fragment end probability is similar across the region (Fig. 3).
We further studied the coordinates that had an increased probability of being an ending position for plasma DNA fragments. We focused our search based on fragments covering the informative SNPs, so that the fragments carrying fetal-specific alleles and alleles shared by the mother and the fetus could be evaluated separately. We determined if certain locations within the human genome had a significantly increased probability of being an ending position of plasma DNA fragments using a Poisson probability function (Materials and Methods). A P value of <0.01 had been chosen to indicate statistical significance. Statistically significant ending positions were determined for DNA fragments carrying the shared allele and the fetal-specific allele independently (Fig. 4A).
We identified a total of 8,242 (Set A) and 23,857 (Set B) nucleotide positions with a significantly increased chance of being an end for plasma DNA fragments carrying fetal-specific alleles and shared alleles, respectively, covering 10,233 SNPs; 8,909 of the nucleotide positions were observed to be overrepresented among the plasma DNA molecules with the fetal-specific allele (Set A) as well as molecules with the shared allele (Set B). We called this overlapping set of ends Set C (Fig. 5A). There were 48,707, 53,901, and 65,205 plasma DNA fragments carrying fetal-specific alleles ending on Set A, Set B, and Set C positions, respectively. In other words, multiple fetal-specific DNA molecules showed identical ending positions. A median of 6 (range = 6–41) plasma DNA fragments carrying fetal-specific alleles terminated at a Set A ending position. Based on a sequencing depth of 270×, fetal DNA fraction of 31.3%, and the size distribution of fetal DNA fragments, it was expected that only 0.29 fragment would end at each site if the fragmentation of plasma DNA had been random. Thus, the probability of ending at these sites was 20 times higher than expected. There were 54,541, 376,343, and 182,791 plasma DNA fragments carrying shared alleles ending at Set A, Set B, and Set C positions, respectively. The genomic coordinates of Set A, Set B, and Set C positions are shown in Dataset S1.
Using the same principle, we analyzed the ending positions for maternal-specific plasma DNA fragments. SNPs that were heterozygous in the mother (genotype AB) and homozygous in the fetus (genotype AA) were noted, and plasma DNA molecules that carried any one of the maternal-specific alleles (B allele) were deemed to be maternally derived. Among the maternal-specific plasma DNA molecules, 7,527 (Set X) and 18,829 (Set Y) nucleotide positions showed increased occurrence as an ending position for plasma DNA fragments carrying maternal-specific alleles and shared alleles, respectively, covering 9,489 SNPs (Fig. 4B); 10,534 positions were observed to be overrepresented among the plasma DNA molecules with the maternal-specific allele (Set X) as well as molecules with the shared allele (Set Y). We called this overlapping set of ends Set Z (Fig. 5B). There were 69,136, 82,413, and 121,607 plasma DNA fragments carrying maternal-specific alleles ending at Set X, Set Y, and Set Z positions, respectively. A median of 10 (range = 9–54) plasma DNA fragments carrying maternal-specific alleles terminated at a Set X base position. If the fragmentation of plasma DNA had been random, it was expected that only 0.56 fragment would terminate at each site. Thus, the probability for ending at these sites was 18 times higher than expected. There were 46,554, 245,037, and 181,709 plasma DNA fragments carrying shared alleles ending on Set X, Set Y, and Set Z positions, respectively. The genomic coordinates of the Set X, Set Y, and Set Z positions of this case are shown in Dataset S1.
Using Preferred End Positions to Deduce Fetal DNA Fractions in Maternal Plasma in First Trimester Pregnancies.
After having identified sets of maternal plasma DNA preferred end positions from one third trimester case, we explored if such ends would be detectable in samples of other pregnancies and whether the relative abundance of plasma DNA ending at these sets of nucleotide positions would reflect the fetal DNA fraction. We sequenced the plasma DNA of 26 first trimester (10–13 wk) pregnancies, each involving a male fetus. Sequencing libraries were prepared from these maternal plasma samples using the KAPA Library Preparation Kits (Kapa Biosystems), which included a PCR amplification step to enrich the adaptor-ligated molecules. The median number of mapped reads per sample was 16 million (range = 12–22 million). The proportion of sequenced reads aligning to chromosome Y was used to calculate the actual fetal DNA fraction in each plasma sample. A positive correlation could be observed between the relative abundance [denoted as the fetal/maternal (F/M) ratio] of plasma DNA with recurrent fetal (Set A) and maternal (Set X) ends and the fetal DNA fraction (R = 0.66, P < 0.001, Pearson correlation) (Fig. 6). A median of 248 Set A ends was observed among 26 samples. A median of 286 Set X ends was observed among those samples.
Size Distribution of Plasma DNA Fragments Terminating on the Fetal-Specific End Positions.
Because fetal DNA in maternal plasma is shorter than the maternal-derived DNA in maternal plasma, we compared the size distributions of plasma DNA ending on the Set A and Set X positions. For the deeply sequenced third trimester case, the size distribution for fragments ending at Set A positions was shorter than those ending at Set X positions (Fig. 7 A and B). Then, we analyzed the pooled sequenced reads from 26 first trimester plasma samples used for fetal DNA fraction analysis. A shorter size distribution was observed for fragments ending at fetal-preferred Set A positions compared with those ending at maternal-preferred Set X positions (Fig. 8 A and B). These results were consistent with the fact that fetal DNA was shorter than maternal-derived DNA and further support the hypothesis that the end positions derived from one pregnant woman can potentially be generalized to other pregnant cases.
To investigate the specificity of these positions, we analyzed the size distribution of DNA fragments ending on the Set A or Set X positions when the genomic coordinates of these positions were shifted by one to five nucleotides. To quantify the size difference, cumulative frequencies for fragments ending at the two sets of positions were plotted against size. The difference in the two cumulative frequency curves (denoted as ΔS) would reflect the magnitude of the size difference (Figs. 7B and 8B). For the third trimester sample, the difference in size distribution diminished when the number of shifted nucleotides was increased and no longer observable when the coordinates were shifted by four nucleotides (Fig. 7 C and D). For the pooled sequenced reads from 26 first trimester cases, a reduction in the difference in size distribution with nucleotide shift was similarly observed (Fig. 8 C and D).
Fetal Genome Analysis of a Second Trimester Pregnancy Affected by Cardiofaciocutaneous Syndrome.
To validate the above-mentioned methodologies, we sequenced a maternal plasma sample collected at 18 wk of gestation to 195× coverage. This case presented with increased nuchal translucency in the first trimester. A chorionic villus sample was obtained at 11 wk of gestation, which showed a normal karyotype. At 14 wk of gestation, a cystic hygroma and mild club foot deformity were detected on ultrasound scan, and hence, microarray analysis was performed on the chorionic villus sample to detect gene mutations that were associated with Noonan syndrome and related conditions. The microarray analysis detected a mutation (c770A > G) on the BRAF (B-Raf proto-oncogene, serine/threonine kinase) gene that resulted in cardiofaciocutaneous syndrome. Because the mutation was absent in the mother’s and father’s genomes, it was deemed a de novo mutation. The fetus was aborted, and placental tissues were collected after the procedure.
DNA from the maternal buffy coat, paternal buffy coat, and placental tissues collected after termination was sequenced to 40×, 60×, and 60× human genome coverages, respectively. The fetal DNA fraction in the maternal plasma was 24%. A difference of at least 8 bp was observed between the size of fragments carrying fetal-specific alleles and alleles shared by the fetus and the mother at over 90% of informative SNPs (Fig. S2B). Because the size distribution of plasma DNA fragments carrying the alleles shared by the fetus and the mother is dependent on the fractional fetal DNA concentration in plasma, the size difference required for filtering de novo mutation would need to be determined on a case by case basis. In this case, at least 8-bp size difference between the variant and the parental alleles was used as the criterion for qualifying the variant as a candidate de novo mutation; 75 candidate de novo mutations were identified in the plasma, of which 47 were confirmed to be present in the placenta. Because a total of 58 de novo mutations were detected in the placenta, the plasma analysis had a detection rate of 81% and a PPV of 62% (Fig. S4). The de novo BRAF mutation was among 47 variants detected by the plasma DNA analysis. Using GRAD analysis, the maternal inheritance of the fetus was deduced in 528,008 (68% of 775,456 SNPs where the mother was heterozygous) SNPs with an accuracy of 96.8%. Regarding preferred end sites, plasma DNA fragments carrying the most prevalent 0.5% of plasma DNA ends represented 2.4% of all DNA fragments. Strikingly, 59% of the most prevalent 0.5% plasma DNA end sites overlapped with those of the third trimester case. Based on the analysis of informative SNPs, 10,401 fetal-preferred (Set A) (Fig. S5A) and 6,562 maternal-preferred (Set X) (Fig. S5B) end sites were identified. The preferred end sites for fragments carrying alleles shared between the mother and the fetus were also identified (Set B, Set C, Set Y, and Set Z) (Fig. S5). The genomic coordinates of each set of preferred positions of this case are shown in Dataset S2. In 26 first trimester maternal plasma samples, a positive correlation could be observed between the relative abundance (i.e., the F/M ratio) of the plasma DNA molecules with fetal-preferred (Set A) and maternal-preferred (Set X) ends and the fetal DNA fraction (R = 0.66, P < 0.001, Pearson correlation) (Fig. S6).
Discussion
In this work, we performed ultradeep genome-wide sequencing of the plasma DNA obtained from two pregnant women to 195× and 270× haploid genome coverages. We believe that these are the deepest genome-wide sequencings yet reported for plasma DNA obtained from any one pregnancy. Previous efforts in the genome-wide sequencing of plasma DNA obtained from pregnant women had attempted sequencing depths from 52.7× to 78× haploid genome coverages (8–10). The depth of sequencing achieved in our analyzed case has allowed us to push forward the limit and expand the applications of NIPT.
First, we investigated the possibility of searching for fetal de novo mutations on a genome-wide scale using the maternal plasma DNA sequencing data. It has been reported that there are ∼74 de novo point mutations per person and that the mutation rate increases with paternal age (23). It is now increasingly recognized that de novo mutations play an important role in human genetic diseases and that diseases associated with de novo mutations are not rare collectively (23). Thus, it is clinically relevant to be able to screen for de novo mutations prenatally. Efforts in the noninvasive prenatal detection of fetal de novo mutations are limited to diseases associated with structural abnormalities detectable on ultrasound and where those diseases are known to be associated with hotspot de novo mutations (24). For example, >90% of achondroplasia cases are caused by a de novo mutation in the FGFR3 (fibroblast growth factor receptor 3) gene with no prior familial history of skeletal dysplasia (25). A clinical suspicion for achondroplasia could, therefore, be followed by maternal plasma DNA analysis with the aim to specifically detect fetal FGFR3 mutations (26).
Genome-wide scanning for fetal de novo mutations without a priori information on potentially affected genes or diseases is a huge technological challenge. Previous efforts in this area had reported a PPV of 0.438% at a sensitivity of 38.6% (10). Hence, using the depth of sequencing as a foundation, we used a series of bioinformatics filters taking into account the sequencing error rate, minimizing the realignment errors, and considering the plasma fraction of a putative mutation as well as the size difference between fetal and maternal DNA in maternal plasma. Using such filters, we were able to detect fetal de novo mutations with a PPV of 74% at a sensitivity of 85% for the third trimester case. For the second trimester case, we achieved a PPV of 62% and a sensitivity of 81%. The PPVs, in particular, represent a significant step forward over previous efforts and are at a level that suggests that, with additional improvements, such an approach might eventually have clinical impact. It is noteworthy that the approach successfully detected the de novo BRAF mutation that caused the craniofaciocutaneous syndrome in the second trimester case. For the analysis of cases with earlier gestational ages and lower fetal DNA fractions, higher sequencing depths would be required to achieve similar levels of performance. We should perhaps add a cautionary note that, although we have shown the technological feasibility of detecting fetal de novo mutations from maternal plasma, the interpretation of the clinical implications of each of these mutations is a great challenge that would require a well-supported clinical genetics infrastructure and continual improvements in our understanding of the human genome.
Second, the depth of maternal plasma DNA sequencing and refined bioinformatics strategies used in this study allow one to determine the maternal inheritance of the fetus on a genome-wide level without resorting to maternal haplotype dosage analysis (8). Using GRAD analysis, the maternal inheritance of the fetus could be resolved in 94.2% of 656,676 heterozygous SNPs in the maternal genome of the third trimester pregnancy and 68% of 775,456 heterozygous SNPs in the genome of the second trimester pregnant woman. In our previous study on noninvasive fetal genome analysis, the maternal inheritance was deduced using RHDO analysis (8). The mean size of the haplotype blocks was 409 kb. For other studies involving the noninvasive fetal genome analysis, the mean size of haplotype blocks with maternal inheritance determined ranged from 300 kb to over 1 Mb (9, 10, 27). For our previous study (8), the maternal inheritance of 7,332 haplotype blocks was successfully determined with an accuracy of 99.1%. The present work, thus, represents an ∼90-fold (i.e., 656,676/7,332 and 716,646/7,332 for the third and second trimester cases, respectively) increase in resolution in elucidating the maternal inheritance of the fetus.
Third, the deep sequencing of maternal plasma DNA samples prepared using non–PCR-amplified libraries has allowed us to investigate if there might be genomic locations that would be preferentially represented at the ends of plasma DNA fragments and whether such ends would exhibit differences depending on their tissue of origin (i.e., from the placenta of the fetus or the mother). Our data indicate that such recurrent ends are indeed present in plasma DNA. For example, our data from the third trimester case revealed that plasma DNA fragments carrying the most prevalent 0.5% of plasma DNA ends represented 3.5% of all DNA fragments and that 25% of the fragments had at least one additional fragment sharing identical ends at both ends. Interestingly, the preferred ending positions identified in the third trimester case were also detectable in 26 first trimester cases, even when conventional PCR-amplified DNA libraries were sequenced to a relatively shallow depth. These observations indicate that the ending sites that we have identified are indeed preferential plasma DNA fragmentation sites that are detectable recurrently across different samples at different gestational ages and with more than one library preparation protocol.
Furthermore, we have found that there were subsets of such recurrent plasma DNA ends that were preferentially associated with fetal or maternal DNA (Figs. 3, 4, and 5). Using such fetal- or maternal-preferred ending sites, one could estimate the fetal DNA fraction in a particular sample (Fig. 6 and Fig. S6). This latter correlation is particularly striking, because the fetal- and maternal-preferred ending sites were originally identified in the third and second trimester cases. However, the abundance of such fetal-specific ending sites correlated with the fetal DNA fractions in the series of independent first trimester cases.
For two cases in the study that were subjected to ultradeep sequencing (270× and 195× haploid genome coverages), it was desirable to use PCR-free library preparations, because the probability of encountering PCR duplicates increased with the depth of sequencing. Furthermore, the use of PCR-free libraries was also the most convincing approach to answer the fundamental question as to whether there were preferred ending sites for plasma DNA, because one could positively identify them without the worry that one was seeing an artefactual result (i.e., PCR duplicates).
For using the recurrent ends to deduce the fetal DNA fractions for the first trimester samples, we had used PCR-amplified sequencing libraries. However, the depth of sequencing was very shallow (median of 15 million reads per case; i.e., 0.8× haploid genome coverage). Hence, one would not expect any of the preferred end sites to appear more than once in a particular sample. Consequently, the filtering step that one normally performed to remove PCR duplicates, leaving one read for subsequent bioinformatic analysis, would not be expected to bias the results. Our demonstration of a good correlation between the F/M ratio (based on preferred ends) and the fetal DNA fraction is a testimony to our belief.
The plasma DNA fragments with the fetal-preferred ending sites had a shorter size distribution than those with the maternal-preferred ending sites (Fig. 7). Most importantly, this size difference was observed in not only the third trimester case in which the recurrent ends were identified but also, 26 independent first trimester cases (Fig. 8). It is also interesting to note that, as we shifted the end coordinates by one to five nucleotides on both sides of the fetal- and maternal-preferred end sites, the size difference rapidly disappeared (Fig. 7 C and D). A similar finding was also observed from 26 first trimester cases (Fig. 8 C and D). These data suggest that there is a high level of specificity in the process of plasma DNA fragmentation to the extent where, in some parts of the genome, the cleavage or breakage occurs preferentially at specific base locations.
There are active recent discussions on the nucleosomal origin of plasma DNA. One clue to such an origin came from work on the high-resolution size profiling of maternal plasma DNA (8, 28). Such work has revealed that the size profile of the total DNA in maternal plasma, in which a predominant proportion is from the pregnant mother, contains a dominant population of plasma DNA molecules with a size of 166 bp. Such data have also shown that, when one focuses on the placentally derived fetal DNA in maternal plasma, one would observe a predominant population with a size of 143 bp. The 166-bp molecules have been postulated to represent molecules containing the nucleosome core plus the linker (8). The 143-bp molecules, however, have been postulated to represent molecules containing the nucleosomal core without the linker. Straver et al. (17) created an artificial “nucleosome track” by pooling maternal plasma DNA sequencing data from multiple cases. They found that the frequency of reads with starting sequences within regions 73 bp upstream and downstream of the deduced nucleosome center showed a positive correlation with the fetal DNA fraction. Outside of the context of pregnancy, Snyder et al. (16) also explored the nucleosomal footprint of plasma DNA. They developed a metric that they called the windowed protection score, which has been defined as the number of molecules spanning a particular genomic window minus those molecules with an end point within the window. Conclusions drawn from those studies are that plasma DNA cleavages or breakages cluster around locations in the genome in relation to positions of nucleosome arrays and open chromatin domains. However, as shown in Figs. 7 and 8, the preferred plasma DNA ending sites identified by this study are exquisitely sensitive to the genomic location. We have, therefore, shown that there is another layer of complexity associated with the nonrandom nature of plasma DNA fragmentation that is beyond factors just related to genome architecture or structural domains. Additional studies are needed to understand the relationship between the various factors and mechanisms resulting in the highly orchestrated and location-specific patterns of plasma DNA fragmentation. We also believe that a catalog of such tissue-specific end sites would provide an alternative approach for elucidating the origin of plasma DNA in addition to methods based on methylation analysis (29) and nucleosome foot printing (16).
In summary, we have developed a second generation approach that produces noninvasive fetal genomes at high resolution using maternal plasma DNA sequencing. We believe that this work has significantly pushed forward the limit of NIPT by showing the feasibility of detecting fetal de novo mutations on a genome-wide level from maternal plasma. We have also substantially increased the resolution in which one could determine the maternal inheritance of the fetus using a nonhaplotype-based approach. Currently, the potential clinical implementation of this approach is limited by the costs involved in sequencing maternal plasma DNA to the depth reported here (195× and above; e.g., over US $40,000 in our center). However, with the continual reduction in the costs of massively parallel sequencing, it is envisioned that this cost barrier would eventually be removed. Finally, our ΔS demonstration of preferred end sites in plasma DNA that exhibit specificity of their origin (e.g., the placenta of the fetus) has opened up numerous avenues of investigation. In the context of NIPT, it would allow a relatively simple approach for estimating the fetal DNA fraction. Additional work would need to determine the relative merit between this approach and others previously reported [e.g., using plasma DNA size (28) and nucleosomal profiles (17)]. Actually, these methods could potentially be used in combination for NIPT. It would also be of great interest to explore the presence of such recurrent end positions in clinical and pathological scenarios outside of the pregnancy context (e.g., cancer).
Materials and Methods
Clinical Samples.
The study was approved by the Joint Chinese University of Hong Kong and Hospital Authority New Territories East Cluster Clinical Research Ethics Committee. For each of the two cases whose plasma DNA samples were sequenced to high depths, 20 mL maternal venous peripheral blood was collected; 10 mL paternal venous peripheral blood was also collected. For the third trimester pregnant case, 3 mL cord blood was collected after delivery. For the case with cardiofaciocutaneous syndrome, the placenta was collected after therapeutic abortion. Twenty-six first trimester pregnant women were recruited for fetal DNA fraction analysis.
Sample Processing.
Blood samples were centrifuged at 1,600 × g for 10 min at 4 °C. The plasma portion was harvested and recentrifuged at 16,000 × g for 10 min at 4 °C to remove the blood cells. The blood cell portion was recentrifuged at 2,500 × g, and any residual plasma was removed. DNA from the blood cells and that from maternal plasma were extracted with the blood and body fluid protocol of the QIAamp DNA Blood Mini Kit and the QIAamp DSP DNA Blood Mini Kit (Qiagen), respectively. DNA from the placenta was extracted with the QIAamp DNA Mini Kit (Qiagen) according to the manufacturer’s tissue protocol.
DNA Library Construction.
For the two cases subjected to deep sequencing, DNA libraries for the genomic DNA samples and the maternal plasma sample were constructed with the TruSeq DNA PCR-Free Library Preparation Kit (Illumina) according to the manufacturer’s protocol, except that one-fifth of the indexed adapter was used for plasma DNA library construction. For the construction of plasma DNA libraries, DNA extracted from 8 mL plasma was used. For the genomic DNA samples, including the mother’s buffy coat DNA, the father’s buffy coat DNA, and the cord blood buffy coat DNA, 1 μg sonicated DNA was used for library construction. The library concentrations ranged from 34 to 58 nM in a 20-μL library. For the first trimester, plasma DNA sequencing libraries were constructed using a KAPA Library Preparation Kit (Kapa Biosystems).
Sequencing of DNA Libraries.
The libraries for maternal plasma DNA and genomic DNA were sequenced using the HiSeq2500 and the HiSeq1500 Sequencing Systems (Illumina), respectively, using a paired end sequencing protocol. Seventy-five nucleotides were sequenced from each end.
Alignment of Sequencing Data.
The paired end sequencing data were analyzed by means of the SOAP2 in the paired end mode (20). The paired end reads were aligned to the nonrepeat-masked reference human genome (hg19). Up to two nucleotide mismatches were allowed for the alignment of each end. The genomic coordinates of these potential alignments for the two ends were then analyzed to determine whether any combination would allow the two ends to be aligned to the same chromosome with the correct orientation spanning an insert size ≤600 bp and mapping to a single location in the reference human genome. On average, 86% of the reads were aligned to the reference human genome (hg19). For the detection of fetal de novo mutations from maternal plasma, an additional step of realignment using the BOWTIE2 software was performed (19). All sequenced reads covering the candidate mutation sites and exhibiting the variant alleles were realigned using BOWTIE2.
Binomial Mixture Model Analysis for Paternal Inheritance.
To deduce the paternal inheritance of the fetus using maternal plasma DNA, we focused on SNPs at which the mother was homozygous (genotype AA) and the father was heterozygous (genotype AB). At these SNPs, the genotype of the fetus would be either AA or AB. Hence, we used a two-component binomial mixture model to fit the observed allelic counts at each SNP locus (21). For each SNP locus i, the counts for the A and B alleles are denoted as ai and bi, respectively, and Ni is the total number of reads covering the locus. Binomial mixture model was then used to determine if the fetus would be homozygous (AA) or heterozygous (AB) for each locus i based on the observed allelic counts ai and bi, the sequencing error rate, and the fetal DNA fraction (21).
GRAD Analysis for Maternal Inheritance.
For SNPs in which the mother was heterozygous, we tested whether the two alleles were present in maternal plasma at the same or different concentrations using the sequential probability ratio test (SPRT). An odds ratio of 20 was used for the calculation of the threshold for accepting the null or the alternative hypothesis. The null hypothesis for each SPRT analysis was the absence of dosage imbalance between the read counts for the two maternal alleles. The alternative hypothesis was the overrepresentation of one allele. The calculation of the upper and lower boundaries of the SPRT curves was as previously described (8).
Dynamic Cutoff Analysis for de Novo Mutation.
Dynamic cutoff filtering criteria were developed to distinguish de novo mutations of the fetus from sequencing errors. The sequencing error rate was assumed to be 0.3% (30). The probability that the same variant would be observed in a number of sequenced reads because of sequencing errors (Perr) would be calculated as
where represents a mathematical combination function , represents the sequencing error rate, Ni represents the total number of sequenced reads covering the locus i, and bi represents the number of sequenced reads carrying the variant. The dynamic cutoff values were determined based on a theoretical false-positive rate of <1 in 3 × 109. In other words, the theoretical number of false-positive sites caused by sequencing errors would be less than one for a haploid genome.
Plasma DNA End Position Analysis.
To screen for genomic coordinates that were preferred ending positions for maternally and fetally derived DNA fragments, fetal-specific informative SNP sites (i.e., at which the mother was homozygous and the fetus was heterozygous) and maternal-specific informative SNP sites (i.e., at which the fetus was homozygous and the mother was heterozygous) were identified. For each informative SNP site, all of the sequencing reads covering the informative SNP sites were analyzed.
For the analysis of SNPs where the mother was homozygous (genotype AA) and the fetus was heterozygous (genotype AB), the A allele would be the “shared allele,” and the B allele would be the fetal-specific allele. The number of sequenced reads carrying the shared allele and the fetal-specific allele would be counted. In the size distribution of plasma DNA, a peak would be observed at 166 bp for both the fetally and maternally derived DNA. If the fragmentation of the plasma DNA had been random, the two ends would be evenly distributed across a region 166 bp upstream and 166 bp downstream of the informative SNP. A P value would be calculated to determine if a particular position has a significantly increased probability for being an end for the reads carrying the shared allele or the fetal-specific allele based on a Poisson probability function:
where Poisson() is the Poisson probability function, Nactual is the actual number of reads ending at the particular nucleotide, and Npredict is the total number of reads divided by 166.
A P value of <0.01 was used as a cutoff to define preferred ending positions for the reads carrying the fetal-specific allele or the shared allele.
For the SNPs where the mother was heterozygous (AB) and the fetus was homozygous, the reads carrying the maternal-specific allele (B allele) and the shared alleles were analyzed.
Expected Proportion of Fragments Sharing Two Identical Ends.
The probability of having n fragments sharing the same ending position on one side, P(n), was calculated using a Poisson probability function assuming that the average size of plasma DNA fragments was 166 bp and that the average sequencing depth was 270×:
The probability of two fragments having the same size (Psize) can be calculated from the size distribution of the plasma DNA fragments:
where %(size) is the proportion of fragments having the particular size.
Therefore, the overall probability of finding two fragments with identical ends on both sides (Pboth) can be calculated as
where represents a mathematical combination function .
Supplementary Material
Acknowledgments
This work was supported by the Hong Kong Research Grants Council Theme-Based Research Scheme T12-403/15. Y.M.D.L. was supported by an endowed chair from the Li Ka Shing Foundation.
Footnotes
Conflict of interest statement: R.W.K.C. and Y.M.D.L. received research support from Sequenom, Inc. R.W.K.C. and Y.M.D.L. were consultants to Sequenom, Inc. K.C.A.C., R.W.K.C., and Y.M.D.L. hold equities in Sequenom, Inc. K.C.A.C., R.W.K.C., and Y.M.D.L. are founders of Xcelom and Cirina. K.C.A.C., P.J., and R.W.K.C. are consultants to Xcelom. P.J. is a consultant to Cirina. K.C.A.C., P.J., R.W.K.C., and Y.M.D.L. have filed patent applications (PCT/CN2016/073753 and PCT/CN2016/091531) based on the data generated from this work, which have been licensed to Cirina.
Data deposition: The sequence data for the subjects studied in this work who had consented to data archiving have been deposited in the European Genome-Phenome Archive (EGA), https://www.ebi.ac.uk/ega/, hosted by the European Bioinformatics Institute (EBI; accession no. EGAS00001001882).
See Commentary on page 14173.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1615800113/-/DCSupplemental.
References
- 1.Lo YM, et al. Presence of fetal DNA in maternal plasma and serum. Lancet. 1997;350(9076):485–487. doi: 10.1016/S0140-6736(97)02174-0. [DOI] [PubMed] [Google Scholar]
- 2.Chiu RW, et al. Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of DNA in maternal plasma. Proc Natl Acad Sci USA. 2008;105(51):20458–20463. doi: 10.1073/pnas.0810641105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Chiu RW, et al. Non-invasive prenatal assessment of trisomy 21 by multiplexed maternal plasma DNA sequencing: Large scale validity study. BMJ. 2011;342:c7401. doi: 10.1136/bmj.c7401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bianchi DW, et al. CARE Study Group DNA sequencing versus standard prenatal aneuploidy screening. N Engl J Med. 2014;370(9):799–808. doi: 10.1056/NEJMoa1311037. [DOI] [PubMed] [Google Scholar]
- 5.Norton ME, et al. Cell-free DNA analysis for noninvasive examination of trisomy. N Engl J Med. 2015;372(17):1589–1597. doi: 10.1056/NEJMoa1407349. [DOI] [PubMed] [Google Scholar]
- 6.Srinivasan A, Bianchi DW, Huang H, Sehnert AJ, Rava RP. Noninvasive detection of fetal subchromosome abnormalities via deep sequencing of maternal plasma. Am J Hum Genet. 2013;92(2):167–176. doi: 10.1016/j.ajhg.2012.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Yu SC, et al. Noninvasive prenatal molecular karyotyping from maternal plasma. PLoS One. 2013;8(4):e60968. doi: 10.1371/journal.pone.0060968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lo YM, et al. Maternal plasma DNA sequencing reveals the genome-wide genetic and mutational profile of the fetus. Sci Transl Med. 2010;2(61):61ra91. doi: 10.1126/scitranslmed.3001720. [DOI] [PubMed] [Google Scholar]
- 9.Fan HC, et al. Non-invasive prenatal measurement of the fetal genome. Nature. 2012;487(7407):320–324. doi: 10.1038/nature11251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kitzman JO, et al. Noninvasive whole-genome sequencing of a human fetus. Sci Transl Med. 2012;4(137):137ra76. doi: 10.1126/scitranslmed.3004323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Fan HC, Wang J, Potanina A, Quake SR. Whole-genome molecular haplotyping of single cells. Nat Biotechnol. 2011;29(1):51–57. doi: 10.1038/nbt.1739. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kitzman JO, et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat Biotechnol. 2011;29(1):59–63. doi: 10.1038/nbt.1740. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lam KW, et al. Noninvasive prenatal diagnosis of monogenic diseases by targeted massively parallel sequencing of maternal plasma: Application to β-thalassemia. Clin Chem. 2012;58(10):1467–1475. doi: 10.1373/clinchem.2012.189589. [DOI] [PubMed] [Google Scholar]
- 14.New MI, et al. Noninvasive prenatal diagnosis of congenital adrenal hyperplasia using cell-free fetal DNA in maternal plasma. J Clin Endocrinol Metab. 2014;99(6):E1022–E1030. doi: 10.1210/jc.2014-1118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Zeevi DA, et al. Proof-of-principle rapid noninvasive prenatal diagnosis of autosomal recessive founder mutations. J Clin Invest. 2015;125(10):3757–3765. doi: 10.1172/JCI79322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Snyder MW, Kircher M, Hill AJ, Daza RM, Shendure J. Cell-free DNA Comprises an in vivo nucleosome footprint that informs its tissues-of-origin. Cell. 2016;164(1-2):57–68. doi: 10.1016/j.cell.2015.11.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Straver R, Oudejans CB, Sistermans EA, Reinders MJ. Calculating the fetal fraction for noninvasive prenatal testing based on genome-wide nucleosome profiles. Prenat Diagn. 2016;36(7):614–621. doi: 10.1002/pd.4816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Chandrananda D, Thorne NP, Bahlo M. High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA. BMC Med Genomics. 2015;8:29. doi: 10.1186/s12920-015-0107-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Li R, et al. SOAP2: An improved ultrafast tool for short read alignment. Bioinformatics. 2009;25(15):1966–1967. doi: 10.1093/bioinformatics/btp336. [DOI] [PubMed] [Google Scholar]
- 21.Jiang P, et al. FetalQuant: Deducing fractional fetal DNA concentration from massively parallel sequencing of DNA in maternal plasma. Bioinformatics. 2012;28(22):2883–2890. doi: 10.1093/bioinformatics/bts549. [DOI] [PubMed] [Google Scholar]
- 22.Lun FM, et al. Noninvasive prenatal diagnosis of monogenic diseases by digital size selection and relative mutation dosage on DNA in maternal plasma. Proc Natl Acad Sci USA. 2008;105(50):19920–19925. doi: 10.1073/pnas.0810373105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Veltman JA, Brunner HG. De novo mutations in human genetic disease. Nat Rev Genet. 2012;13(8):565–575. doi: 10.1038/nrg3241. [DOI] [PubMed] [Google Scholar]
- 24.Chitty LS, et al. Non-invasive prenatal diagnosis of achondroplasia and thanatophoric dysplasia: Next-generation sequencing allows for a safer, more accurate, and comprehensive approach. Prenat Diagn. 2015;35(7):656–662. doi: 10.1002/pd.4583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Vajo Z, Francomano CA, Wilkin DJ. The molecular and genetic basis of fibroblast growth factor receptor 3 disorders: The achondroplasia family of skeletal dysplasias, Muenke craniosynostosis, and Crouzon syndrome with acanthosis nigricans. Endocr Rev. 2000;21(1):23–39. doi: 10.1210/edrv.21.1.0387. [DOI] [PubMed] [Google Scholar]
- 26.Saito H, Sekizawa A, Morimoto T, Suzuki M, Yanaihara T. Prenatal DNA diagnosis of a single-gene disorder from maternal plasma. Lancet. 2000;356(9236):1170. doi: 10.1016/S0140-6736(00)02767-7. [DOI] [PubMed] [Google Scholar]
- 27.Chen S, et al. Haplotype-assisted accurate non-invasive fetal whole genome recovery through maternal plasma sequencing. Genome Med. 2013;5(2):18. doi: 10.1186/gm422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Yu SC, et al. Size-based molecular diagnostics using plasma DNA for noninvasive prenatal testing. Proc Natl Acad Sci USA. 2014;111(23):8583–8588. doi: 10.1073/pnas.1406103111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Sun K, et al. Plasma DNA tissue mapping by genome-wide methylation sequencing for noninvasive prenatal, cancer, and transplantation assessments. Proc Natl Acad Sci USA. 2015;112(40):E5503–E5512. doi: 10.1073/pnas.1508736112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Ross MG, et al. Characterizing and measuring bias in sequence data. Genome Biol. 2013;14(5):R51. doi: 10.1186/gb-2013-14-5-r51. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.