Targeted realignment of LC-MS profiles by neighbor-wise compound-specific graphical time warping with misalignment detection

doi:10.1093/bioinformatics/btaa037

. 2020 May 1;36(9):2862-2871.

doi: 10.1093/bioinformatics/btaa037.

Targeted realignment of LC-MS profiles by neighbor-wise compound-specific graphical time warping with misalignment detection

Chiung-Ting Wu¹, Yizhi Wang¹, Yinxue Wang¹, Timothy Ebbels², Ibrahim Karaman^{3

4}, Gonçalo Graça², Rui Pinto^{3

4}, David M Herrington⁵, Yue Wang¹, Guoqiang Yu¹

Affiliations

¹ Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA.
² Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, London SW7 2AZ, UK.
³ Department of Epidemiology and Biostatistics, Imperial College London, London W2 1PG, UK.
⁴ UK Dementia Research Institute, Imperial College London, London, UK.
⁵ Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA.

PMID: 31950989
PMCID: PMC7203744
DOI: 10.1093/bioinformatics/btaa037

Targeted realignment of LC-MS profiles by neighbor-wise compound-specific graphical time warping with misalignment detection

Chiung-Ting Wu et al. Bioinformatics. 2020.

. 2020 May 1;36(9):2862-2871.

doi: 10.1093/bioinformatics/btaa037.

Authors

Chiung-Ting Wu¹, Yizhi Wang¹, Yinxue Wang¹, Timothy Ebbels², Ibrahim Karaman^{3

4}, Gonçalo Graça², Rui Pinto^{3

4}, David M Herrington⁵, Yue Wang¹, Guoqiang Yu¹

Affiliations

¹ Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA.
² Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, London SW7 2AZ, UK.
³ Department of Epidemiology and Biostatistics, Imperial College London, London W2 1PG, UK.
⁴ UK Dementia Research Institute, Imperial College London, London, UK.
⁵ Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA.

PMID: 31950989
PMCID: PMC7203744
DOI: 10.1093/bioinformatics/btaa037

Abstract

Motivation: Liquid chromatography-mass spectrometry (LC-MS) is a standard method for proteomics and metabolomics analysis of biological samples. Unfortunately, it suffers from various changes in the retention times (RT) of the same compound in different samples, and these must be subsequently corrected (aligned) during data processing. Classic alignment methods such as in the popular XCMS package often assume a single time-warping function for each sample. Thus, the potentially varying RT drift for compounds with different masses in a sample is neglected in these methods. Moreover, the systematic change in RT drift across run order is often not considered by alignment algorithms. Therefore, these methods cannot effectively correct all misalignments. For a large-scale experiment involving many samples, the existence of misalignment becomes inevitable and concerning.

Results: Here, we describe an integrated reference-free profile alignment method, neighbor-wise compound-specific Graphical Time Warping (ncGTW), that can detect misaligned features and align profiles by leveraging expected RT drift structures and compound-specific warping functions. Specifically, ncGTW uses individualized warping functions for different compounds and assigns constraint edges on warping functions of neighboring samples. Validated with both realistic synthetic data and internal quality control samples, ncGTW applied to two large-scale metabolomics LC-MS datasets identifies many misaligned features and successfully realigns them. These features would otherwise be discarded or uncorrected using existing methods. The ncGTW software tool is developed currently as a plug-in to detect and realign misaligned features present in standard XCMS output.

Availability and implementation: An R package of ncGTW is freely available at Bioconductor and https://github.com/ChiungTingWu/ncGTW. A detailed user's manual and a vignette are provided within the package.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Examples of the observed misalignments due to single warping function assumption. Five samples over two *m/z* bins from each dataset are shown here for demonstration, where the upper and lower rows represent two different *m/z* bins, respectively (see details in Section 2.1). **(a)** An example from the Rotterdam dataset shows that even with similar RT, the drift of each sample could be significantly different in two *m/z* bins. Using only a single warping function, XCMS can only align one bin (the upper one) well but not the other one as shown in the right part. **(b)** A similar example is also observed in MESA dataset

**Fig. 2.**
Illustrative example on detecting misaligned features. After initial alignment, among the total 70 samples, relevant peaks are detected only in some samples (the indices in blue), and some of the feature(s) are obviously misaligned. With a lower resolution grouping by XCMS, these peaks are all grouped into one single feature, as shown between the two blue dashed lines. While with higher resolution grouping, this feature is split into three features 1–3 as separated by the red dashed lines. The sample index sets of these features are shown in red, respectively. The P-values of features 1–3 are all smaller than 0.05, thus pass the first criterion. Because the sample index sets of these three features are also disjoint, they pass the second criterion. Accordingly, ncGTW will detect the misalignment and realign the whole blue feature produced by the lower resolution grouping. (Color version of this figure is available at *Bioinformatics* online.)

**Fig. 3.**
Principle concept of DTW grid and graph. **(a)** DTW alignment aims to find the pairs of corresponding points over which the maximum profile similarity is achieved. Here the paired corresponding points are (1, 1), (2, 2), (3, 2), (4, 3) and (4, 4) that form the warping function. **(b)** DTW grid for solving the warping function shown in blue edges and dots, where the cost of each edge is determined by the intensity distance of each point pair. By solving the ‘shortest path’ problem, the warping function is obtained (black path) whose paired corresponding points are reflected in (a). **(c)** Based on the duality property of planar graph, DTW gird is transformed to DTW graph (red and orange lines and dots), where each red or orange edge crosses one blue edge, and the cost of red or orange edge is the same as the cost of blue edge. Note that orange lines link only the vertices (red dots enclosed by blue lined exterior triangle) to a single source or sink. Then, the shortest path problem becomes a maximum-flow/minimum-cut problem. **(d)** Solving the alignment problem is to find a ‘cut’ which separates the DTW graph into two parts with the minimum cost, with one part including the source and the other including the sink. The cut with the minimum cost corresponds to the warping function (black path). (Color version of this figure is available at *Bioinformatics* online.)

**Fig. 4.**
Construction of the various GTW graphs used in the different steps of ncGTW algorithm (Wang *et al.*, 2016). **(a)** Two small DTW graphs are linked to form a GTW graph via additional ‘connecting’ edges (green lines between the vertices as the same position of two DTW graphs). **(b)** After adding the green edges, we can solve all warping functions at the same time, with the similarity among warping functions as constraints. The cost on green edges is to control the similarity between warping functions. For example, if the cost is very high, no green edge will be cut and all the warping functions would be the same. **(c)** GTW graph formed by two linked DTW graphs of two neighboring samples (x_i and x_(i+1)) with a common reference, extendable to all neighboring samples, where the orange lines link vertices to a single source or sink forming a large maximum flow graph, while green edges link the corresponding vertices of two DTW graphs (only edges linking top three vertices are shown here). **(d)** Part of the graph constructed in Stage 1 of ncGTW without using a common reference, where x_i and x_m are neighboring samples, and x_j and x_n are neighboring samples. **(e)** Part of the graph constructed in Stage 2 based on all pairwise warping functions obtained in Stage 1, where the warping function Φ_(i→j) guides the links between the corresponding vertices in GTW graphs, with x_c being the virtual reference. (Color version of this figure is available at *Bioinformatics* online.)

**Fig. 5.**
Flowchart of ncGTW algorithm. **(a)** With two-stage alignment strategy, all input samples (curves) are aligned simultaneously to a virtual reference. **(b)** Stage 1 of ncGTW with three illustrative samples. First, ncGTW builds a pairwise warping flow map (blue arrows). Then ncGTW incorporates structural information as the constraint and applies to all pairs (pair as red dot and constraint as a red dashed line). Lastly, ncGTW estimates all pairwise warping functions (Φ_(i,j)) jointly with e.g. smoothness constraint on neighboring sample pairs. **(c)** Stage 2 of ncGTW with three illustrative samples. ncGTW aligns every sample to a common virtual reference x_c, where the warping functions {Φ_(i,j)} obtained in Stage 1 provide warping correspondences and final warping functions {Φ_(i,c)} are calculated by solving the maximum flow problem. (Color version of this figure is available at *Bioinformatics* online.)

**Fig. 6.**
An illustrative experimental result on realignment and peak-distortion correction by ncGTW, where a feature from the MESA dataset was initially misaligned by XCMS. The color mapping (green to blue and blue to red) corresponds to the sample index. **(a)** Raw LC-MS data associated with the feature of interest (before alignment). **(b)** The misaligned feature by XCMS that has been correctly detected and reported by the misalignment detection module of ncGTW package. **(c)** Realignment by ncGTW where apices are well aligned but with observable peak shape distortion. **(d)** Peak shape distortion is efficiently corrected by the post-processing module of ncGTW package. (Color version of this figure is available at *Bioinformatics* online.)

**Fig. 7.**
Workflow of ncGTW. As a plug-in to XCMS, ncGTW uses the grouping results provided by XCMS as the inputs (one lower resolution and one higher resolution, as explained in **Fig. 2**). Then, ncGTW detects all misaligned features using the aforementioned criteria and performs realignment on these features. Lastly, ncGTW calculates final warping functions for each sample that can be sent back to XCMS for re-grouping or peak-filling

**Fig. 8.**
Illustrative realignment successfully performed by ncGTW incorporating ‘line’ structure in small-scale real LC-MS dataset. **(a)** LC-MS profiles of total ten samples. **(b)** The same LC-MS profiles but the curves are shifted to separate the curves. The 8 indexed peaks on sample 10 represent there are 8 peak groups. The peaks indicated by arrows in group 2 were misaligned by most peer methods except GTW and ncGTW. **(c)** The arrow-indicated peaks were wrongly aligned to the third peak group (and all peaks were severely distorted by DBA). **(d)** The arrow-indicated peaks were misaligned by CPM. **(e)** The arrow-indicated peaks were well-aligned by GTW while the fourth and fifth peak groups were misaligned. **(f)** All nine peak groups were correctly and accurately aligned by ncGTW in this challenging case

**Fig. 9.**
Application of ncGTW realignment method to Rotterdam and MESA datasets, where among the detected misaligned features, the blue circles represent true positives, and the red crosses represent false positives, respectively. **(a)** The average pairwise correlation coefficients on the Rotterdam dataset. **(b)** The average pairwise correlation coefficients on the MESA dataset. **(c)** The average pairwise total overlapping area on the Rotterdam dataset. **(d)** The average pairwise overlapping area on the MESA dataset. (Color version of this figure is available at *Bioinformatics* online.)

**Fig. 10.**
The comparisons of CV with versus without ncGTW realignment after the peak-filling step of XCMS. The blue circles represent the true positives and the red crosses represent the false positives. **(a)** The CV comparison on Rotterdam dataset. **(b)** The CV comparison on MESA dataset. (Color version of this figure is available at *Bioinformatics* online.)

See this image and copyright information in PMC

Cited by

Alignment of multiple metabolomics LC-MS datasets from disparate diseases to reveal fever-associated metabolites.
Năstase AM, Barrett MP, Cárdenas WB, Cordeiro FB, Zambrano M, Andrade J, Chang J, Regato M, Carrillo E, Botana L, Moreno J, Regnault C, Milne K, Spence PJ, Rowe JA, Rogers S. Năstase AM, et al. PLoS Negl Trop Dis. 2023 Jul 24;17(7):e0011133. doi: 10.1371/journal.pntd.0011133. eCollection 2023 Jul. PLoS Negl Trop Dis. 2023. PMID: 37486920 Free PMC article.
Finding Correspondence between Metabolomic Features in Untargeted Liquid Chromatography-Mass Spectrometry Metabolomics Datasets.
Climaco Pinto R, Karaman I, Lewis MR, Hällqvist J, Kaluarachchi M, Graça G, Chekmeneva E, Durainayagam B, Ghanbari M, Ikram MA, Zetterberg H, Griffin J, Elliott P, Tzoulaki I, Dehghan A, Herrington D, Ebbels T. Climaco Pinto R, et al. Anal Chem. 2022 Apr 12;94(14):5493-5503. doi: 10.1021/acs.analchem.1c03592. Epub 2022 Mar 31. Anal Chem. 2022. PMID: 35360896 Free PMC article.
Alignstein: Optimal transport for improved LC-MS retention time alignment.
Skoraczyński G, Gambin A, Miasojedow B. Skoraczyński G, et al. Gigascience. 2022 Nov 3;11:giac101. doi: 10.1093/gigascience/giac101. Gigascience. 2022. PMID: 36329619 Free PMC article.
New software tools, databases, and resources in metabolomics: updates from 2020.
Misra BB. Misra BB. Metabolomics. 2021 May 11;17(5):49. doi: 10.1007/s11306-021-01796-1. Metabolomics. 2021. PMID: 33977389 Free PMC article. Review.
metabCombiner 2.0: Disparate Multi-Dataset Feature Alignment for LC-MS Metabolomics.
Habra H, Meijer JL, Shen T, Fiehn O, Gaul DA, Fernández FM, Rempfert KR, Metz TO, Peterson KE, Evans CR, Karnovsky A. Habra H, et al. Metabolites. 2024 Feb 15;14(2):125. doi: 10.3390/metabo14020125. Metabolites. 2024. PMID: 38393017 Free PMC article.

See all "Cited by" articles

References

1. Arnold B.C. et al. (1992) A First Course in Order Statistics. Siam, Philadelphia, PA.
1. Benk A.S., Roesli C. (2012) Label-free quantification using MALDI mass spectrometry: considerations and perspectives. Anal. Bioanal. Chem., 404, 1039–1056. - PubMed
1. Bild D.E. et al. (2002) Multi-ethnic study of atherosclerosis: objectives and design. Am. J. Epidemiol., 156, 871–881. - PubMed
1. Christin C. et al. (2008) Optimized time alignment algorithm for LC−MS data: correlation optimized warping using component detection algorithm-selected mass chromatograms. Anal. Chem., 80, 7012–7021. - PubMed
1. Goldberg A.V. et al. (2011) Maximum flows by incremental breadth-first search In: European Symposium on Algorithms. Springer, Berlin, Heidelberg, pp. 457–468.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

[1] Arnold B.C. et al. (1992) A First Course in Order Statistics. Siam, Philadelphia, PA.

[2] Arnold B.C. et al. (1992) A First Course in Order Statistics. Siam, Philadelphia, PA.

[3] Benk A.S., Roesli C. (2012) Label-free quantification using MALDI mass spectrometry: considerations and perspectives. Anal. Bioanal. Chem., 404, 1039–1056. - PubMed

[4] Benk A.S., Roesli C. (2012) Label-free quantification using MALDI mass spectrometry: considerations and perspectives. Anal. Bioanal. Chem., 404, 1039–1056. - PubMed

[5] Bild D.E. et al. (2002) Multi-ethnic study of atherosclerosis: objectives and design. Am. J. Epidemiol., 156, 871–881. - PubMed

[6] Bild D.E. et al. (2002) Multi-ethnic study of atherosclerosis: objectives and design. Am. J. Epidemiol., 156, 871–881. - PubMed

[7] Christin C. et al. (2008) Optimized time alignment algorithm for LC−MS data: correlation optimized warping using component detection algorithm-selected mass chromatograms. Anal. Chem., 80, 7012–7021. - PubMed

[8] Christin C. et al. (2008) Optimized time alignment algorithm for LC−MS data: correlation optimized warping using component detection algorithm-selected mass chromatograms. Anal. Chem., 80, 7012–7021. - PubMed

[9] Goldberg A.V. et al. (2011) Maximum flows by incremental breadth-first search In: European Symposium on Algorithms. Springer, Berlin, Heidelberg, pp. 457–468.

[10] Goldberg A.V. et al. (2011) Maximum flows by incremental breadth-first search In: European Symposium on Algorithms. Springer, Berlin, Heidelberg, pp. 457–468.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Targeted realignment of LC-MS profiles by neighbor-wise compound-specific graphical time warping with misalignment detection

Affiliations

Targeted realignment of LC-MS profiles by neighbor-wise compound-specific graphical time warping with misalignment detection

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources