A multi-bin rarefying method for evaluating alpha diversities in TCR sequencing data

doi:10.1093/bioinformatics/btae431

. 2024 Jul 1;40(7):btae431.

doi: 10.1093/bioinformatics/btae431.

A multi-bin rarefying method for evaluating alpha diversities in TCR sequencing data

Mo Li¹, Xing Hua², Shuai Li³, Michael C Wu², Ni Zhao³

Affiliations

¹ Department of Mathematics, University of Louisiana at Lafayette, Lafayette, LA, 70504, United States.
² Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, United States.
³ Department of Biostatistics, Johns Hopkins University, Baltimore, MD, 21205, United States.

PMID: 38950175
PMCID: PMC11246167
DOI: 10.1093/bioinformatics/btae431

A multi-bin rarefying method for evaluating alpha diversities in TCR sequencing data

Mo Li et al. Bioinformatics. 2024.

. 2024 Jul 1;40(7):btae431.

doi: 10.1093/bioinformatics/btae431.

Authors

Mo Li¹, Xing Hua², Shuai Li³, Michael C Wu², Ni Zhao³

Affiliations

¹ Department of Mathematics, University of Louisiana at Lafayette, Lafayette, LA, 70504, United States.
² Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, United States.
³ Department of Biostatistics, Johns Hopkins University, Baltimore, MD, 21205, United States.

PMID: 38950175
PMCID: PMC11246167
DOI: 10.1093/bioinformatics/btae431

Abstract

Motivation: T cell receptors (TCRs) constitute a major component of our adaptive immune system, governing the recognition and response to internal and external antigens. Studying the TCR diversity via sequencing technology is critical for a deeper understanding of immune dynamics. However, library sizes differ substantially across samples, hindering the accurate estimation/comparisons of alpha diversities. To address this, researchers frequently use an overall rarefying approach in which all samples are sub-sampled to an even depth. Despite its pervasive application, its efficacy has never been rigorously assessed.

Results: In this paper, we develop an innovative "multi-bin" rarefying approach that partitions samples into multiple bins according to their library sizes, conducts rarefying within each bin for alpha diversity calculations, and performs meta-analysis across bins. Extensive simulations using real-world data highlight the inadequacy of the overall rarefying approach in controlling the confounding effect of library size. Our method proves robust in addressing library size confounding, outperforming competing normalization strategies by achieving better-controlled type-I error rates and enhanced statistical power in association tests.

Availability and implementation: The code is available at https://github.com/mli171/MultibinAlpha. The datasets are freely available at https://doi.org/10.21417/B7001Z and https://doi.org/10.21417/AR2019NC.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**Figure 1.**
Type I error rates for Simulation A. Panels (A) and (B): type I error rates to all simulated data ( $| ρ | > 0$ ) and restricting to datasets in which $| ρ | > 0.01, | ρ | > 0.02, \dots, | ρ | > 0.1$ , for the unique sequence counts (A) and Shannon index (B). Panels (C) and (D): QQ plots for the empirical P-values and Expected P-values (both −log10 transformed), for the unique sequence counts (C) and Shannon index (D).

**Figure 2.**
Data generation process for Simulation B.

**Figure 3.**
Type I errors and powers in Simulation B. Panels (A) and (B) are under simulations when the library size is not a confounder between alpha diversity and phenotype. Panels (C) and (D) are under simulations when library size confounds the relationship between alpha diversity and phenotype. Note that when p = 0 we are evaluating the type I error.

**Figure 4.**
Spearman correlation analyses with scatter plots and LOESS curve fits for alpha diversities within six bins. Panels (A) and (B) analyze unique sequence counts and Shannon index, respectively, against library sizes rarefied to 1e5 across all samples. Panels (C) and (D) follow the same analyses but with samples rarefied to the lowest library size in each bin using the “multi-bin” approach. P-values were calculated to test the significance of Spearman correlations between alpha diversities and the library sizes across all samples (black) and within each bin (colored).

**Figure 5.**
Correlations between covariates and alpha diversity calculated from overall-rarefied samples with $L^{*} = 1 e 6$ . (Left column): Plots the unique sequence counts against the of-interested clinical variables. (Middle column): Plots the Shannon index against the of-interested clinical variables. (Right column): Plots the log base 10 transformed library size against the of-interest clinical variables. The curves represent LOWESS fittings for the “Age” variable in the first row. P-values at the bottom in each sub-figure are obtained from Spearman rank-based correlation tests. Correlation tests in the right column assess associations between initial library size and clinical variables.

See this image and copyright information in PMC

References

1. Aboukhalil A, Bulyk ML.. Loess correction for length variation in gene set-based genomic sequence analysis. Bioinformatics 2012;28:1446–54. - PMC - PubMed
1. Azizi E, Carr AJ, Plitas G. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 2018;174:1293–308.e36. - PMC - PubMed
1. Bortone DS, Woodcock MG, Parker JS. et al. Improved t-cell receptor diversity estimates associate with survival and response to anti–pd-1 therapy. Cancer Immunol Res 2021;9:103–12. - PubMed
1. Cameron ES, Schmidt PJ, Tremblay BJ-M. et al. Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities. Sci Rep 2021;11:22302. - PMC - PubMed
1. Chen Z, Zhang G, Li J.. Goodness-of-fit test for meta-analysis. Sci Rep 2015;5:16983. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources

[1] Aboukhalil A, Bulyk ML.. Loess correction for length variation in gene set-based genomic sequence analysis. Bioinformatics 2012;28:1446–54. - PMC - PubMed

[2] Aboukhalil A, Bulyk ML.. Loess correction for length variation in gene set-based genomic sequence analysis. Bioinformatics 2012;28:1446–54. - PMC - PubMed

[3] Azizi E, Carr AJ, Plitas G. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 2018;174:1293–308.e36. - PMC - PubMed

[4] Azizi E, Carr AJ, Plitas G. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 2018;174:1293–308.e36. - PMC - PubMed

[5] Bortone DS, Woodcock MG, Parker JS. et al. Improved t-cell receptor diversity estimates associate with survival and response to anti–pd-1 therapy. Cancer Immunol Res 2021;9:103–12. - PubMed

[6] Bortone DS, Woodcock MG, Parker JS. et al. Improved t-cell receptor diversity estimates associate with survival and response to anti–pd-1 therapy. Cancer Immunol Res 2021;9:103–12. - PubMed

[7] Cameron ES, Schmidt PJ, Tremblay BJ-M. et al. Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities. Sci Rep 2021;11:22302. - PMC - PubMed

[8] Cameron ES, Schmidt PJ, Tremblay BJ-M. et al. Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities. Sci Rep 2021;11:22302. - PMC - PubMed

[9] Chen Z, Zhang G, Li J.. Goodness-of-fit test for meta-analysis. Sci Rep 2015;5:16983. - PMC - PubMed

[10] Chen Z, Zhang G, Li J.. Goodness-of-fit test for meta-analysis. Sci Rep 2015;5:16983. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A multi-bin rarefying method for evaluating alpha diversities in TCR sequencing data

Affiliations

A multi-bin rarefying method for evaluating alpha diversities in TCR sequencing data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources