RNA CoMPASS: a dual approach for pathogen and host transcriptome analysis of RNA-seq datasets

doi:10.1371/journal.pone.0089445

. 2014 Feb 25;9(2):e89445.

doi: 10.1371/journal.pone.0089445. eCollection 2014.

RNA CoMPASS: a dual approach for pathogen and host transcriptome analysis of RNA-seq datasets

Guorong Xu¹, Michael J Strong², Michelle R Lacey³, Carl Baribault³, Erik K Flemington², Christopher M Taylor⁴

Affiliations

¹ Department of Computer Science, University of New Orleans Lakefront, New Orleans, Louisiana, United States of America.
² Department of Pathology, Tulane University, New Orleans, Louisiana, United States of America.
³ Department of Mathematics, Tulane University, New Orleans, Louisiana, United States of America.
⁴ Department of Microbiology, Immunology & Parasitology, Louisiana State University Health Sciences Center, New Orleans, Louisiana, United States of America ; Research Institute for Children, Children's Hospital of New Orleans, New Orleans, Louisiana, United States of America.

PMID: 24586784
PMCID: PMC3934900
DOI: 10.1371/journal.pone.0089445

RNA CoMPASS: a dual approach for pathogen and host transcriptome analysis of RNA-seq datasets

Guorong Xu et al. PLoS One. 2014.

. 2014 Feb 25;9(2):e89445.

doi: 10.1371/journal.pone.0089445. eCollection 2014.

Authors

Guorong Xu¹, Michael J Strong², Michelle R Lacey³, Carl Baribault³, Erik K Flemington², Christopher M Taylor⁴

Affiliations

¹ Department of Computer Science, University of New Orleans Lakefront, New Orleans, Louisiana, United States of America.
² Department of Pathology, Tulane University, New Orleans, Louisiana, United States of America.
³ Department of Mathematics, Tulane University, New Orleans, Louisiana, United States of America.
⁴ Department of Microbiology, Immunology & Parasitology, Louisiana State University Health Sciences Center, New Orleans, Louisiana, United States of America ; Research Institute for Children, Children's Hospital of New Orleans, New Orleans, Louisiana, United States of America.

PMID: 24586784
PMCID: PMC3934900
DOI: 10.1371/journal.pone.0089445

Abstract

High-throughput RNA sequencing (RNA-seq) has become an instrumental assay for the analysis of multiple aspects of an organism's transcriptome. Further, the analysis of a biological specimen's associated microbiome can also be performed using RNA-seq data and this application is gaining interest in the scientific community. There are many existing bioinformatics tools designed for analysis and visualization of transcriptome data. Despite the availability of an array of next generation sequencing (NGS) analysis tools, the analysis of RNA-seq data sets poses a challenge for many biomedical researchers who are not familiar with command-line tools. Here we present RNA CoMPASS, a comprehensive RNA-seq analysis pipeline for the simultaneous analysis of transcriptomes and metatranscriptomes from diverse biological specimens. RNA CoMPASS leverages existing tools and parallel computing technology to facilitate the analysis of even very large datasets. RNA CoMPASS has a web-based graphical user interface with intrinsic queuing to control a distributed computational pipeline. RNA CoMPASS was evaluated by analyzing RNA-seq data sets from 45 B-cell samples. Twenty-two of these samples were derived from lymphoblastoid cell lines (LCLs) generated by the infection of naïve B-cells with the Epstein Barr virus (EBV), while another 23 samples were derived from Burkitt's lymphomas (BL), some of which arose in part through infection with EBV. Appropriately, RNA CoMPASS identified EBV in all LCLs and in a fraction of the BLs. Cluster analysis of the human transcriptome component of the RNA CoMPASS output clearly separated the BLs (which have a germinal center-like phenotype) from the LCLs (which have a blast-like phenotype) with evidence of activated MYC signaling and lower interferon and NF-kB signaling in the BLs. Together, this analysis illustrates the utility of RNA CoMPASS in the simultaneous analysis of transcriptome and metatranscriptome data. RNA CoMPASS is freely available at http://rnacompass.sourceforge.net/.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Schematic of RNA CoMPASS (RNA comprehensive multi-processor analysis system for sequencing) architecture.**
RNA CoMPASS is a graphical user interface (GUI) based parallel computation pipeline for the analysis of both exogenous and human sequences from RNA-seq data. It employs a commercial and several open-source programs to analyze RNA-seq data sets including Novoalign, SAMMate, BLAST, and MEGAN. Each step results in the subtraction of reads in order to further analyze the unmapped reads for pathogen discovery. The mapped reads are analyzed separately. The end result from this pipeline is pathogen discovery and host transcriptome analysis.

**Figure 2. Performance Analysis of RNA CoMPASS.**
RNA CoMPASS was deployed on a local cluster and benchmarking was performed. An Akata RNA-seq data set was split into six files of varying sizes: 1–393.4 MB, 1,397,139 reads, 2–757 MB, 2,685,149 reads, 3–1.44 GB, 5,120,805 reads, 4–2.72 GB, 9,651,466 reads, 5–5.01 GB, 25,465,406 reads, sample 6–8.99 GB, 50,930,812 reads. Overall time was calculated for each file on a single machine (blue column) and on the local 4-node cluster (red column). Speedup time is represented as a green line.

**Figure 3. Detection of EBV in Human B-Cells using RNA CoMPASS.**
Analysis of all 45 single-end RNA-seq data sets (22-Lymphoblastoid cell lines, 23-Burkitt's lymphomas) were analyzed using RNA CoMPASS. (A) The virome branch of the taxonomy trees for two representative LCLs and Burkitt's lymphomas were generated using the metagenome analysis tool, MEGAN 4. (B) EBV reads were quantified in all 45 RNA-seq data sets and are represented as per 5,000,000 total sequence reads.

**Figure 4. Circos plot of two EBV samples shows distinct gene expression.**
An annotated Circos plot depicts the EBV read coverage across the EBV genome of two samples. The graph displays the number of reads mapped to each nucleotide position of the genome and are depicted in log scale. Blue features represent lytic genes, red features represent latency genes, green features represent potential non-coding genes, and black features represent non-gene features (e.g. repeat regions and origins of replication).

**Figure 5. Heat Map representing Human B-Cells analyzed using RNA CoMPASS.**
Human transcript counts from the 45 B-cell samples were imported into the R software environment and analyzed using the edgeR package . Genes with low transcript counts (less than 1 CPM (count per million)) in the majority of samples were filtered out. The Manhattan (L-1) distance matrix for the samples was computed using the remaining transcript counts, and this was taken as input for hierarchical clustering using the Ward algorithm. After assigning each sample to one of two groups identified by hierarchical clustering (Human B-Cell or Burkitt's Lymphoma), the glmFit function was used to fit the mean log(CPM) for each group and likelihood ratio tests were used to identify those genes that were differentially expressed, with adjusted P<0.05 following the Benjamini-Hochberg correction for multiple testing. The fitted log(CPM) values for the subset of genes that were differentially expressed in the LCL samples relative to the Burkitt's lymphoma samples were then clustered using the Euclidean distance and complete linkage algorithm to detect groups of co-expressed genes. The expression heat map displays the top 250 differentially expressed genes.

See this image and copyright information in PMC

Cited by

PhytoPipe: a phytosanitary pipeline for plant pathogen detection and diagnosis using RNA-seq data.
Hu X, Hurtado-Gonzales OP, Adhikari BN, French-Monar RD, Malapi M, Foster JA, McFarland CD. Hu X, et al. BMC Bioinformatics. 2023 Dec 13;24(1):470. doi: 10.1186/s12859-023-05589-2. BMC Bioinformatics. 2023. PMID: 38093207 Free PMC article.
Profiling of Microbial Landscape in Lung of Chronic Obstructive Pulmonary Disease Patients Using RNA Sequencing.
Shin D, Kim J, Lee JH, Kim JI, Oh YM. Shin D, et al. Int J Chron Obstruct Pulmon Dis. 2023 Nov 10;18:2531-2542. doi: 10.2147/COPD.S426260. eCollection 2023. Int J Chron Obstruct Pulmon Dis. 2023. PMID: 38022823 Free PMC article.
RNA-seq data science: From raw data to effective interpretation.
Deshpande D, Chhugani K, Chang Y, Karlsberg A, Loeffler C, Zhang J, Muszyńska A, Munteanu V, Yang H, Rotman J, Tao L, Balliu B, Tseng E, Eskin E, Zhao F, Mohammadi P, P Łabaj P, Mangul S. Deshpande D, et al. Front Genet. 2023 Mar 13;14:997383. doi: 10.3389/fgene.2023.997383. eCollection 2023. Front Genet. 2023. PMID: 36999049 Free PMC article. Review.
Computational Studies of the Intestinal Host-Microbiota Interactome.
Christley S, Cockrell C, An G. Christley S, et al. Computation (Basel). 2015 Mar;3(1):2-28. doi: 10.3390/computation3010002. Epub 2015 Jan 14. Computation (Basel). 2015. PMID: 34765258 Free PMC article.
Assessment of viral RNA in idiopathic pulmonary fibrosis using RNA-seq.
Yin Q, Strong MJ, Zhuang Y, Flemington EK, Kaminski N, de Andrade JA, Lasky JA. Yin Q, et al. BMC Pulm Med. 2020 Apr 3;20(1):81. doi: 10.1186/s12890-020-1114-1. BMC Pulm Med. 2020. PMID: 32245461 Free PMC article.

See all "Cited by" articles

References

1. Feng H, Shuda M, Chang Y, Moore PS (2008) Clonal Integration of a Polyomavirus in Human Merkel Cell Carcinoma. Science 319: 1096–1100. - PMC - PubMed
1. Kostic AD, Ojesina AI, Pedamallu CS, Jung J, Verhaak RGW, et al. (2011) PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nat Biotech 29: 393–396. - PMC - PubMed
1. Castellarin M, Warren R, Freeman JD, Dreolini L, Krzywinski M, et al... (2011) Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Research. - PMC - PubMed
1. Coco JR, EK Flemington, CM Taylor (2011) PARSES: A Pipeline for Analysis of RNA-Seq Exogenous Sequences. Proceedings of the ISCA 3rd International Conference on Bioinformatics and Computational Biology. Holiday Inn Downtown-Superdome, New Orleans, Louisiana, USA 2011: BICoB-2011. pp. 196–200.
1. Weber G, Shendure J, Tanenbaum DM, Church GM, Meyerson M (2002) Identification of foreign gene sequences by transcript filtering against the human genome. Nat Genet 30: 141–142. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Feng H, Shuda M, Chang Y, Moore PS (2008) Clonal Integration of a Polyomavirus in Human Merkel Cell Carcinoma. Science 319: 1096–1100. - PMC - PubMed

[2] Feng H, Shuda M, Chang Y, Moore PS (2008) Clonal Integration of a Polyomavirus in Human Merkel Cell Carcinoma. Science 319: 1096–1100. - PMC - PubMed

[3] Kostic AD, Ojesina AI, Pedamallu CS, Jung J, Verhaak RGW, et al. (2011) PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nat Biotech 29: 393–396. - PMC - PubMed

[4] Kostic AD, Ojesina AI, Pedamallu CS, Jung J, Verhaak RGW, et al. (2011) PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nat Biotech 29: 393–396. - PMC - PubMed

[5] Castellarin M, Warren R, Freeman JD, Dreolini L, Krzywinski M, et al... (2011) Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Research. - PMC - PubMed

[6] Castellarin M, Warren R, Freeman JD, Dreolini L, Krzywinski M, et al... (2011) Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Research. - PMC - PubMed

[7] Coco JR, EK Flemington, CM Taylor (2011) PARSES: A Pipeline for Analysis of RNA-Seq Exogenous Sequences. Proceedings of the ISCA 3rd International Conference on Bioinformatics and Computational Biology. Holiday Inn Downtown-Superdome, New Orleans, Louisiana, USA 2011: BICoB-2011. pp. 196–200.

[8] Coco JR, EK Flemington, CM Taylor (2011) PARSES: A Pipeline for Analysis of RNA-Seq Exogenous Sequences. Proceedings of the ISCA 3rd International Conference on Bioinformatics and Computational Biology. Holiday Inn Downtown-Superdome, New Orleans, Louisiana, USA 2011: BICoB-2011. pp. 196–200.

[9] Weber G, Shendure J, Tanenbaum DM, Church GM, Meyerson M (2002) Identification of foreign gene sequences by transcript filtering against the human genome. Nat Genet 30: 141–142. - PubMed

[10] Weber G, Shendure J, Tanenbaum DM, Church GM, Meyerson M (2002) Identification of foreign gene sequences by transcript filtering against the human genome. Nat Genet 30: 141–142. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

RNA CoMPASS: a dual approach for pathogen and host transcriptome analysis of RNA-seq datasets

Affiliations

RNA CoMPASS: a dual approach for pathogen and host transcriptome analysis of RNA-seq datasets

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials