How to Handle and Merge Feature Counts for Mixed Single-End and Paired-End RNA-Seq Data? Post:
1
0
Entering edit mode
7 weeks ago
roxana ▴ 10

Hello everyone,

I’m working on RNA-Seq data analysis and encountered an issue when using the featureCounts tool from the Subread package. I have two types of RNA-Seq data:

The first three samples are single-end reads (control samples). The next three samples are paired-end reads (disease samples). I attempted to use the following featureCounts command:

featureCounts -a Homo_sapiens.GRCh38.112.gtf -F GTF -p -C -T 8 -G Homo_sapiens.GRCh38.dna.primary_assembly.fa -o Sixsamplecontrolearly-PCOS.txt \
/home/tj/SRR868862/SRR868862Aligned_sorted.bam \
/home/tj/SRR868881/SRR868881Aligned_sorted.bam \
/home/tj/SRR868884/SRR868884Aligned_sorted.bam \
/home/tj/SRR868859/Aligned_sorted.bam \
/home/tj/SRR868863/SRR868863Aligned_sorted.bam \
/home/tj/SRR868873/SRR868873Aligned_sorted.bam

However, I ran into a problem because the -p flag, which enables paired-end counting, causes an issue with the single-end reads in the first three samples. The -p flag forces featureCounts to treat all the BAM files as paired-end data, which is incorrect for the single-end samples.

Solution Approach: After some research and advice, I realized that the correct approach is to handle single-end and paired-end samples separately:

Step 1: Run featureCounts Separately for Single-End and Paired-End Samples For Single-End Samples (Control Samples): Run featureCounts without the -p flag:

featureCounts -a Homo_sapiens.GRCh38.112.gtf -F GTF -T 8 -G Homo_sapiens.GRCh38.dna.primary_assembly.fa -o ControlSamplesFeatureCounts.txt \
/home/tj/SRR868862/SRR868862Aligned_sorted.bam \
/home/tj/SRR868881/SRR868881Aligned_sorted.bam \
/home/tj/SRR868884/SRR868884Aligned_sorted.bam

For Paired-End Samples (Disease Samples): Run featureCounts with the -p flag for the paired-end data:

featureCounts -a Homo_sapiens.GRCh38.112.gtf -F GTF -p -C -T 8 -G Homo_sapiens.GRCh38.dna.primary_assembly.fa -o DiseaseSamplesFeatureCounts.txt \
/home/tj/SRR868859/Aligned_sorted.bam \
/home/tj/SRR868863/SRR868863Aligned_sorted.bam \
/home/tj/SRR868873/SRR868873Aligned_sorted.bam

Step 2: Merge the Resulting Count Files After running featureCounts separately, I needed to merge the two resulting count files into a single count matrix for downstream analysis. Here are a few ways to accomplish this:

Using Command-Line Tools (paste or join):

Using paste: Merges files side by side:

paste ControlSamplesFeatureCounts.txt DiseaseSamplesFeatureCounts.txt > MergedFeatureCounts.txt
Using join: Merges based on a common field (gene ID):


sort -k1,1 ControlSamplesFeatureCounts.txt > sorted_ControlSamplesFeatureCounts.txt
sort -k1,1 DiseaseSamplesFeatureCounts.txt > sorted_DiseaseSamplesFeatureCounts.txt
join -1 1 -2 1 sorted_ControlSamplesFeatureCounts.txt sorted_DiseaseSamplesFeatureCounts.txt > MergedFeatureCounts.txt

Using R:

# Load count data from both files
control_counts <- read.table("ControlSamplesFeatureCounts.txt", header = TRUE, row.names = 1)
disease_counts <- read.table("DiseaseSamplesFeatureCounts.txt", header = TRUE, row.names = 1)

# Merge the data frames by row names (gene IDs)
merged_counts <- merge(control_counts, disease_counts, by = "row.names", all = TRUE)

# Set row names back to gene IDs after merging
rownames(merged_counts) <- merged_counts$Row.names
merged_counts <- merged_counts[, -1]  # Remove redundant row names column

# Write the merged data to a new file
write.table(merged_counts, file = "MergedFeatureCounts.txt", sep = "\t", quote = FALSE)

My Question: Are there any better or more efficient methods to handle and merge count files from mixed single-end and paired-end RNA-Seq samples? Are there any potential pitfalls with this approach that I should be aware of?

I appreciate any insights or suggestions from the community!

Thank you!

RNA-Seq • 355 views
ADD COMMENT
0
Entering edit mode

Interstingly, the SRA records for both your control and disease samples claim to be single ended.

ADD REPLY
1
Entering edit mode
7 weeks ago

You can't really hand mixed single and paired-end reads together in the same experiment, parituclarly if your control samples are single ended, and your treatment samples are pair-ended. The best way to proceed is to discard the read2s from the treatment samples, and only use the read1s.

ADD COMMENT
0
Entering edit mode

Also, if read lengths are different, trim the longer so it matches the other samples.

Are your u sure these are appropriate controls? Library prepped at the same time as the treatments?

ADD REPLY

Login before adding your answer.

Traffic: 1492 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6