I am running a script on paired end illumina data to run trimmomatic to remove adaptor sequences and low quality reads.
The script is below:
input_dir="/home/reads"
# Define the output directory
output_dir="trimmed"
# Create the output directory if it doesn't exist
mkdir -p "$output_dir"
# Loop through all files in the input directory that match the pattern *_1.fastq.gz
for file1 in "$input_dir"/*_1.fastq.gz; do
# Extract the base name (two-letter code) from the file name
base_name=$(basename "$file1" _1.fastq.gz)
file2="${input_dir}/${base_name}_2.fastq.gz"
# Check if the corresponding _2.fastq.gz file exists
if [[ -f "$file2" ]]; then
# Run the trimmomatic command
trimmomatic PE "$file1" "$file2" \
-baseout "${output_dir}/${base_name}.fastq.gz" \
ILLUMINACLIP:adaptors.fasta:4:30:10 MINLEN:30
else
echo "Warning: Corresponding file for $file1 not found. Skipping."
fi
done
The script runs over several hours on my cluster, but when I check the output files that should contain the reads removed by trimmomatic they are empty. I have also run fastqc on the sequence files before and after trimmomatic, and the html report files are identical. I can see from the fastqc outputs that each file fails for "Adapter Content" and contains a high percentage of Illumina Universal Adapter sequences.
Here is the sequence I am using for the Illumina Universal Adapter
>IlluminaUniversalAdapterNA
AGATCGGAAGAG
Why is trimmomatic not removing any reads?
Thanks for the advice, it seems like it is likely I have the incorrect adaptor sequences, though I am still surprised that no low quality reads were removed.
Perhaps there were no low quality reads either.