Hello everyone, I am new to bioinformatics, and i am trying to retrieve a .fasta file of a gene located in human chr6 from a neandertal published sequence. The type of file with the one i begin with is a file.bam type of the single chr6 sequencing of an organism.
I have searched and realized that a useful tool for doing this is samtools, which i have already succesfully installed. I tried to view the file with the next syntax:
samtools view filename.bam chr6:posxx-posxx+1
I got an error, saying i should index it, so i proceeded to do so, with the needed sort command previous to the indexing, writting syntax as follows:
samtools sort filename.bam -o file.sorted
samtools index filename.sorted
after sorting i got a single file with the .sorted extention, which i followed to apply the index command, task which gave me a filename.sorted.bai file, plus two other tmp.xxxx.bam files, with the filename.sorted.bai file supposed to be indexed already, since was the result of my last command.
after this, i am trying to view the desired genic region in order to know if it's present in the sequencing data, before proceeding to apply mpileup to the files. and retrieving my desired .fasta file from the sequence.
According to the first error message i got, an indexed file should be viewable through the view command, so i proceed with the following command:
samtools view filename.sorted.bai chr6:xx-xx+1
and it gives me the following error:
[E: :hts:hopen] Failed to open file filename.sorted.bai
[E: :hts_open_format] Failed to open file filename.sorted.bai
samtools view: failed to open "filename.sorted.bai", for reading: Exec format error
i try applying the same syntax to other files, say the temporal files, and it retrieves the next error:
[main_samview] random alignment retrieval only works for indexed BAM or CRAM files
... I have gone back to sort and index the same original filename.bam file, but asking for an output in a .bam format at the end rather than the .bai which is supposed to be the default output format when indexing. Nothing has worked.
At the end i tried viewing the whole sequence, instead of asking for a region, and it worked nicely, with a filename.bam file, which was not indexed, just sorted, and actually as well with a filename.sorted.bam.temp.xxxx.bam file, writting the folowing commands:
samtools view filename.bam
samtools view filename.sorted.bam.temp.xxxx.bam
both commands retreived a huge ammount of readings, whith headers and everything, but when i narrow my command by giving the region it does not work!,
Perhaps i am missing something with the REGION syntax?, do i have to perform any process on the filename.bai file previous to trying to view an especific region?, i would appreciate any help!, i am using samtools version 1.6
I apologize for the lenght of my post, but i tried to be most explicit with my kind of errors and the pipeline that i have followed in sake of clarity.
Which version of
samtools
are you using?Have you tried
samtools view filename.sorted.bam chr6:xx-xx
? You do not use the.bai
file. You just need to have it available in the same directory.I am using samtools-1.6
i haven't got any "sorted.bam" file ... anyways, you suggest that i use the file that i got after the "samtools sort filename.bam" command?
That name is just a place holder. If you were able to successfully sort your bam file then use whatever file name you have for that sorted file.
Since you have the
.tmp
files still around you were NOT able to successfully complete the sorting of the bam file. So you would need to repeat that. The.tmp
files should be deleted automatically once the sorting is successful. Don't think the order is critical but if you repeat trysamtools sort -o file_sorted.bam original.filename.bam
Thanks a lot, it is useful to know that i should use the sorted file insted of the .bai one ... still, i think i should be able to run that syntax only after indexing and having the .bai file in my folder right?
I had worked the pipeline a couple of times, so now i deleted everything but the original .bam file ... and repeating the tasks. I hope it works this time!
oh, and... those temp files came after index command, not sort, but i get your idea
You need to first sort original bam. Let that complete (no
.tmp
files should remain). Then index the sorted file. Finally do:samtools view filename.sorted.bam chr6:xx-xx
That is the correct order.as i told shussainather bellow:
Thanks, i tried with the sorted file, and it seems it worked something different, now it retrieves:
[main_samview] recion "chr6:xx-xx+1" specifies an unknown reference name. continue anyway.