Hello,
I wanted to ask what solutions are out there use for random accessing BAM files via http.
Of course, the first answer here is samtools/htslib/pysam, but the current version of the htslib creates open range GET requests, those request lead to inflated egress costs when working on the S3 infrastructure.
I described this behavior here:
I was curious, If anybody else experienced this behavior and maybe has an work around for this.
IGV/IGVjs creates clean range requests when accessing data via http, but I don’t see an option to use this functionality outside of the programs for example in a pipeline or a command line tool.
A solution could be to parse the .bai file and define the range for the requested bytes from this data, maybe somebody has some code to share.
Happy about any feedback on this topic.
Best,
Stephan
Hi Lucas, I am looking for a similar functionality as I'm working with a large volume of CRAM files on S3, and downloading them whole would cost tens of thousands of dollars. Meanwhile if we want to study a gene locus it only needs downloading a few MB of data per individual. Have you been able to figure out a workaround or simple tool to download just the byte range for a specified genetic locus?
Thanks