Patched 0 SNPs -> an old .tbi index file seems to be culprit. #24
Description
Similar to #19, but in this case, the same chromosome names are used, so that shouldn't be the issue. I'm using the genome that was used to create the VCF files, as provided by Sanger.
Using g2gtools 0.2.7 installed via conda install -c kbchoi g2gtools=0.2.7=py36_0
.
If I download these VCF and FASTA for CAST:
ftp://ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/strain_specific_vcfs/CAST_EiJ.mgp.v5.snps.dbSNP142.vcf.gz
ftp://ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/strain_specific_vcfs/CAST_EiJ.mgp.v5.indels.dbSNP142.normed.vcf.gz
ftp://ftp-mouse.sanger.ac.uk/ref/GRCm38_68.fa
And then start the first two steps to create a transcriptome:
g2gtools vcf2vci -p 12 -i snps/CAST_EiJ.mgp.v5.snps.dbSNP142.vcf.gz \
-i snps/CAST_EiJ.mgp.v5.indels.dbSNP142.normed.vcf.gz \
-o snps/CAST_EiJ.vci -s CAST_EiJ -f fasta/GRCm38_68.fa --pass --quality
g2gtools patch -p 12 -i fasta/GRCm38_68.fa -c snps/CAST_EiJ.vci.gz \
-o fasta/CAST_EiJ.patched.fa 2> fasta/CAST_EiJ.patched.log
The log output says:
==> fasta/CAST_EiJ.patched.log <==
[g2gtools] Patched 0 SNPs total
[g2gtools] Patch complete: 00:00:14.33
And the output FASTA is equivalent to reference.
If I exclude the indels file, I get:
==> fasta/CAST_EiJ.patched.log <==
[g2gtools] Patched 552,805 SNPs total
[g2gtools] Patch complete: 00:00:18.53
I've tried re-ordering the SNP and indels file, and I've tried with and without the pass or quality filters. I've also tried merging the two VCF into one file (removing the header of the second one).