indexing the drosophila genome for kallisto
2
0
Entering edit mode
4 months ago
gogeni5529 ▴ 50

I want to index the drosophila genome to run kallisto (v. 0.50).

I was wondering which fasta file I should use, when downloading it from the Ensembl FTP site - the dna.toplevel fasta or the cdna.all fasta?

The command then would be e.g. kallisto index -i Dme.BDGP6.46.idx Drosophila_melanogaster.BDGP6.46.dna.toplevel.fa.gz if taking the dna file.

Is that correct?

index kallisto • 468 views
ADD COMMENT
1
Entering edit mode
4 months ago
gogeni5529 ▴ 50

thx dsull for the explanation. that makes sense. I manage to create the drosophila genome using the following command (I copied it from your github repository.

kb ref --workflow=standard -i index.idx -g t2g.txt -f1 Drosophila_melanogaster.BDGP6.46.cdna.fa \
  --include-attribute gene_biotype:protein_coding \
  --include-attribute gene_biotype:lncRNA \
  --include-attribute gene_biotype:lincRNA \
  --include-attribute gene_biotype:antisense \
  --include-attribute gene_biotype:IG_LV_gene \
  --include-attribute gene_biotype:IG_V_gene \
  --include-attribute gene_biotype:IG_V_pseudogene \
  --include-attribute gene_biotype:IG_D_gene \
  --include-attribute gene_biotype:IG_J_gene \
  --include-attribute gene_biotype:IG_J_pseudogene \
  --include-attribute gene_biotype:IG_C_gene \
  --include-attribute gene_biotype:IG_C_pseudogene \
  --include-attribute gene_biotype:TR_V_gene \
  --include-attribute gene_biotype:TR_V_pseudogene \
  --include-attribute gene_biotype:TR_D_gene \
  --include-attribute gene_biotype:TR_J_gene \
  --include-attribute gene_biotype:TR_J_pseudogene \
  --include-attribute gene_biotype:TR_C_gene \
  Drosophila_melanogaster.BDGP6.46.dna.toplevel.fa Drosophila_melanogaster.BDGP6.46.112.gtf
ADD COMMENT
0
Entering edit mode
4 months ago
dsull ★ 6.9k

You should use the cdna one.

kallisto index works on the reference TRANSCRIPTOME (i.e. cdna), not genome. The toplevel dna file is the genome so that's not what you want.

ADD COMMENT
0
Entering edit mode

but if you look at the examples given by the people created the example indicies, they also took the dna and not the cdna to create the index in version 0.50.

see kallisto-transcriptome-indices

The files they are using are all dna primary assembly files, no cdna.

ADD REPLY
1
Entering edit mode

I was the one who created those indices and I replied to you on the GitHub issues. My reply is reproduced below:

This indices were created by kb-python: kb-python takes in the GENOME fasta and extracts a TRANSCRIPTOME fasta from it (and then calls the kallisto index command on the TRANSCRIPTOME fasta that it just extracted).

If you don't use kb-python and simply stick with calling the kallisto index command, then you should be using the TRANSCRIPTOME fasta. The kallisto index command always uses the TRANSCRIPTOME fasta.

ADD REPLY

Login before adding your answer.

Traffic: 1012 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6