Question

Kallisto-Bustools Output prpblems

0

Entering edit mode

10 months ago

myoui3122010 ▴ 20

I used Kallisto-Bustools for quantifying a snRNA-seq data, the output gives gene names, most tutorial seemed to have gene symbol.

First 10 lines of t2g.txt input file:

ENST00000456328.2 ENSG00000223972.5 DDX11L1
ENST00000450305.2 ENSG00000223972.5 DDX11L1
ENST00000488147.1 ENSG00000227232.5 WASH7P
ENST00000619216.1 ENSG00000278267.1 MIR6859-1
ENST00000473358.1 ENSG00000243485.5 MIR1302-2HG
ENST00000469289.1 ENSG00000243485.5 MIR1302-2HG
ENST00000607096.1 ENSG00000284332.1 MIR1302-2
ENST00000417324.1 ENSG00000237613.2 FAM138A
ENST00000461467.1 ENSG00000237613.2 FAM138A
ENST00000606857.1 ENSG00000268020.3 OR4G4P

First 10 line of genes.txt output file:

ENSG00000001460.18
ENSG00000001461.17
ENSG00000010072.16
ENSG00000008118.10
ENSG00000009780.15
ENSG00000048707.15
ENSG00000034971.17
ENSG00000059588.10
ENSG00000041988.15
ENSG00000049245.13

a. Would swapping the location of gene name and symbol give valid gene name(actually correct names) in the output? b. Any other possible solutions other than manually changing the names. Thanks for your time

Bustools Kallisto • 757 views

ADD COMMENT • link updated 10 months ago by dsull ★ 6.9k • written 10 months ago by myoui3122010 ▴ 20

score 1 · Accepted Answer · 2023-12-21

1

Entering edit mode

10 months ago

dsull ★ 6.9k

In kb-python (around version 0.27.3 or so), you could run kb count with the --gene-names options to get gene names instead of gene IDs.
Yes, swapping the gene name and symbol in the t2g.txt file would work -- be careful if you do so, because if there's an empty field for gene name (i.e. the gene ID doesn't actually have a corresponding gene name associated with it), errors will arise.
In any case, you could use R or python to convert your genes.txt gene IDs into the corresponding gene names based on the t2g.txt file.

(Side note: There's a new version of kb-python (version 0.28.0) out where a gene names file is automatically outputted by default (but that requires upgrading your index and several other things -- this new version makes improvements to memory and accuracy).)

ADD COMMENT • link 10 months ago by dsull ★ 6.9k

0

Entering edit mode

I installed version 0.28.0, but index cannot be build properly(extremely small files, no errors). I downloaded prebuilt-index to run the program but it threw the following error:

kb count -i index.idx -g t2g.txt -c1 cdna_t2c.txt -c2 intron_t2c.txt -x 10xv3 -o output -t 4 --workflow nucleus /mnt/d/Raw/GSE219280/GSM6781917/SRR22512224/SRR22512224_1.fastq /mnt/d/Raw/GSE219280/GSM6781917/SRR22512224/SRR22512224_2.fastq

usage: kb [-h] [--list] <CMD> ... kb: error: --sum incompatible with lamanno/nucleus

I used awk to create a new file with the corresponding gene symbols, it's satisfactory. By default version 0.26.0 was installed, I had to create new conda environment for upgrading to 0.27.3. Provided good increase in speed.

Thank you very much, have a great day.

ADD REPLY • link 10 months ago by myoui3122010 ▴ 20

0

Entering edit mode

This is as I said -- there are many changes you need to make in order to use kb-python 0.28.0. For one, the --workflow nucleus no longer is supported. We describe the usage of 0.28.0 in a new preprint.

If you already have 0.27.3 working, just go with that.

ADD REPLY • link 10 months ago by dsull ★ 6.9k