I have the following information from blastx annotations of bacterial genes predicted by prodigal:
Sequence name Sequence desc. Sequence length Hit desc. Hit ACC
gene_1_contig_1 excinuclease ABC subunit A 228 gi|1055624747|ref|WP_067265422.1|excinuclease ABC subunit A [Sulfitobacter sp. HI0054] gi|1024544140|gb|KZY51396.1| excinuclease ABC subunit A [Sulfitobacter sp. HI0054] WP_067265422, KZY51396
gene_2_contig_1 excinuclease ABC subunit A 210 gi|1055651942|ref|WP_067291557.1|excinuclease ABC subunit A [Sulfitobacter sp. EhC04] gi|1032103716|gb|OAN76192.1| excinuclease ABC subunit A [Sulfitobacter sp. EhC04] WP_067291557, OAN76192
gene_3_contig_1 MFS transporter 432 gi|1055624744|ref|WP_067265419.1|MFS transporter [Sulfitobacter sp. HI0054] gi|1024544139|gb|KZY51395.1| hypothetical protein A3734_05250 [Sulfitobacter sp. HI0054] WP_067265419, KZY51395
gene_4_contig_1 MFS transporter 561 gi|1055624744|ref|WP_067265419.1|MFS transporter [Sulfitobacter sp. HI0054] gi|1024544139|gb|KZY51395.1| hypothetical protein A3734_05250 [Sulfitobacter sp. HI0054] WP_067265419, KZY51395
I wish to fetch gene symbols using the information (either the gi identifiers or the protein accessions) from the blastx results; may be using entrex efetch.
So, the result would be as below:
Gene Name Gene symbol
excinuclease ABC subunit A UvrA
See, the link here. However, I am not sure how to proceed in this case. Can anybody please suggest something?
Hi Vijay, Did you try using Biomart? it has some useful function to fetch gene symbols.
The gene symbol appears to have been included in the description: https://www.ncbi.nlm.nih.gov/protein/1055624747/
Unfortunately, that is not true for all the entries which I have. That had saved a lot of time
How about db2db where you would convert RefSeq Protein Accession to Gene ID? https://biodbnet-abcc.ncifcrf.gov/db/db2db.php
You could do something like:
Problem is you are dealing with
WP*
entries which are non-redundant protein entries from multiple strains etc. so the gene symbol is not separately annotated.