Question

Obtaining sequence from Bioproject IDs using biopython gives unknown sequence

0

Entering edit mode

9.4 years ago

Prasad ▴ 50

Hi All,

I have a list of bioproject IDs and would like to get corresponding sequences from them. So, I am following a list of steps as below:

1. Using the bioproject ID, I am getting GI ID using elink:

handle = Entrez.elink(dbfrom="bioproject", db="nuccore",id=bioprojecID, linkname="bioproject_nuccore_wgsmaster")
record = Entrez.read(handle)
GI_ID = record[0]["LinkSetDb"]["Link"]["Id"]

2. Then I am trying to get sequence from GI_ID (using efetch and seqIO modules in biopython):

handle = Entrez.efetch(db="nucleotide", id=GI_ID, rettype="gb", retmode="text")
record = SeqIO.read(handle, "genbank")

But this gives unknown sequence when trying to print record.

Can anyone advise if this is the right way to do it or is there a better way to obtain related sequences from bioproject IDs?

Thanks in advance!

efetch elink biopython eutilities • 4.3k views

ADD COMMENT • link updated 22 months ago by Ram 44k • written 9.4 years ago by Prasad ▴ 50

Ram · Answer 1 · 2015-06-07

0

Entering edit mode

9.4 years ago

Kirill Tsyganov ▴ 370

I can help with SeqIO part. Assuming that your "handle" is a genbank file.

from Bio import SeqIO

for record in SeqIO.parse(open(handle), 'genbank'):
    print record.id, record.seq

For more options do this:

print dir(record)
break

This will return a list of methods you can call on record object - that way you can get different information about your file (handle)

ADD COMMENT • link updated 22 months ago by Ram 44k • written 9.4 years ago by Kirill Tsyganov ▴ 370

0

Entering edit mode

Hi, thanks for replying. I tried printing record.seq but it gives weird output (multiple 'N' characters).

ADD REPLY • link 9.4 years ago by Prasad ▴ 50

0

Entering edit mode

It is very common to have multiple 'N' characters at the start of the sequence. Each chromosome may have multiple Ns at the start of the chromosome (could be 100 or 1000 of bases long). Scroll down into your sequence.

ADD REPLY • link updated 22 months ago by Ram 44k • written 9.4 years ago by Kirill Tsyganov ▴ 370