truncated scaffolds from gff files #70

simarilion · 2022-10-09T21:47:40Z

I'm sure I'm missing something obvious here.....

I'd like to view a long (583,249 bp long) but feature-poor scaffold (7 features). These features are described in the attached gff file which specifies the full length of the scaffold on the second line as:

##sequence-region scaffold59 1 583249

I can generate output files OK, but they don't display the entire length of the scaffold - they always stop at the end of the last feature (position 483,172), i.e. missing about 100,000 bp of straight black line (example output also attached).

Here's an example of one of my attempts to use dan_feature_viewer to get this working:

from dna_features_viewer import BiopythonTranslator
from BCBio import GFF
in_file = "scaffold59_original.gff"

in_handle = open(in_file)
gff_records = []
for rec in GFF.parse(in_handle):
#print(rec)
gff_records.append(rec)
in_handle.close()

#fig, ax = plt.subplots()

start = 0
end = 583249
record = BiopythonTranslator().translate_record(gff_records[0][start:end])
ax, _ = record.plot(figure_width=10, strand_in_label_threshold=7)
ax.figure.savefig("dna_FV_scaffold59_original_test2.png", bbox_inches="tight")

scaffold59_original_gff.txt

dna_FV_scaffold59.pdf

simarilion · 2022-10-10T06:01:54Z

Kind of figured a workaround out - if one includes the fasta formatted sequence at the end of the gff file the entire length gets plotted, so after the last gff feature annotation include the following:

##FASTA

scaffold59
CAGTAAATTTGACTGGGTCCAAACATAGCTGGTATCACATATATTTTGCGATACCTACACACGTGATGATGATTGATTTC
ACACATCTTTAGCAAAATAAGGatttagtttgttatttaaatttgtttattttcattgtgactataatttatttttttcg
ta.....

Would be nice if this wasn't necessary as this information is present in the second line of the off file (##sequence-region scaffold59 1 583249)....

Otherwise this is a great package - thanks for providing it to the community!

Zulko · 2022-10-10T14:24:02Z

Just in case: DnaFeatureViewer normally uses the length given by the biopython record. Maybe this is a case where the record imported by GFF doesn't have the right size (meaning the problem is not with DFV). What does len(records[0]) print?

In any case, another workaround to your fasta solution is to manually give the sequence length:

record = BiopythonTranslator().translate_record(gff_records[0])
record.sequence_length = 583249

simarilion · 2022-10-10T14:26:37Z

Thanks for your answer Zulko! I think the length in the gff file is correct, but I like your "record.sequence_length = 583249” solution - much more efficient! Cheers Dan

…

--------------------------------------------------------------- Professor Daniel J. Jackson Department of Geobiology Room 2.124 Georg-August University of Göttingen Goldschmidtstr.3 37077 Göttingen Germany Tel: +49 (0) 551 39 14177 Fax: +49 (0) 551 39 7918 ***@***.******@***.***> http://www.uni-goettingen.de/en/102705.html

--------------------------------------------------------------- On 10. Oct 2022, at 16:24, Zulko ***@***.******@***.***>> wrote: Just in case: DnaFeatureViewer normally uses the length given by the biopython record. Maybe this is a case where the record imported by GFF doesn't have the right size (meaning the problem is not with DFV). What does len(records[0]) print? In any case, another workaround to your fasta solution is to manually give the sequence length: record = BiopythonTranslator().translate_record(gff_records[0]) record.sequence_length = 583249 — Reply to this email directly, view it on GitHub<#70 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AENVHPM7QLVHJMS4KG3JHG3WCQRI3ANCNFSM6AAAAAARA3OANU>. You are receiving this because you authored the thread.Message ID: ***@***.***>

Zulko · 2022-10-10T14:48:41Z

Great! To be clear, I wasn't suggesting that the length in the file is incorrect, but that the record produced by BCBio.GFF has the incorrect size, which would mean the issue is with the BCBio library and DnaFeatureViewer isn't at fault.

Could you check the value returned by len(record[0])?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

truncated scaffolds from gff files #70

truncated scaffolds from gff files #70

simarilion commented Oct 9, 2022

simarilion commented Oct 10, 2022

Zulko commented Oct 10, 2022

simarilion commented Oct 10, 2022 via email

Zulko commented Oct 10, 2022

truncated scaffolds from gff files #70

truncated scaffolds from gff files #70

Comments

simarilion commented Oct 9, 2022

simarilion commented Oct 10, 2022

Zulko commented Oct 10, 2022

simarilion commented Oct 10, 2022 via email

Zulko commented Oct 10, 2022