Post-TOGA queries related to UL (Uncertain loss) and L (loss) status #183

vinitamehlawat · 2024-09-18T01:58:28Z

Greetings, I have some post toga queries:

I am struggling with "L" and "UL" dataset. Please consider the following tree, where my research interest in not a single species, but rather a whole lineage(branch where I pointed arrow). I am interested into common set of gene which are lost (L) at that branch.

If I consider all "L" at that branch I am skipping lot of data, means if genes which are, lets say out of 14 in-group species in 11 they are Lost but in other 3 in-group that same gene is UL. Is it okay to say that gene is lost ?
Another If gene is clearly "L" in TOGA output, is it necessary to check in transcriptome data if that gene is transcribing or not or based on TOGA robustness we can say that lost is "Clear Lost" means no functional protein for that gene?
For "UL", I went through some discussion over TOGA GitHub issues but still I am not clear for their status; Is it right to say that if 1 gene have 10 transcript, out of 10 that could be possible that I can get true hit in transcriptome for 6, if they have inactivation mutation but not full filling the "loss" criteria
What could be the possible way to analyze in more detail for "UL" data , transcriptome dataset or RELAX selection test.
In provided sample tree, if gene is Intact till Query3 and then at my focal branch in all species that gene is either L or UL but this is gene NOT Intact, What would be the best possible status of that gene in your thoughts?

I know, this is too much to ask, But I really appreciate your thoughtful suggestions and they will really help me to sort my data in more logical way.

Looking forward to hear from you
Best Regards
Vinita

MichaelHiller · 2024-09-28T15:33:45Z

Hi Vinita,

not sure I fully understand all questions but I'll try my best.
For 3) pls have a look at the TOGA supplementary materials. We have images illustrating examples.
For 2 and 4), both RELAX and transcriptomics data is a good idea. Transcriptomics data could tell you if the inact mutation is potentially a base error (RNA reads don't have the genomic mutation), or if the exon with the mutation has maybe shifted splice sites or is skipped, such that the mutation is actually not part of a transcript.

The picture looks to me like a gene loss (or UL) on your focal branch, provided that no species in the group has an Intact gene.
Here are a few more considerations. Some ingroup species with UL can have another frameshift that returns into the ancestral reading frame (the first frameshift would then be shared with other species). Some ingroups can also have M (or in principle PI), e.g. say exon 3 and 7 have frameshifts and both exons are missing (assembly gap).
Probably best to assess with a multiple alignment whether inact mutations are shared among the species in your group and whether they can be assigned to the ancestral branch you labeled.

Hope that helps
Michael

vinitamehlawat · 2024-09-30T17:25:25Z

Thank you Dr. @MichaelHiller , This really helps a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post-TOGA queries related to UL (Uncertain loss) and L (loss) status #183

Post-TOGA queries related to UL (Uncertain loss) and L (loss) status #183

vinitamehlawat commented Sep 18, 2024

MichaelHiller commented Sep 28, 2024

vinitamehlawat commented Sep 30, 2024

Post-TOGA queries related to UL (Uncertain loss) and L (loss) status #183

Post-TOGA queries related to UL (Uncertain loss) and L (loss) status #183

Comments

vinitamehlawat commented Sep 18, 2024

MichaelHiller commented Sep 28, 2024

vinitamehlawat commented Sep 30, 2024