Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-TOGA queries related to UL (Uncertain loss) and L (loss) status #183

Open
vinitamehlawat opened this issue Sep 18, 2024 · 2 comments
Open

Comments

@vinitamehlawat
Copy link

Hello Dr. @MichaelHiller

Greetings, I have some post toga queries:

I am struggling with "L" and "UL" dataset. Please consider the following tree, where my research interest in not a single species, but rather a whole lineage(branch where I pointed arrow). I am interested into common set of gene which are lost (L) at that branch.

TOGA_tree_image

  1. If I consider all "L" at that branch I am skipping lot of data, means if genes which are, lets say out of 14 in-group species in 11 they are Lost but in other 3 in-group that same gene is UL. Is it okay to say that gene is lost ?
  2. Another If gene is clearly "L" in TOGA output, is it necessary to check in transcriptome data if that gene is transcribing or not or based on TOGA robustness we can say that lost is "Clear Lost" means no functional protein for that gene?
  3. For "UL", I went through some discussion over TOGA GitHub issues but still I am not clear for their status; Is it right to say that if 1 gene have 10 transcript, out of 10 that could be possible that I can get true hit in transcriptome for 6, if they have inactivation mutation but not full filling the "loss" criteria
  4. What could be the possible way to analyze in more detail for "UL" data , transcriptome dataset or RELAX selection test.
  5. In provided sample tree, if gene is Intact till Query3 and then at my focal branch in all species that gene is either L or UL but this is gene NOT Intact, What would be the best possible status of that gene in your thoughts?

I know, this is too much to ask, But I really appreciate your thoughtful suggestions and they will really help me to sort my data in more logical way.

Looking forward to hear from you
Best Regards
Vinita

@MichaelHiller
Copy link
Collaborator

Hi Vinita,

not sure I fully understand all questions but I'll try my best.
For 3) pls have a look at the TOGA supplementary materials. We have images illustrating examples.
For 2 and 4), both RELAX and transcriptomics data is a good idea. Transcriptomics data could tell you if the inact mutation is potentially a base error (RNA reads don't have the genomic mutation), or if the exon with the mutation has maybe shifted splice sites or is skipped, such that the mutation is actually not part of a transcript.

  1. The picture looks to me like a gene loss (or UL) on your focal branch, provided that no species in the group has an Intact gene.
    Here are a few more considerations. Some ingroup species with UL can have another frameshift that returns into the ancestral reading frame (the first frameshift would then be shared with other species). Some ingroups can also have M (or in principle PI), e.g. say exon 3 and 7 have frameshifts and both exons are missing (assembly gap).
    Probably best to assess with a multiple alignment whether inact mutations are shared among the species in your group and whether they can be assigned to the ancestral branch you labeled.

Hope that helps
Michael

@vinitamehlawat
Copy link
Author

Thank you Dr. @MichaelHiller , This really helps a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants