Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Oct 19:2024.05.01.592119.
doi: 10.1101/2024.05.01.592119.

Selecting genes for analysis using historically contingent progress: from RNA changes to protein-protein interactions

Affiliations

Selecting genes for analysis using historically contingent progress: from RNA changes to protein-protein interactions

Farhaan Lalit et al. bioRxiv. .

Update in

Abstract

Progress in biology has generated numerous lists of genes that share some property. But advancing from these lists of genes to understanding their roles is slow and unsystematic. Here we use RNA silencing in C. elegans to illustrate an approach for prioritizing genes for detailed study given limited resources. The partially subjective relationships between genes forged by both deduced functional relatedness and biased progress in the field was captured as mutual information and used to cluster genes that were frequently identified yet remain understudied. Some proteins encoded by these understudied genes are predicted to physically interact with known regulators of RNA silencing, suggesting feedback regulation. Predicted interactions with proteins that act in other processes and the clustering of studied genes among the most frequently perturbed suggest regulatory links connecting RNA silencing to other processes like the cell cycle and asymmetric cell division. Thus, among the gene products altered when a process is perturbed could be regulators of that process acting to restore homeostasis, which provides a way to use RNA sequencing to identify candidate protein-protein interactions. Together, the analysis of perturbed transcripts and potential interactions of the proteins they encode could help prioritize candidate regulators of any process.

Keywords: AlphaFold; C. elegans; Mutual Information; RNA silencing; homeostasis.

PubMed Disclaimer

Conflict of interest statement

Competing Interest Statement: The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Some genes are selectively regulated, reported as part of many lists, and yet are understudied.
(A) Schematics of possible regulatory architectures for genes found on multiple lists. (top) Gene receiving one input form a large network. (bottom) Gene receiving multiple inputs from separable networks. (B) Strategy for the identification of regulated genes. See Methods for details. (C) Relationship between Si, Ti, and g obtained using simulated data for an organism with 20,000 genes. Distributions of the probabilities of having at least one overlapping gene within the selected gene set (P(Si > 0)) for 100 runs of each parameter combination are presented as box and whisker plots. (D) Numbers of publications listed on WormBase for the top 25 regulated genes ordered using r25 in the field of RNA silencing in C. elegans. Red line marks 10 publications. (E) Domains present in proteins encoded by understudied genes among the top 25 genes that are suggestive of function. Proteins with high-confidence AlphaFold structures (12) were used to identify similar proteins as detected by Foldseek (17) or based on the literature ((18); C38D9.2, F15D4.5, and W09B7.2). (F) Heatmap showing the top 25 regulated genes. Presence (black) or absence (white) of each gene in each dataset is indicated. Relatively understudied (<10 references on WormBase) genes (red) or pseudogenes (grey) identified in (D) are indicated. (G) Hierarchical clustering of the top 25 genes based on co-occurrence in lists, where gene names colored as in (F) and ‘distance (dJ)’ indicates Jaccard distance.
Figure 2.
Figure 2.. Understudied regulated genes encode proteins predicted to interact with key regulators of RNA silencing.
(A) Regulators of RNA silencing in different categories examined for predicted interactions with proteins encoded by understudied genes identified in this study. See text for details. (B) Predicted interactions between proteins encoded by top 25 genes ordered by their r25 scores and known regulators of RNA silencing in C. elegans. The area of the interaction surface between partners normalized by the product of the sizes of the interactors is shown as a bubble plot (inter-protein predicted aligned error <5Å and inter-residue distance <6Å). Interactions with a low ranking score (< 0.6) and/or that constrain fewer that 20 amino acids in proteins encoded by the understudied genes are indicated in grey. Also see Fig. S1 and Movies S1 to S32. (C) Proteins encoded by understudied genes with significant interactions are predicted to impact multiple steps in RNA silencing. (D) Predicted structures for the five newly named predicted influencers of RNA-regulated expression (PIRE) proteins are shown with the per-residue confidence (pLDDT) as present in the AlphaFold protein database (83).
Figure 3.
Figure 3.. Predicted Influencer of RNA-regulated Expression (PIRE) proteins interact with regulators of RNA silencing in two general modes.
(A) Predicted interactions between the PIRE proteins (magenta) FBXB-97 (left) and PIRE-3 (right) with the known regulator RDE-8 (green) that are of high confidence (constraining more than 20 amino acid residues with an inter-Cα distance less than 6Å and PAE less than 5Å) are indicated with pseudo bonds. (B) Regions of the PIRE protein sequence constrained by the interacting regulator. Markers (black, ranking score >0.6; grey, ranking score <0.6) are enlarged with respect to the X-axis for visibility (e.g., the marker denoting the interaction between RDE-1 and FBXB-97 only indicates one residue).
Figure 4.
Figure 4.. Interactions predicted by AlphaFold 2 and by the AlphaFold 3 server can differ.
(A) Comparison of the top ranking interactions between known regulators of RNA silencing and the PIRE proteins predicted by AlphaFold 2 (AF 2 (11); 0.8*ipTM + 0.2*pTM) with the score generated by AlphaFold 3 (AF3 (13); 0.8*ipTM + 0.2*pTM + 0.5*disorder). A high-confidence prediction by both approaches is highlighted in bold. (B) Models for the interaction of RNH-1.3 with RDE-3 generated by AF2 and AF3 overlayed using RDE-3. Also see Movie S33. (C) Comparison of residues of PIRE proteins constrained through interactions as predicted by AF2 (black) or by AF3 (grey). (D) Comparison of interactions between FBXB-97 and RDE-3 (left), and between PIRE-4 and RDE-3 (right) as predicted by AF2 (black) and the AF3 server (grey), respectively. Structures are shown with differential coloring of each protein and overlayed using the RDE-3 structures in both cases. Also see Movies S34 and S35. (E) Interactions between EGO-1 (magenta or red) and W09B7.1 (green or cyan) predicted by AF2 or AF3. Black ovals indicate interacting regions with inter-protein PAE <10Å (left) or <5Å (right). Also see Movie S36.
Figure 5.
Figure 5.. High-ranking models can be rare, and models can converge early with increasing scores.
(A) Distribution of ranking scores for the 25 models of RNH-1.3:RDE-3 generated by AF 2. (B) Multiple runs with different random seeds and resulting scores for models of RNH-1.3:RDE-3 generated by AF 3. (C) Overlay of models with the highest scores from two different runs showing similar interactions between RNH-1.3 (magenta or red) and RDE-3 (green or lime) predicted by both AF 2 and AF 3. Pseudobonds depicting the predicted aligned errors for the constrained residues are highlighted for both pairs of models. Also see Movie S37 and S38. (D) A range of scores can underlie nearly similar architectures of a predicted complex. The highest scoring model for RNH-1.3:RDE-3 from each of 18 AF 2 runs (different colors) were superimposed using RDE-3. Top, Superimposed models for low (less than 10Å) and high (more than 20Å) root mean square deviation (RMSD) values are shown. Bottom, Ranking scores are plotted after arranging models in increasing order of RMSD from the highest scoring model.
Figure 6.
Figure 6.. The poly-UG polymerase RDE-3 is predicted to interact with multiple proteins.
(A) Predicted interactions of RDE-3 with known regulators of RNA silencing and the 5 proteins listed as physical interactors on WormBase (MUT-16, MUT-7, PIK-1, RDE-8, and PRG-1) identified by AlphaFold 2.3 are shown. Sizes of circles indicate normalized interaction area and shading indicates ranking score. Grey indicates ranking scores < 0.6 and/or the products of numbers of constrained residues in RDE-3 and its interactors (nbait x nprey) < 100. Also see Movies S61 to S72. (B) Regions of RDE-3 protein sequence constrained by the interacting regulators. Markers (black) are as in Fig. 3B. (C) Table summarizing interactors of RDE-3. Experimentally identified physical interactors (interactor on WormBase?), highest score of AF 2 predicted interactions that are > 0.6 (25 models from 1 run), highest score among AF 3 predicted interactions (25 models from 5 runs), and whether the AF 2 and AF 3 structures are similar (convergence of AF 2 and AF 3?) are indicated. Scores of AF 3 models that lack any interactions between the two proteins with a predicted aligned error < 5Å and a distance < 6Å are indicated in grey.
Figure 7.
Figure 7.. PAR-5 is predicted to interact with the Z-granule surface protein PID-2/ZSP-1 but not with many other tested regulators of RNA silencing.
(A) Predicted interactions of PAR-5 with known regulators of RNA silencing identified by AlphaFold 2.3 are shown. Area of circles and shading are as in Fig. 6A. (B) Distribution of ranking scores for the 25 models of PAR-5:PID-2 generated by AF 2. (C) Regions of PAR-5 protein sequence constrained by interactions with PID-2. Markers (black) are as in Fig. 3B. (D) Structure of the C-terminus of PID-2 constrained by PAR-5. (E) Overlay of models predicted by AF 2 and AF 3 superimposed using PAR-5 showing similar interactions between the C-terminus of PID-2 (lime or red) and PAR-5 (magenta or green) although the rest of the PID-2 protein are positioned differently in the two models. Pseudobonds are as in Fig. 3A. Also see Movie S78.
Figure 8.
Figure 8.. Clusters formed by understudied regulated genes suggest priorities for detailed study.
(A to E) Properties of the top 100 regulated genes in the field of RNA silencing in C. elegans. (A) Clusters of genes based on their historical mutual information (HMI). Threshold for link: distance (1 - HMI) < 0.9. (B to E) Network in (A) with nodes colored to show number of publications per gene (white, 0; black, ≥100) (B), genes that have been the main subject of abstracts on RNA silencing in C. elegans (C), pseudogenes (red) (D), and genes changed in hrde-1 mutants (69) (red), a sid-1 mutant (16) (cyan), or both (orange) (E). (F) Predicted interactions of proteins encoded by genes with different r100 ranks with known regulators of RNA silencing. Sizes of circles indicate normalized interaction area and shading indicates ranking score. Grey indicates ranking scores < 0.6 and/or the products of numbers of constrained residues (nbait x nprey) < 100. Also see Movies S80 to Movie S93. (G) All interactions (connecting lines) depicted were identified by AF 2 (grey). Some are supported by experimental evidence for physical interaction (magenta) and some are also predicted by AF 3 with either similar (green) or different (cyan) interfaces. Known regulators of RNA silencing are in red and those used as baits to look for predicted interactors (STAU-1, PID-2, and RDE-3) are in bold. Also see Table S4.

Similar articles

References

    1. Jose A.M. (2020) The analysis of living systems can generate both knowledge and illusions. Elife, 9, e56354. - PMC - PubMed
    1. Richardson R.A.K., Navarro H.T., Nunes Amaral L.A. and Stoeger T. (2023) Meta-Research: understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. eLife, 12, RP93429. - PMC - PubMed
    1. Rocha J.J., Jayaram S.A., Stevens T.J., Muschalik N., Shah R.D., Emran S., Robles C., Freeman M. and Munro S. (2023) Functional unknomics: Systematic screening of conserved genes of unknown function. PLoS Biol, 21, e3002222. - PMC - PubMed
    1. Billman G.E. (2020) Homeostasis: The Underappreciated and Far Too Often Ignored Central Organizing Principle of Physiology. Front Physiol, 11, 200. - PMC - PubMed
    1. Davis P., Zarowiecki M., Arnaboldi V., Becerra A., Cain S., Chan J., Chen W.J., Cho J., da Veiga Beltrame E., Diamantakis S. et al. (2022) WormBase in 2022-data, processes, and tools for analyzing Caenorhabditis elegans. Genetics, 220, iyac003. - PMC - PubMed

Publication types