Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 15;36(4):1121-1128.
doi: 10.1093/bioinformatics/btz703.

Proteome-level assessment of origin, prevalence and function of leucine-aspartic acid (LD) motifs

Affiliations

Proteome-level assessment of origin, prevalence and function of leucine-aspartic acid (LD) motifs

Tanvir Alam et al. Bioinformatics. .

Abstract

Motivation: Leucine-aspartic acid (LD) motifs are short linear interaction motifs (SLiMs) that link paxillin family proteins to factors controlling cell adhesion, motility and survival. The existence and importance of LD motifs beyond the paxillin family is poorly understood.

Results: To enable a proteome-wide assessment of LD motifs, we developed an active learning based framework (LD motif finder; LDMF) that iteratively integrates computational predictions with experimental validation. Our analysis of the human proteome revealed a dozen new proteins containing LD motifs. We found that LD motif signalling evolved in unicellular eukaryotes more than 800 Myr ago, with paxillin and vinculin as core constituents, and nuclear export signal as a likely source of de novo LD motifs. We show that LD motif proteins form a functionally homogenous group, all being involved in cell morphogenesis and adhesion. This functional focus is recapitulated in cells by GFP-fused LD motifs, suggesting that it is intrinsic to the LD motif sequence, possibly through their effect on binding partners. Our approach elucidated the origin and dynamic adaptations of an ancestral SLiM, and can serve as a guide for the identification of other SLiMs for which only few representatives are known.

Availability and implementation: LDMF is freely available online at www.cbrc.kaust.edu.sa/ldmf; Source code is available at https://github.com/tanviralambd/LD/.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of LD motifs. (A) Schematic representation of human paxillin family members (paxillin, leupaxin and Hic-5) and PaxB from Dictyostelium discoideum. TPR, tetratricopeptide repeat; Znf, zinc-finger; SAM, sterile α-motif; START, STAR-related lipid transfer domain. (B) Sequence alignment of selected known LD motifs. Sequence positions are numbered with respect to the first leucine of the LD motif (numbered 0). Acidic (red), basic (blue) and hydrophobic (green) residues are highlighted. PXN, paxillin. LPXN, leupaxin. All sequences are from human proteins, except for PaxB (D. discoideum). (C) Structure of LD motifs bound to FAK FAT and α-parvin. Ribbon diagrams of FAT and α-parvin are colour-ramped from blue (N-terminus) to red (C-terminus). LD motifs are shown in grey, with key residues shown as stick models in green (hydrophobic) or red (acidic). Position −1 (D), 0 (L) and +1 (E) are labelled
Fig. 2.
Fig. 2.
Flowchart of the LDMF tool, and features of the predicted LD motif sequences. (A) Our learning process contains three iterations. The first iteration was by training a support vector machine (SVM) model based on the 18 known LD motifs as the positive set and randomly drawn sequences as the negative one. Sequence, secondary structure and AAindex features of these sets were used to build an initial model. This model was expected to have poor prediction performance because the randomly drawn negative sequences are expected to be easily differentiable from the positive ones. We then applied this initial model to identify putative LD motifs in close orthologs of our six positive-set proteins, using standard protein–protein unidirectional BLAST (blastp) (Altschul et al., 1997) (see Supplementary Material for details). This step resulted in additional 40 LD motif sequences that we manually checked and added to the positive set. The initial model was then applied to the protein data bank (PDB) to find sequences that satisfy some of the key features, but not all of them. These sequences are similar to the true motifs in some aspects and thus provide a much more difficult negative set for the second iteration of training. These training sets were used to build the ‘final’ first round model, with which we scanned the human proteome (20 159 sequences). All predicted novel LD motifs were synthesized as peptides and used in in vitro binding experiments. Those sequences that showed binding were included in the positive set of the final iteration of training. The final model of the second round was used to predict LD motifs in various proteomes. (B) The ten amino acids constituting the LD motif core are highlighted inside the red box. The twenty up- and down-stream residues of the flanking regions are shown. Top: amino acid sequences. Bottom: secondary structure. This figure was generated by Jalview (Waterhouse et al., 2009)
Fig. 3.
Fig. 3.
LD motif containing proteins identified in the human proteome. (A) Summary of experimental binding assays between putative LD motifs and selected LDBDs. The LD motifs are coloured according to: positive controls (green), negative controls (red), highly likely (blue), less likely (orange), least likely (yellow) and the motifs discarded in round 1 (grey). Values indicate the Kd in µM for direct anisotropy (DA), microscale thermophoresis (MST) and isothermal calorimetry (ITC). ‘N’: no confident Kd could be derived from fitting the data. ‘-’: not determined. For anisotropy competition assay (ACA) and differential scanning fluorimetry (DSF), results are given as relative difference, or indicate Tm shifts, respectively. For ACA and DSF values, a t-test of significance was performed (n = 3), were the null hypothesis is rejected with 95% (*), 99% (**) and 99.9% (***) of confidence. For Kd values, errors are indicated as SEM. (B) LD motif containing proteins identified in the human proteome. Protein length and positions of the LD motifs (residues −1 to +8) are labelled. Additional domains are indicated by their PFAM name. Background colouring as in (A)
Fig. 4.
Fig. 4.
NMR binding site mapping of LD motifs onto the FAK FAT domain. NMR chemical shift changes introduced by titrations with LD motif peptides were mapped onto the molecular surface (grey) of the FAT structure in blue (resonances disappeared), purple (shift changes great than 2 σ) and pink (chemical shift changes between 1 and 2 σ). Unassigned residues and prolines were coloured black. Two sides of the FAT domain are shown: the side composed of helices 1 and 4 (1/4) and the side composed of helices 2 and 3 (2/3). LD motifs are shown as stick models, with carbons coloured in green. Paxillin LD2 and LD4 peptides were taken from the crystal structures 1ow8 and 1ow7, respectively. Positions of LD motifs of LPP and CCDC158 were obtained by NMR-data guided docking. Positions ‘L0’ and ‘D+1’ of the canonical class I consensus, and positions ‘L+7’ and ‘D/E+6’of the inverse class II are labelled
Fig. 5.
Fig. 5.
Evolution and adaptation of the LD motif interactome. (A)LDMF-predicted LD motifs and LDBDs in stem eukaryotes. Left: Evolutionary relation of the unicellular eukaryotes analysed. Figure adapted from the Broad Institute’s Origin of Multicellularity initiative. Right: PAX refers to paxillin homologues, non-PAX to proteins not homologous to paxillin; number of LD motifs (or LD motif containing protein) and NES as identified by LDMF and NetNES (la Cour et al., 2004), respectively. If there are several paxillin homologues in one species, the corresponding numbers are separated by ‘/’. (#): species contains a paxillin homologue without LD motif. The XPO1, VINC, CCM3, PARVA, FAK and GIT columns show the presence of genes homologous to exportin, vinculin, CCM3, α-parvin, FAK and GIT, respectively. The number of ticks corresponds to the number of homologues found. The presence of a functional LDBD in these domains was assessed by sequence alignments and homology modelling. Colouring of the rows for each species matches panel A. (B) Conservation of the non-paxillin LD motifs. ‘Distant’ refers to the evolutionary distance to humans. *: with respect to the protein sequence in the most distant species. This table summarizes results of Supplementary Figure S8
Fig. 6.
Fig. 6.
Cellular effects caused by the introduction of additional LD motifs. (A) Subcellular localization and Cell morphology. HeLa cells were plated on fibronectin-coated coverslips (25 000 cells) transfected and fixed after 24 h for immunofluorescence. Fixed cells incubated with the indicated antibodies and fluorescent phalloidin to reveal filamentous actin were observed with a fluorescence microscope. ‘4×’ are 2-fold enlargements of areas indicated by arrows. The enlargements show examples of the localization of eGFP-tagged proteins (GFP) in proximity of vinculin-positive FAs (Vinc.; upper panel) and actin fibres (lower panel; yellow areas). Scale bar = 50 µm. (B) Spreading assay. Analysis of projected cell areas (left), aspect ratio (middle) and roundness (right) were evaluated from 18 to 27 cells per condition. *P < 0.05, **P < 0.001. (C) Analysis of wound healing assay over 30 h: bars represent normalized mean values ± SEM of Total and Euclidean distance (left), speed (µm/min) (middle) and directionality (right; persistence of migration). **P < 0.001. The tracking profile of 35–40 moving cells per condition was quantified for the analysis of the wound healing assay. The data were analysed by the two-tailed distribution and two-sample unequal variance (Student’s t-test). Differences in values with P < 0.05 and P < 0.001 were considered statistically significant. (D) Wound healing assay using HeLa cells plated on fibronectin and transfected with eGFP-tagged constructs. Cell tracking for 30 h at 60 min/frame of eGFP control (GFP) and the indicated eGFP-tagged proteins. Black and red trajectories indicate left and right tracks, respectively

Similar articles

Cited by

References

    1. Alam T. et al. (2014) How to find a leucine in a haystack? Structure, ligand recognition and regulation of leucine-aspartic acid (LD) motifs. Biochem. J., 460, 317–329. - PubMed
    1. Altschul S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. - PMC - PubMed
    1. Arold S.T. et al. (2002) The structural basis of localization and signaling by the focal adhesion targeting domain. Structure, 10, 319–327. - PubMed
    1. Astro V. et al. (2011) Liprin-alpha1 regulates breast cancer cell invasion by affecting cell motility, invadopodia and extracellular matrix degradation. Oncogene, 30, 1841–1849. - PubMed
    1. Brown M.C. et al. (1996) Identification of LIM3 as the principal determinant of paxillin focal adhesion localization and characterization of a novel motif on paxillin directing vinculin and focal adhesion kinase binding. J. Cell Biol., 135, 1109–1123. - PMC - PubMed

Publication types