Introduction

The oncologic relevance of the DCBLD1 gene was first described in a genome-wide association study that identified a susceptibility locus in the DCBLD1 promoter in women with never-smoking lung cancer1. Three subsequent studies have reported an association between single nucleotide polymorphisms (SNPs) in the DCBLD1 promoter region and higher risks of never-smoking cancers for human papillomavirus-negative head and neck squamous cell carcinoma (HNSCC), lung cancer, and female lung adenocarcinoma (LUAD)2,3,4. SNPs in this locus were associated with higher DCBLD1 expression in HNSCC and lung cancer2,5, and with overall survival in two other studies, one on female non-smoking patients with lung cancer, and another in non-small cell lung carcinoma (NSCLC)4,6. The prognostic value of DCBLD1 gene expression was also demonstrated for HNSCC and LUAD by multivariate and univariate analyses respectively2,5. Furthermore, NSCLC xenografts in mice using A549 cells showed that knockdown of the DCBLD1 gene greatly impaired tumor growth in vivo7.

Little is known about the function of the DCBLD1 protein, a transmembrane protein with extracellular CUB, LCCL and F5/8 type C domains8. These features are similar to the extracellular domain of neuropilins (NRPs), which are transmembrane proteins that bind to semaphorins and have established roles in neural axon guidance8. DCBLD1 may have a similar function as NRPs; DCBLD1 was shown to be upregulated in mitral cells during the olfactory learning of rat pups while co-localizing with semaphorin 4c and correlated with extracellular matrix (ECM) remodeling, suggesting a role in neural development9. In oncology, NRPs are involved in breast, prostate, renal and pancreatic cancers, mainly through their interactions with integrins10,11,12,13. By comparison, gene ontology analysis of DCBLD1 expression in HNSCC showed a strong upregulation of the integrin signaling pathway in patients with high DCBLD1 expression, suggesting another parallel with NRPs2.

The intracellular region of DCBLD1 contains seven highly conserved YxxP motifs that are involved in phosphorylation-dependent binding to the CRKL signaling protein14. The Abl and Fyn kinases are responsible for this phosphorylation14,15. Phosphoprotemic bioinformatic analysis showed that the phosphorylation of four of these YxxP sites was altered following the inhibition of EGFR and MET and following HGF and FGF2 stimulation, suggesting an association of DCBLD1 with receptor tyrosine kinases2. Thus far, the only study on the interactome of DCBLD1 involved proteomic analysis on HEK 293 cells transfected with a FLAG-tagged DCBLD1 plasmid15. With this overexpression model, the FLAG immunoprecipitate revealed that the majority of DCBLD1 interactors were adaptor proteins and proteins associated with actin dynamics15.

Although multiple studies have now described SNPs in the DCBLD1 promoter region in different oncology contexts1,2,3,4,5,6,7, the role of DCBLD1 itself in cancer and its function in both normal and oncology settings remain poorly understood. The objective of this study was to evaluate the prognostic value of DCBLD1 in the four most frequent solid cancers16: NSCLC, breast, colorectal, and prostate cancers. Two independent cohorts were used for each cancer. We also investigated the DCBLD1 status in The Cancer Genome Atlas (TCGA) PanCancer Atlas to determine the potential oncogenic mechanism of DCBLD1.

Results

Patient cohorts

NSCLC, invasive breast carcinoma, colorectal adenocarcinoma and prostate adenocarcinoma were each represented by two cohorts for the analysis of DCBLD1 gene expression and cancer outcome. Age, sex and stage distribution of these cohorts are shown in Table 1.

Table 1 Demographics and clinical parameters of the cohorts evaluated in the outcome analysis.

DCBLD1 gene expression and cancer outcome

We previously evaluated the role of the DCBLD1 gene in association with patient outcome in HNSCC2. For NSCLC, this association was only tested in univariate analysis on one cohort5, and nothing is yet known for breast, colorectal and prostate cancers. We examined if DCBLD1 gene expression had prognostic value in the eight cohorts of this study, using multivariate analysis with age, sex (when appropriate) and stage. The hazard ratio (HR) was based on the range of DCBLD1 expression levels, which was analyzed as a numerical variable. A higher HR was associated with a higher risk for patients who had high DCBLD1 gene expression. This type of analysis is less biased and more stringent than a cut-off based analysis17. It also shows a continuity of the risk increase through the variable distribution. Age was also analyzed as a continuous variable. Patients were not subdivided by sex for prostate and breast cancers as there were only 12 males in the TCGA cohort and no male in the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) cohort. Stages were grouped as stage 1 and 2 versus stage 3 and 4 to prevent bias from the low frequency of some stages for certain cancers. These stage groupings resulted in the smallest group to be n = 36 (for stage 3–4, NSCLC GSE81089 cohort).

For NSCLC, both TCGA and GSE81089 cohorts had reproducible significant results. High DCBLD1 gene expression and stage were significantly associated with a worst overall survival, while age and sex had no significant effect (Table 2). When the same analysis was performed without stratifying stage into two groups, and histology and smoking status were added to the model, high DCBLD1 gene expression was again associated with a worst overall survival (Supplementary Table 1). Age (Fig. 1A,B), sex (Fig. 1C,D), stage (Fig. 1E,F), smoking history (Fig. 1G,H) and histology (Fig. 1I,J) were not reproducibly associated with DCBLD1 gene expression for NSCLC.

Table 2 Multivariate Cox proportional hazards analysis of cancer outcomes.
Figure 1
figure 1

DCBLD1 in the TCGA and GSE81089 cohorts of non-small cell lung carcinoma. Comparison of DCBLD1 gene expression for age (A, B), sex (C, D), stage (E, F), tobacco use (G, H) and histology (I, J).

For invasive breast carcinoma, the TCGA and METABRIC cohorts also showed reproducible significant results. High DCBLD1 gene expression, age and stage were significantly associated with a worst overall survival (Table 2). Specifically, DCBLD1 expression was associated with the PAM50 molecular subtypes in both cohorts (Fig. 2A,B). Basal-like and HER2-enriched breast cancers had significantly higher DCBLD1 expression compared to the normal-like and luminal subtypes in the METABRIC cohort. In the TCGA cohort, the HER2-enriched subtype was not significantly different from the normal-like and luminal subtypes, but we observed lower DCBLD1 expression in the basal-like subtype. Since PAM50 subtypes carry prognostic value18, this may partly explain a DCBLD1 association with survival. Indeed, adding the PAM50 parameter to the multivariate model in both cohorts lowered the association of DCBLD1 gene expression with overall survival with P values of 0.05 for the TCGA cohort and 0.02 for the METABRIC cohorts (Supplementary Table 2). Age (Fig. 2C,D) and stage (Fig. 2E,F) were not reproducibly associated with DCBLD1 gene expression for invasive breast carcinoma.

Figure 2
figure 2

DCBLD1 in the TCGA and METABRIC cohorts of invasive breast cancer. Comparison of DCBLD1 gene expression for PAM50 subtypes (A, B), age (C, D) and stage (E, F).

For colorectal adenocarcinoma, no association was found between DCBLD1 and overall survival in both the TCGA and the GSE14333 cohorts (Table 2). Also, no association was observed between DCBLD1 gene expression and age, sex or stage in colorectal adenocarcinoma (Supplementary Fig. 1).

For prostate adenocarcinoma, biochemical recurrence was evaluated instead of overall survival due to the low numbers of deceased patients. Biochemical recurrence in prostate cancer is defined as a rise in the blood level of prostate-specific antigen following surgery, and it precedes clinical disease recurrence19. Only 10 of 496 participants were deceased in the TCGA cohort (median follow-up: 30.5 years). No association was found between DCBLD1 and biochemical recurrence in both the TCGA and GSE70770 cohorts (Table 2). Also, no association was observed between DCBLD1 gene expression and age or stage in prostate adenocarcinoma (Supplementary Fig. 2).

DCBLD1 expression in tumor tissues

DCBLD1 gene expression was investigated in paired tumor tissue and normal adjacent tissue in the NSCLC (n = 108), invasive breast carcinoma (n = 111), colorectal adenocarcinoma (n = 32) and prostate adenocarcinoma (n = 52) TCGA cohorts (Fig. 3). Only participants with RNAseq results for both tissues were considered for this analysis. Statistically significant higher DCBLD1 in tumor tissue was observed for all four cancers with median of 1.47 fold for NSCLC, 1.54 fold for invasive breast carcinoma, 1.39 fold for colorectal adenocarcinoma and 1.25 fold for prostate adenocarcinoma.

Figure 3
figure 3

Evaluation of DCBLD1 expression in tumor tissue. Comparison of DCBLD1 gene expression in normal tissue adjacent to the tumor and tumor tissue for NSCLC (A), invasive breast carcinoma (B), colorectal adenocarcinoma (C) and prostate adenocarcinoma (D) TCGA cohorts.

DCBLD1 mutations and copy number alterations in cancer

DCBLD1 was investigated within the TCGA PanCancer Atlas. Occurrence of mutations was evaluated, resulting in only 109 patients of 10,953 (1.0%) harboring mutations in the DCBLD1 protein coding region, and no single mutation was present in more than four total cases (0.04%) (Supplementary Fig. 3). The only cancer with over 3% of mutations was uterine cancer with 33 cases out of 517 (6.4%) (Supplementary Fig. 4). Copy number alterations were rare with only 36 patients (0.3%) showing amplification and 77 patients (0.7%) showing a deep deletion of DCBLD1 (Supplementary Fig. 5).

Upregulated and downregulated genes and pathways in patients with high DCBLD1 expression

To understand the implications of high DCBLD1 expression, we compared the RNA-seq gene expression profiles of 50 patients with the highest DCBLD1 expression versus 50 participants with the lowest DCBLD1 expression in each of the four TCGA cohorts independently. We evaluated pathway enrichment in the four cohorts using the PANTHER pathway database (Table 3). Patients with high DCBLD1 expression had strong upregulation of the integrin signaling pathway in comparison to patients with low DCBLD1 expression for both NSCLC and invasive breast cancer. No pathway was upregulated in the colorectal and prostate cancer cohorts. Also, no pathway was downregulated in patients with high DCBLD1 expression in all four cohorts.

Table 3 Significantly upregulated pathways for DCBLD1 high expression patients in the four TCGA cohorts.

There is three cancers for which high DCBLD1 expression has been associated to a worse overall survival and upregulation of the integrin signaling pathway: NSCLC, invasive breast carcinoma and HNSCC, which was previously published2. Evaluating the common changes in those three cancers for DCBLD1 high cases will allow to better understand DCBLD1 role in oncology and to further clarify how DCBLD1 is associated with the integrin signaling pathway. For this study, 37 common genes were differentially regulated between patients of high and low DCBLD1 expression for NSCLC, invasive breast carcinoma and HNSCC. All these genes were upregulated in the high DCBLD1 expression group with the exception of STRBP, which was downregulated. Interactions between these genes were evaluated using STRING protein interaction analysis with the highest confidence interval (0.9) (Fig. 4). This analysis allows to build the connectivity network of those genes for physical and functional interactions, using bioinformatics to combine publicly available sources of data20. A strong association was observed between ITGB1, ACTN1, ACTN4, VCL, PXN, TLN1, PLAU, PLAUR and SRPX2. An association between TIMP2, MMP2 and MMP14 was also observed. Other genes did not integrate in the network. DCBLD1 itself did not associate in the STRING connectivity network, although it was expected as its function is still undetermined. On the other hand, since DCBLD1 high expression is the common point for this analysis, it is likely that DCBLD1 should be inserted in those pathways. Further in vitro experiments will be necessary to determine where and if this association is physical or functional. For other genes which did not associate in the connectivity network, they are either unrelated to that network or their association has not yet been shown.

Figure 4
figure 4

Differentially expressed genes in patients with high DCBLD1 expression NSCLC, invasive breast carcinoma and HNSCC. STRING protein interaction analysis of the 37 genes differentially regulated for patients with high DCBLD1 expression (n = 50) in comparison to those with low DCBLD1 expression (n = 50) in the TCGA cohorts for NSCLC, invasive breast carcinoma and HNSCC TCGA cohorts. The network shows results for the highest confidence interval (0.9) interaction scores on STRING v11 (https://string-db.org/). STRBP has lower expression, while the 36 other genes have higher expression in patients with high DCBLD1 expression.

Discussion

In this study, we showed that DCBLD1 gene expression is prognostic of overall survival in NSCLC and breast cancer. For NSCLC and HNSCC, the association of germinal SNPs in the DCBLD1 promoter region has been clearly established, especially for patients who are non-smokers or have no classical cancer risk factors1,2,3,4,5,6,7. Moreover, DCBLD1 copy number alterations and mutations in the protein coding region are rare. This suggests that high DCBLD1 expression in tumors may arise from SNPs in the promoter region modifying gene regulation or alterations in transcription factors, or both. These SNPs may have similar implications for invasive breast carcinoma, particularly for subtypes that are more likely to harbor germline mutations. Basal-like cancers are usually triple-negative breast cancers, which harbor more germline mutations of BRCA1 and BRCA221,22. In this study, we showed that basal-like cancers had a high expression of DCBLD1 in comparison to other subtypes. Whether this involves germline SNPs in the DCBLD1 promoter region is unknown and will need further study to determine an affiliation with this subtype.

In three cancers (NSCLC, invasive breast carcinoma and HNSCC) for which DCBLD1 had prognostic value, high DCBLD1 expression showed statistically significant upregulation of the integrin signaling pathway. In contrast, high and low DCBLD1 expression showed no difference in the cancers for which DCBLD1 had no prognostic value. We hypothesized that an oncogenic role for DCBLD1 was associated with the activation of the integrin signaling pathway.

Using STRING protein interaction analysis, the upregulated genes in patients with high DCBLD1 expression revealed an important network of nine proteins that centered on ITGB1. ITGB1 is a transmembrane integrin that interacts with the ECM and stimulates cell–matrix adhesion when bound to a phosphorylated ACTN123,24. VCL, PXN and TLN1 are adapter proteins that bind to ITGB1 and ACTN1, forming a link between ITGB1 and actin filaments25,26. These five proteins are central components of focal adhesions, which allow the intracellular actin cytoskeleton to associate with the ECM27,28. The reminder of the nine identified proteins includes ACTN4, which shares 86.7% amino acid sequence with ACTN1 and also binds to ITGB1, but its role in regulating focal adhesions is less clear29. PLAU and its receptor PLAUR are involved in the proteolysis of the ECM and mediate cleavage of ITGA6, which forms the heterodimeric laminin receptor with ITGB130,31. SRPX2 is another PLAUR ligand32, but its association with focal adhesions is unclear. Lastly, TIMP2, MMP2 and MMP14 are mediators of ECM degradation associated with tumor metastasis33. For NSCLC, invasive breast carcinoma and HNSCC, the upregulation of all these proteins in conjunction with high DCBLD1 expression strongly suggests that DCBLD1 is involved in focal adhesions and therefore, cell migration.

Previous in vitro experiments reveal that the DCBLD1 interactome consists mainly of adaptor proteins and proteins associated with actin dynamics15, further implicating a role for DCBLD1 in focal adhesions and supporting the idea that DCBLD1 is a NRP-like protein. Both NRP1 and NRP2 are involved in focal adhesions: NRP1 regulates focal adhesion turnover and NRP2 regulates α6β1 integrin association with the cytoskeleton11,34. Although the extracellular domains of DCBLD1 and NRP are very similar, their intracellular domains are completely different. NRP1 and NRP2 have small intracellular domains of 44 and 42 amino acids, respectively, whereas DCBLD1 has a more complex intracellular domain of 235 amino acids with multiple YxxP motifs8,14. The exact role of DCBLD1 in focal adhesion formation is yet to be discovered, but the role of focal adhesion turnover in tumor cell migration has already been established27 and may provide insight into the poor prognosis among potentially aggressive cancers with high DCBLD1 expression. Upregulation of the integrin signaling pathway was not observed for colorectal and prostate cancers, the prognosis was similar for patients with high or low DCBLD1 expression within these cohorts, suggesting that DCBLD1 activity is cell-type dependent. On the other hand, we also showed that DCBLD1 expression is higher in tumor tissue for all four cancers. The fact that DCBLD1 was upregulated also in cancers for which it has no prognostic value suggests that factors regulating DCBLD1 might be generally regulated in cancer. Identifying how DCBLD1 gene expression is regulated might help understand why it has a prognostic value in some cancers.

Association between cancer migration and patient survival is well established, and more specifically for breast cancer, a migration transcriptomic signature was previously published and showed to predict overall survival for that cohort35. We hypothesize that DCBLD1 expression prognostic value also comes from DCBLD1 association with migration through upregulation of the integrin signaling pathway and perhaps more importantly regulation of focal adhesion. Since one study evaluating the oncological role of DCBLD1 using the A549 lung adenocarcinoma cell line showed a decrease in xenograft tumor growth when using a stable DCBLD1 knockdown cell line7, it is reasonable to hypothesize that DCBLD1 has a regulating role in those pathways. This also suggests that DCBLD1 could be a potential therapeutic target.

The retrospective design of this study was the main limitation of the study and may have introduced bias and confounding factors. We used multivariate analysis when examining patient outcome to overcome this limitation as much as possible, although it is likely that some confounding factors were not included in the multivariate analysis. To limit censor bias, we used overall survival and biochemical recurrence as outcomes as they are well defined. Also outcome was evaluated until 95% of the patients were either censored or deceased to prevent potential bias from some rare patients with very long follow-up. Immortal time bias was prevented because samples were taken at surgery (day 0) in those cohorts, on the same specimen as pathology assessment. Another limitation was that analyses focused on RNA expression data and not actual protein levels. The prognostic value of measured DCBLD1 protein levels in NSCLC, invasive breast carcinoma and HNSCC will warrant further studies.

Conclusion

Using multiple cancer cohorts, this study showed that DBCLD1 is associated with the integrin signaling pathway and focal adhesions, and has prognostic value for NSCLC and invasive breast carcinoma. Given that germline SNPs in DCBLD1 are associated with non-smoking lung and head and neck cancers, and demonstrate prognostic value in other cancers, further studies are needed to evaluate its potential as a therapeutic target.

Materials and methods

Study cohorts

This retrospective study included multiple independent cohorts. NSCLC was represented by the NSCLC TCGA cohort, which combined the TCGA Firehose Legacy LUAD36 and the TCGA Firehose Legacy Lung Squamous Cell Carcinoma (LSCC)37 cohorts, and by the GSE81089 NSCLC38 cohort. For invasive breast carcinoma, the TCGA Firehose Legacy Breast Invasive Carcinoma39 and the METABRIC40 cohorts were investigated. For colorectal adenocarcinoma, the TCGA Firehose Legacy Colorectal Adenocarcinoma41 and the GSE1433342 cohorts were analyzed. For prostate adenocarcinoma, the TCGA Firehose Legacy Prostate Adenocarcinoma43 and the GSE7077044,45 cohorts were examined. For HNSCC, the TCGA provisional cohort46 was used.

Clinical data from the TCGA and METABRIC cohorts were obtained using cBioPortal for Cancer Genomics (www.cbioportal.com)47,48. TCGA mRNAseq data was extracted from FirehoseR (gdac.broadinstitute.org)49. Data from the GSE14333, GSE81089 and GSE70770 cohorts were extracted from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus database50. Every participant of these cohorts with RNA expression and outcome data were included in the study. Analysis of the TCGA PanCancer Atlas51 was performed with the cBioPortal47,48. No patient was excluded for this analysis. The data for upregulated and downregulated genes for HNSCC was previously published2.

Clinical characteristics

DCBLD1 gene expression was measured by RNA-seq for all TCGA cohorts and GSE81089, and measured by array for the METABRIC, GSE14333 and GSE70770 cohorts. DCBLD1 gene expression was normalized using z-score normalization of the log expression value, except for tumor versus normal adjacent tissue comparison where it was analyzed as log2 RSEM.

For multivariate analysis, DCBLD1 gene expression and age were analyzed as numerical variables. Stages were subdivided into two groups: stage 1 and 2 versus stage 3 and 4. Patients were subdivided by sex when possible except for invasive breast carcinoma, which had only 12 males in the TCGA cohort and no males in the METABRIC cohort. Outcome was evaluated until 95% of the patients were either censored or deceased to prevent potential bias from some rare patients with very long follow-up. Samples were taken at surgery (day 0) in those cohorts, on the same specimen as pathology assessment. For NSCLC, tobacco use (ever versus never users) and histology (LUAD versus LSCC) were also assessed. For invasive breast carcinoma, PAM50 subtypes were also evaluated. For prostate adenocarcinoma, biochemical recurrence was used.

Statistics

HR was evaluated using the multivariate Cox proportional hazards analysis for the multivariate survival prediction model. The significance of the gene expression variations was determined by Student’s t-test and Tukey’s honest significance test for nominal variables. DBCLD1 gene expression and age association was evaluated using linear regression. DCBLD1 gene expression comparison for paired normal and tumor tissue was done using a paired Student’s t-test. Pathway enrichment was evaluated using the PANTHER pathway database52. For the PANTHER pathways annotation data set, Fisher’s exact test was corrected using FDR. Interactions between proteins was determined using STRING v11 (https://string-db.org/) with the highest confidence threshold (0.9)20.

Tests of statistical significance were two-sided and P values less than 0.05 were considered significant with *P < 0.05, **P < 0.01 and ***P < 0.001. Statistical analysis was performed using JMP 12.0.1 statistical software (SAS Institute Inc) with the exception of the whole exome analysis, which was performed using GraphPad Prism 7.04 (GraphPad Software Inc.).