The reuse of public datasets in the life sciences: potential risks and rewards
- PMID: 33024631
- PMCID: PMC7518187
- DOI: 10.7717/peerj.9954
The reuse of public datasets in the life sciences: potential risks and rewards
Abstract
The 'big data' revolution has enabled novel types of analyses in the life sciences, facilitated by public sharing and reuse of datasets. Here, we review the prodigious potential of reusing publicly available datasets and the associated challenges, limitations and risks. Possible solutions to issues and research integrity considerations are also discussed. Due to the prominence, abundance and wide distribution of sequencing data, we focus on the reuse of publicly available sequence datasets. We define 'successful reuse' as the use of previously published data to enable novel scientific findings. By using selected examples of successful reuse from different disciplines, we illustrate the enormous potential of the practice, while acknowledging the respective limitations and risks. A checklist to determine the reuse value and potential of a particular dataset is also provided. The open discussion of data reuse and the establishment of this practice as a norm has the potential to benefit all stakeholders in the life sciences.
Keywords: Bioinformatics; Computational biology; Data science; Databases; Genomics; Open science; Reuse; Sequencing data.
© 2020 Sielemann et al.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures
Similar articles
-
Data reuse and the open data citation advantage.PeerJ. 2013 Oct 1;1:e175. doi: 10.7717/peerj.175. eCollection 2013. PeerJ. 2013. PMID: 24109559 Free PMC article.
-
A game theoretic analysis of research data sharing.PeerJ. 2015 Sep 8;3:e1242. doi: 10.7717/peerj.1242. eCollection 2015. PeerJ. 2015. PMID: 26401453 Free PMC article.
-
Perspectives on tracking data reuse across biodata resources.Bioinform Adv. 2024 Apr 25;4(1):vbae057. doi: 10.1093/bioadv/vbae057. eCollection 2024. Bioinform Adv. 2024. PMID: 38721398 Free PMC article.
-
An overview of publicly available patient-centered prostate cancer datasets.Transl Androl Urol. 2019 Mar;8(Suppl 1):S64-S77. doi: 10.21037/tau.2019.03.01. Transl Androl Urol. 2019. PMID: 31143673 Free PMC article. Review.
-
Challenges of big data integration in the life sciences.Anal Bioanal Chem. 2019 Oct;411(26):6791-6800. doi: 10.1007/s00216-019-02074-9. Epub 2019 Aug 28. Anal Bioanal Chem. 2019. PMID: 31463515 Review.
Cited by
-
Conserved amino acid residues and gene expression patterns associated with the substrate preferences of the competing enzymes FLS and DFR.PLoS One. 2024 Aug 28;19(8):e0305837. doi: 10.1371/journal.pone.0305837. eCollection 2024. PLoS One. 2024. PMID: 39196921 Free PMC article.
-
Data literacy in genome research.J Integr Bioinform. 2023 Dec 5;20(4):20230033. doi: 10.1515/jib-2023-0033. eCollection 2023 Dec 1. J Integr Bioinform. 2023. PMID: 38047760 Free PMC article. Review.
-
New Insights into the Identity of the DFNA58 Gene.Genes (Basel). 2022 Dec 2;13(12):2274. doi: 10.3390/genes13122274. Genes (Basel). 2022. PMID: 36553541 Free PMC article.
-
Best practices for data management and sharing in experimental biomedical research.Physiol Rev. 2024 Jul 1;104(3):1387-1408. doi: 10.1152/physrev.00043.2023. Epub 2024 Mar 7. Physiol Rev. 2024. PMID: 38451234 Review.
-
A data science roadmap for open science organizations engaged in early-stage drug discovery.Nat Commun. 2024 Jul 5;15(1):5640. doi: 10.1038/s41467-024-49777-x. Nat Commun. 2024. PMID: 38965235 Free PMC article. Review.
References
-
- Abolfathi B, Aguado DS, Aguilar G, Prieto CA, Almeida A, Ananna TT, Anders F, Anderson SF, Andrews BH, Anguiano B, Aragón-Salamanca A, Argudo-Fernández M, Armengaud E, Ata M, Aubourg E, Avila-Reese V, Badenes C, Bailey S, Balland C, Barger KA, Barrera-Ballesteros J, Bartosz C, Bastien F, Bates D, Baumgarten F, Bautista J, Beaton R, Beers TC, Belfiore F, Bender CF, Bernardi M, Bershady MA, Beutler F, Bird JC, Bizyaev D, Blanc GA, Blanton MR, Blomqvist M, Bolton AS, Boquien M, Borissova J, Bovy J, Bradna Diaz CA, Nielsen Brandt W, Brinkmann J, Brownstein JR, Bundy K, Burgasser AJ, Burtin E, Busca NG, Cañas CI, Cano-Díaz M, Cappellari M, Carrera R, Casey AR, Sodi BC, Chen Y, Cherinka B, Chiappini C, Choi PD, Chojnowski D, Chuang C-H, Chung H, Clerc N, Cohen RE, Comerford JM, Comparat J, Do Nascimento JC, Da Costa L, Cousinou M-C, Covey K, Crane JD, Cruz-Gonzalez I, Cunha K, Ilha GS, Damke GJ, Darling J, Davidson JW, Jr, Dawson K, De Icaza Lizaola MAC, Macorra A, De la Torre S, De Lee N, Sainte Agathe V, Deconto Machado A, Dell’Agli F, Delubac T, Diamond-Stanic AM, Donor J, Downes JJ, Drory N, Mas des Bourboux H, Duckworth CJ, Dwelly T, Dyer J, Ebelke G, Eigenbrot AD, Eisenstein DJ, Elsworth YP, Emsellem E, Eracleous M, Erfanianfar G, Escoffier S, Fan X, Alvar EF, Fernandez-Trincado JG, Cirolini RF, Feuillet D, Finoguenov A, Fleming SW, Font-Ribera A, Freischlad G, Frinchaboy P, Fu H, Chew YGM, Galbany L, García Pérez AE, Garcia-Dias R, García-Hernández DA, Garma Oehmichen LA, Gaulme P, Gelfand J, Gil-Marín H, Gillespie BA, Goddard D, González Hernández JI, Gonzalez-Perez V, Grabowski K, Green PJ, Grier CJ, Gueguen A, Guo H, Guy J, Hagen A, Hall P, Harding P, Hasselquist S, Hawley S, Hayes CR, Hearty F, Hekker S, Hernandez J, Hernandez Toledo H, Hogg DW, Holley-Bockelmann K, Holtzman JA, Hou J, Hsieh B-C, Hunt JAS, Hutchinson TA, Hwang HS, Jimenez Angel CE, Johnson JA, Jones A, Jönsson H, Jullo E, Sakil Khan F, Kinemuchi K, Kirkby D, Kirkpatrick IV CC, Kitaura F-S, Knapp GR, Kneib J-P, Kollmeier JA, Lacerna I, Lane RR, Lang D, Law DR, Le Goff J-M, Lee Y-B, Li H, Li C, Lian J, Liang Y, Lima M, Lin L, Long D, Lucatello S, Lundgren B, Mackereth JT, MacLeod CL, Mahadevan S, Geimba Maia MA, Majewski S, Manchado A, Maraston C, Mariappan V, Marques-Chaves R, Masseron T, Masters KL, McDermid RM, McGreer ID, Melendez M, Meneses-Goytia S, Merloni A, Merrifield MR, Meszaros S, Meza A, Minchev I, Minniti D, et al. The fourteenth data release of the Sloan Digital Sky Survey: first spectroscopic data from the extended Baryon Oscillation Spectroscopic Survey and from the second phase of the Apache Point Observatory Galactic Evolution Experiment. Astrophysical Journal Supplement Series. 2018;235(2):42. doi: 10.3847/1538-4365/aa9e8a. - DOI
Grants and funding
LinkOut - more resources
Full Text Sources