-
this code uses the
DiSCoVeR
algorithm (Descending from Stochastic Clustering Variance Regression) ([software], [paper]) to predict chemically novel, high-temperature superconductors. The model trains on theSuperCon
data set and predicts through chunks of a curated dataset snapshot based on theNOMAD
(Novel Materials Discovery) database. A chemical validity label is assigned to each composition through a modified version ofSMACT
(semiconducting materials by analogy and chemical theory) ported from CDVAE. -
dens_score.csv
andpeak_score.csv
are the expected output files after runningmain.ipynb
. These contain a weighted score involving superconductor performance (maximize superconducting critical temperature) and chemical novelty, where chemical novelty is defined either using a density-based proxy or a peak-based proxy. These files are reduced to 100,000 formulas due to size. -
In the post processing file,
final_comps_withhighlights.csv
andfinal_comps_nohighlights.csv
are similar tofinal.csv
but after considering the conditional thresholds defined in the paper (is_valid== TRUE & predicted_e_above_hull <= 0.1 & is_theoretical >= 0.95
).
Below is a flowchart that depicts the workflow:
environment.yaml
is the file for the trained model that can be used
To reproduce results:
- Download the csv from here, rename it to
supercon.csv
, and add to directory - Download
NOMAD-unique-reduced-formula.csv
from here and add to directory - Excecute code in
main.ipynb
If you find this work useful, please consider citing the following works.
@article{baird_discover_2022,
title = {{DiSCoVeR}: a materials discovery screening tool for high performance, unique chemical compositions},
volume = {1},
issn = {2635-098X},
shorttitle = {{DiSCoVeR}},
url = {http://xlink.rsc.org/?DOI=D1DD00028D},
doi = {10.1039/D1DD00028D},
language = {en},
number = {3},
urldate = {2022-08-05},
journal = {Digital Discovery},
author = {Baird, Sterling G. and Diep, Tran Q. and Sparks, Taylor D.},
year = {2022},
pages = {226--240},
}
@article{stanev_machine_2018,
title = {Machine learning modeling of superconducting critical temperature},
volume = {4},
issn = {2057-3960},
url = {http://www.nature.com/articles/s41524-018-0085-8},
doi = {10.1038/s41524-018-0085-8},
language = {en},
number = {1},
urldate = {2022-08-05},
journal = {npj Computational Materials},
author = {Stanev, Valentin and Oses, Corey and Kusne, A. Gilad and Rodriguez, Efrain and Paglione, Johnpierre and Curtarolo, Stefano and Takeuchi, Ichiro},
month = dec,
year = {2018},
pages = {29}
}
@article{Baird2022,
author = {Sterling G. Baird},
title = {NOMAD Chemical Formulas and Calculation IDs},
year = {2022},
month = {3},
url = {https://figshare.com/articles/dataset/NOMAD_Chemical_Formulas_and_Calculation_IDs/19319783},
doi = {10.6084/m9.figshare.19319783.v3}
}
@article{draxl_nomad_2019,
title = {The {NOMAD} laboratory: from data sharing to artificial intelligence},
volume = {2},
issn = {2515-7639},
shorttitle = {The {NOMAD} laboratory},
url = {https://iopscience.iop.org/article/10.1088/2515-7639/ab13bb},
doi = {10.1088/2515-7639/ab13bb},
language = {en},
number = {3},
urldate = {2022-08-05},
journal = {Journal of Physics: Materials},
author = {Draxl, Claudia and Scheffler, Matthias},
month = jul,
year = {2019},
pages = {036001}
}
@article{xie2021crystal,
title={Crystal diffusion variational autoencoder for periodic material generation},
author={Xie, Tian and Fu, Xiang and Ganea, Octavian-Eugen and Barzilay, Regina and Jaakkola, Tommi},
journal={arXiv preprint arXiv:2110.06197},
year={2021}
url = {http://arxiv.org/abs/2110.06197},
}
@article{davies_smact_2019,
title = {{SMACT}: {Semiconducting} {Materials} by {Analogy} and {Chemical} {Theory}},
volume = {4},
issn = {2475-9066},
shorttitle = {{SMACT}},
url = {http://joss.theoj.org/papers/10.21105/joss.01361},
doi = {10.21105/joss.01361},
language = {en},
number = {38},
urldate = {2022-08-05},
journal = {Journal of Open Source Software},
author = {Davies, Daniel and Butler, Keith and Jackson, Adam and Skelton, Jonathan and Morita, Kazuki and Walsh, Aron},
month = jun,
year = {2019},
pages = {1361}
}