LLM4Mol

LLM(Large Language Model)4Mol is a comprehensive repository dedicated to the collection and exploration of studies utilizing large language models for molecular design, protein research, and material science. This repository serves as a central hub for researchers, scientists, and enthusiasts interested in leveraging the power of language models for advancing our understanding and applications in these domains. Discover state-of-the-art techniques, novel approaches, and cutting-edge research papers that harness the potential of AI-powered language models in unraveling the complexities of Biomedical Text, RNA/DNA, Molecules, Peptides, Proteins, Antibody, and Materials. Join our vibrant community and contribute to the exciting advancements in the field of LLM4Mol!

🔔Updating ...

Recommendations and references

Generative AI and Deep Learning for molecular/drug design
https://github.com/AspirinCode/papers-for-molecular-design-using-DL

List of papers about Proteins Design using Deep Learning
https://github.com/Peldom/papers_for_protein_design_using_DL

Large Language Models in Chemistry
https://github.com/alxfgh/Large-Language-Models-in-Chemistry

LLM4Biomedical Text

Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health [2023]
Tian, Shubo, Qiao Jin, Lana Yeganova, Po-Ting Lai, Qingqing Zhu, Xiuying Chen, Yifan Yang et al.
arXiv:2306.10070 (2023)
Large language models are universal biomedical simulators [2023]
Schaefer, Moritz, Stephan Reichl, Rob ter Horst, Adele M. Nicolas, Thomas Krausgruber, Francesco Piras, Peter Stepper, Christoph Bock, and Matthias Samwald.
bioRxiv (2023) | code
Fine-tuning large neural language models for biomedical natural language processing [2023]
Tinn, Robert, Hao Cheng, Yu Gu, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon.
Patterns 4.4 (2023) | code
A Platform for the Biomedical Application of Large Language Models [2023]
Lobentanzer, Sebastian, and Julio Saez-Rodriguez.
arXiv:2305.06488v2 | code
Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations [2023]
Chen, Qingyu, Jingcheng Du, Yan Hu, Vipina Kuttichi Keloth, Xueqing Peng, Kalpana Raja, Rui Zhang, Zhiyong Lu, and Hua Xu.
arXiv:2305.16326v1 | code
BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks [2023]
Zhang, K., Yu, J., Yan, Z., Liu, Y., Adhikarla, E., Fu, S., ... & Sun, L.
arXiv:2305.17100v1 | code
BioMedLM: a Domain-Specific Large Language Model for Biomedical Text [2022]
Paper | code

LLM4Small Molecule

Empowering Molecule Discovery for Molecule-Caption Translation with LargeLanguage Models: A ChatGPT Perspective [2023]
Jiatong Li, Yunqing Liu, Wenqi Fan, Xiao-Yong Wei, Hui Liu, Jiliang Tang, Qing Li
arXiv:2306.06615 (2023) | code
Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language [2023]
Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, Huajun Chen
arXiv:2303.03363 (2023) | code
Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models [2023]
Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, Huajun Chen
arXiv:2306.08018v1 | code
MolReGPT: Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [2023]
Li, Jiatong, Yunqing Liu, Wenqi Fan, Xiao-Yong Wei, Hui Liu, Jiliang Tang, and Qing Li.
arXiv:2306.06615v1 | code

LLM4RNA/DNA

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution [2023]
Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, Clayton Rabideau, Stefano Massaroli, Yoshua Bengio, Stefano Ermon, Stephen A. Baccus, Chris Ré.
arXiv:2306.15794v1
DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome [2021]
Ji, Yanrong, Zhihan Zhou, Han Liu, and Ramana V. Davuluri.
Bioinformatics 37.15 (2021) | code

LLM4Peptide

LMPred: predicting antimicrobial peptides using pre-trained language models and deep learning [2022]
Dee, William.
Bioinformatics Advances 2.1 (2022) | code

LLM4Protein

Protein-Protein Interaction Prediction is Achievable with Large Language Models [2023]
Hallee, Logan, and Jason P. Gleghorn.
bioRxiv (2023)
Prediction of virus-host association using protein language models and multiple instance learning [2023]
Liu, Dan, Francesca Young, David L. Robertson, and Ke Yuan.
bioRxiv (2023) | code
Large language models generate functional protein sequences across diverse families [2023]
Madani, Ali, Ben Krause, Eric R. Greene, Subu Subramanian, Benjamin P. Mohr, James M. Holton, Jose Luis Olmos Jr et al.
Nat Biotechnol (2023) | code

LLM4Antibody

On Pre-training Language Model for Antibody [2023]
Wang, Danqing, Y. E. Fei, and Hao Zhou.
ICLR (2023) | code
Efficient evolution of human antibodies from general protein language models [2023]
Hie, Brian L., Varun R. Shanker, Duo Xu, Theodora UJ Bruun, Payton A. Weidenbacher, Shaogeng Tang, Wesley Wu, John E. Pak, and Peter S. Kim.
Nat Biotechnol (2023) | code
AbLang: an antibody language model for completing antibody sequences [2022]
Olsen, Tobias H., Iain H. Moal, and Charlotte M. Deane.
Bioinformatics Advances (2022) | code

LLM4Clinical

Matching Patients to Clinical Trials with Large Language Models [2023]
Jin, Qiao, Zifeng Wang, Charalampos S. Floudas, Jimeng Sun, and Zhiyong Lu.
arXiv:2307.15051 (2023)
ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation [2023]
Wang, Danqing, Y. E. Fei, and Hao Zhou.
arXiv:2306.09968v1

LLM4Chemistry

ChemCrow: Augmenting large-language models with chemistry tools [2023]
Bran, Andres M., Sam Cox, Andrew D. White, and Philippe Schwaller.
arXiv:2304.05376 (2023) | code

LLM4Material

Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT [2023]
Xie, Tong, Yuwei Wa, Wei Huang, Yufei Zhou, Yixuan Liu, Qingyuan Linghu, Shaozhou Wang, Chunyu Kit, Clara Grazian, and Bram Hoex.
arXiv:2304.02213v5
MatSciBERT: A materials domain language model for text mining and information extraction [2022]
Gupta, Tanishq, Mohd Zaki, NM Anoop Krishnan, and Mausam.
npj Comput Mater 8, 102 (2022) | code

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM4Mol

Recommendations and references

Menu

LLM4Biomedical Text

LLM4Small Molecule

LLM4RNA/DNA

LLM4Peptide

LLM4Protein

LLM4Antibody

LLM4Clinical

LLM4Chemistry

LLM4Material

About

Releases

Packages

Contributors 2

License

HHW-zhou/LLM4Mol

Folders and files

Latest commit

History

Repository files navigation

LLM4Mol

Recommendations and references

Menu

LLM4Biomedical Text

LLM4Small Molecule

LLM4RNA/DNA

LLM4Peptide

LLM4Protein

LLM4Antibody

LLM4Clinical

LLM4Chemistry

LLM4Material

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages