Papers-on-Protein

Project Overview

We aim to focus on protein sequence prediction by researching relevant models and exploring transferable strategies from multimodal approaches. Our goal is to identify and adapt state-of-the-art methods that can enhance the prediction of protein sequences through innovative and efficient modeling techniques.

Papers-on-Protein

Protein Generation

Function to Sequence

ProtFIM: Fill-in-Middle Protein Sequence Design via Protein Language Models
Lee, Youhan, and Hasun Yu
ICLR, 2023
[Paper]

Equivariant 3D-Conditional Diffusion Models for Molecular Linker Design
Ilia Igashov, Hannes Stärk, Clément Vignac, Arne Schneuing, Victor Garcia Satorras, Pascal Frossard, Max Welling, Michael Bronstein and Bruno Correia
Nature Machine Intelligence
[Paper] [Code]

Linker-Tuning: Optimizing Continuous Prompts for Heterodimeric Protein Prediction
Shuxian Zou, Shentong Mo, Hui Li, Xingyi Cheng, Le Song, Eric Xing
NeurIPS, 2023 (submitted)
[Paper]
Abstract: This paper introduces Linker-Tuning, a method that adapts ESMFold to predict heterodimer structures efficiently, showing significant improvements over baseline models in accuracy and speed.

HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights
Authors not specified in available content
Publication Date: Not Specified
[Paper]
Abstract: HelixFold-Multimer showcases exceptional accuracy in predicting antigen-antibody complexes, making it a promising tool for advancing drug design and therapeutic development.

Pairing Interacting Protein Sequences Using Masked Language Modeling
Authors not specified in available content
Publication Date: Not Specified
[Paper]
Abstract: This study leverages MSA-based transformers for protein sequence pairing, demonstrating superior performance over traditional coevolution methods, particularly in challenging datasets with low sequence diversity.

Leveraging Machine Learning Models for Peptide-Protein Interaction Prediction
Authors not specified in available content
Publication Date: Not Specified
[Paper]
Abstract: This work integrates machine learning techniques, including SVM and Random Forest models, to predict peptide-protein interactions using sequence-based and structure-based features, enhancing prediction accuracy.

Reinforcement Learning for Sequence Design Leveraging Protein Language Models
Authors not specified in the available content
2023
[Paper]
Abstract: This paper presents a modular approach to leverage existing protein language models within a reinforcement learning framework, focusing on generating protein sequences through mutation policies.

ReLSO: A Transformer-based Model for Latent Space Optimization and Generation of Proteins
Authors not specified in the available content
2023
[Paper]
Abstract: ReLSO integrates sequence and fitness information into a jointly trained autoencoder, optimizing protein sequences by modeling the sequence-function landscape.

Diffusion Language Models Are Versatile Protein Learners
Authors not specified in the available content
2023
[Paper]
Abstract: This work blends diffusion models and language models for protein learning, utilizing discrete diffusion over sequence data for effective modeling of protein structures and interactions.

Protein Sequence Design with Batch Bayesian Optimisation
Authors not specified in the available content
2023
[Paper]
Abstract: The study introduces a Bayesian optimization approach to protein sequence design, focusing on exploring the proximal frontier of the fitness landscape to find high-fitness mutants.

Network and Sequence-Based Prediction of Protein-Protein Interactions
Authors not specified in the available content
2023
[Paper]
Abstract: The paper models protein interactions using sequence similarity and biological indices, predicting interactions based on evolutionary and functional similarities among protein sequences.

PiFold: Toward effective and efficient protein inverse folding
Zhangyang Gao, Cheng Tan, and Stan Z. Li
ICLR, 2023
[Paper] [Code]

Reprogramming Pretrained Language Models for Antibody Sequence Infilling
Igor Melnyk, Vijil Chenthamarakshan, Pin-Yu Chen, Payel Das, Amit Dhurandhar, Inkit Padhi, and Devleena Das
arXiv Preprint
[Paper]

Alphafold Distillation For Improved Inverse Protein Folding
Igor Melnyk, Aurelie Lozano, Payel Das, and Vijil Chenthamarakshan
arXiv Preprint
[Paper]

Protein Design And Variant Prediction Using Autoregressive Generative Models
Jung-Eun Shin, Adam Riesselman, Kollasch, Conor McMahon, Elana Simon, Chris Sander, Aashish Manglik, Andrew Kruse, and Debora Marks
Nature Communications, 2021
[Paper]

Protein Sequence Design with a Learned Potential
Namrata Anand, Raphael R. Eguchi, Alexander Derry, Russ B. Altman, Po-Ssu Huang.
Preprint
[Paper]

Regression Transformer Enables Concurrent Sequence Regression And Generation For Molecular Language Modelling
Jannis Born, and Matteo Manica
Nature Machine Intelligence
[Paper]

Towards Controllable Protein Design With Conditional Transformers
Noelia Ferruz, and Birte Höcker
Preprint
[Paper]

Robust Deep Learning Based Protein Sequence Design Using ProteinMPNN
J. Dauparas, I. Anishchenko, N. Bennett, H. Bai, R. J. Ragotte, L. F. Milles, B. I. M. Wicky, A. Courbet, R. J. de Haas, N. Bethel, P. J. Y. Leung, T. F. Huddy, S. Pellock, D. Tischer, F. Chan, B. Koepnick, H. Nguyen, A. Kang, B. Sankaran, A. K. Bera, N. P. King, and D. Baker
Science, 2022
[Paper]

Linker-Tuning: Optimizing Continuous Prompts for Heterodimeric Protein Prediction
Shuxian Zou, Shentong Mo, Hui Li, Xingyi Cheng, Le Song, Eric Xing
NeurIPS, 2023 (submitted)
[Paper]
Keywords: protein structure prediction, protein language models, parameter-efficient training
Abstract: This paper introduces Linker-Tuning, a method that adapts ESMFold to predict heterodimer structures efficiently, showing significant improvements over baseline models in accuracy and speed.

Accurate structure prediction of biomolecular interactions with AlphaFold 3
Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, Lindsay Willmore, Andrew J. Ballard, Joshua Bambrick, Sebastian W. Bodenstein, David A. Evans, Chia-Chun Hung, Michael O’Neill, David Reiman, Kathryn Tunyasuvunakool, Zachary Wu, Akvilė Žemgulytė, Eirini Arvaniti, Charles Beattie, Ottavia Bertolli, Alex Bridgland, Alexey Cherepanov, Miles Congreve, Alexander I. Cowen-Rivers, Andrew Cowie, Michael Figurnov, Fabian B. Fuchs, Hannah Gladman, Rishub Jain, Yousuf A. Khan, Caroline M. R. Low, Kuba Perlin, Anna Potapenko, Pascal Savy, Sukhdeep Singh, Adrian Stecula, Ashok Thillaisundaram, Catherine Tong, Sergei Yakneen, Ellen D. Zhong, Michal Zielinski, Augustin Žídek, Victor Bapst, Pushmeet Kohli, Max Jaderberg, Demis Hassabis & John M. Jumper
Nature
Keywords: Diffusion-based architecture, Protein structure modelling, Biomolecular space modelling
This paper introduces AlphaFold 3, which uses a diffusion-based architecture to accurately predict biomolecular interactions and protein structures.
[Paper]

Highly Accurate Protein Structure Prediction with AlphaFold
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021).
[Paper]

High-resolution De Novo Structure Prediction from Primary Sequence
Wu, R., Ding, F., Wang, R., Shen, R., Zhang, X., Luo, S., ... & Peng, J. (2022).
[Paper]

A backbone-centred energy function of neural networks for protein design
B Huang, Y Xu, X Hu, Y Liu, S Liao, J Zhang, C Huang
Nature
Keywords: Energy function, MD simulation, Backbone-centred
The study presents a backbone-centred energy function that integrates neural networks and MD simulations for efficient protein design.
[Paper]

De novo protein design by deep network hallucination
Ivan Anishchenko, Samuel J. Pellock, Tamuka M. Chidyausiku, Theresa A. Ramelot, Sergey Ovchinnikov, Jingzhou Hao, Khushboo Bafna, Christoffer Norn, Alex Kang, Asim K. Bera, Frank DiMaio, Lauren Carter, Cameron M. Chow, Gaetano T. Montelione & David Baker
Nature
Keywords: Hallucination, Inpainting, Protein design
The paper explores a novel approach for de novo protein design using deep network hallucination and inpainting techniques.
[Paper]

Design of protein-binding proteins from the target structure alone
Longxing Cao, Brian Coventry, Inna Goreshnik, Buwei Huang, William Sheffler, Joon Sung Park, Kevin M. Jude, Iva Marković, Rameshwar U. Kadam, Koen H. G. Verschueren, Kenneth Verstraete, Scott Thomas Russell Walsh, Nathaniel Bennett, Ashish Phal, Aerin Yang, Lisa Kozodoy, Michelle DeWitt, Lora Picton, Lauren Miller, Eva-Maria Strauch, Nicholas D. DeBouver, Allison Pires, Asim K. Bera, Samer Halabiya, Bradley Hammerson, Wei Yang, Steffen Bernard, Lance Stewart, Ian A. Wilson, Hannele Ruohola-Baker, Joseph Schlessinger, Sangwon Lee, Savvas N. Savvides, K. Christopher Garcia & David Baker
Nature
Keywords: Binding site
This research focuses on designing protein-binding proteins using only the target structure, enhancing binding affinity and specificity.
[Paper]

Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations
Payel Das, Tom Sercu, Kahini Wadhawan, Inkit Padhi, Sebastian Gehrmann, Flaviu Cipcigan, Vijil Chenthamarakshan, Hendrik Strobelt, Cicero dos Santos, Pin-Yu Chen, Yi Yan Yang, Jeremy P. K. Tan, James Hedrick, Jason Crain & Aleksandra Mojsilovic
Nature Biomedical Engineering
Keywords: Antimicrobials, Generative autoencoder, Molecular dynamics
The paper discusses a method for accelerated antimicrobial discovery using deep generative models coupled with molecular dynamics simulations.
[Paper]

Discovering de novo peptide substrates for enzymes using machine learning
Lorillee Tallorin, JiaLei Wang, Woojoo E. Kim, Swagat Sahu, Nicolas M. Kosa, Pu Yang, Matthew Thompson, Michael K. Gilson, Peter I. Frazier, Michael D. Burkart & Nathan C. Gianneschi
Nature Communications
Keywords: Enzymes design, Machine learning
This study leverages machine learning to discover de novo peptide substrates for enzyme design, improving enzyme efficiency.
[Paper]

ECNet is an evolutionary context-integrated deep learning framework for protein engineering
Yunan Luo, Guangde Jiang, Tianhao Yu, Yang Liu, Lam Vo, Hantian Ding, Yufeng Su, Wesley Wei Qian, Huimin Zhao & Jian Peng
Nature Communications
Keywords: Functional fitness, Evolutionary
The research presents ECNet, a deep learning framework integrating evolutionary context for improved protein engineering and functional fitness prediction.
[Paper]

Deep generative models create new and diverse protein structures
Zeming Lin, Tom Sercu, Yann LeCun
ICML
Keywords: Diversity, Generative model, Protein design
This research demonstrates the use of deep generative models to create diverse and novel protein structures, enhancing the potential for new protein functionalities.
[Paper]

Protein generation with evolutionary diffusion: sequence is all you need
Sarah Alamdari, Nitya Thakkar, Rianne van den Berg, Alex Xijie Lu, Nicolo Fusi, Ava Pardis Amini, Kevin K Yang
Arxiv
Keywords: Diffusion model, Deep generative model, Protein generation, Framework, Sequence design
The study explores the application of evolutionary diffusion models in protein generation, emphasizing sequence design.
[Paper]

A high-level programming language for generative protein design
Brian Hie, Salvatore Candido, Zeming Lin, Ori Kabeli, Roshan Rao, Nikita Smetanin, Tom Sercu, Alexander Rives
Arxiv
Keywords: ESMFold, Language model, Energy-based
The paper introduces a high-level programming language tailored for generative protein design, leveraging ESMFold and energy-based models for efficient design.
[Paper]

ProLLaMA: A Protein Large Language Model for Multi-Task Protein Language Processing
Liuzhenghao Lv, Zongying Lin, Hao Li, Yuyang Liu, Jiaxi Cui, Calvin Yu-Chian Chen, Li Yuan, Yonghong Tian
Arxiv
[Paper]

Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling
Elnaggar, A., Essam, H., Salah-Eldin, W., Moustafa, W., Elkerdawy, M., Rochereau, C., & Rost, B.
Technical University of Munich, Proteinea, Inc., Columbia University
[Paper] [Code]

Design Proteins Using Large Language Models: Enhancements and Comparative Analyses
Zeinalipour, K., Jamshidi, N., Bianchini, M., Maggini, M., & Gori, M.
University of Siena, DIISM
[Paper]

Unifying Sequences, Structures, and Descriptions for Any-to-Any Protein Generation with the Large Multimodal Model HelixProtX
Chen, Z., Chen, T., Xie, C., Xue, Y., Zhang, X., Zhou, J., & Fang, X.
Baidu Inc
[Paper]

Functional Protein Design with Local Domain Alignment
Yuan, C., Li, S., Ye, G., Zhang, Y., Huang, L. K., Huang, W., ... & Rong, Y.
Tencent AI Lab, Tsinghua University, Renmin University of China, Peking University
[Paper]

TooT-PLM-P2S: Incorporating Secondary Structure Information into Protein Language Models
Ghazikhani, H., & Butler, G.
Concordia University
[Paper]

Energy Efficient Protein Language Models: Leveraging Small Language Models with LoRA for Controllable Protein Generation
Aayush Shah, Shankar Jayaratnam
Esperanto Technologies
[Paper]

ProtDAT: A Unified Framework for Protein Sequence Design from Any Protein Text Description
Xiao-Yu Guo, Yi-Fan Li, Yuan Liu, Xiaoyong Pan, Hong-Bin Shen
Shanghai Jiao Tong
[Paper]

InstructBioMol: Advancing Biomolecule Understanding and Design Following Human Instructions
Xiang Zhuang, Keyan Ding, Tianwen Lyu, Yinuo Jiang, Xiaotong Li, Zhuoyi Xiang, Zeyuan Wang, Ming Qin, Kehua Feng, Jike Wang, Qiang Zhang, and Huajun Chen
Zhejiang University
[Paper]

Conditional Enzyme Generation Using Protein Language Models with Adapters
Jason Yang, Aadyot Bhatnagar, Jeffrey A. Ruffolo, Ali Madani
California Institute of Technology
[Paper]

DPLM-2: A Multimodal Diffusion Protein Language Model
Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu
Nanjing University
[Paper]

Function to Structure

Protein Sequence and Structure Co-Design with Equivariant Translation
Chence Shi, Chuanrui Wang, Jiarui Lu, Bozitao Zhong, and Jian Tang
ICLR, 2023
[Paper] [Code]

Protein Sequence and Structure Co-Design with Equivariant Translation
Nathan C. Frey, Daniel Berenberg, Karina Zadorozhny, Joseph Kleinhenz, Julien Lafrance-Vanasse, Isidro Hotzel, Yan Wu, Stephen Ra, Richard Bonneau, Kyunghyun Cho, Andreas Loukas, Vladimir Gligorijevic, and Saeed Saremi
ICLR, 2024
[Paper]

De novo design of protein structure and function with RFdiffusion
Joseph L. Watson, David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, Andrew J. Borst, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Nikita Hanikel, Samuel J. Pellock, Alexis Courbet, William Sheffler, Jue Wang, Preetham Venkatesh, Isaac Sappington, Susana Vázquez Torres, Anna Lauko, Valentin De Bortoli, Emile Mathieu, Sergey Ovchinnikov, Regina Barzilay, Tommi S. Jaakkola, Frank DiMaio, Minkyung Baek and David Baker
Nature, 2023
[Paper] [Code]

Scaffolding protein functional sites using deep learning
Jue Wang, Sidney Lisanza, David Juergens, Doug Tischer, Joseph L. Watson, Karla M. Castro, Robert Ragotte, Amijai Saragovi, Lukas F. Milles, Minkyung Baek, Ivan Anishchenko, Wei Yang, Derrick R. Hicks, Marc Expòsit, Thomas Schlichthaerle, Jung-Ho Chun, Justas Dauparas, Nathaniel Bennett, Basile I. M. Wicky, Andrew Muenks, Frank DiMaio, Bruno Correia, Sergey Ovchinnikov, David Baker
Science
Keywords: Functional site, Deep learning, Hallucination, Inpainting
This paper highlights a deep learning approach for scaffolding protein functional sites, incorporating hallucination and inpainting techniques to enhance functionality.
[Paper]

Conditional Antibody Design as 3D Equivariant Graph Translation
Xiangzhe Kong, Wenbing Huang, Yang Liu
ICML
Keywords: Antibody design, Graph translation
The study focuses on conditional antibody design using 3D equivariant graph translation to improve antibody binding and specificity.
[Paper]

Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models
Joseph L. Watson, David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, Andrew J. Borst, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Nikita Hanikel, Samuel J. Pellock, Alexis Courbet, William Sheffler, Jue Wang, Preetham Venkatesh, Isaac Sappington, Susana Vázquez Torres, Anna Lauko, Valentin De Bortoli, Emile Mathieu, Regina Barzilay, Tommi S. Jaakkola, Frank DiMaio, Minkyung Baek, David Baker
Arxiv
Keywords: Diffusion, General deep learning framework, De novo binder design
The research integrates structure prediction networks with diffusion generative models for accurate and broadly applicable protein design.
[Paper]

Function-guided protein design by deep manifold sampling
Vladimir Gligorijević, Daniel Berenberg, Stephen Ra, Simon Kelow, Kyunghyun Cho
Arxiv
Keywords: Sequence denoising autoencoder, Deep manifold sampling
This paper presents a function-guided approach to protein design using deep manifold sampling and sequence denoising autoencoders.
[Paper]

Deep sharpening of topological features for de novo protein design
Zander Harteveld, Joshua Southern, Michaël Defferrard, Andreas Loukas, Pierre Vandergheynst, Micheal Bronstein, Bruno Correia
ICML
Keywords: Variational autoencoder, Topological features, Sharpen
[Paper]

Protein Representation Learning

Protein Representation Learning via Knowledge Enhanced Primary Structure Modeling
Hong-Yu Zhou, Yunxiang Fu, Zhicheng Zhang, Cheng Bian, and Yizhou Yu
ICLR, 2023
[Paper]

Protein Representation Learning By Geometric Structure Pretraining
Jiaxin Xie, Hao Ouyang, Jingtan Piao, Chenyang Lei, Qifeng Chen
ICLR, 2023
[Paper]

Multi-Level Protein Structure Pre-Training With Prompt Learning
Titas Anciukevicius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J. Mitra, Paul Guerrero
ICLR, 2023
[Paper]

Protein Representation Learning by Geometric Structure Pretraining
Zuobai Zhang, Minghao Xu, Arian Jamasb, Vijil Chenthamarakshan, Aurelie Lozano, Payel Das, Jian Tang
Arxiv
Keywords: Drug discovery, Drug design, Generative models of new molecular structures
The study proposes a geometric structure pretraining approach for protein representation learning, aimed at improving drug discovery and design.
[Paper]

Language models generalize beyond natural proteins
Robert Verkuil, Ori Kabeli, Yilun Du, Basile I. M. Wicky, Lukas F. Milles, Justas Dauparas, David Baker, Sergey Ovchinnikov, Tom Sercu, Alexander Rives
Arxiv
Keywords: ESMFold, Language model, Fixed backbone design
This research shows how language models can generalize beyond natural proteins, offering new insights into protein structure prediction and design.
[Paper]

ProtGO: Function-Guided Protein Modeling for Unified Representation Learning
Hu, B., Tan, C., Xu, Y., Gao, Z., Xia, J., Wu, L., & Li, S. Z.
[Paper]

InstructPLM: Aligning Protein Language Models to Follow Protein Structure Instructions
Qiu, J., Xu, J., Hu, J., Cao, H., Hou, L., Gao, Z., ... & Chen, G.
[Paper]

ProSST: Protein Language Modeling with Quantized Structure and Disentangled Attention
Li, M., Tan, Y., Ma, X., Zhong, B., Yu, H., Zhou, Z., ... & Tan, P.
[Paper]

ProtDAT: A Unified Framework for Protein Sequence Design from Any Protein Text Description
Guo, X. Y., Li, Y. F., Liu, Y., Pan, X., & Shen, H. B.
[Paper]

Protein Understanding

LucaOne: Generalized Biological Foundation Model with Unified Nucleic Acid and Protein Language
Yong He, Pan Fang, Yongtao Shan, Yuanfei Pan, Yanhong Wei, Yichang Chen, Yihao Chen, Yi Liu, Zhenyu Zeng, Zhan Zhou, Feng Zhu, Edward C. Holmes, Jieping Ye, Jun Li, Yuelong Shu, Mang Shi, and Zhaorong Li
ArXiv
[Paper] [Code]

SaProt: Protein Language Modeling with Structure-aware Vocabulary
Jin Su, Chenchen Han, Yuyang Zhou, Junjie Shan, Xibin Zhou, and Fajie Yuan
ArXiv
[Paper] [Code]

BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
Yizhen Luo, Jiahuan Zhang, Siqi Fan, Kai Yang, Yushuai Wu, Mu Qiao, and Zaiqing Nie
ArXiv
[Paper] [Code]

PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language Models
Eli M Carrami, Sahand Sharifzadeh
ArXiv
[Paper] [Code]

InstructProtein: Aligning Human and Protein Language via Knowledge Instruction
Zeyuan Wang, Qiang Zhang, Keyan Ding, Ming Qin, Xiang Zhuang, Xiaotong Li, and Huajun Chen
ArXiv
[Paper] [Code]

ProtT3: Protein-to-Text Generation for Text-based Protein Understanding
Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, and Tat-Seng Chua
ArXiv
[Paper] [Code]

Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Junhong Shen, Neil Tenenholtz, James Brian Hall, David Alvarez-Melis, and Nicolo Fusi
ArXiv
[Paper] [Code]

Benchmark

On Pre-Trained Language Models For Antibody
Danqing Wang, Fei Ye, and Hao Zhou
arXiv Preprint
[Paper]

PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding
Minghao Xu, Zuobai Zhang, Jiarui Lu, Zhaocheng Zhu, Yangtian Zhang, Chang Ma, Runcheng Liu, and Jian Tang
NeurIPS, 2022
[Paper] [Project Page]

Multimodal Transfer Reference

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Jun Zhan, Junqi Dai, Jiasheng Ye, Yunhua Zhou, Dong Zhang, Zhigeng Liu, Xin Zhang, Ruibin Yuan, Ge Zhang, Linyang Li, Hang Yan, Jie Fu, Tao Gui, Tianxiang Sun, Yugang Jiang, Xipeng Qiu
Fudan University, Multimodal Art Projection Research Community, Shanghai AI Laboratory
[Paper]
Abstract: AnyGPT introduces a unified framework that integrates multimodal data into a single language model through discrete sequence modeling, facilitating seamless understanding and generation across various modalities.

SHOW-O: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie, Weijia Mao, Zechen Bai, David Junhao Zhang, Weihao Wang, Kevin Qinghong Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Zheng Shou
Show Lab, National University of Singapore; ByteDance
Abstract: SHOW-O presents a unified Transformer architecture that integrates multimodal understanding and generation, enabling efficient and high-quality performance across diverse tasks involving visual and textual data.

Semantic Alignment for Multimodal Large Language Models
Tao Wu, Mengze Li, Jingyuan Chen, Wei Ji, Wang Lin, Jinyang Gao, Kun Kuang, Zhou Zhao, Fei Wu
Zhejiang University, National University of Singapore, Alibaba Group
Abstract: This work introduces a novel approach for aligning semantics in multimodal large language models, enhancing the coherence and consistency of information across different modalities to improve understanding and generation tasks.

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Abdin, M., Jacobs, S. A., Awan, A. A., Aneja, J., Awadallah, A., Awadalla, H., ... & Zhou, X.
Microsoft
[Paper]

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Ye, J., Xu, H., Liu, H., Hu, A., Yan, M., Qian, Q., ... & Zhou, J.
Alibaba Group
[Paper] [Code]

MM-Interleaved: Interleaved Image-Text Generation via Multi-modal Feature Synchronizer
Tian, C., Zhu, X., Xiong, Y., Wang, W., Chen, Z., Wang, W., ... & Dai, J.
OpenGVLab, Shanghai AI Laboratory, MMLab, CUHK, Tsinghua University, SenseTime Research, University of Toronto, Fudan University, Nanjing University, CAIR, HKISI, CAS
[Paper] [Code]

CogAgent: A Visual Language Model for GUI Agents
Hong, W., Wang, W., Lv, Q., Xu, J., Yu, W., Ji, J., ... & Tang, J.
Tsinghua University, Zhipu AI
[Paper] [Code]

Dataset Construction

Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, Huajun Chen
ICLR 2024
[Paper]

A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding
Yiqing Shen, Zan Chen, Michail Mamalakis, Luhan He, Haiyang Xia, Tianbin Li, Yanzhou Su, Junjun He, Yu Guang Wang
Submitted on 8 Jun 2024, last revised 8 Jul 2024
[Paper]

MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
Zhao, H., Cai, Z., Si, S., Ma, X., An, K., Chen, L., ... & Chang, B.
Peking University, BIGAI, University of Washington, Beijing Jiaotong University
[Paper] [Code]

Otter: A Multi-Modal with In-Context Instruction Tuning
Li, B., Zhang, Y., Chen, L., Wang, J., Yang, J., & Liu, Z.
S-Lab
[Paper] [Code]

Lightweight In-Context Tuning for Multimodal Unified Models
Chen, Y., Zhang, S., Han, B., & Jia, J.
The Chinese University of Hong Kong, Amazon Web Services
[Paper]

Understanding Multimodal Instruction Format for In-Context Learning
Ma, Y., Li, C., & Xiao, C.
[Paper]

SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
Chen, Y. S., Song, Y. Z., Yeo, C. Y., Liu, B., Fu, J., & Shuai, H. H.
National Yang Ming Chiao Tung University, Microsoft Research Asia
[Paper]

Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning
Najdenkoska, I., Zhen, X., & Worring, M.
University of Amsterdam
[Paper]

Towards More Unified In-context Visual Understanding
Sheng, D., Chen, D., Tan, Z., Liu, Q., Chu, Q., Bao, J., ... & Yu, N.
University of Science and Technology of China, Anhui Province Key Laboratory of Digital Security, Microsoft, Beijing Institute of Technology, Beijing Electronic Science and Technology Institute
[Paper]

Inpainting Protein Sequence and Structure with ProtFill
Kozlova, E., Valentin, A., & Gutierrez, D. N. Z.
[Paper]

Explainability

Concept Bottleneck Language Models for Protein Design
Aya Abdelsalam Ismail, Tuomas Oikarinen, Amy Wang, Julius Adebayo, Samuel Stanton, Taylor Joren, Joseph Kleinhenz, Allen Goodman, Hector Corrada Bravo, Kyunghyun Cho, Nathan C. Frey
[Paper]

Metric

Scoring Function for Automated Assessment of Protein Structure Template Quality
Zhang, Y., & Skolnick, J. (2004).
[Paper]

lDDT: A Local Superposition-Free Score for Comparing Protein Structures and Models Using Distance Difference Tests
Mariani, V., Biasini, M., Barbato, A., & Schwede
[Paper]

Local Fitness Landscape of The Green Fluorescent Protein
Sarkisyan, K. S., Bolotin, D. A., Meer, M. V., Usmanova, D. R., Mishin, A. S., Sharonov, G. V., ... & Kondrashov, F. A.
[Paper]

Deepsol: A Deep Learning Framework for Sequence-Based Protein Solubility Prediction
Khurana, S., Rawi, R., Kunji, K., Chuang, G. Y., Bensmail, H., & Mall, R.
[Paper]

Deeploc: Prediction of Protein Subcellular Localization Using Deep Learning
Almagro Armenteros, J. J., Sønderby, C. K., Sønderby, S. K., Nielsen, H., & Winther, O.
[Paper]

Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data
Gray, V. E., Hause, R. J., Luebeck, J., Shendure, J., & Fowler, D. M.
[Paper]

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Papers-on-Protein

Protein Generation

Function to Sequence

Function to Structure

Protein Representation Learning

Protein Understanding

Benchmark

Multimodal Transfer Reference

Dataset Construction

Explainability

Metric

About

Releases

Packages

Contributors 3

wxrui182/Papers-on-Protein

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Papers-on-Protein

Protein Generation

Function to Sequence

Function to Structure

Protein Representation Learning

Protein Understanding

Benchmark

Multimodal Transfer Reference

Dataset Construction

Explainability

Metric

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages