BASE: a web service for providing compound-protein binding affinity prediction datasets with reduced similarity bias
Deep learning-based drug-target affinity (DTA) prediction models have shown high performance but suffer from dataset bias. Our study investigates this bias using comprehensive databases and demonstrates that compound-protein binding affinity can often be predicted using compound features alone, due to high similarity among target proteins. We developed bias-reduced datasets by decreasing protein similarity between training and test sets, which improved model performance and balanced feature importance.
We introduce the Binding Affinity Similarity Explorer (BASE) web service, which offers bias-reduced datasets and prediction results to aid in the development of generalized and robust DTA models. BASE is freely available at https://synbi2024.kaist.ac.kr/base.
To run the project locally, clone the repository:
git clone https://github.com/yourusername/HJ-DTA-DataBias.git
cd HJ-DTA-DataBias
Hyojin Son*, Sechan Lee, Jaeuk Kim, Haangik Park, Myeong-Ha Hwang and Gwan-Su Yi†
- Journal: BMC Bioinformatics
- First Author(*): hyojin0912@kaist.ac.kr
- Corresponding Author(†): gwansuyi@kaist.ac.kr
This work was supported by the BK-21 program through National Research Foundation of Korea (NRF) under Ministro of Education.
The code in this repository is licensed under the MIT License. See the LICENSE file for more details.
The data in the data
folder is licensed under the CC BY 4.0 International license. See the LICENSE file for more details.