GraphCL-DTA: a graph contrastive learning with molecular semantics for drug-target binding affinity prediction

Xinxing Yang,Genke Yang,Jian Chu
2023-07-18
Abstract:Drug-target binding affinity prediction plays an important role in the early stages of drug discovery, which can infer the strength of interactions between new drugs and new targets. However, the performance of previous computational models is limited by the following drawbacks. The learning of drug representation relies only on supervised data, without taking into account the information contained in the molecular graph itself. Moreover, most previous studies tended to design complicated representation learning module, while uniformity, which is used to measure representation quality, is ignored. In this study, we propose GraphCL-DTA, a graph contrastive learning with molecular semantics for drug-target binding affinity prediction. In GraphCL-DTA, we design a graph contrastive learning framework for molecular graphs to learn drug representations, so that the semantics of molecular graphs are preserved. Through this graph contrastive framework, a more essential and effective drug representation can be learned without additional supervised data. Next, we design a new loss function that can be directly used to smoothly adjust the uniformity of drug and target representations. By directly optimizing the uniformity of representations, the representation quality of drugs and targets can be improved. The effectiveness of the above innovative elements is verified on two real datasets, KIBA and Davis. The excellent performance of GraphCL-DTA on the above datasets suggests its superiority to the state-of-the-art model.
Machine Learning,Information Retrieval,Quantitative Methods
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to predict the binding affinity between drugs and targets more accurately in the early stage of drug discovery. Specifically, existing computational models have limitations in the following aspects: 1. **Drug representation learning depends on supervised data**: Most existing methods only rely on the supervised data of known drug - target binding affinities to learn drug representations, without making full use of the information of molecular graphs themselves. This leads to a dependence on a large amount of supervised data, increasing time and cost. 2. **Complex representation learning module design**: Many studies tend to design complex representation learning modules, but overlook an important indicator of representation quality - uniformity. Uniformity is used to measure the quality of representation, but it has often been ignored in previous models. 3. **Loss functions do not consider the uniformity of representations**: Existing models usually use the mean squared error (MSE) as a loss function for parameter optimization, but this loss function does not take into account the uniformity of drug representations and target representations, thus affecting the performance of the model. To overcome these limitations, the authors proposed the GraphCL - DTA model, which improves the prediction of drug - target binding affinities through the following innovations: 1. **Graph contrastive learning framework**: A graph contrastive learning framework was designed to learn drug representations by using the semantic information of molecular graphs without the need for additional supervised data. By adding controllable random noise in the drug representation space to generate contrastive views, this framework can preserve the semantic information of molecular graphs and learn more essential and effective drug representations. 2. **Loss function for optimizing representation uniformity**: A new loss function was designed that can directly adjust the uniformity of drug representations and target representations. Based on the MSE loss function, the new loss function adds a regularization term to directly optimize the uniformity of representations, thereby improving the representation quality and the predictive ability of the model. 3. **Experimental verification**: Extensive experiments were carried out on two publicly available datasets (KIBA and Davis), and the results show that the GraphCL - DTA model outperforms the existing state - of - the - art models on these datasets. In summary, the main contribution of this paper lies in the design of a graph contrastive learning framework that can learn more effective drug representations without relying on additional supervised data, and further improves the performance of the model by optimizing the uniformity of representations.