An Exact Transformation of Convolutional Kernels Enables Accurate Identification of Sequence Motifs

Yang Ding,Jingyi Li,Meng Wang,Ge Gao
2018-01-01
Abstract:Motivation: The powerful learning ability of a convolutional neural network (CNN) to perform functional classification of DNA/RNA sequences could provide valuable clues for the discovery of underlying biological mechanisms. Currently, however, the only way to interpret the direct application of a convolutional kernel to DNA/RNA sequences is the heuristic construction of a position weight matrix (PWM) from fragments scored highly by that kernel; whether the resulting PWM still performs the sequence classification well is unclear. Results: We developed a novel kernel-to-PWM transformation whose result is theoretically provable. Specifically, we proved that the log-likelihood of the resulting PWM of any DNA/RNA sequence is exactly the sum of a constant and the convolution of the original kernel on the same sequence. Importantly, we further proved that the resulting PWM demonstrates the same performance, in theory, as the original kernel under popular CNN frameworks. As expected, the transformation rivaled the performance of the heuristic PWMs in terms of sequence classification, whether the discriminative motif was sequence- or structure-conserved. The transformation also faithfully reproduced the output of trained CNN models where the heuristic one failed. These results compelled us to further develop a maximum likelihood estimation of the optimal PWM for each kernel and a back-transformation of predefined PWMs into kernels. These tools can benefit the biological interpretation of kernel signals. Availability: Python scripts for the transformation from kernel to PWM, the inverted transformation from PWM to kernel, and the maximum likelihood estimation of optimal PWM are available through ftp://ftp.cbi.pku.edu.cn/pub/software/CBI/k2p or https://github.com/gao-lab/kernel-to-PWM .
What problem does this paper attempt to address?