Fine-grained Artificial Neurons in Audio-transformers for Disentangling Neural Auditory Encoding.
Mengyue Zhou,Xu Liu,David Liu,Zihao Wu,Zhengliang Liu,Lin Zhao,Dajiang Zhu,Lei Guo,Junwei Han,Tianming Liu,Xintao Hu
DOI: https://doi.org/10.18653/v1/2023.findings-acl.503
2023-01-01
Abstract:The Wav2Vec and its variants have achieved unprecedented success in computational auditory and speech processing.Meanwhile, neural encoding studies that link representations of Wav2Vec to brain activities have provided novel insights into how auditory and speech processing unfold in the human brain.Most existing neural encoding studies treat each transformer encoding layer in Wav2Vec as a single artificial neuron (AN).That is, the layerlevel embeddings are used to predict neural responses.The layer-level embedding aggregates multiple types of contextual attention captured by multi-head self-attention (MSA).Thus, the layer-level ANs lack fine-granularity for neural encoding.To address this limitation, we define the elementary units, i.e., each hidden dimension, as neuron-level ANs in Wav2Vec2.0,quantify their temporal responses, and couple those ANs with their biological-neuron (BN) counterparts in the human brain.Our experimental results demonstrated that: 1) The proposed neuron-level ANs carry meaningful neurolinguistic information; 2) Those ANs anchor to their BN signatures; 3) The AN-BN anchoring patterns are interpretable from a neurolinguistic perspective.More importantly, our results suggest an intermediate stage in both the computational representation in Wav2Vec2.0and the cortical representation in the brain.Our study validates the fine-grained ANs in Wav2Vec2.0,which may serve as a novel and general strategy to link transformer-based deep learning models to neural responses for probing sensory processing in the brain.