An enterprise portrait tag extraction method based on context embedding and knowledge distillation

Xiang Li,Xingshuo Ding,Quanyin Zhu,Jialin Ma,Hong Zhou,Jizhou Sun
DOI: https://doi.org/10.1007/s00500-024-09911-9
IF: 3.732
2024-07-19
Soft Computing
Abstract:Enterprise portraits are reliant on the meticulous process of tag extraction, with the accuracy and efficiency of this foundational task being pivotal for the construction of high-quality portraits and their subsequent applicability in varied real-world scenarios. However, current methodologies, chiefly semantic clustering and deep neural networks, are constrained by significant shortcomings: the former struggles with intricate text feature discernment, while the latter, although high-performing, are marked by their extensive computational requirements. In response to these challenges, we present EPCEKD, a novel approach leveraging context embedding and knowledge distillation, designed to refine the tag extraction process. EPCEKD utilizes structured enterprise context data to facilitate feature expansion and impose tag constraints, enabling the efficient vectorization and integration of enterprise context data and text. Further, EPCEKD amalgamates a large-scale BERT network and a streamlined GRU network, embedding the branch network into the backbone network to forge a unified tag extraction model. To validate the efficacy of EPCEKD, comprehensive experiments were conducted on the Ente-pku dataset and benchmarked against eight prevailing models, with EPCEKD demonstrating superior performance, achieving 95.17% in precision, recall, and F1-score, significantly surpassing models like BiLSTM-CNN (94.25%) and Transformer (91.46%). Importantly, EPCEKD not only excels in precision and F1-score but also enhances the computing speed of large-scale neural networks by 4 11.4 times, showcasing its substantial potential for practical enterprise portrait applications.
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?