Automatic Abstraction of Long Chinese Patent Texts Based on P-Bertsum Model

ChangSheng Zhu,Peng Qin
DOI: https://doi.org/10.1109/ICSP58490.2023.10248581
2023-01-01
Abstract:In response to the problems of loss of key information, deviation of the abstract from the core textual idea and excessive redundancy caused by the traditional text abstract extraction algorithm for long text processing of Chinese patents, the P-BertSum algorithm is proposed, which enables the algorithm to process long (over 1500 words) patent texts with high efficiency by building an extractive text abstract algorithm model for long Chinese patent text contents and generating high quality long (over 200 words) text summaries. The method is based on the improved BertSum algorithm model, using the new CLTPDS patented text dataset, processing long texts by Pooling, transforming Chinese input representations, generating sentence vectors using a pre-trained model, and capturing internal text features and text structure features of sentences to extract summaries. Experiments demonstrate that this method has improved the recall and F-value of ROUGE by more than 13 percentage points compared to existing methods.
What problem does this paper attempt to address?