Chinese story generation of sentence format control based on multi-channel word embedding and novel data format
Jhe-Wei Lin,Rong-Guey Chang
DOI: https://doi.org/10.1007/s00500-021-06548-w
IF: 3.732
2022-01-31
Soft Computing
Abstract:It is very difficult to generate stories in Chinese language. So far, there is no effective method to generate smooth articles. Here proposed a novel approach to improve the generation of Chinese stories in artificial intelligence, in order that it can effectively control the part-of-speech structure in sentence generation to imitate the writer’s writing style. The main proposal consists of three parts. First, the pre-processing of the sentence discards the input as the summary and the output as the text. It uses the format containing < SOS > < MOS > < EOS > for processing and the detailed method is defined in session 4. The second part is for vectorization. Traditional vectorization methods include Word2vec, Fasttext, LexVec and Glove; the different vectorization methods can help data semantic or grammatical understanding. Combining different vectorization methods improves the information of the input data. Therefore, this paper proposes the multi-channel word embedding and the details defined in session 5. The last part contains the optimization of the model architecture and how to control the process of sentence generation effectively. It also rewrites the Bert model proposed by Google to be the proposed model architecture. In addition, the Softmax function had been optimizing to reduce the search time during training and increase the training speed in the model. To make the model have better performance, the necessary training of the generative adversarial network was carried out, and the GAN architecture was revised for the data set, and the detail is defined in session 6. After the model is trained, to effectively control the structure of the generated sentence. This paper proposes a complete generation flowchart. In the process, based on the concept of FP-Tree, all sentences in the data set are built into a tree structure, and the part-of-speech structure of the next sentence is restricted through model generation combined with FP-Tree and the detail is defined in session 7. In addition, the experimental results show that our proposed method can effectively control the results of Chinese story generation and generate sentences with better performance and the detail is defined in session 8.
computer science, artificial intelligence, interdisciplinary applications