Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization
Pengcheng He,Baolin Peng,Liyang Lu,Song Wang,Jie Mei,Yang Liu,Ruochen Xu,Hany Hassan Awadalla,Yu Shi,Chenguang Zhu,Wayne Xiong,Michael Zeng,Jianfeng Gao,Xuedong Huang
2022-08-21
Abstract:This paper presents Z-Code++, a new pre-trained language model optimized for
abstractive text summarization. The model extends the state of the art
encoder-decoder model using three techniques. First, we use a two-phase
pre-training process to improve model's performance on low-resource
summarization tasks. The model is first pre-trained using text corpora for
language understanding, and then is continually pre-trained on summarization
corpora for grounded text generation. Second, we replace self-attention layers
in the encoder with disentangled attention layers, where each word is
represented using two vectors that encode its content and position,
respectively. Third, we use fusion-in-encoder, a simple yet effective method of
encoding long sequences in a hierarchical manner. Z-Code++ creates new state of
the art on 9 out of 13 text summarization tasks across 5 languages. Our model
is parameter-efficient in that it outperforms the 600x larger PaLM-540B on
XSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and
few-shot settings, our model substantially outperforms the competing models.
Artificial Intelligence,Computation and Language