GLGE: A New General Language Generation Evaluation Benchmark
Dayiheng Liu,Yu Yan,Yeyun Gong,Weizhen Qi,Hang Zhang,Jian Jiao,Weizhu Chen,Jie Fu,Linjun Shou,Ming Gong,Pengcheng Wang,Jiusheng Chen,Daxin Jiang,Jiancheng Lv,Ruofei Zhang,Winnie Wu,Ming Zhou,Nan Duan
DOI: https://doi.org/10.18653/v1/2021.findings-acl.36
2020-01-01
Abstract:Multi-task benchmarks such as GLUE and Su-perGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP).These benchmarks mostly focus on a range of Natural Language Understanding (NLU) tasks, without considering the Natural Language Generation (NLG) models.In this paper, we present the General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks.For each task, we continue to design three subtasks in terms of task difficulty (GLGE-Easy, GLGE-Medium, and GLGE-Hard).This introduces 24 subtasks to comprehensively compare model performance.To encourage research on pretraining and transfer learning on NLG models, we make GLGE publicly available and build a leaderboard with strong baselines including MASS, BART, and Prophet-Net 1 .