ReMeDi: Resources for Multi-domain, Multi-service, Medical Dialogues
Guojun Yan,Jiahuan Pei,Pengjie Ren,Zhaochun Ren,Xin,Huasheng Liang,Maarten de Rijke,Zhumin Chen
DOI: https://doi.org/10.1145/3477495.3531809
2021-01-01
Abstract:\AcpMDS aim to assist doctors and patients with a range of professional medical services, i.e., diagnosis, treatment and consultation. The development of \acpMDS is hindered because of a lack of resources. In particular. \beginenumerate* [label=(\arabic*) ] \item there is no dataset with large-scale medical dialogues that covers multiple medical services and contains fine-grained medical labels (i.e., intents, actions, slots, values), and \item there is no set of established benchmarks for \acpMDS for multi-domain, multi-service medical dialogues. \endenumerate* In this paper, we present \acsReMeDi, a set of \aclReMeDi \acusedReMeDi. ØurResources consists of two parts, the ØurResources dataset and the ØurResources benchmarks. The ØurResources dataset contains 96,965 conversations between doctors and patients, including 1,557 conversations with fine-gained labels. It covers 843 types of diseases, 5,228 medical entities, and 3 specialties of medical services across 40 domains. To the best of our knowledge, the ØurResources dataset is the only medical dialogue dataset that covers multiple domains and services, and has fine-grained medical labels. The second part of the ØurResources resources consists of a set of state-of-the-art models for (medical) dialogue generation. The ØurResources benchmark has the following methods: \beginenumerate* \item pretrained models (i.e., BERT-WWM, BERT-MED, GPT2, and MT5) trained, validated, and tested on the ØurResources dataset, and \item a \acfSCL method to expand the ØurResources dataset and enhance the training of the state-of-the-art pretrained models. \endenumerate* We describe the creation of the ØurResources dataset, the ØurResources benchmarking methods, and establish experimental results using the ØurResources benchmarking methods on the ØurResources dataset for future research to compare against. With this paper, we share the dataset, implementations of the benchmarks, and evaluation scripts.