Text2MDT: Extracting Medical Decision Trees from Medical Texts

Wei Zhu,Wenfeng Li,Xing Tian,Pengfei Wang,Xiaoling Wang,Jin Chen,Yuanbin Wu,Yuan Ni,Guotong Xie
2024-01-04
Abstract:Knowledge of the medical decision process, which can be modeled as medical decision trees (MDTs), is critical to build clinical decision support systems. However, the current MDT construction methods rely heavily on time-consuming and laborious manual annotation. In this work, we propose a novel task, Text2MDT, to explore the automatic extraction of MDTs from medical texts such as medical guidelines and textbooks. We normalize the form of the MDT and create an annotated Text-to-MDT dataset in Chinese with the participation of medical experts. We investigate two different methods for the Text2MDT tasks: (a) an end-to-end framework which only relies on a GPT style large language models (LLM) instruction tuning to generate all the node information and tree structures. (b) The pipeline framework which decomposes the Text2MDT task to three subtasks. Experiments on our Text2MDT dataset demonstrate that: (a) the end-to-end method basd on LLMs (7B parameters or larger) show promising results, and successfully outperform the pipeline methods. (b) The chain-of-thought (COT) prompting method \cite{Wei2022ChainOT} can improve the performance of the fine-tuned LLMs on the Text2MDT test set. (c) the lightweight pipelined method based on encoder-based pretrained models can perform comparably with LLMs with model complexity two magnititudes smaller. Our Text2MDT dataset is open-sourced at \url{
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the automatic extraction of medical decision trees (MDTs) from medical texts. Specifically, the current methods for constructing medical decision trees are highly dependent on the time - consuming and cumbersome manual annotation process, which is not only inefficient but also difficult to absorb the latest research results in a timely manner. These problems impede the construction, dissemination and maintenance of large - scale clinical decision support systems (CDSS). Therefore, the paper proposes a new task named Text2MDT, aiming to explore how to automatically extract medical decision trees from medical texts such as medical guidelines and textbooks to improve the efficiency and accuracy of this process. To achieve this goal, the authors carried out the following work: 1. **Define the task**: The specific goal of the Text2MDT task was clearly defined, that is, to automatically generate medical decision trees from the given medical texts. 2. **Construct the dataset**: A Chinese - labeled dataset containing 500 text - decision - tree pairs was created, and these datasets were labeled with the participation of medical experts. 3. **Research on methods**: Two different methods to complete the Text2MDT task were explored: - **End - to - end framework**: Only rely on the instruction tuning of large - language models (LLMs) to generate all node information and tree structures. - **Pipeline framework**: Decompose the Text2MDT task into three sub - tasks: triple extraction, node grouping and tree assembly. 4. **Experiment and evaluation**: Systematic experiments were carried out on the constructed Text2MDT dataset, and the results showed that: - The end - to - end method based on LLMs (with a parameter quantity of 7B or larger) performed excellently and successfully surpassed the pipeline method. - The chain - of - thought (COT) prompting method can further improve the performance of the fine - tuned LLMs on the Text2MDT test set. - The encoder - based lightweight pipeline method can perform equivalently to LLMs with a model complexity two orders of magnitude lower. Through these works, the paper not only proposes a new task and method, but also provides an open dataset and source code, providing a basis for future research.