Text2MDT: Extracting Medical Decision Trees from Medical Texts

Wei Zhu,Wenfeng Li,Xing Tian,Pengfei Wang,Xiaoling Wang,Jin Chen,Yuanbin Wu,Yuan Ni,Guotong Xie

2024-01-04

Abstract:Knowledge of the medical decision process, which can be modeled as medical decision trees (MDTs), is critical to build clinical decision support systems. However, the current MDT construction methods rely heavily on time-consuming and laborious manual annotation. In this work, we propose a novel task, Text2MDT, to explore the automatic extraction of MDTs from medical texts such as medical guidelines and textbooks. We normalize the form of the MDT and create an annotated Text-to-MDT dataset in Chinese with the participation of medical experts. We investigate two different methods for the Text2MDT tasks: (a) an end-to-end framework which only relies on a GPT style large language models (LLM) instruction tuning to generate all the node information and tree structures. (b) The pipeline framework which decomposes the Text2MDT task to three subtasks. Experiments on our Text2MDT dataset demonstrate that: (a) the end-to-end method basd on LLMs (7B parameters or larger) show promising results, and successfully outperform the pipeline methods. (b) The chain-of-thought (COT) prompting method \cite{Wei2022ChainOT} can improve the performance of the fine-tuned LLMs on the Text2MDT test set. (c) the lightweight pipelined method based on encoder-based pretrained models can perform comparably with LLMs with model complexity two magnititudes smaller. Our Text2MDT dataset is open-sourced at \url{

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the automatic extraction of medical decision trees (MDTs) from medical texts. Specifically, the current methods for constructing medical decision trees are highly dependent on the time - consuming and cumbersome manual annotation process, which is not only inefficient but also difficult to absorb the latest research results in a timely manner. These problems impede the construction, dissemination and maintenance of large - scale clinical decision support systems (CDSS). Therefore, the paper proposes a new task named Text2MDT, aiming to explore how to automatically extract medical decision trees from medical texts such as medical guidelines and textbooks to improve the efficiency and accuracy of this process. To achieve this goal, the authors carried out the following work: 1. **Define the task**: The specific goal of the Text2MDT task was clearly defined, that is, to automatically generate medical decision trees from the given medical texts. 2. **Construct the dataset**: A Chinese - labeled dataset containing 500 text - decision - tree pairs was created, and these datasets were labeled with the participation of medical experts. 3. **Research on methods**: Two different methods to complete the Text2MDT task were explored: - **End - to - end framework**: Only rely on the instruction tuning of large - language models (LLMs) to generate all node information and tree structures. - **Pipeline framework**: Decompose the Text2MDT task into three sub - tasks: triple extraction, node grouping and tree assembly. 4. **Experiment and evaluation**: Systematic experiments were carried out on the constructed Text2MDT dataset, and the results showed that: - The end - to - end method based on LLMs (with a parameter quantity of 7B or larger) performed excellently and successfully surpassed the pipeline method. - The chain - of - thought (COT) prompting method can further improve the performance of the fine - tuned LLMs on the Text2MDT test set. - The encoder - based lightweight pipeline method can perform equivalently to LLMs with a model complexity two orders of magnitude lower. Through these works, the paper not only proposes a new task and method, but also provides an open dataset and source code, providing a basis for future research.

Text2MDT: Extracting Medical Decision Trees from Medical Texts

Extracting Decision Trees from Medical Texts: an Overview of the Text2DT Track in CHIP2022

MedDM:LLM-executable clinical guidance tree for clinical decision-making

An Improved Double Channel Long Short-Term Memory Model for Medical Text Classification

KG-MTT-BERT: Knowledge Graph Enhanced BERT for Multi-Type Medical Text Classification

ChineseWebText 2.0: Large-Scale High-quality Chinese Web Text with Multi-dimensional and fine-grained information

Lingdan: enhancing encoding of traditional Chinese medicine knowledge for clinical reasoning tasks with large language models

Combining the External Medical Knowledge Graph Embedding to Improve the Performance of Syndrome Differentiation Model

Generation of guideline-based clinical decision trees in oncology using large language models

Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models

Text2Tree: Aligning Text Representation to the Label Tree Hierarchy for Imbalanced Medical Classification

mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis

ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences

Large Language Model in Medical Information Extraction from Titles and Abstracts with Prompt Engineering Strategies: A Comparative Study of GPT-3.5 and GPT-4

Research on Medical Text Parsing Method Based on BiGRU-BiLSTM Multi-Task Learning

Multimodal Tree Decoder for Table of Contents Extraction in Document Images.

Applications of BERT Based Sequence Tagging Models on Chinese Medical Text Attributes Extraction

TCM-GPT: Efficient Pre-training of Large Language Models for Domain Adaptation in Traditional Chinese Medicine

Constructing Multiple Domain Taxonomy for Text Processing Tasks

TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models

CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models