Abstract:Background: Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. Objective: This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. Methods: We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. Results: The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). Conclusions: This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.

DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation

Efficient Fine-Tuning of Large Language Models for Automated Medical Documentation

Development of a Human Evaluation Framework and Correlation with Automated Metrics for Natural Language Generation of Medical Diagnoses

MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering Medical Knowledge

MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain

An Investigation of Evaluation Metrics for Automated Medical Note Generation

Ascle-A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study

RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning

Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data

Towards Automatic Evaluation for LLMs' Clinical Capabilities: Metric, Data, and Algorithm

Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis

PeFoMed: Parameter Efficient Fine-tuning of Multimodal Large Language Models for Medical Imaging

MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation

LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation

Leveraging Professional Radiologists' Expertise to Enhance LLMs' Evaluation for Radiology Reports

Large Language Models for Medical OSCE Assessment: A Novel Approach to Transcript Analysis

Towards Evaluating and Building Versatile Large Language Models for Medicine

Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes

Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries

An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models