GuKang, a Reliable Dataset for Multi-source Chinese Orthopedic Rehabilitation

Zelin Song,Tao Xue,Pinjie Li,Ming Zhang,Tao Zhang
DOI: https://doi.org/10.1109/icac61394.2024.10718741
2024-01-01
Abstract:The treatment of orthopaedic diseases commonly faces challenges such as prolonged hospital stays, a shortage of physician resources, and high medical costs. Large language models offer a novel solution for remote home rehabilitation by simulating doctor-patient interactions, report interpretations, professional diagnoses, and prescription issuances. The development of these consultative models relies heavily on special-ized knowledge datasets. To this end, our research team has constructed a medical consultation dataset named "GuKang" (GuKang Medical Consultation DataSet, GKMCD), which stands as the most comprehensive repository of orthopaedic rehabilitation medical consultation resources to date. It comprises four main components: a medical Q&A database, a medical knowledge base, knowledge graphs, and a medical knowledge question bank. The Q&A database contains 401,568 entries covering over 95% of orthopaedic diseases. Ensuring the accuracy and professionalism of our data involved thorough collection, cleaning, and filtering of information, as well as manual reviews and intelligent evaluations. Furthermore, the diversity of data sources not only provides the model with a wealth of learning resources but also enhances the model’s adaptability and flexibility in addressing real-world problems. This study establishes new research directions and practical foundations for the application of large language models in specialized medical fields.
What problem does this paper attempt to address?