Abstract:BackgroundWhile doctors should analyze a large amount of electronic medical record (EMR) data to conduct clinical research, the analyzing process requires information technology (IT) skills, which is difficult for most doctors in China.MethodsIn this paper, we build a novel tool QAnalysis, where doctors enter their analytic requirements in their natural language and then the tool returns charts and tables to the doctors. For a given question from a user, we first segment the sentence, and then we use grammar parser to analyze the structure of the sentence. After linking the segmentations to concepts and predicates in knowledge graphs, we convert the question into a set of triples connected with different kinds of operators. These triples are converted to queries in Cypher, the query language for Neo4j. Finally, the query is executed on Neo4j, and the results shown in terms of tables and charts are returned to the user.ResultsThe tool supports top 50 questions we gathered from two hospital departments with the Delphi method. We also gathered 161 questions from clinical research papers with statistical requirements on EMR data. Experimental results show that our tool can directly cover 78.20% of these statistical questions and the precision is as high as 96.36%. Such extension is easy to achieve with the help of knowledge-graph technology we have adopted. The recorded demo can be accessed from https://github.com/NLP-BigDataLab/QAnalysis-project.ConclusionOur tool shows great flexibility in processing different kinds of statistic questions, which provides a convenient way for doctors to get statistical results directly in natural language.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: when conducting clinical research, doctors need to analyze a large amount of electronic medical record (EMR) data, but this process requires information technology (IT) skills, and most doctors lack these skills, especially in China. To solve this problem, the author designed a new tool - QAnalysis, which enables doctors to input their analysis requirements in natural language, and the tool returns query results in the form of charts and tables.
Specifically, the paper mainly solves the following problems:
1. **Doctors' lack of IT skills**:
- Doctors usually do not have the ability to write Structured Query Language (SQL), so it is difficult to directly extract the required information from EMR data.
- Through the QAnalysis tool, doctors only need to ask questions in natural language, and the tool will automatically convert the questions into executable queries and return easy - to - understand results.
2. **Handling complex statistical analysis problems**:
- Many clinical research questions are not just simple fact - finding queries, but involve complex statistical analysis, such as ratios, maximum values, and averages.
- The QAnalysis tool supports multiple statistical operations, including listing, counting, aggregation (summing/averaging), distribution, and ratios, and can handle complex questions containing logical operators (such as NOT, AND, OR).
3. **Understanding and parsing of medical terms**:
- Clinical research questions inevitably contain medical terms, such as disease names, examination items, and drug names.
- The tool uses the existing Chinese clinical term atlas to handle these problems to ensure the correct understanding and parsing of medical terms.
4. **Requirements for high precision**:
- In the medical field, the precision requirements of question - answering systems are very high, and lower precision cannot be tolerated as in the general field.
- QAnalysis ensures high precision by using context - free grammar (CFG) and dependency parsing, and uses the patient atlas for joint disambiguation.
In summary, this paper aims to develop a tool that can help doctors conveniently conduct complex EMR data analysis in natural language, thereby improving the efficiency and accuracy of clinical research.