Extracting Medical Knowledge from Crowdsourced Question Answering Website

Yaliang Li,Chaochun Liu,Nan Du,Wei Fan,Qi Li,Jing Gao,Chenwei Zhang,Hao Wu
DOI: https://doi.org/10.1109/tbdata.2016.2612236
2020-01-01
IEEE Transactions on Big Data
Abstract:The medical crowdsourced question answering (Q&A) websites are booming in recent years, and an increasingly large amount of patients and doctors are involved. The valuable information from these medical crowdsourced Q&A websites can benefit patients, doctors and the society. One key to unleash the power of these Q&A websites is to extract medical knowledge from the noisy question-answer pairs and filter out unrelated or even incorrect information. Facing the daunting scale of information generated on medical Q&A websites everyday, it is unrealistic to fulfill this task via supervised method due to the expensive annotation cost. In this paper, we propose a Medical Knowledge Extraction (MKE) system that can automatically provide high-quality knowledge triples extracted from the noisy question-answer pairs, and at the same time, estimate expertise for the doctors who give answers on these Q&A websites. The MKE system is built upon a truth discovery framework, where we jointly estimate trustworthiness of answers and doctor expertise from the data without any supervision. We further tackle three unique challenges in the medical knowledge extraction task, namely representation of noisy input, multiple linked truths, and the long-tail phenomenon in the data. The MKE system is applied to real-world datasets crawled from xywy.com, one of the most popular medical crowdsourced Q&A websites. Both quantitative evaluation and case studies demonstrate that the proposed MKE system can successfully provide useful medical knowledge and accurate doctor expertise. We further demonstrate a real-world application, Ask A Doctor, which can automatically give patients suggestions to their questions.
What problem does this paper attempt to address?