Development of an Internet-based Product-related Child Injury Textual Data Platform (IPCITDP) in China

Wangxin Xiao,Peixia Cheng,David C Schwebel,Lei Yang,Min Zhao,Shuying Zhao,Guoqing Hu
DOI: https://doi.org/10.7189/jogh.14.04174
2024-08-23
Abstract:Background: Internet-based media stories provide valuable information for emerging risks of product-related child injury prevention and control, but critical methodological challenges and high costs of data acquisition and processing restrict practical use by stakeholders. Methods: We constructed a data platform through literature reviews and multi-round research group discussions. Developed components included standard search strategies, filtering criteria, textual document classification, information extraction standards and a keyword dictionary. We used ten thousand manually labelled media stories to validate the textual document classification model, which was established using the Bidirectional Encoder Representation from Transformers (BERT). Multiple information extraction methods based on natural language processing algorithms were adopted to extract data for 29 structured variables from media stories. They were evaluated through manual validation of 1000 media stories about product-related child injury. We mapped the geographic distribution of media sources and media-reported product-related child injury events. Results: We developed an internet-based product-related child injury textual data platform, IPCITDP, consisting of four layers - automatic data search and acquisition, data processing, data storage, and data application - concerning product-related child injury online media stories in China. Each layer occurred daily. External validation demonstrated high performance for the BERT classification model we established (accuracy = 0.9703) and the combined information extraction strategies (accuracy >0.70 for 25 variables). As of 31 December 2023, IPCITDP collected 35 275 eligible product-related child injury reports from 13 261 news media websites or social media platform accounts which were geographically located across all 31 mainland Chinese provinces and covered over 97% of prefecture-level cities. The injury cases in IPCITDP were typically reported several months or years earlier than official announcements about the product-related child injury risks. Our data platform added data concerning 15 supplementary variables that the national product-related injury surveillance system lacks. Two examples demonstrate the value of IPCITDP in supplementing existing data and providing early epidemiological detection of emerging signals concerning product-related child injury: magnetic beads and electric self-balancing scooters. Conclusions: Our data platform provides injury data that can support early detection of new product-related child injury characteristics and supplement existing data sources to reduce the burden of product-related injury among Chinese children.
What problem does this paper attempt to address?