Abstract:Poor data quality has a direct impact on the performance of the machine learning system that is built on the data. As a demonstrated effective approach for data quality improvement, transfer learning has been widely used to improve machine learning quality. However, the "quality improvement" brought by transfer learning was rarely rigorously validated, and some of the quality improvement results were misleading. This article first exposed the hidden quality problem in the datasets used to build a machine learning system for normalizing medical concepts in social media text. The system was claimed to have achieved the best performance compared to existing work on a machine learning task. However, the results of our experiments showed that the "best performance" was due to the poor quality of the datasets and the defective validation process. To address the data quality issue and build a high-performance medical concept normalization system, we developed a transfer-learning-based strategy for data quality enhancement and system performance improvement. The results of the experiments showed a strong correlation between the quality of the datasets and the performance of the machine learning system. The results also demonstrated that a rigorous evaluation of data quality is necessary for guiding the quality improvement of machine learning. Therefore, we propose a data quality evaluation framework that includes the quality criteria and their corresponding evaluation approaches. The data validation process, the performance improvement strategy, and the data quality evaluation framework discussed in this article can be used for machine learning researchers and practitioners to build high-performance machine learning systems. The code and datasets used in this research are available in GitHub (https://github.com/haihua0913/dataEvaluationML).

Quality Evaluation of Public NLP Dataset

Statistical Dataset Evaluation: Reliability, Difficulty, and Validity

Statistical Dataset Evaluation: A Case Study on Named Entity Recognition

Problems and Countermeasures in Natural Language Processing Evaluation

Development and Evaluation of Task-Specific NLP Framework in China.

Research on the Quantity Evaluation of Speech Datasets for Model Training

Evaluation of Chinese Natural Language Processing System Based on Metamorphic Testing

Analyzing Dataset Annotation Quality Management in the Wild

A survey on dataset quality in machine learning

Evaluating Open-QA Evaluation

WYWEB: A NLP Evaluation Benchmark For Classical Chinese

Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: An Empirical Study

Research on Quality Evaluation of Chinese Spatial Semantic Understanding Evaluation Dataset

Quality Matters: Evaluating Synthetic Data for Tool-Using LLMs

Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in Natural Language Understanding

Technical Evaluations in Natural Language Processing and Implications for TEM

Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics using Measurement Theory

Quality Assessment of Image Dataset for Autonomous Driving.

Data Evaluation and Enhancement for Quality Improvement of Machine Learning

LongWanjuan: Towards Systematic Measurement for Long Text Quality