ChartEye: A Deep Learning Framework for Chart Information Extraction

Osama Mustafa,Muhammad Khizer Ali,Momina Moetesum,Imran Siddiqi
DOI: https://doi.org/10.1109/DICTA60407.2023.00082
2024-08-29
Abstract:The widespread use of charts and infographics as a means of data visualization in various domains has inspired recent research in automated chart understanding. However, information extraction from chart images is a complex multitasked process due to style variations and, as a consequence, it is challenging to design an end-to-end system. In this study, we propose a deep learning-based framework that provides a solution for key steps in the chart information extraction pipeline. The proposed framework utilizes hierarchal vision transformers for the tasks of chart-type and text-role classification, while YOLOv7 for text detection. The detected text is then enhanced using Super Resolution Generative Adversarial Networks to improve the recognition output of the OCR. Experimental results on a benchmark dataset show that our proposed framework achieves excellent performance at every stage with F1-scores of 0.97 for chart-type classification, 0.91 for text-role classification, and a mean Average Precision of 0.95 for text detection.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of automatically extracting information from chart images. Specifically, it aims to develop a deep - learning framework to efficiently and accurately complete the following key tasks: 1. **Chart - type Classification**: Identify the specific type of the input chart (such as bar chart, line chart, scatter plot, etc.), because different types of charts have different structures and semantics. 2. **Text Detection**: Locate and identify all text elements in the chart (such as title, axis labels, legend, etc.). 3. **Text - role Classification**: Determine the specific role of the text in the chart (for example, X - axis label, Y - axis label, legend title, etc.). 4. **Text Recognition**: Convert the detected text into a readable text string for further analysis. The complexity of these tasks stems from the variation in chart styles, the diversity of layouts, and the differences in text size, font, and orientation. To address these challenges, the authors propose a deep - learning - based framework, which uses hierarchical visual transformers for chart - type and text - role classification, YOLOv7 for text detection, and introduces the Enhanced Super - Resolution Generative Adversarial Network (ESRGAN) to enhance the recognition effect of low - resolution text. Through this series of steps, this research aims to provide a general and efficient solution to extract explicit information from various types of charts, thereby laying the foundation for further implicit understanding. The experimental results show that this framework has achieved excellent performance at each stage, with F1 - scores of 0.97 (chart - type classification), 0.91 (text - role classification), and an average precision of 0.95 (text detection).