Abstract:In this work, we introduce a new dataset GenChaR for an image captioning task around stock charts. The task aims to read market sentiment directly from depicted charts and generate descriptions, hopefully to provide comprehensible and useful insights for stock trading. Impressed by the success of large language models (LLMs), the study decides to pioneer itself by exploring the capabilities of large vision-language models (LVLMs) on the proposed task. This paper outlines the objectives of the stock captioning task, the dataset we built, and automatic evaluation with some representative general-purpose LVLMs.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to generate multi - sentence descriptions of stock charts in an automated manner to provide comprehensive market - dynamic information and predict trends, helping investors better understand stock charts and providing useful insights for trading decisions. Specifically, the researchers proposed a new dataset named GenChaR, which is specifically designed for the image - caption - generation task of stock charts. ### Problem Background 1. **Importance of Stock Charts** - Stock charts are important tools for technical analysis. Analysts predict price trends through waveform patterns in the charts (such as Elliott Wave Theory). - Non - professionals often find it difficult to interpret useful investment advice from the charts. 2. **Limitations of Existing Image - Caption Generation** - Traditional image - caption - generation tasks usually generate only one - sentence descriptions and cannot provide sufficient information to support effective financial decisions. - Existing image - caption datasets are mainly concentrated in the general domain and do not include stock charts. ### Proposed Solutions 1. **Reconstruction of the Image - Caption - Generation Task** - Given an annotated stock - chart image \(I\), the goal is to generate a multi - sentence description \(C\) that covers past price trends, predicted trends, and ideally includes trading advice. - The generated text should be accurate, informative, and concise and easy to understand so that ordinary users and fast traders can easily understand it. 2. **Creation of the New Dataset GenChaR** - The dataset contains 1,972 chart - caption pairs, divided into training and test sets (in an 8:2 ratio). - The charts are sourced from articles published by ElliottWave - Forecast, which use Elliott Wave Theory for chart analysis. 3. **Automated Evaluation** - Use five of the latest general visual - language models (LVLMs), such as GPT - 4V, mPLUG - Owl2, LLaVA, InstructBLIP, and Gemini, for evaluation in a zero - sample setting. - Evaluation metrics include BLEU, CIDEr, METEOR, ROUGE, BERTScore, cosine similarity (COS F), sentiment consistency (SA), and intersection - over - union (IoU). ### Conclusion This research shows the potential and challenges in the stock - chart - caption - generation task through the introduction of the GenChaR dataset and preliminary evaluation. Although some models perform impressively, there are still some limitations, such as sensitivity issues and deficiencies in long - text evaluation. Future research needs to explore more suitable models and evaluation methods.

GenChaR: A Dataset for Stock Chart Captioning

AutoChart: A Dataset for Chart-to-Text Generation Task

ChartLlama: A Multimodal LLM for Chart Understanding and Generation

LineCap: Line Charts for Data Visualization Captioning Models

Chart Understanding with Large Language Model

NWPU-Captions Dataset and MLCA-Net for Remote Sensing Image Captioning

AutoCaption: an Approach to Generate Natural Language Description from Visualization Automatically

MemeCap: A Dataset for Captioning and Interpreting Memes

CompCap: Improving Multimodal Large Language Models with Composite Captions

D-CNN: A New model for Generating Image Captions with Text Extraction Using Deep Learning for Visually Challenged Individuals

Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning

Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning

Towards Generating and Evaluating Iconographic Image Captions of Artworks

Satellite Captioning: Large Language Models to Augment Labeling

Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models

From Captions to Visual Concepts and Back

Improving Multimodal Datasets with Image Captioning

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

ArtCap: A Dataset for Image Captioning of Fine Art Paintings

Contextual Emotion Estimation from Image Captions

Visually-Aware Context Modeling for News Image Captioning