Abstract:Sentiment analysis is a critical task that is highly beneficial to various financial tasks such as stock-price prediction, corporate credit rating, economic report analysis, and investment decision support. Researchers have used various methods to train pretraining language models (PLMs) for these tasks. Although most PLMs have achieved excellent performance, they can be further improved. In this study, we propose a new framework to strengthen numerical understanding, in particular for the FinBERT(Financial Bidirectional Encoder Representations from Transformers) model released in 2019, thus improving model performance in the task of sentiment analysis on financial news sentences. This method selects sentences containing numerical words from financial news articles, preferentially masks the words, and post-train the PLM. To evaluate the proposed methodology quantitatively, we apply the same post-training to different financial language models and compare the performance before and after the application using Financial Phrasebank, which is a representative benchmark dataset used in financial sentiment analysis. The experimental results show that the best performance is achieved when 50,000 sentences are used to post-train FinBERT, thus confirming the advantage of the proposed methodology for downstream tasks and highlighting the importance of using the correct amount of data. Additionally, we show that applying the proposed method to different language models improves the performance, particularly in low-resource environments with less training data. The findings of this study suggest that the PLM can improve aspects that it does not understand well, and that thd PLM performance can be improved by post-training it with task- and domain-appropriate datasets, in not only finance but also in other domains.

FinBERT–MRC: Financial Named Entity Recognition Using BERT Under the Machine Reading Comprehension Paradigm

FinBERT: A Pre-trained Financial Language Representation Model for Financial Text Mining

Improving Biomedical Named Entity Recognition with a Unified Multi-Task MRC Framework

MFF-CNER: A Multi-feature Fusion Model for Chinese Named Entity Recognition in Finance Securities

Biomedical named entity recognition using BERT in the machine reading comprehension framework

FiNER: Financial Numeric Entity Recognition for XBRL Tagging

German FinBERT: A German Pre-trained Language Model

Feeding What You Need by Understanding What You Learned

Enhancing Financial Sentiment Analysis Ability of Language Model via Targeted Numerical Change-Related Masking

IPerFEX-2023: Indonesian personal financial entity extraction using indoBERT-BiGRU-CRF model

A BERT based Sentiment Analysis and Key Entity Detection Approach for Online Financial Texts

Enhancing Language Models for Financial Relation Extraction with Named Entities and Part-of-Speech

Financial Sentiment Analysis on News and Reports Using Large Language Models and FinBERT

Evaluating Named Entity Recognition: A comparative analysis of mono- and multilingual transformer models on a novel Brazilian corporate earnings call transcripts dataset

CAT-BERT: A Context-Aware Transferable BERT Model for Multi-turn Machine Reading Comprehension.

FinEntity: Entity-level Sentiment Classification for Financial Texts

SEntFiN 1.0: Entity-Aware Sentiment Analysis for Financial News

MMBERT: a unified framework for biomedical named entity recognition

Chinese Named Entity Recognition Method for the Finance Domain Based on Enhanced Features and Pretrained Language Models

Financial sentiment analysis using FinBERT with application in predicting stock movement

FiNER-ORD: Financial Named Entity Recognition Open Research Dataset