Identifying Banking Transaction Descriptions via Support Vector Machine Short-Text Classification Based on a Specialized Labelled Corpus

Silvia García-Méndez,Milagros Fernández-Gavilanes,Jonathan Juncal-Martínez,Francisco J. González-Castaño,Oscar Barba Seara

DOI: https://doi.org/10.1109/ACCESS.2020.2983584

2024-03-29

Abstract:Short texts are omnipresent in real-time news, social network commentaries, etc. Traditional text representation methods have been successfully applied to self-contained documents of medium size. However, information in short texts is often insufficient, due, for example, to the use of mnemonics, which makes them hard to classify. Therefore, the particularities of specific domains must be exploited. In this article we describe a novel system that combines Natural Language Processing techniques with Machine Learning algorithms to classify banking transaction descriptions for personal finance management, a problem that was not previously considered in the literature. We trained and tested that system on a labelled dataset with real customer transactions that will be available to other researchers on request. Motivated by existing solutions in spam detection, we also propose a short text similarity detector to reduce training set size based on the Jaccard distance. Experimental results with a two-stage classifier combining this detector with a SVM indicate a high accuracy in comparison with alternative approaches, taking into account complexity and computing time. Finally, we present a use case with a personal finance application, CoinScrap, which is available at Google Play and App Store.

Information Retrieval,Artificial Intelligence,Computational Engineering, Finance, and Science,Computation and Language,Machine Learning

What problem does this paper attempt to address?

The paper attempts to address the problem of automatically classifying bank transaction descriptions (BT descriptions) for personal financial management. Specifically, the authors propose a new system that combines natural language processing (NLP) techniques and machine learning (ML) algorithms to classify the short text descriptions of bank transactions. This problem has not been considered in previous literature. ### Main Issues 1. **Insufficient Information**: Bank transaction descriptions are usually very short and contain limited information, making effective classification difficult. 2. **Domain-Specific Characteristics**: The terms and vocabulary in bank transaction descriptions are specific, requiring the use of these domain-specific characteristics for classification. 3. **Real-Time Generation**: Bank transaction descriptions are generated in real-time, necessitating efficient classification methods to handle large volumes of data. ### Solutions 1. **Feature Extraction**: Use features such as character and word n-grams to represent short texts. 2. **Support Vector Machine (SVM)**: Use SVM as the classifier, combined with features for classification. 3. **Similarity Detection**: Introduce a similarity detector based on Jaccard distance to reduce the size of the training set and improve efficiency. ### Experimental Results - Through cross-validation, the system demonstrated high accuracy across different training and test data set splits. - Compared to other existing methods, this system performs better in terms of classification effectiveness, especially regarding complexity and computation time. ### Application Case - The system has been applied to a personal financial management application called CoinScrap, which is available for download on Google Play and the App Store. ### Summary The paper proposes a novel system that can effectively classify bank transaction descriptions, thereby helping financial institutions better manage and analyze customer data and improve decision-making accuracy.

Identifying Banking Transaction Descriptions via Support Vector Machine Short-Text Classification Based on a Specialized Labelled Corpus

Scalable and Weakly Supervised Bank Transaction Classification

Detection of Temporality at Discourse Level on Financial News by Combining Natural Language Processing and Machine Learning

Detection of financial opportunities in micro-blogging data with a stacked classification system

Predict financial text sentiment: an empirical examination

Detection of Abuse in Financial Transaction Descriptions Using Machine Learning

Identifying Financial Institutions by Transaction Signatures

Text Classification Using Hybrid Machine Learning Algorithms on Big Data

Genre Identification Of Chinese Finance Text Using Machine Learning Method

Deep learning enhancing banking services: a hybrid transaction classification and cash flow prediction approach

A Semi-Supervised Learning Financial News Classification Algorithm

"The Squawk Bot": Joint Learning of Time Series and Text Data Modalities for Automated Financial Information Filtering

Textual Data Mining for Financial Fraud Detection: A Deep Learning Approach

Evaluation of transformer models for financial targeted sentiment analysis in Spanish

Combining Benford's Law and machine learning to detect money laundering. An actual Spanish court case

Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation

Sentiment Classification for Financial Texts Based on Deep Learning

SMS Scam Detection Application Based on Optical Character Recognition for Image Data Using Unsupervised and Deep Semi-Supervised Learning

Sentiment Analysis of Short Texts Using SVMs and VSMs-Based Multiclass Semantic Classification

Textual analysis and machine leaning: Crack unstructured data in finance and accounting ☆

Binary Classification of Customer’s Online Purchasing Behavior Using Machine Learning