Abstract:This study explores the comparative performance of cutting-edge AI models, i.e., Finaance Bidirectional Encoder representations from Transsformers (FinBERT), Generatice Pre-trained Transformer GPT-4, and Logistic Regression, for sentiment analysis and stock index prediction using financial news and the NGX All-Share Index data label. By leveraging advanced natural language processing models like GPT-4 and FinBERT, alongside a traditional machine learning model, Logistic Regression, we aim to classify market sentiment, generate sentiment scores, and predict market price movements. This research highlights global AI advancements in stock markets, showcasing how state-of-the-art language models can contribute to understanding complex financial data. The models were assessed using metrics such as accuracy, precision, recall, F1 score, and ROC AUC. Results indicate that Logistic Regression outperformed the more computationally intensive FinBERT and predefined approach of versatile GPT-4, with an accuracy of 81.83% and a ROC AUC of 89.76%. The GPT-4 predefined approach exhibited a lower accuracy of 54.19% but demonstrated strong potential in handling complex data. FinBERT, while offering more sophisticated analysis, was resource-demanding and yielded a moderate performance. Hyperparameter optimization using Optuna and cross-validation techniques ensured the robustness of the models. This study highlights the strengths and limitations of the practical applications of AI approaches in stock market prediction and presents Logistic Regression as the most efficient model for this task, with FinBERT and GPT-4 representing emerging tools with potential for future exploration and innovation in AI-driven financial analytics
Machine Learning,Artificial Intelligence,Statistical Finance,Applications,Computation
What problem does this paper attempt to address?
### What problems does this paper attempt to solve?
This paper aims to solve the following problems:
1. **Improve the accuracy of stock market prediction**:
- Traditional statistical techniques have difficulty in capturing complex patterns in stock data, especially when influenced by external variables such as news and market sentiment. Therefore, researchers hope to more effectively use financial news sentiment for stock market trend prediction by introducing advanced natural language processing (NLP) models, such as FinBERT and GPT - 4, as well as traditional machine - learning models (such as logistic regression).
2. **Evaluate the performance of different AI models in financial sentiment analysis**:
- The paper compares the performance of three different models - FinBERT, GPT - 4, and logistic regression - in sentiment analysis and stock price prediction tasks. Specifically, the researchers want to understand the differences in the performance of these models in classifying market sentiment, generating sentiment scores, and predicting market price changes.
3. **Explore the application potential of hybrid methods**:
- The research also explores the possibility of combining classical models with advanced NLP models in order to find a more efficient and accurate method for stock market prediction. In this way, the computational efficiency of classical models and the powerful text - understanding ability of NLP models can be fully utilized.
### Specific objectives
- **Verify hypotheses**:
- Hypothesis 1 (H1): Machine - learning models (such as FinBERT, GPT - 4, and logistic regression) can effectively classify financial news sentiment.
- Hypothesis 2 (H2): Domain - specific models (such as FinBERT and GPT - 4) will outperform classical models (such as logistic regression) in capturing market sentiment due to their strong natural - language - understanding ability.
- Hypothesis 3 (H3): General - purpose language models (such as GPT - 4) can achieve high precision in sentiment analysis, but may be inferior to fine - tuned domain - specific models (such as FinBERT) in tasks involving professional financial terms.
- **Provide a practical framework**:
- Provide a practical framework for financial analysts, investors, news platforms, data scientists, as well as organizations and AI researchers to help them better understand and apply these models for financial market prediction.
### Method overview
To achieve the above objectives, the researchers adopted the following methods:
- **Data collection and pre - processing**:
- Collect and pre - process news headline data from Nairametric and Proshare websites to ensure the accuracy and reliability of the data.
- **Model selection and training**:
- Use three models, FinBERT, GPT - 4, and logistic regression, for training, and ensure the time - dependence of the models through time - series cross - validation (TSCV).
- **Performance evaluation**:
- Use metrics such as accuracy, precision, recall, F1 - score, and ROC AUC to evaluate the performance of each model to determine the best model.
Through these methods, the researchers hope to provide more effective and scalable solutions for financial market prediction.