Abstract:This paper explores articles hosted on the arXiv preprint server with the aim to uncover valuable insights hidden in this vast collection of research. Employing text mining techniques and through the application of natural language processing methods, we examine the contents of quantitative finance papers posted in arXiv from 1997 to 2022. We extract and analyze crucial information from the entire documents, including the references, to understand the topics trends over time and to find out the most cited researchers and journals on this domain. Additionally, we compare numerous algorithms to perform topic modeling, including state-of-the-art approaches.

What problem does this paper attempt to address?

The main aim of this paper is to explore research literature in the field of quantitative finance on the arXiv preprint server through text mining techniques, in order to uncover valuable hidden information and trends within these documents. Specifically, the paper has two objectives: 1. **Topic Trend Analysis**: By applying natural language processing methods, the paper conducts topic modeling on quantitative finance papers published on arXiv from 1997 to 2022, to identify and describe the research topics and their evolution over these years. This includes evaluating various clustering algorithms and selecting the best-performing one to categorize the papers into 30 topic groups, thereby exploring the popular research directions of different periods. 2. **Key Authors and Journals Identification**: Besides topic trends, the paper also attempts to identify the most influential authors and journals in the field of quantitative finance. This is achieved through data mining techniques, allowing the analysis to be completed without actually reading the content of the papers. To achieve the above objectives, the authors first collected approximately 16,000 quantitative finance papers from arXiv and conducted detailed preprocessing on these papers, including text cleaning, lemmatization, and other steps. Then, by comparing the performance of different topic modeling algorithms (such as K-means, LDA, Word2Vec, Doc2Vec, Top2Vec, and BERTopic), the most effective algorithm was selected for topic analysis. Ultimately, through in-depth mining of the paper data, this study is able to reveal the main research trends, key contributors, and important publications in the field of quantitative finance, thereby providing guidance for the future development of the field.

Text mining arXiv: a look through quantitative finance papers

Financial Text Mining in Twitterland

Comprehensive review of text-mining applications in finance

Finding Trends in Software Research

Exploring Interdisciplinarity of Science Projects Based on the Text Mining

A Worldwide Assessment of Quantitative Finance Research through Bibliometric Analysis

Deep Learning based Topic Analysis on Financial Emerging Event Tweets

Text mining for market prediction: A systematic review

Predict financial text sentiment: an empirical examination

Explainable artificial intelligence in finance: A bibliometric review

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

Revealing Research Themes and Trends in 30 Top‐ranking Accounting Journals: A Text‐mining Approach

Textual analysis and machine leaning: Crack unstructured data in finance and accounting ☆

Finance Research over 40 Years: What Can We Learn from Machine Learning?

From Text Representation to Financial Market Prediction: A Literature Review

On the Use of ArXiv as a Dataset

Evolution of Financial Studies over Forty Years: What Can We Learn from Machine Learning?

Machine Learning for Quantitative Finance Applications: A Survey

Artificial intelligence in Finance: a comprehensive review through bibliometric and content analysis

Text Mining in Big Data Analytics

Trends and gaps in biodiversity and ecosystem services research: A text mining approach