Abstract:Financial organisations such as brokers face a significant challenge in servicing the investment needs of thousands of their traders worldwide. This task is further compounded since individual traders will have their own risk appetite and investment goals. Traders may look to capture short-term trends in the market which last only seconds to minutes, or they may have longer-term views which last several days to months. To reduce the complexity of this task, client trades can be clustered. By examining such clusters, we would likely observe many traders following common patterns of investment, but how do these patterns vary through time? Knowledge regarding the temporal distributions of such clusters may help financial institutions manage the overall portfolio of risk that accumulates from underlying trader positions. This study contributes to the field by demonstrating that the distribution of clusters derived from the real-world trades of 20k Foreign Exchange (FX) traders (from 2015 to 2017) is described in accordance with Ewens' Sampling Distribution. Further, we show that the Aggregating Algorithm (AA), an on-line prediction with expert advice algorithm, can be applied to the aforementioned real-world data in order to improve the returns of portfolios of trader risk. However we found that the AA 'struggles' when presented with too many trader ``experts'', especially when there are many trades with similar overall patterns. To help overcome this challenge, we have applied and compared the use of Statistically Validated Networks (SVN) with a hierarchical clustering approach on a subset of the data, demonstrating that both approaches can be used to significantly improve results of the AA in terms of profitability and smoothness of returns.
Statistical Finance,Artificial Intelligence,Computational Engineering, Finance, and Science,Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper
The paper aims to address the challenges faced by financial organizations (such as brokers) in meeting the investment needs of thousands of traders globally. Specifically, each trader has their own risk preferences and investment goals, which can range from a few seconds to several months. To simplify this complex task, customer trades can be grouped using clustering methods. By studying the temporal distribution of these clusters, financial institutions can better manage the overall risk portfolio accumulated by traders' positions.
### Main Contributions
1. **Study of Clustering Distribution**:
- The authors use the Ewens sampling distribution to describe the clustering distribution extracted from the actual trading data of 20,000 forex traders from 2015 to 2017.
- By constructing a statistical validation network (SVN) with a sliding window, based on the synchronization of trading times, they demonstrate the good fit of the Ewens sampling distribution.
2. **Application of Prediction Algorithms**:
- The authors applied the online prediction expert advice algorithm (Aggregating Algorithm, AA) to improve the returns of traders' risk portfolios.
- They found that AA performs poorly when faced with a large number of trader "experts," especially when many traders have similar overall patterns.
- To overcome this challenge, the authors compared the application of the statistical validation network (SVN) and hierarchical clustering methods on data subsets, showing that both methods significantly improve the profitability and return smoothness of AA.
### Methods and Experiments
1. **Dataset Description**:
- The customer trading data of a retail forex broker was used, including the trading records of over 20,000 customers from 2015 to 2017.
- The dataset includes each investor's anonymous ID, trade opening and closing times, trade volume, trade direction (buy or sell), and trading currency pairs.
2. **Experimental Protocol**:
- A sliding window was used to track the temporal evolution of clusters, filtering out traders with fewer than 100, 500, or 1000 trades in each sample time window.
- SVN networks were constructed at different time resolutions (10 minutes, 15 minutes, 30 minutes, 60 minutes, 120 minutes, 180 minutes, 360 minutes, and 1440 minutes).
3. **SVN Clustering and Descriptive Statistics**:
- The Infomap clustering algorithm was used to classify traders, favored for its information-theoretic approach, scalability, high-quality clustering, flexibility, and statistical significance.
- Multiple relevant statistics were calculated to assess the impact of SVN at different time resolutions.
4. **Goodness-of-Fit Test**:
- The classic χ² test was used to evaluate the goodness of fit of the data, estimating the parameter θ for each sliding window.
- Results showed that the conditional Ewens distribution is a good fit in most cases.
5. **Temporal Clustering Evolution and Consistent Group Identification**:
- The temporal evolution of clusters was studied, and how to maintain consistent naming of clusters in subsequent time frames was explored.
- A total consistency measure (closely related to the Jaccard index) was used to generate meaningful visualizations.
6. **Clustered Aggregating Algorithm (CAA)**:
- The evolution of clusters was applied to the online expert advice model (Aggregating Algorithm, AA) to improve predictions.
- Two decision rules were introduced: the mean rule and the weighted average rule.
### Conclusion
By studying the temporal distribution of trader clusters, financial institutions can better manage the overall risk portfolio. Additionally, by improving the AA algorithm, the reliability and robustness of predictions can be enhanced when faced with a large number of traders.