Constructing long-short stock portfolio with a new listwise learn-to-rank algorithm

Xin Zhang,Lan Wu,Zhixue Chen
DOI: https://doi.org/10.48550/arXiv.2104.12484
2021-04-26
Abstract:Factor strategies have gained growing popularity in industry with the fast development of machine learning. Usually, multi-factors are fed to an algorithm for some cross-sectional return predictions, which are further used to construct a long-short portfolio. Instead of predicting the value of the stock return, emerging studies predict a ranked stock list using the mature learn-to-rank technology. In this study, we propose a new listwise learn-to-rank loss function which aims to emphasize both the top and the bottom of a rank list. Our loss function, motivated by the long-short strategy, is endogenously shift-invariant and can be viewed as a direct generalization of ListMLE. Under different transformation functions, our loss can lead to consistency with binary classification loss or permutation level 0-1 loss. A probabilistic explanation for our model is also given as a generalized Plackett-Luce model. Based on a dataset of 68 factors in China A-share market from 2006 to 2019, our empirical study has demonstrated the strength of our method which achieves an out-of-sample annual return of 38% with the Sharpe ratio being 2.
Portfolio Management
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use new learning - to - rank algorithms to improve the prediction accuracy of long - short stock portfolios in stock portfolio construction. Specifically, the paper proposes a new list - based learning - to - rank loss function (ListFold), aiming to emphasize both the top and bottom of the ranked list simultaneously to meet the requirements of long - short trading strategies. Different from the traditional methods of predicting the absolute returns of stocks, this method directly predicts the relative rankings of stocks, thus better constructing long - short investment portfolios. ### Background of the Paper 1. **Factor Strategies**: - Factor strategies are becoming more and more popular in the industry, especially in the context of the rapid development of machine learning. Multiple factors are input into algorithms to predict cross - sectional returns and then construct long - short portfolios. - Traditional factor strategies usually generate stronger factors by combining multiple factors through methods such as linear regression and SVM. 2. **Learning - to - Rank**: - Learning - to - rank is a supervised machine - learning algorithm, widely used in the field of information retrieval (IR), such as web search, news push, online shopping, and advertisement recommendation. - The core of the learning - to - rank algorithm is to score documents (or stocks) through a scoring function and then rank them according to the scores. 3. **ListMLE**: - ListMLE is a list - based learning - to - rank algorithm based on the Plackett - Luce model, which optimizes ranking through maximum - likelihood estimation. - Although ListMLE is theoretically consistent with the permutation - level 0 - 1 loss, it also shows good performance in practical applications. ### Contributions of the Paper 1. **New Learning - to - Rank Loss Function**: - A new list - based learning - to - rank loss function (ListFold) is proposed. This loss function aims to focus on both the top and bottom of the ranked list simultaneously to meet the requirements of long - short trading strategies. - This loss function has translation invariance and can be consistent with the binary - classification loss or the permutation - level 0 - 1 loss under different transformation functions. - A probabilistic interpretation of the model is given, regarding it as a generalized Plackett - Luce model. 2. **Empirical Research**: - A detailed empirical research was carried out in China's A - share market, using 68 factor data from 2006 to 2019. - The empirical results show that the out - of - sample annualized return rate of the ListFold method reaches 38%, and the Sharpe ratio is 2, which is significantly better than MLP, ListMLE, and List2MLE. ### Model Details 1. **ListFold Loss Function**: - For 2n documents \(X_1,\ldots,X_{2n}\), the observed ranking \(y\) and the scoring function \(f\), the probability is defined as follows: \[ P_c(y|X,f)=\prod_{i = 1}^n\frac{\psi(f_i - f_{2n+1 - i})}{\sum_{i\leq u\neq v\leq 2n+1 - i}\psi(f_u - f_v)} \] - The loss function is defined as the negative log - likelihood: \[ L_c(f,y,X)=-\log P_c(y|X,f)=-\sum_{i = 1}^n\left(\log\psi(f_i - f_{2n+1 - i})-\log\sum_{i\leq u\neq v\leq 2n+1 - i}\psi(f_u - f_v)\right) \] 2. **Theoretical Analysis**: - It is proved that when the transformation function \(\psi\) is the Sigmoid function, the ListFold loss function is consistent with the binary - classification loss. - It is proved that when the transformation function \(\psi\) is the exponential function, the ListFold loss function can recover the true permutation given the correct segmentation of the top and bottom. ### Empirical Results - **Portfolio Performance**: - Based on the predicted rankings, two strategies were constructed: one is to go long on the top 10% of stocks and short on the bottom 10% of stocks.