A Fused Large Language Model for Predicting Startup Success

Abdurahman Maarouf,Stefan Feuerriegel,Nicolas Pröllochs
2024-09-06
Abstract:Investors are continuously seeking profitable investment opportunities in startups and, hence, for effective decision-making, need to predict a startup's probability of success. Nowadays, investors can use not only various fundamental information about a startup (e.g., the age of the startup, the number of founders, and the business sector) but also textual description of a startup's innovation and business model, which is widely available through online venture capital (VC) platforms such as Crunchbase. To support the decision-making of investors, we develop a machine learning approach with the aim of locating successful startups on VC platforms. Specifically, we develop, train, and evaluate a tailored, fused large language model to predict startup success. Thereby, we assess to what extent self-descriptions on VC platforms are predictive of startup success. Using 20,172 online profiles from Crunchbase, we find that our fused large language model can predict startup success, with textual self-descriptions being responsible for a significant part of the predictive power. Our work provides a decision support tool for investors to find profitable investment opportunities.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to use the online materials on the venture capital (VC) platform, especially the text self - descriptions, to predict the success probability of start - ups, thereby providing investors with an effective decision - support tool. Specifically, the author has developed a machine - learning method integrating large - language models, aiming to combine structured basic information and unstructured text descriptions to improve the prediction accuracy of start - up success. ### Core issues of the paper 1. **Investors' needs**: Investors need to find potential investment opportunities among high - risk start - ups, so they need an effective method to predict the success probability of start - ups. 2. **Data sources**: With the development of online VC platforms (such as Crunchbase), investors can obtain a large amount of start - up information, including structured data (such as the age of founders, the establishment time of the company, etc.) and unstructured data (such as the company's innovation description and business model). 3. **Limitations of existing methods**: Traditional prediction methods mainly rely on structured data and ignore the potential value in text descriptions. Although some existing studies have used text descriptions, they mostly adopt traditional methods (such as the bag - of - words model) and fail to fully utilize the advantages of deep learning and large - language models. ### Goals of the paper - Develop a machine - learning method integrating large - language models that can process both structured data and unstructured text data simultaneously. - Evaluate the contribution of text descriptions in predicting start - up success and verify its improvement compared to using only structured data. - Provide an automated tool to help investors screen potential start - ups more efficiently, thereby optimizing the investment portfolio and increasing the return on investment. ### Method overview The author proposes a machine - learning framework integrating large - language models (such as BERT), and the specific steps are as follows: 1. **Data collection**: Obtain the online materials of 20,172 start - ups from the Crunchbase platform, including structured variables and text descriptions. 2. **Feature extraction**: - For structured variables (FV), directly pass them as input to the final classifier. - For text descriptions (TSD), first generate document embedding vectors through BERT, and then concatenate them with structured variables. 3. **Model construction**: Construct a fusion model, input the concatenated feature vectors into the final classifier, and predict whether the start - up is successful or not. 4. **Performance evaluation**: Evaluate the contribution of text descriptions by comparing the prediction performance of different models (such as using only structured data, using only text descriptions, and the fusion model). ### Main findings - The success rate of prediction by structured variables alone is 72.00%. - After adding text descriptions, the prediction success rate increases to 74.33%, and this increase is statistically significant. - The addition of text descriptions increases the return on investment portfolio by 40.61 percentage points. In conclusion, this paper significantly improves the ability to predict start - up success by introducing a machine - learning method integrating large - language models, providing investors with a more effective decision - support tool.