Abstract:Investors are continuously seeking profitable investment opportunities in startups and, hence, for effective decision-making, need to predict a startup's probability of success. Nowadays, investors can use not only various fundamental information about a startup (e.g., the age of the startup, the number of founders, and the business sector) but also textual description of a startup's innovation and business model, which is widely available through online venture capital (VC) platforms such as Crunchbase. To support the decision-making of investors, we develop a machine learning approach with the aim of locating successful startups on VC platforms. Specifically, we develop, train, and evaluate a tailored, fused large language model to predict startup success. Thereby, we assess to what extent self-descriptions on VC platforms are predictive of startup success. Using 20,172 online profiles from Crunchbase, we find that our fused large language model can predict startup success, with textual self-descriptions being responsible for a significant part of the predictive power. Our work provides a decision support tool for investors to find profitable investment opportunities.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to use the online materials on the venture capital (VC) platform, especially the text self - descriptions, to predict the success probability of start - ups, thereby providing investors with an effective decision - support tool. Specifically, the author has developed a machine - learning method integrating large - language models, aiming to combine structured basic information and unstructured text descriptions to improve the prediction accuracy of start - up success. ### Core issues of the paper 1. **Investors' needs**: Investors need to find potential investment opportunities among high - risk start - ups, so they need an effective method to predict the success probability of start - ups. 2. **Data sources**: With the development of online VC platforms (such as Crunchbase), investors can obtain a large amount of start - up information, including structured data (such as the age of founders, the establishment time of the company, etc.) and unstructured data (such as the company's innovation description and business model). 3. **Limitations of existing methods**: Traditional prediction methods mainly rely on structured data and ignore the potential value in text descriptions. Although some existing studies have used text descriptions, they mostly adopt traditional methods (such as the bag - of - words model) and fail to fully utilize the advantages of deep learning and large - language models. ### Goals of the paper - Develop a machine - learning method integrating large - language models that can process both structured data and unstructured text data simultaneously. - Evaluate the contribution of text descriptions in predicting start - up success and verify its improvement compared to using only structured data. - Provide an automated tool to help investors screen potential start - ups more efficiently, thereby optimizing the investment portfolio and increasing the return on investment. ### Method overview The author proposes a machine - learning framework integrating large - language models (such as BERT), and the specific steps are as follows: 1. **Data collection**: Obtain the online materials of 20,172 start - ups from the Crunchbase platform, including structured variables and text descriptions. 2. **Feature extraction**: - For structured variables (FV), directly pass them as input to the final classifier. - For text descriptions (TSD), first generate document embedding vectors through BERT, and then concatenate them with structured variables. 3. **Model construction**: Construct a fusion model, input the concatenated feature vectors into the final classifier, and predict whether the start - up is successful or not. 4. **Performance evaluation**: Evaluate the contribution of text descriptions by comparing the prediction performance of different models (such as using only structured data, using only text descriptions, and the fusion model). ### Main findings - The success rate of prediction by structured variables alone is 72.00%. - After adding text descriptions, the prediction success rate increases to 74.33%, and this increase is statistically significant. - The addition of text descriptions increases the return on investment portfolio by 40.61 percentage points. In conclusion, this paper significantly improves the ability to predict start - up success by introducing a machine - learning method integrating large - language models, providing investors with a more effective decision - support tool.

A Fused Large Language Model for Predicting Startup Success

Startup success prediction and VC portfolio simulation using CrunchBase data

Improving Startup Success with Text Analysis

Automating Venture Capital: Founder assessment using LLM-powered segmentation, feature engineering and automated labeling techniques

CapitalVX: A Machine Learning Model for Startup Selection and Exit Prediction

Solving the Data Sparsity Problem in Predicting the Success of the Startups with Machine Learning Methods

An Automated Startup Evaluation Pipeline: Startup Success Forecasting Framework (SSFF)

Graph Neural Network Based VC Investment Success Prediction

Exploring investor-business-market interplay for business success prediction

Using Deep Learning to Find the Next Unicorn: A Practical Synthesis

Web Based Platform for Startups and Investors to Connect and Predict Investment Returns Using Deep Learning

Enhancing Startup Success Predictions in Venture Capital: A GraphRAG Augmented Multivariate Time Series Method

Application of Machine Learning Techniques to Predict Entrepreneurial Firm Valuation

Using Artificial Intelligence to Unlock Crowdfunding Success for Small Businesses

Pathways to success: a machine learning approach to predicting investor dynamics in equity and lending crowdfunding campaigns

Predicting Entrepreneurial Intention of Students: Kernel Extreme Learning Machine with Boosted Crow Search Algorithm

Harnessing Business and Media Insights with Large Language Models

Supervised learning for the prediction of firm dynamics

Risk-Hedged Venture Capital Investment Recommendation

Uncovering key predictors of high-growth firms via explainable machine learning

Funding Innovation and Risk: A Grey-Based Startup Investment Decision