Automating Venture Capital: Founder assessment using LLM-powered segmentation, feature engineering and automated labeling techniques

Ekin Ozince,Yiğit Ihlamur
2024-07-06
Abstract:This study explores the application of large language models (LLMs) in venture capital (VC) decision-making, focusing on predicting startup success based on founder characteristics. We utilize LLM prompting techniques, like chain-of-thought, to generate features from limited data, then extract insights through statistics and machine learning. Our results reveal potential relationships between certain founder characteristics and success, as well as demonstrate the effectiveness of these characteristics in prediction. This framework for integrating ML techniques and LLMs has vast potential for improving startup success prediction, with important implications for VC firms seeking to optimize their investment strategies.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the problem of how to use large language models (LLMs) to predict the success probability of startups in the venture capital (VC) decision-making process. Specifically, the study attempts to solve this problem through the following points: 1. **Feature Engineering**: Utilizing LLMs to generate features from limited data, especially the background information of the founders. These features include educational background, work experience, etc. 2. **Founder Grading**: Introducing a more detailed grading method, dividing founders into 10 levels to better understand the performance of founders at different levels in terms of entrepreneurial success. 3. **Persona Segmentation**: Creating multiple "personas" based on the characteristics of the founders, further segmenting the founder group. 4. **Boolean Variables**: Adding 23 Boolean variables to capture more information about the founders, such as whether it is their first startup, whether they have a PhD, etc. 5. **Machine Learning Models**: Using three models—linear regression, random forest, and XGBoost—to predict the success rate of founders and comparing the performance of different models. Through these methods, the paper aims to reveal the potential relationship between certain founder characteristics and entrepreneurial success, and to demonstrate the effectiveness of these features in prediction, thereby providing support for VC companies in optimizing their investment strategies.