Psittacines of Innovation? Assessing the True Novelty of AI Creations

Anirban Mukherjee
2024-03-17
Abstract:We examine whether Artificial Intelligence (AI) systems generate truly novel ideas rather than merely regurgitating patterns learned during training. Utilizing a novel experimental design, we task an AI with generating project titles for hypothetical crowdfunding campaigns. We compare within AI-generated project titles, measuring repetition and complexity. We compare between the AI-generated titles and actual observed field data using an extension of maximum mean discrepancy--a metric derived from the application of kernel mean embeddings of statistical distributions to high-dimensional machine learning (large language) embedding vectors--yielding a structured analysis of AI output novelty. Results suggest that (1) the AI generates unique content even under increasing task complexity, and at the limits of its computational capabilities, (2) the generated content has face validity, being consistent with both inputs to other generative AI and in qualitative comparison to field data, and (3) exhibits divergence from field data, mitigating concerns relating to intellectual property rights. We discuss implications for copyright and trademark law.
Artificial Intelligence,Computation and Language,Human-Computer Interaction
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to explore whether artificial intelligence (AI) systems can generate truly novel ideas rather than simply repeating the patterns learned during the training process. Specifically, the author evaluates whether the content generated by AI is truly unique and original through an innovative experimental design. #### Main problems: 1. **Can AI generate truly novel ideas?** Researchers hope to verify whether AI can go beyond simple pattern replication when generating ideas and produce truly creative content. 2. **The uniqueness and complexity of AI - output content**: By comparing the item titles generated by AI, researchers measure the repetition rate and complexity of these titles to assess their novelty. 3. **The difference between AI - output content and actual data**: Compare the item titles generated by AI with the crowdfunding project titles in the real world, and use the extended Maximum Mean Discrepancy (MMD) method to quantify the uniqueness of AI output. #### Methodology: - **Experimental design**: Researchers let AI generate titles for hypothetical crowdfunding projects and compare them with the titles of real - world crowdfunding projects. - **Maximum Mean Discrepancy (MMD)**: This is a method based on Kernel Mean Embedding (KME) used to measure the difference between two distributions. The formula is as follows: \[ \text{MMD}^2(P, Q)=\mathbb{E}_{x, x' \sim P}[k(x, x')]+\mathbb{E}_{y, y' \sim Q}[k(y, y')]-2 \mathbb{E}_{x \sim P, y \sim Q}[k(x, y)] \] where \(k(\cdot, \cdot)\) is the kernel function, and \(P\) and \(Q\) represent the data distribution generated by AI and the actual data distribution respectively. #### Results: - The study found that AI can generate unique content as the task complexity increases. - The content generated by AI has surface validity, is similar to human - generated content, and shows consistency in qualitative comparison. - There are differences between the content generated by AI and the actual data, alleviating concerns in terms of intellectual property rights. #### Significance: This research not only helps to understand the capabilities of AI in idea generation but also has important implications for areas such as copyright law and trademark protection. It challenges the view that AI can only repeat existing patterns and provides evidence that AI can generate truly novel ideas.