The Cultivated Practices of Text-to-Image Generation

Jonas Oppenlaender
DOI: https://doi.org/10.1007/978-3-031-66528-8
2024-09-02
Abstract:Humankind is entering a novel creative era in which anybody can synthesize digital information using generative artificial intelligence (AI). Text-to-image generation, in particular, has become vastly popular and millions of practitioners produce AI-generated images and AI art online. This chapter first gives an overview of the key developments that enabled a healthy co-creative online ecosystem around text-to-image generation to rapidly emerge, followed by a high-level description of key elements in this ecosystem. A particular focus is placed on prompt engineering, a creative practice that has been embraced by the AI art community. It is then argued that the emerging co-creative ecosystem constitutes an intelligent system on its own - a system that both supports human creativity, but also potentially entraps future generations and limits future development efforts in AI. The chapter discusses the potential risks and dangers of cultivating this co-creative ecosystem, such as the bias inherent in today's training data, potential quality degradation in future image generation systems due to synthetic data becoming common place, and the potential long-term effects of text-to-image generation on people's imagination, ambitions, and development.
Computers and Society,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to explore the development of the emerging technology of Text-to-Image Generation and its impact on society and culture. Specifically, the paper focuses on the following aspects: 1. **Technological Development**: The paper first outlines the key technological advancements that have rapidly developed text-to-image generation, including Generative Adversarial Networks (GANs) and Diffusion Models. 2. **Ecosystem**: The paper describes the creative online ecosystem formed around text-to-image generation, including communities, learning resources, and tool services. It particularly emphasizes the creative practice of Prompt Engineering, which involves guiding the generation of images in specific styles through carefully designed text prompts. 3. **Potential Risks and Challenges**: The paper discusses the potential risks and dangers of fostering this collaborative creative ecosystem, mainly including: - **Data Bias**: There may be biases from Western perspectives in the current training data, which could lead to generated images having specific cultural inclinations. - **Quality Degradation**: With the widespread use of synthetic data, the image quality of future generation systems may decline. - **Long-term Impact**: Text-to-image generation may have long-term effects on people's creativity, imagination, and development. - **Privacy Issues**: Diffusion models may memorize and replicate instances from the training data, leading to privacy leaks and copyright issues. - **Socioeconomic Impact**: This technology may threaten certain professions, such as illustrators, designers, and artists. 4. **Ethical and Legal Issues**: The paper also explores the legal and ethical controversies surrounding text-to-image generation, such as the legality of using web-scraped data for training and whether the generated content infringes on copyrights. Overall, this paper aims to comprehensively analyze the current state of development, ecosystem, and potential social impacts of text-to-image generation technology, providing a reference for future research and applications.