Abstract:Text-to-video generation has made significant strides, but replicating the capabilities of advanced systems like OpenAI Sora remains challenging due to their closed-source nature. Existing open-source methods struggle to achieve comparable performance, often hindered by ineffective agent collaboration and inadequate training data quality. In this paper, we introduce Mora, a novel multi-agent framework that leverages existing open-source modules to replicate Sora functionalities. We address these fundamental limitations by proposing three key techniques: (1) multi-agent fine-tuning with a self-modulation factor to enhance inter-agent coordination, (2) a data-free training strategy that uses large models to synthesize training data, and (3) a human-in-the-loop mechanism combined with multimodal large language models for data filtering to ensure high-quality training datasets. Our comprehensive experiments on six video generation tasks demonstrate that Mora achieves performance comparable to Sora on VBench, outperforming existing open-source methods across various tasks. Specifically, in the text-to-video generation task, Mora achieved a Video Quality score of 0.800, surpassing Sora 0.797 and outperforming all other baseline models across six key metrics. Additionally, in the image-to-video generation task, Mora achieved a perfect Dynamic Degree score of 1.00, demonstrating exceptional capability in enhancing motion realism and achieving higher Imaging Quality than Sora. These results highlight the potential of collaborative multi-agent systems and human-in-the-loop mechanisms in advancing text-to-video generation. Our code is available at \url{<a class="link-external link-https" href="https://github.com/lichao-sun/Mora" rel="external noopener nofollow">this https URL</a>}.

From text to video with AI: the rise and potential of Sora in education and libraries

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

How OpenAI's text-to-video tool Sora could change science – and society

"Sora is Incredible and Scary": Emerging Governance Challenges of Text-to-Video Generative AI Models

Analysing the Public Discourse around OpenAI's Text-To-Video Model 'Sora' using Topic Modeling

Sora OpenAI's Prelude: Social Media Perspectives on Sora OpenAI and the Future of AI Video Generation

From Sora What We Can See: A Survey of Text-to-Video Generation

Text-to-video generative artificial intelligence: sora in neurosurgery

Concerns with OpenAI's Sora in Medicine

Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation

Factors Influencing User Willingness To Use SORA

AI & robotics briefing: How AI images and videos could change science

An Overview of OpenAI's Sora and Its Potential for Physics Engine Free Games and Virtual Reality

When Does Sora Show: The Beginning of TAO to Imaginative Intelligence and Scenarios Engineering

Hey Siri, tell me a story: Digital storytelling and AI authorship

What Matters in Detecting AI-Generated Videos like Sora?

NarrationBot and InfoBot: A Hybrid System for Automated Video Description

SARD: A Human-AI Collaborative Story Generation

Interactive Storytelling for Children: A Case-study of Design and Development Considerations for Ethical Conversational AI

Short Videos on Social Media as Catalysts for English Language Learning Beyond the Classroom

Mora: Enabling Generalist Video Generation via A Multi-Agent Framework