Sora Generates Videos with Stunning Geometrical Consistency

Xuanyi Li,Daquan Zhou,Chenxu Zhang,Shaodong Wei,Qibin Hou,Ming-Ming Cheng
2024-02-27
Abstract:The recently developed Sora model [1] has exhibited remarkable capabilities in video generation, sparking intense discussions regarding its ability to simulate real-world phenomena. Despite its growing popularity, there is a lack of established metrics to evaluate its fidelity to real-world physics quantitatively. In this paper, we introduce a new benchmark that assesses the quality of the generated videos based on their adherence to real-world physics principles. We employ a method that transforms the generated videos into 3D models, leveraging the premise that the accuracy of 3D reconstruction is heavily contingent on the video quality. From the perspective of 3D reconstruction, we use the fidelity of the geometric constraints satisfied by the constructed 3D models as a proxy to gauge the extent to which the generated videos conform to real-world physics rules. Project page:
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper mainly addresses the following issues: 1. **Video Generation Quality Assessment**: Although the Sora model performs excellently in video generation and can produce highly realistic and logically consistent videos, there is currently a lack of a quantitative method to assess whether the generated videos conform to real-world physical rules. Therefore, the paper introduces a new benchmark to evaluate the quality of video generation through 3D reconstruction technology. 2. **Geometric Consistency Measurement**: Traditional video quality assessment metrics (such as Fréchet Inception Distance (FID), Frechet Video Distance (FVD), etc.) mainly focus on frame-to-frame consistency, motion harmony, and text-to-video consistency, but these metrics fail to cover the geometric quality of video generation. Hence, the paper proposes using the quality of 3D reconstruction as a new method to evaluate the geometric consistency of video generation. 3. **Experimental Validation**: By collecting videos generated by Sora and comparing them with other leading models (such as Gen2 and Pika), the paper demonstrates Sora's significant advantages in 3D reconstruction, especially in maintaining geometric consistency. In summary, this paper aims to address the shortcomings of existing video generation model evaluation methods in terms of geometric consistency by introducing a new evaluation standard based on 3D reconstruction, and to prove the superior performance of the Sora model in this field.