InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
Guo Chen,Sen Xing,Zhe Chen,Yi Wang,Kunchang Li,Yizhuo Li,Yi Liu,Jiahao Wang,Yin-Dong Zheng,Bingkun Huang,Zhiyu Zhao,Junting Pan,Y. Huang,Zun Wang,Jiashuo Yu,Yuanqing He,Hongjie Zhang,Tiecheng Lu,Yali Wang,Limin Wang,Yu Qiao
DOI: https://doi.org/10.48550/arxiv.2211.09529
2022-01-01
Abstract:In this report, we present our champion solutions to five tracks at Ego4D challenge. We leverage our developed InternVideo, a video foundation model, for five Ego4D tasks, including Moment Queries, Natural Language Queries, Future Hand Prediction, State Change Object Detection, and Short-term Object Interaction Anticipation. InternVideo-Ego4D is an effective paradigm to adapt the strong foundation model to the downstream ego-centric video understanding tasks with simple head designs. In these five tasks, the performance of InternVideo-Ego4D comprehensively surpasses the baseline methods and the champions of CVPR2022, demonstrating the powerful representation ability of InternVideo as a video foundation model. Our code will be released at https://github.com/OpenGVLab/ego4d-eccv2022-solutions