Abstract:Robust face clustering is a vital step in enabling computational understanding of visual character portrayal in media. Face clustering for long-form content is challenging because of variations in appearance and lack of supporting large-scale labeled data. Our work in this paper focuses on two key aspects of this problem: the lack of domain-specific training or benchmark datasets, and adapting face embeddings learned on web images to long-form content, specifically movies. First, we present a dataset of over 169,000 face tracks curated from 240 Hollywood movies with weak labels on whether a pair of face tracks belong to the same or a different character. We propose an offline algorithm based on nearest-neighbor search in the embedding space to mine hard-examples from these tracks. We then investigate triplet-loss and multiview correlation-based methods for adapting face embeddings to hard-examples. Our experimental results highlight the usefulness of weakly labeled data for domain-specific feature adaptation. Overall, we find that multiview correlation-based adaptation yields more discriminative and robust face embeddings. Its performance on downstream face verification and clustering tasks is comparable to that of the state-of-the-art results in this domain. We also present the SAIL-Movie Character Benchmark corpus developed to augment existing benchmarks. It consists of racially diverse actors and provides face-quality labels for subsequent error analysis. We hope that the large-scale datasets developed in this work can further advance automatic character labeling in videos. All resources are available freely at <a class="link-external link-https" href="https://sail.usc.edu/~ccmi/multiface" rel="external noopener nofollow">this https URL</a>.

Cast2Face

Semi-supervised cast indexing for feature-length films

FaceSwapNet: Landmark Guided Many-to-Many Face Reenactment

Automatic Naming of Speakers in Video via Name-Face Mapping.

APB2FaceV2: Real-Time Audio-Guided Multi-Face Reenactment

Real-Time Audio-Guided Multi-Face Reenactment

Face identification using reference-based features with message passing model

Cast indexing for videos by NCuts and page ranking

Learning to Name Faces

Actor Identification Via Mining Representative Actions

Context-Oriented Name-Face Association in Web Videos.

Robust 3D Face Recognition by Local Shape Difference Boosting

“Who are you?” - Learning person specific classifiers from video

FANS: Face Annotation by Searching Large-scale Web Facial Images.(2013). Research Collection School Of Information Systems

From Benedict Cumberbatch to Sherlock Holmes: Character Identification in TV series without a Script

Robust Character Labeling in Movie Videos: Data Resources and Self-supervised Feature Adaptation

Improving Automatic Name-Face Association Using Celebrity Images on the Web

Unsupervised Manga Character Re-identification via Face-body and Spatial-temporal Associated Clustering

Dynamic Character Graph via Online Face Clustering for Movie Analysis

Community as a connector: associating faces with celebrity names in web videos.

CAST: Cross-Attention in Space and Time for Video Action Recognition