Abstract:The process of aligning a pair of shapes is a fundamental operation in computer graphics. Traditional approaches rely heavily on matching corresponding points or features to guide the alignment, a paradigm that falters when significant shape portions are missing. These techniques generally do not incorporate prior knowledge about expected shape characteristics, which can help compensate for any misleading cues left by inaccuracies exhibited in the input shapes. We present an approach based on a deep neural network, leveraging shape datasets to learn a shape-aware prior for source-to-target alignment that is robust to shape incompleteness. In the absence of ground truth alignments for supervision, we train a network on the task of shape alignment using incomplete shapes generated from full shapes for self-supervision. Our network, called ALIGNet, is trained to warp complete source shapes to incomplete targets, as if the target shapes were complete, thus essentially rendering the alignment partial-shape agnostic. We aim for the network to develop specialized expertise over the common characteristics of the shapes in each dataset, thereby achieving a higher-level understanding of the expected shape space to which a local approach would be oblivious. We constrain ALIGNet through an anisotropic total variation identity regularization to promote piecewise smooth deformation fields, facilitating both partial-shape agnosticism and post-deformation applications. We demonstrate that ALIGNet learns to align geometrically distinct shapes, and is able to infer plausible mappings even when the target shape is significantly incomplete. We show that our network learns the common expected characteristics of shape collections, without over-fitting or memorization, enabling it to produce plausible deformations on unseen data during test time.

AlignNet: A Unifying Approach to Audio-Visual Alignment

Aligning Audio-Visual Joint Representations with an Agentic Workflow

Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis

AlignNet: Learning dataset score alignment functions to enable better training of speech quality estimators

Data-efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions.

Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets

How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition

Video alignment using unsupervised learning of local and global features

Video-to-Audio Generation with Hidden Alignment

Generating Visually Aligned Sound from Videos

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

ProAlignNet : Unsupervised Learning for Progressively Aligning Noisy Contours

HANet: Hierarchical Alignment Networks for Video-Text Retrieval

Fine-grained Cross-modal Alignment Network for Text-Video Retrieval

Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning

ALCAP: Alignment-Augmented Music Captioner

On-Line Audio-to-Lyrics Alignment Based on a Reference Performance

AutoMatch: A Large-scale Audio Beat Matching Benchmark for Boosting Deep Learning Assistant Video Editing

ALIGNet: Partial-Shape Agnostic Alignment via Unsupervised Learning

CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization

Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization