Abstract:Speech-driven facial animation is important for many applications including TV, film, video games, telecommunication and AR/VR. Recently, transformers have been shown to be extremely effective for this task. However, we identify two issues with the existing transformer-based models. Firstly, they are difficult to adapt to new personalised speaking styles and secondly, they are slow to run for long sentences due to the quadratic complexity of the transformer. We propose TalkLoRA to address both of these issues. TalkLoRA uses Low-Rank Adaptation to effectively and efficiently adapt to new speaking styles, even with limited data. It does this by training an adaptor with a small number of parameters for each subject. We also utilise a chunking strategy to reduce the complexity of the underlying transformer, allowing for long sentences at inference time. TalkLoRA can be applied to any transformer-based speech-driven animation method. We perform extensive experiments to show that TalkLoRA archives state-of-the-art style adaptation and that it allows for an order-of-complexity reduction in inference times without sacrificing quality. We also investigate and provide insights into the hyperparameter selection for LoRA fine-tuning of speech-driven facial animation models.

What problem does this paper attempt to address?

This paper attempts to solve two main problems: 1. **Adaptability of personalized speaking styles**: Existing Transformer - based voice - driven facial animation models have difficulty adapting to new personalized speaking styles. This means that when dealing with new users or new characters, these models cannot well capture and reproduce their unique voice features and facial expressions. 2. **Slow inference speed for long sentences**: Since the time complexity of the Transformer model is \(O(N^2)\), where \(N\) is the length of the animation sequence, this makes existing models very slow in inference when processing long sentences. Specifically, when generating facial expressions at time \(t\), the model will consider all audio information from \(0\) to \(t - 1\), which not only increases the computational burden, but also, for the facial animation task, this full - history - dependence is unnecessary and unreasonable. To solve these problems, the authors propose the **TalkLoRA** method, which specifically includes the following two key improvements: - **Low - Rank Adaptation (LoRA)**: By introducing an adapter with a small number of parameters, TalkLoRA can efficiently adapt to new speaking styles and achieve good results even with a limited amount of data. This method avoids the over - fitting risk brought by fine - tuning the entire model and can quickly adapt to new identities. - **Chunking Strategy**: To improve the inference speed of long sentences, TalkLoRA adopts a chunking strategy, which divides the input audio into fixed - size overlapping chunks for parallel processing. This can significantly reduce the computational complexity, enabling the model to process longer audio sequences while maintaining high quality. Through these two improvements, TalkLoRA not only improves the adaptability and inference efficiency of the model, but also is applicable to any Transformer - based voice - driven facial animation model.

TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation

HyperLoRA: Efficient Cross-task Generalization Via Constrained Low-Rank Adapters Generation

The Expressive Power of Low-Rank Adaptation

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models

Batched Low-Rank Adaptation of Foundation Models

Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

A Note on LoRA

Matrix-Transformation Based Low-Rank Adaptation (MTLoRA): A Brain-Inspired Method for Parameter-Efficient Fine-Tuning

V-LoRA: an Efficient and Flexible System Boosts Vision Applications with LoRA LMM

LoRTA: Low Rank Tensor Adaptation of Large Language Models

LoRA: Low-Rank Adaptation of Large Language Models

Run LoRA Run: Faster and Lighter LoRA Implementations

ResLoRA: Identity Residual Mapping in Low-Rank Adaption

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

LoRA-Mini : Adaptation Matrices Decomposition and Selective Training

MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models

A Survey on LoRA of Large Language Models