Abstract:Recent advancements in large language models (LLMs) focus on aligning to heterogeneous human expectations and values via multi-objective preference alignment. However, existing methods are dependent on the policy model parameters, which require high-cost repetition of their alignment algorithms for each new policy model, and they cannot expand to unseen objectives due to their static alignment objectives. In this work, we propose Meta-Objective Aligner (MetaAligner), the first policy-agnostic and generalizable method for multi-objective preference alignment. MetaAligner models multi-objective alignment into three stages: (1) dynamic objectives reformulation algorithm reorganizes traditional alignment datasets to supervise the model on performing flexible alignment across different objectives; (2) conditional weak-to-strong correction paradigm aligns the weak outputs of fixed policy models to approach strong outputs with higher preferences in the corresponding alignment objectives, enabling plug-and-play inferences on any policy models, which significantly reduces training costs and facilitates alignment on close-source policy models; (3) generalizable inference method flexibly adjusts target objectives by updating their text descriptions in the prompts, facilitating generalizable alignment to unseen objectives. Experimental results show that MetaAligner achieves significant and balanced improvements in multi-objective alignments on 10 state-of-the-art policy models, and saves up to 93.63% of GPU training hours compared to previous alignment methods. The model also effectively aligns unseen objectives, marking the first step towards generalizable multi-objective preference alignment.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve two main challenges in multi - objective preference alignment for large language models (LLMs): 1. **High - cost repeated alignment algorithms**: Existing multi - objective alignment methods rely on the parameters of the policy model, and every time a new policy model is introduced, the high - cost alignment algorithm needs to be re - run. This is incompatible with the rapid iteration and ever - increasing scale of current base models. 2. **Limitations of static alignment objectives**: Existing methods can only perform static alignment on pre - defined objectives and lack the ability to extend to unseen objectives, resulting in poor generalization ability of the alignment method. To solve these problems, the paper proposes **Meta - Objective Aligner (MetaAligner)**, a policy - independent and generalizable multi - objective preference alignment method. MetaAligner achieves multi - objective alignment through three stages: 1. **Dynamic objective restatement algorithm**: Reorganize the traditional alignment data set into a dynamic objective alignment data set, enabling MetaAligner to align flexibly under different objective combinations. 2. **Weak - to - strong conditional correction paradigm**: Align the weak output of the policy model to the strong output with higher preference, thereby achieving plug - and - play reasoning, significantly reducing training costs, and supporting the alignment of closed - source policy models. 3. **Generalized reasoning method**: Flexibly adjust the objectives by updating the objective descriptions in the prompts, enabling MetaAligner to adapt to unseen objectives and implement new alignment strategies through in - context learning. Experimental results show that MetaAligner achieves significant and balanced multi - objective alignment improvements on multiple state - of - the - art policy models while saving up to 93.63% of GPU training time. In addition, MetaAligner can also effectively align unseen objectives, marking the first step towards generalized multi - objective preference alignment.

MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models

MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time

Hybrid Alignment Training for Large Language Models

Aligner: Efficient Alignment by Learning to Correct

AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability

COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences

On Diversified Preferences of Large Language Model Alignment

Panacea: Pareto Alignment via Preference Adaptation for LLMs

Parameter-Efficient Tuning Helps Language Model Alignment

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models

Aligners: Decoupling LLMs and Alignment

NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

Decoding-Time Language Model Alignment with Multiple Objectives

ABC Align: Large Language Model Alignment for Safety & Accuracy

Aligning Large Language Models via Self-Steering Optimization

Aligner: One Global Token is Worth Millions of Parameters when Aligning Large Language Models

Fine-Tuning Language Models with Advantage-Induced Policy Alignment

Aligning LLMs with Individual Preferences via Interaction

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization