Abstract:The optimization-based meta-learning approach is gaining increased traction because of its unique ability to quickly adapt to a new task using only small amounts of data. However, existing optimization-based meta-learning approaches, such as MAML, ANIL and their variants, generally employ backpropagation for upper-level gradient estimation, which requires using historical lower-level parameters/gradients and thus increases computational and memory overhead in each iteration. In this paper, we propose a meta-learning algorithm that can avoid using historical parameters/gradients and significantly reduce memory costs in each iteration compared to existing optimization-based meta-learning approaches. In addition to memory reduction, we prove that our proposed algorithm converges sublinearly with the iteration number of upper-level optimization, and the convergence error decays sublinearly with the batch size of sampled tasks. In the specific case in terms of deterministic meta-learning, we also prove that our proposed algorithm converges to an exact solution. Moreover, we quantify that the computational complexity of the algorithm is on the order of $\mathcal{O}(\epsilon^{-1})$, which matches existing convergence results on meta-learning even without using any historical parameters/gradients. Experimental results on meta-learning benchmarks confirm the efficacy of our proposed algorithm.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is that existing optimization - based meta - learning methods need to use historical low - level parameters/gradients when calculating high - level gradients, resulting in increased computational and memory overheads in each iteration. Specifically:
1. **Problem Description**:
- Existing optimization - based meta - learning methods (such as MAML, ANIL, etc.) usually use back - propagation to estimate high - level gradients, which requires saving historical low - level parameters or gradients, thus increasing computational and memory overheads.
- When these methods handle large - scale tasks, as the number of low - level optimization iterations increases, memory consumption will increase significantly, and may lead to the problems of gradient vanishing or exploding.
2. **Paper Objectives**:
- Propose a new meta - learning algorithm that can estimate high - level gradients without using historical low - level parameters/gradients, thereby significantly reducing the memory cost in each iteration.
- Prove the convergence of the proposed algorithm and quantify its computational complexity to ensure its effectiveness in different scenarios.
3. **Specific Contributions**:
- **Reducing Memory Overhead**: Proposed an optimization method that does not need to use any historical parameters/gradients, ensuring that the memory overhead in each iteration is almost unchanged and reducing the memory consumption by at least 50% compared with existing methods.
- **Convergence and Computational Complexity**: Established the convergence rate and computational complexity of the algorithm in stochastic and deterministic meta - learning, proved that the algorithm converges at a sub - linear rate in stochastic meta - learning, and the error decays sub - linearly with the increase of the sampling task batch size; and can converge to the exact solution in deterministic meta - learning.
- **Improving Hyper - gradient Estimation**: By calculating the product of the inverse of the Hessian matrix and a vector instead of directly calculating the complete Hessian or Jacobian matrix, the computational complexity of hyper - gradient estimation is reduced from \(O(q^{2})\) or \(O(pq)\) to \(O(\max\{p, q\})\).
4. **Experimental Verification**:
- Experiments were carried out on multiple meta - learning benchmark datasets (such as CIFAR - FS, FC100, miniImageNet, tieredImageNet), and the results show that the proposed algorithm performs well in terms of learning accuracy and memory reduction.
In summary, this paper aims to solve the problems of high memory consumption and computational overhead in existing methods by proposing a new meta - learning algorithm, and demonstrates its effectiveness and superiority through theoretical analysis and experimental verification.