Trainable Weight Averaging: Efficient Training by Optimizing Historical Solutions.

Tao Li,Zhehao Huang,Qinghua Tao,Yingwen Wu,Xiaolin Huang
2023-01-01
Abstract:or EMA and manifests better adaptation to different stages of training. We further design a parallel framework for large-scale training with efficiency in memory and computation. Extensive experiments demonstrate the superior performance of TWA on benchmark computer vision tasks with various architectures.
What problem does this paper attempt to address?