Bounds on the price of feedback for mistake-bounded online learning

Jesse Geneson,Linus Tang
2024-01-17
Abstract:We improve several worst-case bounds for various online learning scenarios from (Auer and Long, Machine Learning, 1999). In particular, we sharpen an upper bound for delayed ambiguous reinforcement learning by a factor of 2 and an upper bound for learning compositions of families of functions by a factor of 2.41. We also improve a lower bound from the same paper for learning compositions of $k$ families of functions by a factor of $\Theta(\ln{k})$, matching the upper bound up to a constant factor. In addition, we solve a problem from (Long, Theoretical Computer Science, 2020) on the price of bandit feedback with respect to standard feedback for multiclass learning, and we improve an upper bound from (Feng et al., Theoretical Computer Science, 2023) on the price of $r$-input delayed ambiguous reinforcement learning by a factor of $r$, matching a lower bound from the same paper up to the leading term.
Machine Learning,Discrete Mathematics,Combinatorics
What problem does this paper attempt to address?
This paper primarily explores the models of error bounds in online reinforcement learning, particularly focusing on error bounds for delayed fuzzy reinforcement learning and function composition learning scenarios. The authors improve upon some worst-case bounds previously proposed by Auer and Long in 1999. Specifically, they enhance the upper bound for delayed fuzzy reinforcement learning by reducing it by a factor of 2, and improve the upper bound for learning function composition by reducing it by approximately a factor of 2.41. Additionally, they address the issue of price between bandwidth feedback and standard feedback in multi-class learning, and optimize the upper bound for input delayed fuzzy reinforcement learning. The paper also investigates the difficulty of learning function composition and provides more precise upper bounds. In certain cases, they prove that the gap in upper bounds between learning delayed fuzzy reinforcement learning and bandwidth models is within a constant factor. Lastly, they present new bounds for finite error reinforcement learning (agnostic learning) and discuss future research directions.