Scaling in Deep and Shallow Learning Architectures

Ella Koresh,Tal Halevi,Yuval Meir,Dolev Dilmoney,Tamar Dror,Ronit Gross,Ofek Tevet,Shiri Hodassman,Ido Kanter
DOI: https://doi.org/10.1016/j.physa.2024.129909
IF: 3.778
2024-06-21
Physica A Statistical Mechanics and its Applications
Abstract:The realization of classification tasks using deep learning is a primary goal of artificial intelligence; however, its possible universal behavior remains unexplored. Herein, we demonstrate a scaling behavior for the test error, ε , as a function of the number of classified labels, K. For trained utmost deep architectures on CIFAR-100 ε(K)∝Kρ with ρ∼1 , and in case of reduced deep architectures, ρ continuously decreases until a crossover to ε(K)∝log(K) is observed for shallow architectures. A similar crossover is observed for shallow architectures, where the number of filters in the convolutional layers is proportionally increased. This unified the scaling behavior of deep and shallow architectures, which yields a reduced latency method. The dependence of Δε/ΔK on the trained architecture is expected to be crucial in learning scenarios involving dynamic number of labels.
physics, multidisciplinary
What problem does this paper attempt to address?