Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality in Approximation on Hölder Class.
Yuling Jiao,Yanming Lai,Xiliang Lu,Fengru Wang,Jerry zhijian Yang,Yuanyuan Yang
DOI: https://doi.org/10.1137/21m144431x
IF: 2.071
2023-01-01
SIAM Journal on Mathematical Analysis
Abstract:In this paper, we construct neural networks with ReLU, sine, and 2xas activationfunctions. For a general continuousfdefined on [0,1]dwith continuity modulus\omega f(\cdot ), we constructReLU-sine-2xnetworks that enjoy an approximation rate\scrO (\omega f(\surd d)\cdot 2 - M+\omega f(\surd dN)), whereM,N\in \BbbN +are the hyperparameters related to widths of the networks. As a consequence, we can constructReLU-sine-2xnetwork with the depth 6 and width max\{ 2d\lceil log2(\surd d(3\mu \epsilon )1/\alpha )\rceil ,. .2\lceil log23\mu d\alpha /22\epsilon \rceil + 2\} that approximatesf\in \scrH \alpha \mu ([0,1]d) within a given tolerance\epsilon >0 measured in theLpnorm withp\in [1,\infty ), where\scrH \alpha \mu ([0,1]d) denotes the H\"older continuous function class defined on [0,1]dwithorder\alpha \in (0,1] and constant\mu >0. Therefore, the ReLU-sine-2xnetworks overcome the curseof dimensionality in an approximation on\scrH \alpha \mu ([0,1]d). In addition to its super expressive power,functions implemented by ReLU-sine-2xnetworks are (generalized) differentiable, enabling us toapply stochastic gradient descent to train.