Abstract:Tree tensor networks, or tree-based tensor formats, are prominent model classes for the approximation of high-dimensional functions in computational and data science. They correspond to sum-product neural networks with a sparse connectivity associated with a dimension tree and widths given by a tuple of tensor ranks. The approximation power of these models has been proved to be (near to) optimal for classical smoothness classes. However, in an empirical risk minimization framework with a limited number of observations, the dimension tree and ranks should be selected carefully to balance estimation and approximation errors. We propose and analyze a complexity-based model selection method for tree tensor networks in an empirical risk minimization framework and we analyze its performance over a wide range of smoothness classes. Given a family of model classes associated with different trees, ranks, tensor product feature spaces and sparsity patterns for sparse tensor networks, a model is selected (à la Barron, Birgé, Massart) by minimizing a penalized empirical risk, with a penalty depending on the complexity of the model class and derived from estimates of the metric entropy of tree tensor networks. This choice of penalty yields a risk bound for the selected predictor. In a least-squares setting, after deriving fast rates of convergence of the risk, we show that our strategy is (near to) minimax adaptive to a wide range of smoothness classes including Sobolev or Besov spaces (with isotropic, anisotropic or mixed dominating smoothness) and analytic functions. We discuss the role of sparsity of the tensor network for obtaining optimal performance in several regimes. In practice, the amplitude of the penalty is calibrated with a slope heuristics method. Numerical experiments in a least-squares regression setting illustrate the performance of the strategy.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively select the model class of Tree Tensor Networks within the framework of empirical risk minimization. Specifically, the paper focuses on how to balance the estimation error and approximation error to select the optimal tree structure, ranks, feature space, and sparse pattern given limited observed data. These problems are particularly important in high - dimensional function approximation because high - dimensional problems usually lead to the "curse of dimensionality", that is, as the dimension increases, the amount of data required grows exponentially, making it difficult for traditional methods to handle effectively. ### Main contributions of the paper 1. **Complexity - based model selection strategy**: - A complexity - based model selection strategy is proposed, which selects the optimal model by minimizing the penalized empirical risk. - The penalty term depends on the complexity of the model class and is formally similar to \( \text{pen}(m) \sim O(\sqrt{C_m / n}) \) or \( \text{pen}(m) \sim O(C_m / n) \), where \( C_m \) is the number of parameters of the model or the number of non - zero parameters (for sparse tensor networks), and \( n \) is the number of samples. 2. **Theoretical guarantees**: - In the bounded least - squares setting, the paper proves that the proposed strategy is (approximately) minimax - adaptive in a wide range of smoothness classes (such as Sobolev or Besov spaces). - By deriving the concentration inequality of the empirical process, an upper bound of the estimation error is obtained, and the upper bound of the risk under a specific penalty term is further derived. 3. **Practical applications**: - The use of slope heuristics to calibrate the magnitude of the penalty term is proposed. - The performance of the proposed model selection strategy in the tensorized approximation of multivariate functions and one - dimensional functions is verified through numerical experiments. ### Key concepts - **Tree Tensor Networks**: A model for high - dimensional function approximation, which can be regarded as a feed - forward neural network with a specific sparse connection structure, using multilinear activation functions. - **Empirical Risk Minimization (ERM)**: Minimize the empirical risk on a given data set to select the optimal model. - **Complexity**: Measure the complexity of the model class, usually represented by the number of parameters or the number of non - zero parameters. - **Minimax Adaptivity**: In a given smoothness class, the model selection strategy can reach the optimal or near - optimal risk upper bound. ### Conclusion By proposing a complexity - based model selection strategy, the paper solves the problem of how to effectively select the Tree Tensor Networks model class within the framework of empirical risk minimization. This strategy is not only strictly proven theoretically but also shows good performance in practical applications. This provides a new and effective tool for high - dimensional function approximation and data analysis.

Learning with tree tensor networks: complexity estimates and model selection

Approximation Theory of Tree Tensor Networks: Tensorized Univariate Functions -- Part II

Active learning of tree tensor networks using optimal least-squares

Approximation Theory of Tree Tensor Networks: Tensorized Multivariate Functions

Tensor Graphical Model: Non-Convex Optimization and Statistical Inference

The art of BART: Minimax optimality over nonhomogeneous smoothness in high dimension

Higher-order principal component analysis for the approximation of tensors in tree-based low-rank formats

Optimal Sparse Regression Trees

Machine learning with tree tensor networks, CP rank constraints, and tensor dropout

Sparsity in Optimal Randomized Classification Trees

Compressing multivariate functions with tree tensor networks

Fast Sparse Decision Tree Optimization via Reference Ensembles

Weighted sparsity and sparse tensor networks for least squares approximation

High-dimensional classification by sparse logistic regression

Bayesian Regression Trees for High-Dimensional Prediction and Variable Selection

Time integration of tree tensor networks

On Statistical Efficiency in Learning

Learning High-Dependence Bayesian Network Classifier with Robust Topology

High-Dimensional Bayesian Optimization via Tree-Structured Additive Models

Tree-Projected Gradient Descent for Estimating Gradient-Sparse Parameters on Graphs

Guaranteed Scalable Learning of Latent Tree Models