Abstract:The need of processing and analyzing massive statistics simultaneously requires the derivatives of matrix-to-scalar functions (scalar-valued functions of matrices) or matrix-to-matrix functions (matrixvalued functions of matrices). Although derivatives of a matrix-to-scalar function have already been defined, the way to express it in algebraic expression, however, is not as clear as that of scalar-to-scalar functions (scalar-valued functions of scalars). Due to the fact that there does not exist a uniform way of applying “chain rule”on matrix derivation, we classify approaches utilized in existing schemes into two ways: the first relies on the index notation of several matrices, and they would be eliminated while being multiplied; the second relies on the vectorizing of matrices and thus they can be dealt with in the way we treat vector-tovector functions (vector-valued functions of vectors), which has already been settled. On one hand, we find that the first approach holds a much lower time complexity than that of the second approach in general. On the other hand, until now though we know most typical functions that can be derived in the first approach, theoretically the second approach is more generally fit for any routine of "chain rule." The result of the second approach, nevertheless, can be also simplified to the same order of time complexity with the first approach under certain conditions. Therefore, it is important to establish these conditions. In this paper, we establish a sufficient condition under which not only the first approach can be applied but also the time complexity of results obtained from the second approach can be reduced. This condition is described in two equivalent individual conditions, each of which is a counterpart of an approach sequentially. In addition, we generalize the methods and use these two approaches to do the derivatives under the two conditions individually. This paper enables us to unify the framework of matrix derivatives, which would result in various applications in science and engineering.

What is the gradient of a scalar function defined on a subspace of square matrices ?

What is the gradient of a scalar function of a symmetric matrix ?

Convergence Analysis of Projected Gradient Descent for Schatten-p Nonconvex Matrix Recovery

Gradient-type subspace iteration methods for the symmetric eigenvalue problem

Gradients on Matrix Manifolds and their Chain Rule

Convergence Analysis of Gradient Algorithms on Riemannian Manifolds Without Curvature Constraints and Application to Riemannian Mass

Low-complexity subspace-descent over symmetric positive definite manifold

Global Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation

Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot

A Nesterov-style Accelerated Gradient Descent Algorithm for the Symmetric Eigenvalue Problem

Subspace Quasi-Newton Method with Gradient Approximation

Gradients of Functions of Large Matrices

Subgradient Projection Algorithms for Convex Feasibility on Riemannian Manifolds with Lower Bounded Curvatures

A Geometric Understanding of Natural Gradient

Gradient-type Approaches to Inverse and Ill-Posed Problems of Mathematical Physics

Behind the Scenes of Gradient Descent: A Trajectory Analysis via Basis Function Decomposition

A generalized formulation for gradient schemes in unstructured finite volume method

Towards a Unified Framework of Matrix Derivatives.

Backpropagation through Back Substitution with a Backslash

Differentiating Matrix Functions