What is the gradient of a scalar function defined on a subspace of square matrices ?

Srinivasan, Shriram,Panda, Nishant
DOI: https://doi.org/10.1007/s13226-024-00594-4
2024-04-25
Indian Journal of Pure and Applied Mathematics
Abstract:We illustrate a technique to calculate the gradient of scalar functions that are defined on any arbitrary matrix subspace. It generalizes our earlier work titled "What is the gradient of a scalar function of a symmetric matrix ?"( Indian Journal of Pure and Applied Mathematics (2022), https://doi.org/10.1007/s13226-022-00313-x ), in which we considered the special case of the subspace of symmetric matrices. Extant methods to calculate the gradient in such cases have an inherent flaw which leads to spurious results that populate several publications, as well as respected textbooks and handbooks on matrix calculus. We examine these sources and results in a rigorous and concrete mathematical setting of a finite-dimensional inner-product space and discover the inherent flaw and also a remedy. We demonstrate two ways to calculate the derivative/gradient and second derivative for scalar functions of matrices defined over an arbitrary matrix subspace; the first method is by considering any (differentiable) extension to the space of square matrices and projection of its gradient onto the given subspace. The second method utilizes an ordered basis and computes each component of the gradient through evaluation of the directional derivative. All the ideas presented are illustrated by non-trivial examples, namely, considering the subspace of circulant and Toeplitz matrices and presenting the results of gradient-descent with both the spurious and correct gradients. Moreover, our bibliography makes it clear that a rigorous approach to matrix calculus is not common in practice, and our presentation of matrix calculus in the language of inner-product spaces will be significant and meaningful for applied mathematicians, engineers and researchers working in inter-disciplinary fields to avoid the conceptual pitfalls that exist.
mathematics
What problem does this paper attempt to address?