Abstract:The linear conjugate gradient method is an efficient iterative method for
the convex quadratic minimization problems $ \mathop {\min }\limits_{x \in { \mathbb R^n}} f(x) =\dfrac{1}{2}x^TAx+b^Tx $, where $ A \in R^{n \times n} $ is symmetric and positive definite and $ b \in R^n $. It is generally agreed that the gradients $ g_k $ are not conjugate with respective to $ A $ in the linear conjugate gradient method (see page 111 in Numerical optimization (2nd, Springer, 2006) by Nocedal and Wright). In the paper
we prove the conjugacy of the gradients $ g_k $ generated by the linear conjugate gradient method, namely, $$g_k^TAg_i=0, \; i=0,1,\cdots, k-2.$$ In addition,a new way is exploited to derive the linear conjugate gradient method based on the conjugacy of the search directions and the orthogonality of the gradients, rather than the conjugacy of the search directions and the exact stepsize.
What problem does this paper attempt to address?
This paper attempts to solve a fundamental problem in the linear conjugate gradient method, namely the conjugacy of gradients. Specifically, the paper aims to prove that the gradients \( g_k \) generated by the linear conjugate gradient method are conjugate with respect to matrix \( A \), that is, they satisfy the following condition:
\[ g_k^T A g_i = 0, \quad i = 0, 1, \cdots, k - 2 \]
This conclusion is different from the traditional view, which holds that when using the linear conjugate gradient method, the gradients \( g_k \) are not conjugate with respect to matrix \( A \). Through strict mathematical proofs, the paper corrects this common misunderstanding and provides a new theoretical basis.
In addition, the paper also proposes a new method based on gradient orthogonality and search - direction conjugacy to derive the linear conjugate gradient method. This method is different from the traditional method based on exact step - sizes. Instead, it selects the step - size \( \alpha_k \) such that the new gradient \( g_{k + 1} \) is orthogonal to the current gradient \( g_k \), that is:
\[ g_{k + 1}^T g_k = 0 \]
The selection of this step - size can be expressed as:
\[ \alpha_k = -\frac{g_k^T g_k}{g_k^T A d_k} \]
The paper proves that this new method is equivalent to the traditional linear conjugate gradient method and has the same convergence properties, that is, it converges to the optimal solution \( x^* \) within at most \( n \) steps.
In summary, the main contribution of this paper lies in correcting the common misunderstanding about the gradient conjugacy in the linear conjugate gradient method and providing a new derivation method, which provides new ideas for the design of optimization algorithms.