Syntax-Directed Derivative Code (Part I: Tangent-Linear Code)
U. Naumann
Abstract:This is the first instance in a series of papers on single-pass generation of various types of derivative code by syntax-directed translation. We consider the automatic generation of tangent-linear code by forward mode automatic differentiation implemented as the bottom-up propagation of synthesized attributes on the abstract syntax tree. A proof-of-concept implementation is presented based on a simple LALR(1) parser generated by the parser generator bison. The proposed technique can be generalized easily to provide a method for computing directional derivatives of mathematical vector functions that are implemented as computer programs in the context of computer algebra systems and compilers for scientific computing. The main advantage of the syntax-directed approach to automatic differentiation is its elegance in terms of the implementation. 1 Motivation and Summary of Results This paper presents a method for generating tangent-linear versions of numerical simulation programs1 that implement vector functions F : IR → IR, y = F (x), x = (xk)k=1,...,n, y = (yl)l=1,...,m , (1) automatically by syntax-directed translation. Such tangent-linear programs Ḟ = Ḟ (x, ẋ) compute directional derivatives ẏ, that is, products of the Jacobian matrix F ′ = (f ′ l,k) l=1,...,m k=1,...,n ≡ ( ∂yl ∂xk l=1,...,m k=1,...,n ∈ IR (2) with a direction ẋ in the input space IR. Formally, ẏ = Ḟ (x, ẋ) ≡ F ′ · ẋ . (3) To motivate the need for tangent-linear codes we consider a system of nonlinear equations F (x) = 0, F : IR → IR . (4) Given a good start estimate x0, the system can be solved by Newton’s method with quadratic convergence as follows: δx = −(F (x)) · F (x) x = x + δx for increasing integer values i. At each step the algorithm requires the Jacobian F ′ of F at the current estimate x. Finite difference quotients can be used to 1 Without loss of generality, we focus on a subset of C. approximate the entries of the Jacobian at the cost of n + 1 and 2n function evaluations when using forward (or backward) and centered differences, respectively. However, it is well-known that step size control is a problem as illustrated in Figure 1 and discussed, for example, in [13]. To avoid these problems, the tangent-linear program can be run with ẋ ranging over the Cartesian basis vectors in IR to obtain F ′ at roughly the same cost as that of (centered) finite differences but with machine accuracy. 0 2e-06 4e-06 6e-06 8e-06 1e-05 1.2e-05 1.4e-05 1.6e-05 1e-09 2e-09 3e-09 4e-09 5e-09 6e-09 7e-09 8e-09 9e-09 1e-08 "abs(tl-fd).data" "abs(tl-100*cos(100*x)).data" Fig. 1. Problems with finite differences: We show the absolute error of finite difference approximation of ∂y ∂x for y = sin(100 ∗x) at x = 0.5 and h ∈ (10, 10) (marked with ”+” symbols). The values computed by the tangent-linear code are identical with those of the hand-coded derivative (correct up to machine accuracy) ∂y ∂x = 100 ∗ cos(100 ∗ x). The ”×” points mark the vanishing absolute error for this case. Choosing a matrix-free approach, the Newton step can be obtained as the solution of the linear system F (x)δx = −F (x) (5) at each Newton iteration i. Direct methods may be prohibitive due to the potentially large size of the problem. Iterative methods are likely to be more suitable. For example, iterative refinement computes the iterates as δx = δx + B(F (x) − F (x) · δx) (6) for a suitable preconditioner B, such as an approximate inverse of F (x). Note that Equation (6), as well as matrix-free Krylov methods such as GMRES, involves the computation of the product of the Jacobian with a vector. A (forward) finite difference approximation of the Jacobian vector product in Equation (3) can be computed by perturbing the scalar input of the modified function F̃ : IR → IR, ỹ = F̃ (s) ≡ F ((s − 1) · ẋ + x) (7) 4 at s = 1.2 Hence, we compute y = F (x) and y = F (h · ẋ + x) to get ẏ ≈ y − y h . There is no need to form the whole Jacobian explicitly. Additional scaling in dependence of the value of a norm of ẋ may be required to avoid numerical instabilities as pointed out, for example, in [14]. Moreover, the usual problems with finite differences may occur. Tangent-linear codes should be used in order to circumvent both problems. The structure of this paper is as follows. In Section 2 we summarize the theoretical concepts behind forward mode automatic differentiation in the context of tangent-linear code generation by source transformation. The syntax-directed translation algorithm for straight-line programs is introduced in Section 3 – the heart of this paper. Generalizations for subroutines with intraprocedural flow of control and programs with interprocedural flow of control are discussed. A simple proof-of-concept implementation is presented in Section 4, making detailed references to the source code that is appended in Section A. We draw conclusions in Section 5 and give an outlook to part II of this paper which deals with the syntax-directed generation of adjoint code. 2 Fundamentals For a given implementation of a vector function as defined in Equation (1) as a computer program3 we use automatic differentiation (AD) [17], [9]4 by source transformation to generate code for computing directional first derivatives of F. Therefore the computation of y = F (x) is expected to decompose into a sequence of elemental assignments vj = φj(vi)i≺j (8) for j = 1, . . . , p+m, and i ≺ j if and only if vi is an argument of φj . Equation (8) is also referred to as the code list of F. We set vi−n = xi for i = 1, . . . , n and vp+j = yj for j = 1, . . . ,m. The vk, k = 1 − n, . . . , p + m, are called code list variables. Forward mode AD transforms F into the tangent-linear model Ḟ that computes a total (or directional) derivative as defined in Equation (3). The m × n Jacobian of F is defined in Equation (2). The transformation of the program semantics is achieved by applying well-known differentiation rules to the elemental functions φj ∈ {+,−, ∗, /, sin, exp, . . .} 5 2 The basic idea is the following: We need ∂xi ∂s = ẋi for i = 1, . . . , n. Therefore, we set x = ẋ · s + x̃ to determine suitable values for x̃. For s = 1 the right-hand side of x̃ ≡ x − ẋ can be substituted in F (x) to get Equation (7). 3 From now on we will use F to refer to this implementation as a computer program. 4 We follow the notation therein as closely as possible. 5 We consider a subset of the arithmetic operators and intrinsic functions provided by most programming languages. 5 followed by the exploitation of the chain rule as v̇j = ∑ i≺j cji · v̇i (9) for j = 1, . . . , p + m and total derivatives v̇k of the code list variables vk, k = 1 − n, . . . , p + m. The elemental functions φj are assumed to be continuously differentiable in a neighborhood of the current argument. The corresponding local partial derivatives are denoted by cji = ∂φj ∂vi for j = 1, . . . , p + m, i ≺ j , where the independent variables vi−n = xi, i = 1, . . . , n, are assumed to be mutually independent. The basic approach dates back to [19]. Since then, AD has been used in the context of a large number of projects in computational science and engineering providing fast and accurate derivative information for a wide variety of highly relevant applications. Many of them are documented in the proceedings of the four international conferences [7], [2], [6], and [5]. Source transformation tools for AD (see, for example, ADIFOR [3], ADIC [4], the differentiation-enabled NAGWare Fortran 95 compiler [16], TAPENADE [12], TAF [8], and OpenAD [20]) that provide a basic forward mode generate tangent-linear code as an augmentation of the code list by statements for computing directional derivatives as illustrated by the following example which shows the tangent-linear code of ”y = sin(x ∗ 2)”. v̇1 = ẋ v1 = x v̇2 = 0 v2 = 2 v̇3 = v̇1 ∗ v2 + v1 ∗ v̇2 v3 = v1 ∗ v2 v̇4 = cos(v3) ∗ v̇3 v4 = sin(v3) ẏ = v̇4 y = v4 . Less trivial examples follow after explaining the syntax-directed approach to the generation of tangent-linear code in the following section. 3 Syntax-Directed Tangent-Linear Codes The main conceptual issues of the syntax-directed construction of tangent-linear codes can be discussed in the context of simple sequences of scalar assignments as defined in Definition 1. The presence of control-flow structures – both intra(loops, branches, or arbitrary jumps in the form of goto statements) and interprocedural (subroutine calls and calls of user-defined functions) – adds little to the algorithmic details behind the proposed method as explained in Section 3.4 and in Section 3.5. Definition 1. A straight-line program (SLP) is a sequence of scalar assignments described by the context-free grammar G = (N,T, P, s) with nonterminal symbols N = { s (straight-line program) a (assignment) e (expression) }