>0,

(A.2.8)

▽ f (x) - XTJ2(x) - ^TJ3(x)

= 0.

(A.2.9)

This theorem has been derived by Kuhn & Tucker (1951). Proofs are also found in Collatz & Wetterling (1971), or Fletcher (1987). Equations (A.2.5) through (A.2.9) are known as (Karush5-) Kuhn-Tucker conditions ,they are also referred to as first-order (necessary) conditions. A point (x*, ^, A*) satisfying these conditions is called a Kuhn-Tucker point.

Introducing some restrictions on the functions f (x), F2(x), and F3(x), called constraint qualification by Kuhn and Tucker, certain irregularity conditions can be excluded - particularly at a stationary point. Different constraint qualifications are discussed for instance in Bomze & Grossmann (1993) and by Gill et al. (1981). A rather general formulation is given by Bock (1987).

Definition A.2.2 Let I(x') denote the set of active inequality constraints at x'. Let F3 be the function composed of all F3i with i e I(x'); J3 is the associated Jacobian. Let uT := (FT, FT) : IRn ^ RN, N := m + \I\. Let L(x, A) be the Lagrangian function of NLP defined in (A.2.4).

Definition A.2.3 Let JN = JN (x) = — be the Jacobian of u associated with the d x equations and active inequalities. A feasible point x' is called regular if rank(JN(x')) = N [Bock (1987, p. 48)].

Theorem A.2.4 Let x * e IRn be a regular point and a local minimizer of problem NLP. Then there exist vectors ^ and such that x*, and satisfy the (Karush-) Kuhn-Tucker conditions [Eqs. (A.2.5) through (A.2.9)].

Note that the difference between Theorems A.2.6 and A.2.1 is in the assumptions, i.e., constraint qualifications. If we further define the set of directions

J2(x+))P = 0, &*J3(x*)p = 0, V i e I(x+) [, (A.2.10)

d x2

5 It was later discovered that Karush (1939) had proven the same result in his 1939 master thesis at the University of Chicago. A survey paper by Kuhn (1976) gives a historical overview of the development of inequality-constrained optimization.

by extending the assumptions in Theorem A.2.1, the following results can be derived:

Theorem A.2.5 (Second-Order Necessary Conditions). If the functions f (x), F2(x), and F3(x) are twice differentiable, the second-order necessary conditions for the Kuhn-Tucker point (x+, ^, A+), being a local minimizer, is pTH(x„ p > 0, V p e T(x*). (A.2.12)

The interpretation of this theorem is that the Hessian of the Lagrangian function is positive semi-definite for all directions p e T(x*).

Theorem A.2.6 (Second-Order Sufficient Conditions). Let (x*, ^, A*) be a Kuhn-Tucker point of NLP, and for all directions p e T(x*), let the Hessian of the Lagrangian function be positive definite, i.e., pTH(x*, A*) p > 0, V p e T(x*). (A.2.13)

Then x* is a strict local minimum of NLP.

A proof of this theorem, further discussion of second-order conditions, and an even less restrictive formulation of the constraint qualification (based on a local characterization by linearized constraints) are given by Fletcher (1987).

For a special class of problems, namely convex problems, the following theorem based on the formulation in Ravindran et al. (1987) is useful:

Theorem A.2.7 (Kuhn-Tucker Sufficiency Theorem) . Consider the nonlinear programming problem, NLP. Let the objective function f (x) be convex, let F2(x) be linear, and let F3(x) be concave. If there exists a solution (x*, A*) satisfying the Kuhn-Tucker conditions (Eqs. (A.2.5) through (A.2.9)), then x* is an optimal solution to NLP.

A proof of this theorem can be found, e.g., in Kuhn & Tucker (1951), Collatz and Wetterling (1971), and Bomze & Grossmann (1993).

If the functions f (x), F2(x), and F3(x) satisfy the assumptions6 of Theorem A.2.7 the optimization problem is called a convex optimization problem. This class of problems has the property, cf. Papadimitriou & Steiglitz (1982), that local optimality (see Appendix A.1.1) implies global optimality, i.e., every local minimum of NLP is a global one, if the problem is convex.

Algorithms to solve (A.2.1) are found, for instance, in Gill et al. (1981) or Fletcher (1987). Most are based on linearization techniques. Inequalities are included, for instance, by applying active set methods. The most powerful nonlinear optimization algorithms are the Generalized Reduced Gradient algorithm (GRG) and Sequential Quadratic Programming (SQP) methods and Interior Point Methods

6 These assumptions guarantee that the feasible region is a convex set and that the objective function is a convex function.

(IPM) [see, for instance, Bazaraa et al. (1993) or Wright (1996)] for problems involving many inequalities. The GRG algorithm was first developed by Abadie & Carpenter (1969); further information is contained in Abadie (1978), Lasdon et al. (1978), Lasdon & Waren (1978), and Gill et al. (1981, Sect. 6.3)]. It is frequently used to solve nonlinear constrained optimization problems, it is rarely used to solve least-squares problems. A similar remark holds for IPM. A special class of this method includes inequalities by adding logarithmic penalties terms to the objective function. Then the problem can solved as a nonlinear optimization problem with possible equations but no inequalities. SQP methods are described in Appendix A.2.2 and they are applied to constrained least-squares problems (Schittkowski 1988). Finally, generalized GauB-Newton methods are an efficient approach to solve constrained least-squares problems.

SQP methods belong to the most powerful and frequently used nonlinear optimization algorithms (Stoer 1985) to solve problem (A.2.1). The basic idea is to solve (A.2.1) by a sequence of quadratic programming subproblems. The subproblem in iteration k appears as min {1 AxJHk¿x + V f (xk)T^x}, ¿x e JRn (A.2.14)

subject to

J3(xk)T^x + F3(xk) > 0, where the subscript k refers to quantities known prior to iteration k, and ¿x is the correction vector to be determined. This subproblem [cf. Gill et al. (1981, Sect. 6.5.3)] is obtained by linearizing the constraints and terminating the Taylor serious expansion of the objective function of (A.2.1) after the quadratic term; the constant term f(xk ) has been dropped. The necessary condition derived from the Lagrangian function associated with (A.2.14) is

Hk¿x + V f (xk) - J2(xk)Xk+1 - J3(xk)Ak+1 = 0. (A.2.15)

If, furthermore, Xk denotes the vector of Lagrange multipliers (for convenience we do not distinguish between X and ^ for equations and inequalities) known prior to iteration k, and ^xk, Xk, and ik represent the solution of (A.2.14), then the next iteration follows as

where ak > 0 is a damping factor.

For the solution of the quadratic subproblems the reader is referred to Gill et al. (1981, Sect. 5.3.2) or Fletcher (1987, Chap. 10).

A special case of unconstrained minimization arises when the objective function is of the form7 maximum likelihood estimator (Brandt, 1976, Chap. 7) for the unknown parameter vector x. This objective function dates back to GauB (1809) and in the mathematical literature the problem is synonymously called a least-squares or l2 approximation problem.

Was this article helpful?

Through this ebook, you are going to learn what you will need to know all about the telescopes that can provide a fun and rewarding hobby for you and your family!

## Post a comment