fbpx
Wikipedia

Lagrange multiplier

In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equation constraints (i.e., subject to the condition that one or more equations have to be satisfied exactly by the chosen values of the variables).[1] It is named after the mathematician Joseph-Louis Lagrange.

Summary and rationale Edit

The basic idea is to convert a constrained problem into a form such that the derivative test of an unconstrained problem can still be applied. The relationship between the gradient of the function and gradients of the constraints rather naturally leads to a reformulation of the original problem, known as the Lagrangian function.[2] In the general case, the Lagrangian is defined as

 

for functions  ;   is called the Lagrange multiplier.

In the simple case, this simplifies to

 

The method can be summarized as follows: in order to find the maximum or minimum of a function   subjected to the equality constraint  , find the stationary points of   considered as a function of   and the Lagrange multiplier  . This means that all partial derivatives should be zero, including the partial derivative with respect to  .[3]

  and  

or equivalently

  and  

The solution corresponding to the original constrained optimization is always a saddle point of the Lagrangian function,[4][5] which can be identified among the stationary points from the definiteness of the bordered Hessian matrix.[6]

The great advantage of this method is that it allows the optimization to be solved without explicit parameterization in terms of the constraints. As a result, the method of Lagrange multipliers is widely used to solve challenging constrained optimization problems. Further, the method of Lagrange multipliers is generalized by the Karush–Kuhn–Tucker conditions, which can also take into account inequality constraints of the form   for a given constant  .

Statement Edit

The following is known as the Lagrange multiplier theorem.[7]

Let   be the objective function,   be the constraints function, both belonging to   (that is, having continuous first derivatives). Let   be an optimal solution to the following optimization problem such that, for the matrix of partial derivatives  ,  :

 
 

Then there exists a unique Lagrange multiplier   such that  

The Lagrange multiplier theorem states that at any local maximum (or minimum) of the function evaluated under the equality constraints, if constraint qualification applies (explained below), then the gradient of the function (at that point) can be expressed as a linear combination of the gradients of the constraints (at that point), with the Lagrange multipliers acting as coefficients.[8] This is equivalent to saying that any direction perpendicular to all gradients of the constraints is also perpendicular to the gradient of the function. Or still, saying that the directional derivative of the function is 0 in every feasible direction.

Single constraint Edit

 
Figure 1: The red curve shows the constraint g(x, y) = c. The blue curves are contours of f(x, y). The point where the red constraint tangentially touches a blue contour is the maximum of f(x, y) along the constraint, since d1 > d2 .

For the case of only one constraint and only two choice variables (as exemplified in Figure 1), consider the optimization problem

 
 

(Sometimes an additive constant is shown separately rather than being included in  , in which case the constraint is written   as in Figure 1.) We assume that both   and   have continuous first partial derivatives. We introduce a new variable ( ) called a Lagrange multiplier (or Lagrange undetermined multiplier) and study the Lagrange function (or Lagrangian or Lagrangian expression) defined by

 

where the   term may be either added or subtracted. If   is a maximum of   for the original constrained problem and   then there exists   such that ( ) is a stationary point for the Lagrange function (stationary points are those points where the first partial derivatives of   are zero). The assumption   is called constraint qualification. However, not all stationary points yield a solution of the original problem, as the method of Lagrange multipliers yields only a necessary condition for optimality in constrained problems.[9][10][11][12][13] Sufficient conditions for a minimum or maximum also exist, but if a particular candidate solution satisfies the sufficient conditions, it is only guaranteed that that solution is the best one locally – that is, it is better than any permissible nearby points. The global optimum can be found by comparing the values of the original objective function at the points satisfying the necessary and locally sufficient conditions.

The method of Lagrange multipliers relies on the intuition that at a maximum, f(x, y) cannot be increasing in the direction of any such neighboring point that also has g = 0. If it were, we could walk along g = 0 to get higher, meaning that the starting point wasn't actually the maximum. Viewed in this way, it is an exact analogue to testing if the derivative of an unconstrained function is 0, that is, we are verifying that the directional derivative is 0 in any relevant (viable) direction.

We can visualize contours of f given by f(x, y) = d for various values of d, and the contour of g given by g(x, y) = c.

Suppose we walk along the contour line with g = c . We are interested in finding points where f almost does not change as we walk, since these points might be maxima.

There are two ways this could happen:

  1. We could touch a contour line of f, since by definition f does not change as we walk along its contour lines. This would mean that the tangents to the contour lines of f and g are parallel here.
  2. We have reached a "level" part of f, meaning that f does not change in any direction.

To check the first possibility (we touch a contour line of f), notice that since the gradient of a function is perpendicular to the contour lines, the tangents to the contour lines of f and g are parallel if and only if the gradients of f and g are parallel. Thus we want points (x, y) where g(x, y) = c and

 

for some  

where

 

are the respective gradients. The constant   is required because although the two gradient vectors are parallel, the magnitudes of the gradient vectors are generally not equal. This constant is called the Lagrange multiplier. (In some conventions   is preceded by a minus sign).

Notice that this method also solves the second possibility, that f is level: if f is level, then its gradient is zero, and setting   is a solution regardless of  .

To incorporate these conditions into one equation, we introduce an auxiliary function

 

and solve

 

Note that this amounts to solving three equations in three unknowns. This is the method of Lagrange multipliers.

Note that   implies   as the partial derivative of   with respect to   is  

To summarize

 

The method generalizes readily to functions on   variables

 

which amounts to solving n + 1 equations in n + 1 unknowns.

The constrained extrema of f are critical points of the Lagrangian  , but they are not necessarily local extrema of   (see Example 2 below).

One may reformulate the Lagrangian as a Hamiltonian, in which case the solutions are local minima for the Hamiltonian. This is done in optimal control theory, in the form of Pontryagin's minimum principle.

The fact that solutions of the Lagrangian are not necessarily extrema also poses difficulties for numerical optimization. This can be addressed by computing the magnitude of the gradient, as the zeros of the magnitude are necessarily local minima, as illustrated in Example 5: Numerical optimization.

Multiple constraints Edit

 
Figure 2: A paraboloid constrained along two intersecting lines.
 
Figure 3: Contour map of Figure 2.

The method of Lagrange multipliers can be extended to solve problems with multiple constraints using a similar argument. Consider a paraboloid subject to two line constraints that intersect at a single point. As the only feasible solution, this point is obviously a constrained extremum. However, the level set of   is clearly not parallel to either constraint at the intersection point (see Figure 3); instead, it is a linear combination of the two constraints' gradients. In the case of multiple constraints, that will be what we seek in general: The method of Lagrange seeks points not at which the gradient of   is multiple of any single constraint's gradient necessarily, but in which it is a linear combination of all the constraints' gradients.

Concretely, suppose we have   constraints and are walking along the set of points satisfying   Every point   on the contour of a given constraint function   has a space of allowable directions: the space of vectors perpendicular to   The set of directions that are allowed by all constraints is thus the space of directions perpendicular to all of the constraints' gradients. Denote this space of allowable moves by   and denote the span of the constraints' gradients by   Then   the space of vectors perpendicular to every element of  

We are still interested in finding points where   does not change as we walk, since these points might be (constrained) extrema. We therefore seek   such that any allowable direction of movement away from   is perpendicular to   (otherwise we could increase   by moving along that allowable direction). In other words,   Thus there are scalars   such that

 

These scalars are the Lagrange multipliers. We now have   of them, one for every constraint.

As before, we introduce an auxiliary function

 

and solve

 

which amounts to solving   equations in   unknowns.

The constraint qualification assumption when there are multiple constraints is that the constraint gradients at the relevant point are linearly independent.

Modern formulation via differentiable manifolds Edit

The problem of finding the local maxima and minima subject to constraints can be generalized to finding local maxima and minima on a differentiable manifold  [14] In what follows, it is not necessary that   be a Euclidean space, or even a Riemannian manifold. All appearances of the gradient   (which depends on a choice of Riemannian metric) can be replaced with the exterior derivative  

Single constraint Edit

Let   be a smooth manifold of dimension   Suppose that we wish to find the stationary points   of a smooth function   when restricted to the submanifold   defined by   where   is a smooth function for which 0 is a regular value.

Let   and   be the exterior derivatives of   and  . Stationarity for the restriction   at   means   Equivalently, the kernel   contains   In other words,   and   are proportional 1-forms. For this it is necessary and sufficient that the following system of   equations holds:

 

where   denotes the exterior product. The stationary points   are the solutions of the above system of equations plus the constraint   Note that the   equations are not independent, since the left-hand side of the equation belongs to the subvariety of   consisting of decomposable elements.

In this formulation, it is not necessary to explicitly find the Lagrange multiplier, a number   such that  

Multiple constraints Edit

Let   and   be as in the above section regarding the case of a single constraint. Rather than the function   described there, now consider a smooth function   with component functions   for which   is a regular value. Let   be the submanifold of   defined by  

  is a stationary point of   if and only if   contains   For convenience let   and   where   denotes the tangent map or Jacobian   The subspace   has dimension smaller than that of  , namely   and     belongs to   if and only if   belongs to the image of   Computationally speaking, the condition is that   belongs to the row space of the matrix of   or equivalently the column space of the matrix of   (the transpose). If   denotes the exterior product of the columns of the matrix of   the stationary condition for   at   becomes

 

Once again, in this formulation it is not necessary to explicitly find the Lagrange multipliers, the numbers   such that

 

Interpretation of the Lagrange multipliers Edit

In this section, we modify the constraint equations from the form   to the form   where the   are m real constants that are considered to be additional arguments of the Lagrangian expression  .

Often the Lagrange multipliers have an interpretation as some quantity of interest. For example, by parametrising the constraint's contour line, that is, if the Lagrangian expression is

 

then

 

So, λk is the rate of change of the quantity being optimized as a function of the constraint parameter. As examples, in Lagrangian mechanics the equations of motion are derived by finding stationary points of the action, the time integral of the difference between kinetic and potential energy. Thus, the force on a particle due to a scalar potential, F = −∇V, can be interpreted as a Lagrange multiplier determining the change in action (transfer of potential to kinetic energy) following a variation in the particle's constrained trajectory. In control theory this is formulated instead as costate equations.

Moreover, by the envelope theorem the optimal value of a Lagrange multiplier has an interpretation as the marginal effect of the corresponding constraint constant upon the optimal attainable value of the original objective function: If we denote values at the optimum with a star ( ), then it can be shown that

 

For example, in economics the optimal profit to a player is calculated subject to a constrained space of actions, where a Lagrange multiplier is the change in the optimal value of the objective function (profit) due to the relaxation of a given constraint (e.g. through a change in income); in such a context   is the marginal cost of the constraint, and is referred to as the shadow price.[15]

Sufficient conditions Edit

Sufficient conditions for a constrained local maximum or minimum can be stated in terms of a sequence of principal minors (determinants of upper-left-justified sub-matrices) of the bordered Hessian matrix of second derivatives of the Lagrangian expression.[6][16]

Examples Edit

Example 1 Edit

 
Illustration of the constrained optimization problem 1

Suppose we wish to maximize   subject to the constraint   The feasible set is the unit circle, and the level sets of f are diagonal lines (with slope −1), so we can see graphically that the maximum occurs at   and that the minimum occurs at  

For the method of Lagrange multipliers, the constraint is

 

hence the Lagrangian function,

 

is a function that is equivalent to   when   is set to 0.

Now we can calculate the gradient:

 

and therefore:

 

Notice that the last equation is the original constraint.

The first two equations yield

 

By substituting into the last equation we have:

 

so

 

which implies that the stationary points of   are

 

Evaluating the objective function f at these points yields

 

Thus the constrained maximum is   and the constrained minimum is  .

Example 2 Edit

 
Illustration of the constrained optimization problem 2

Now we modify the objective function of Example 1 so that we minimize   instead of   again along the circle   Now the level sets of   are still lines of slope −1, and the points on the circle tangent to these level sets are again   and   These tangency points are maxima of  

On the other hand, the minima occur on the level set for   (since by its construction   cannot take negative values), at   and   where the level curves of   are not tangent to the constraint. The condition that   correctly identifies all four points as extrema; the minima are characterized in by   and the maxima by  

Example 3 Edit

 
Illustration of constrained optimization problem 3.

This example deals with more strenuous calculations, but it is still a single constraint problem.

Suppose one wants to find the maximum values of

 

with the condition that the  - and  -coordinates lie on the circle around the origin with radius   That is, subject to the constraint

 

As there is just a single constraint, there is a single multiplier, say  

The constraint   is identically zero on the circle of radius   Any multiple of   may be added to   leaving   unchanged in the region of interest (on the circle where our original constraint is satisfied).

Applying the ordinary Lagrange multiplier method yields

 

from which the gradient can be calculated:

 

And therefore:

 

(iii) is just the original constraint. (i) implies   or   If   then   by (iii) and consequently   from (ii). If   substituting this into (ii) yields   Substituting this into (iii) and solving for   gives   Thus there are six critical points of  

 

Evaluating the objective at these points, one finds that

 

Therefore, the objective function attains the global maximum (subject to the constraints) at   and the global minimum at   The point   is a local minimum of   and   is a local maximum of   as may be determined by consideration of the Hessian matrix of  

Note that while   is a critical point of   it is not a local extremum of   We have

 

Given any neighbourhood of   one can choose a small positive   and a small   of either sign to get   values both greater and less than   This can also be seen from the Hessian matrix of   evaluated at this point (or indeed at any of the critical points) which is an indefinite matrix. Each of the critical points of   is a saddle point of  [4]

Example 4 Edit

Entropy

Suppose we wish to find the discrete probability distribution on the points   with maximal information entropy. This is the same as saying that we wish to find the least structured probability distribution on the points   In other words, we wish to maximize the Shannon entropy equation:

 

For this to be a probability distribution the sum of the probabilities   at each point   must equal 1, so our constraint is:

 

We use Lagrange multipliers to find the point of maximum entropy,   across all discrete probability distributions   on   We require that:

 

which gives a system of n equations,   such that:

 

Carrying out the differentiation of these n equations, we get

 

This shows that all   are equal (because they depend on λ only). By using the constraint

 

we find

 

Hence, the uniform distribution is the distribution with the greatest entropy, among distributions on n points.

Example 5 Edit

Numerical optimization
 
Lagrange multipliers cause the critical points to occur at saddle points (Example 5).
 
The magnitude of the gradient can be used to force the critical points to occur at local minima (Example 5).

The critical points of Lagrangians occur at saddle points, rather than at local maxima (or minima).[4][17] Unfortunately, many numerical optimization techniques, such as hill climbing, gradient descent, some of the quasi-Newton methods, among others, are designed to find local maxima (or minima) and not saddle points. For this reason, one must either modify the formulation to ensure that it's a minimization problem (for example, by extremizing the square of the gradient of the Lagrangian as below), or else use an optimization technique that finds stationary points (such as Newton's method without an extremum seeking line search) and not necessarily extrema.

As a simple example, consider the problem of finding the value of x that minimizes   constrained such that   (This problem is somewhat untypical because there are only two values that satisfy this constraint, but it is useful for illustration purposes because the corresponding unconstrained function can be visualized in three dimensions.)

Using Lagrange multipliers, this problem can be converted into an unconstrained optimization problem:

 

The two critical points occur at saddle points where x = 1 and x = −1.

In order to solve this problem with a numerical optimization technique, we must first transform this problem such that the critical points occur at local minima. This is done by computing the magnitude of the gradient of the unconstrained optimization problem.

First, we compute the partial derivative of the unconstrained problem with respect to each variable:

 

If the target function is not easily differentiable, the differential with respect to each variable can be approximated as

 

where   is a small value.

Next, we compute the magnitude of the gradient, which is the square root of the sum of the squares of the partial derivatives:

 

(Since magnitude is always non-negative, optimizing over the squared-magnitude is equivalent to optimizing over the magnitude. Thus, the "square root" may be omitted from these equations with no expected difference in the results of optimization.)

The critical points of h occur at x = 1 and x = −1, just as in   Unlike the critical points in   however, the critical points in h occur at local minima, so numerical optimization techniques can be used to find them.

Applications Edit

Control theory Edit

In optimal control theory, the Lagrange multipliers are interpreted as costate variables, and Lagrange multipliers are reformulated as the minimization of the Hamiltonian, in Pontryagin's minimum principle.

Nonlinear programming Edit

The Lagrange multiplier method has several generalizations. In nonlinear programming there are several multiplier rules, e.g. the Carathéodory–John Multiplier Rule and the Convex Multiplier Rule, for inequality constraints.[18]

Power systems Edit

Methods based on Lagrange multipliers have applications in power systems, e.g. in distributed-energy-resources (DER) placement and load shedding.[19]

See also Edit

References Edit

  1. ^ Hoffmann, Laurence D.; Bradley, Gerald L. (2004). Calculus for Business, Economics, and the Social and Life Sciences (8th ed.). pp. 575–588. ISBN 0-07-242432-X.
  2. ^ Beavis, Brian; Dobbs, Ian M. (1990). "Static Optimization". Optimization and Stability Theory for Economic Analysis. New York: Cambridge University Press. p. 40. ISBN 0-521-33605-8.
  3. ^ Protter, Murray H.; Morrey, Charles B., Jr. (1985). Intermediate Calculus (2nd ed.). New York, NY: Springer. p. 267. ISBN 0-387-96058-9.{{cite book}}: CS1 maint: multiple names: authors list (link)
  4. ^ a b c Walsh, G.R. (1975). "Saddle-point Property of Lagrangian Function". Methods of Optimization. New York, NY: John Wiley & Sons. pp. 39–44. ISBN 0-471-91922-5.
  5. ^ Kalman, Dan (2009). "Leveling with Lagrange: An alternate view of constrained optimization". Mathematics Magazine. 82 (3): 186–196. doi:10.1080/0025570X.2009.11953617. JSTOR 27765899. S2CID 121070192.
  6. ^ a b Silberberg, Eugene; Suen, Wing (2001). The Structure of Economics : A Mathematical Analysis (Third ed.). Boston: Irwin McGraw-Hill. pp. 134–141. ISBN 0-07-234352-4.
  7. ^ de la Fuente, Angel (2000). Mathematical Methods and Models for Economists. Cambridge: Cambridge University Press. p. 285. doi:10.1017/CBO9780511810756. ISBN 9780521585125.
  8. ^ Luenberger, David G. (1969). Optimization by Vector Space Methods. New York: John Wiley & Sons. pp. 188–189.
  9. ^ Bertsekas, Dimitri P. (1999). Nonlinear Programming (Second ed.). Cambridge, MA: Athena Scientific. ISBN 1-886529-00-0.
  10. ^ Vapnyarskii, I.B. (2001) [1994], "Lagrange multipliers", Encyclopedia of Mathematics, EMS Press.
  11. ^ Lasdon, Leon S. (2002) [1970]. Optimization Theory for Large Systems (reprint ed.). Mineola, New York, NY: Dover. ISBN 0-486-41999-1. MR 1888251.
  12. ^ Hiriart-Urruty, Jean-Baptiste; Lemaréchal, Claude (1993). "Chapter XII: Abstract duality for practitioners". Convex analysis and minimization algorithms. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Vol. 306. Berlin, DE: Springer-Verlag. pp. 136–193 (and Bibliographical comments pp. 334–335). ISBN 3-540-56852-2. MR 1295240. Volume II: Advanced theory and bundle methods.
  13. ^ Lemaréchal, Claude (15–19 May 2000). "Lagrangian relaxation". In Jünger, Michael; Naddef, Denis (eds.). Computational combinatorial optimization: Papers from the Spring School held in Schloß Dagstuhl. Spring School held in Schloß Dagstuhl, May 15–19, 2000. Lecture Notes in Computer Science. Vol. 2241. Berlin, DE: Springer-Verlag (published 2001). pp. 112–156. doi:10.1007/3-540-45586-8_4. ISBN 3-540-42877-1. MR 1900016. S2CID 9048698.
  14. ^ Lafontaine, Jacques (2015). An Introduction to Differential Manifolds. Springer. p. 70. ISBN 9783319207353.
  15. ^ Dixit, Avinash K. (1990). "Shadow Prices". Optimization in Economic Theory (2nd ed.). New York: Oxford University Press. pp. 40–54. ISBN 0-19-877210-6.
  16. ^ Chiang, Alpha C. (1984). Fundamental Methods of Mathematical Economics (Third ed.). McGraw-Hill. p. 386. ISBN 0-07-010813-7.
  17. ^ Heath, Michael T. (2005). Scientific Computing: An introductory survey. McGraw-Hill. p. 203. ISBN 978-0-07-124489-3.
  18. ^ Pourciau, Bruce H. (1980). "Modern multiplier rules". American Mathematical Monthly. 87 (6): 433–452. doi:10.2307/2320250. JSTOR 2320250.
  19. ^ Gautam, Mukesh; Bhusal, Narayan; Benidris, Mohammed (2020). A sensitivity-based approach to adaptive under-frequency load shedding. 2020 IEEE Texas Power and Energy Conference (TPEC). Institute of Electronic and Electrical Engineers. pp. 1–5. doi:10.1109/TPEC48276.2020.9042569.

Further reading Edit

  • Beavis, Brian; Dobbs, Ian M. (1990). "Static Optimization". Optimization and Stability Theory for Economic Analysis. New York, NY: Cambridge University Press. pp. 32–72. ISBN 0-521-33605-8.
  • Bertsekas, Dimitri P. (1982). Constrained optimization and Lagrange multiplier methods. New York, NY: Academic Press. ISBN 0-12-093480-9.
  • Beveridge, Gordon S.G.; Schechter, Robert S. (1970). "Lagrangian multipliers". Optimization: Theory and Practice. New York, NY: McGraw-Hill. pp. 244–259. ISBN 0-07-005128-3.
  • Binger, Brian R.; Hoffman, Elizabeth (1998). "Constrained optimization". Microeconomics with Calculus (2nd ed.). Reading: Addison-Wesley. pp. 56–91. ISBN 0-321-01225-9.
  • Carter, Michael (2001). "Equality constraints". Foundations of Mathematical Economics. Cambridge, MA: MIT Press. pp. 516–549. ISBN 0-262-53192-5.
  • Hestenes, Magnus R. (1966). "Minima of functions subject to equality constraints". Calculus of Variations and Optimal Control Theory. New York, NY: Wiley. pp. 29–34.
  • Wylie, C. Ray; Barrett, Louis C. (1995). "The extrema of integrals under constraint". Advanced Engineering Mathematics (Sixth ed.). New York, NY: McGraw-Hill. pp. 1096–1103. ISBN 0-07-072206-4.

External links Edit

Exposition
  • Steuard. "Conceptual introduction". slimy.com. — plus a brief discussion of Lagrange multipliers in the calculus of variations as used in physics.
  • Carpenter, Kenneth H. "Lagrange multipliers for quadratic forms with linear constraints" (PDF). Kansas State University.
Additional text and interactive applets
  • Resnik. "Simple explanation with an example of governments using taxes as Lagrange multipliers". umiacs.umd.edu. University of Maryland.
  • Klein, Dan. "Lagrange multipliers without permanent scarring] Explanation with focus on the intuition" (PDF). nlp.cs.berkeley.edu. University of California, Berkeley.
  • Sathyanarayana, Shashi. "Geometric representation of method of Lagrange multipliers". wolfram.com (Mathematica demonstration). Wolfram Research. Needs Internet Explorer / Firefox / Safari. — Provides compelling insight in 2 dimensions that at a minimizing point, the direction of steepest descent must be perpendicular to the tangent of the constraint curve at that point.
  • "Lagrange multipliers – two variables". MIT Open Courseware (ocw.mit.edu) (Applet). Massachusetts Institute of Technology.
  • "Lagrange multipliers". MIT Open Courseware (ocw.mit.edu) (video lecture). Mathematics 18-02: Multivariable calculus. Massachusetts Institute of Technology. Fall 2007.
  • Bertsekas. "Details on Lagrange multipliers" (PDF). athenasc.com (slides / course lecture). Non-Linear Programming. — Course slides accompanying text on nonlinear optimization
  • Wyatt, John (7 April 2004) [19 November 2002]. "Legrange multipliers, constrained optimization, and the maximum entropy principle" (PDF). www-mtl.mit.edu. Elec E & C S / Mech E 6.050 – Information, entropy, and computation. — Geometric idea behind Lagrange multipliers
  • "Using Lagrange multipliers in optimization". matlab.cheme.cmu.edu (MATLAB example). Pittsburgh, PA: Carnegie Mellon University. 24 December 2011.
lagrange, multiplier, mathematical, optimization, method, strategy, finding, local, maxima, minima, function, subject, equation, constraints, subject, condition, that, more, equations, have, satisfied, exactly, chosen, values, variables, named, after, mathemat. In mathematical optimization the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equation constraints i e subject to the condition that one or more equations have to be satisfied exactly by the chosen values of the variables 1 It is named after the mathematician Joseph Louis Lagrange Contents 1 Summary and rationale 2 Statement 3 Single constraint 4 Multiple constraints 5 Modern formulation via differentiable manifolds 5 1 Single constraint 5 2 Multiple constraints 6 Interpretation of the Lagrange multipliers 7 Sufficient conditions 8 Examples 8 1 Example 1 8 2 Example 2 8 3 Example 3 8 4 Example 4 8 5 Example 5 9 Applications 9 1 Control theory 9 2 Nonlinear programming 9 3 Power systems 10 See also 11 References 12 Further reading 13 External linksSummary and rationale EditThe basic idea is to convert a constrained problem into a form such that the derivative test of an unconstrained problem can still be applied The relationship between the gradient of the function and gradients of the constraints rather naturally leads to a reformulation of the original problem known as the Lagrangian function 2 In the general case the Lagrangian is defined as L x l f x l g x displaystyle mathcal L x lambda equiv f x langle lambda g x rangle nbsp for functions f x g x displaystyle f x g x nbsp l displaystyle lambda nbsp is called the Lagrange multiplier In the simple case this simplifies to L x l f x l g x displaystyle mathcal L x lambda equiv f x lambda cdot g x nbsp The method can be summarized as follows in order to find the maximum or minimum of a function f x displaystyle f x nbsp subjected to the equality constraint g x 0 displaystyle g x 0 nbsp find the stationary points of L displaystyle mathcal L nbsp considered as a function of x displaystyle x nbsp and the Lagrange multiplier l displaystyle lambda nbsp This means that all partial derivatives should be zero including the partial derivative with respect to l displaystyle lambda nbsp 3 L x 0 displaystyle frac partial mathcal L partial x 0 qquad nbsp and L l 0 displaystyle qquad frac partial mathcal L partial lambda 0 nbsp or equivalently f x x l g x x 0 displaystyle frac partial f x partial x lambda cdot frac partial g x partial x 0 qquad nbsp and g x 0 displaystyle qquad g x 0 nbsp The solution corresponding to the original constrained optimization is always a saddle point of the Lagrangian function 4 5 which can be identified among the stationary points from the definiteness of the bordered Hessian matrix 6 The great advantage of this method is that it allows the optimization to be solved without explicit parameterization in terms of the constraints As a result the method of Lagrange multipliers is widely used to solve challenging constrained optimization problems Further the method of Lagrange multipliers is generalized by the Karush Kuhn Tucker conditions which can also take into account inequality constraints of the form h x c displaystyle h mathbf x leq c nbsp for a given constant c displaystyle c nbsp Statement EditThe following is known as the Lagrange multiplier theorem 7 Let f R n R displaystyle f colon mathbb R n rightarrow mathbb R nbsp be the objective function g R n R c displaystyle g colon mathbb R n rightarrow mathbb R c nbsp be the constraints function both belonging to C 1 displaystyle C 1 nbsp that is having continuous first derivatives Let x displaystyle x star nbsp be an optimal solution to the following optimization problem such that for the matrix of partial derivatives D g x j k g j x k displaystyle Bigl operatorname D g x star Bigr j k left frac partial g j partial x k right nbsp rank D g x c lt n displaystyle operatorname rank operatorname D g x star c lt n nbsp maximize f x displaystyle text maximize f x nbsp subject to g x 0 displaystyle text subject to g x 0 nbsp Then there exists a unique Lagrange multiplier l R c displaystyle lambda star in mathbb R c nbsp such that D f x l T D g x displaystyle operatorname D f x star lambda star mathsf T operatorname D g x star nbsp The Lagrange multiplier theorem states that at any local maximum or minimum of the function evaluated under the equality constraints if constraint qualification applies explained below then the gradient of the function at that point can be expressed as a linear combination of the gradients of the constraints at that point with the Lagrange multipliers acting as coefficients 8 This is equivalent to saying that any direction perpendicular to all gradients of the constraints is also perpendicular to the gradient of the function Or still saying that the directional derivative of the function is 0 in every feasible direction Single constraint Edit nbsp Figure 1 The red curve shows the constraint g x y c The blue curves are contours of f x y The point where the red constraint tangentially touches a blue contour is the maximum of f x y along the constraint since d1 gt d2 For the case of only one constraint and only two choice variables as exemplified in Figure 1 consider the optimization problem maximize f x y displaystyle text maximize f x y nbsp subject to g x y 0 displaystyle text subject to g x y 0 nbsp Sometimes an additive constant is shown separately rather than being included in g displaystyle g nbsp in which case the constraint is written g x y c displaystyle g x y c nbsp as in Figure 1 We assume that both f displaystyle f nbsp and g displaystyle g nbsp have continuous first partial derivatives We introduce a new variable l displaystyle lambda nbsp called a Lagrange multiplier or Lagrange undetermined multiplier and study the Lagrange function or Lagrangian or Lagrangian expression defined by L x y l f x y l g x y displaystyle mathcal L x y lambda f x y lambda cdot g x y nbsp where the l displaystyle lambda nbsp term may be either added or subtracted If f x 0 y 0 displaystyle f x 0 y 0 nbsp is a maximum of f x y displaystyle f x y nbsp for the original constrained problem and g x 0 y 0 0 displaystyle nabla g x 0 y 0 neq 0 nbsp then there exists l 0 displaystyle lambda 0 nbsp such that x 0 y 0 l 0 displaystyle x 0 y 0 lambda 0 nbsp is a stationary point for the Lagrange function stationary points are those points where the first partial derivatives of L displaystyle mathcal L nbsp are zero The assumption g 0 displaystyle nabla g neq 0 nbsp is called constraint qualification However not all stationary points yield a solution of the original problem as the method of Lagrange multipliers yields only a necessary condition for optimality in constrained problems 9 10 11 12 13 Sufficient conditions for a minimum or maximum also exist but if a particular candidate solution satisfies the sufficient conditions it is only guaranteed that that solution is the best one locally that is it is better than any permissible nearby points The global optimum can be found by comparing the values of the original objective function at the points satisfying the necessary and locally sufficient conditions The method of Lagrange multipliers relies on the intuition that at a maximum f x y cannot be increasing in the direction of any such neighboring point that also has g 0 If it were we could walk along g 0 to get higher meaning that the starting point wasn t actually the maximum Viewed in this way it is an exact analogue to testing if the derivative of an unconstrained function is 0 that is we are verifying that the directional derivative is 0 in any relevant viable direction We can visualize contours of f given by f x y d for various values of d and the contour of g given by g x y c Suppose we walk along the contour line with g c We are interested in finding points where f almost does not change as we walk since these points might be maxima There are two ways this could happen We could touch a contour line of f since by definition f does not change as we walk along its contour lines This would mean that the tangents to the contour lines of f and g are parallel here We have reached a level part of f meaning that f does not change in any direction To check the first possibility we touch a contour line of f notice that since the gradient of a function is perpendicular to the contour lines the tangents to the contour lines of f and g are parallel if and only if the gradients of f and g are parallel Thus we want points x y where g x y c and x y f l x y g displaystyle nabla x y f lambda nabla x y g nbsp for some l displaystyle lambda nbsp where x y f f x f y x y g g x g y displaystyle nabla x y f left frac partial f partial x frac partial f partial y right qquad nabla x y g left frac partial g partial x frac partial g partial y right nbsp are the respective gradients The constant l displaystyle lambda nbsp is required because although the two gradient vectors are parallel the magnitudes of the gradient vectors are generally not equal This constant is called the Lagrange multiplier In some conventions l displaystyle lambda nbsp is preceded by a minus sign Notice that this method also solves the second possibility that f is level if f is level then its gradient is zero and setting l 0 displaystyle lambda 0 nbsp is a solution regardless of x y g displaystyle nabla x y g nbsp To incorporate these conditions into one equation we introduce an auxiliary function L x y l f x y l g x y displaystyle mathcal L x y lambda equiv f x y lambda cdot g x y nbsp and solve x y l L x y l 0 displaystyle nabla x y lambda mathcal L x y lambda 0 nbsp Note that this amounts to solving three equations in three unknowns This is the method of Lagrange multipliers Note that l L x y l 0 displaystyle nabla lambda mathcal L x y lambda 0 nbsp implies g x y 0 displaystyle g x y 0 nbsp as the partial derivative of L displaystyle mathcal L nbsp with respect to l displaystyle lambda nbsp is g x y displaystyle g x y nbsp To summarize x y l L x y l 0 x y f x y l x y g x y g x y 0 displaystyle nabla x y lambda mathcal L x y lambda 0 iff begin cases nabla x y f x y lambda nabla x y g x y g x y 0 end cases nbsp The method generalizes readily to functions on n displaystyle n nbsp variables x 1 x n l L x 1 x n l 0 displaystyle nabla x 1 dots x n lambda mathcal L x 1 dots x n lambda 0 nbsp which amounts to solving n 1 equations in n 1 unknowns The constrained extrema of f are critical points of the Lagrangian L displaystyle mathcal L nbsp but they are not necessarily local extrema of L displaystyle mathcal L nbsp see Example 2 below One may reformulate the Lagrangian as a Hamiltonian in which case the solutions are local minima for the Hamiltonian This is done in optimal control theory in the form of Pontryagin s minimum principle The fact that solutions of the Lagrangian are not necessarily extrema also poses difficulties for numerical optimization This can be addressed by computing the magnitude of the gradient as the zeros of the magnitude are necessarily local minima as illustrated in Example 5 Numerical optimization Multiple constraints Edit nbsp Figure 2 A paraboloid constrained along two intersecting lines nbsp Figure 3 Contour map of Figure 2 The method of Lagrange multipliers can be extended to solve problems with multiple constraints using a similar argument Consider a paraboloid subject to two line constraints that intersect at a single point As the only feasible solution this point is obviously a constrained extremum However the level set of f displaystyle f nbsp is clearly not parallel to either constraint at the intersection point see Figure 3 instead it is a linear combination of the two constraints gradients In the case of multiple constraints that will be what we seek in general The method of Lagrange seeks points not at which the gradient of f displaystyle f nbsp is multiple of any single constraint s gradient necessarily but in which it is a linear combination of all the constraints gradients Concretely suppose we have M displaystyle M nbsp constraints and are walking along the set of points satisfying g i x 0 i 1 M displaystyle g i mathbf x 0 i 1 dots M nbsp Every point x displaystyle mathbf x nbsp on the contour of a given constraint function g i displaystyle g i nbsp has a space of allowable directions the space of vectors perpendicular to g i x displaystyle nabla g i mathbf x nbsp The set of directions that are allowed by all constraints is thus the space of directions perpendicular to all of the constraints gradients Denote this space of allowable moves by A displaystyle A nbsp and denote the span of the constraints gradients by S displaystyle S nbsp Then A S displaystyle A S perp nbsp the space of vectors perpendicular to every element of S displaystyle S nbsp We are still interested in finding points where f displaystyle f nbsp does not change as we walk since these points might be constrained extrema We therefore seek x displaystyle mathbf x nbsp such that any allowable direction of movement away from x displaystyle mathbf x nbsp is perpendicular to f x displaystyle nabla f mathbf x nbsp otherwise we could increase f displaystyle f nbsp by moving along that allowable direction In other words f x A S displaystyle nabla f mathbf x in A perp S nbsp Thus there are scalars l 1 l 2 l M displaystyle lambda 1 lambda 2 lambda M nbsp such that f x k 1 M l k g k x f x k 1 M l k g k x 0 displaystyle nabla f mathbf x sum k 1 M lambda k nabla g k mathbf x quad iff quad nabla f mathbf x sum k 1 M lambda k nabla g k mathbf x 0 nbsp These scalars are the Lagrange multipliers We now have M displaystyle M nbsp of them one for every constraint As before we introduce an auxiliary function L x 1 x n l 1 l M f x 1 x n k 1 M l k g k x 1 x n displaystyle mathcal L left x 1 ldots x n lambda 1 ldots lambda M right f left x 1 ldots x n right sum limits k 1 M lambda k g k left x 1 ldots x n right nbsp and solve x 1 x n l 1 l M L x 1 x n l 1 l M 0 f x k 1 M l k g k x 0 g 1 x g M x 0 displaystyle nabla x 1 ldots x n lambda 1 ldots lambda M mathcal L x 1 ldots x n lambda 1 ldots lambda M 0 iff begin cases nabla f mathbf x sum k 1 M lambda k nabla g k mathbf x 0 g 1 mathbf x cdots g M mathbf x 0 end cases nbsp which amounts to solving n M displaystyle n M nbsp equations in n M displaystyle n M nbsp unknowns The constraint qualification assumption when there are multiple constraints is that the constraint gradients at the relevant point are linearly independent Modern formulation via differentiable manifolds EditThe problem of finding the local maxima and minima subject to constraints can be generalized to finding local maxima and minima on a differentiable manifold M displaystyle M nbsp 14 In what follows it is not necessary that M displaystyle M nbsp be a Euclidean space or even a Riemannian manifold All appearances of the gradient displaystyle nabla nbsp which depends on a choice of Riemannian metric can be replaced with the exterior derivative d displaystyle operatorname d nbsp Single constraint Edit Let M displaystyle M nbsp be a smooth manifold of dimension m displaystyle m nbsp Suppose that we wish to find the stationary points x displaystyle x nbsp of a smooth function f M R displaystyle f M to mathbb R nbsp when restricted to the submanifold N displaystyle N nbsp defined by g x 0 displaystyle g x 0 nbsp where g M R displaystyle g M to mathbb R nbsp is a smooth function for which 0 is a regular value Let d f displaystyle operatorname d f nbsp and d g displaystyle operatorname d g nbsp be the exterior derivatives of f displaystyle f nbsp and g displaystyle g nbsp Stationarity for the restriction f N displaystyle f N nbsp at x N displaystyle x in N nbsp means d f N x 0 displaystyle operatorname d f N x 0 nbsp Equivalently the kernel ker d f x displaystyle ker operatorname d f x nbsp contains T x N ker d g x displaystyle T x N ker operatorname d g x nbsp In other words d f x displaystyle operatorname d f x nbsp and d g x displaystyle operatorname d g x nbsp are proportional 1 forms For this it is necessary and sufficient that the following system of 1 2 m m 1 displaystyle tfrac 1 2 m m 1 nbsp equations holds d f x d g x 0 L 2 T x M displaystyle operatorname d f x wedge operatorname d g x 0 in Lambda 2 T star x M nbsp where displaystyle wedge nbsp denotes the exterior product The stationary points x displaystyle x nbsp are the solutions of the above system of equations plus the constraint g x 0 displaystyle g x 0 nbsp Note that the 1 2 m m 1 displaystyle tfrac 1 2 m m 1 nbsp equations are not independent since the left hand side of the equation belongs to the subvariety of L 2 T x M displaystyle Lambda 2 T star x M nbsp consisting of decomposable elements In this formulation it is not necessary to explicitly find the Lagrange multiplier a number l displaystyle lambda nbsp such that d f x l d g x displaystyle operatorname d f x lambda cdot operatorname d g x nbsp Multiple constraints Edit Let M displaystyle M nbsp and f displaystyle f nbsp be as in the above section regarding the case of a single constraint Rather than the function g displaystyle g nbsp described there now consider a smooth function G M R p p gt 1 displaystyle G M to mathbb R p p gt 1 nbsp with component functions g i M R displaystyle g i M to mathbb R nbsp for which 0 R p displaystyle 0 in mathbb R p nbsp is a regular value Let N displaystyle N nbsp be the submanifold of M displaystyle M nbsp defined by G x 0 displaystyle G x 0 nbsp x displaystyle x nbsp is a stationary point of f N displaystyle f N nbsp if and only if ker d f x displaystyle ker operatorname d f x nbsp contains ker d G x displaystyle ker operatorname d G x nbsp For convenience let L x d f x displaystyle L x operatorname d f x nbsp and K x d G x displaystyle K x operatorname d G x nbsp where d G displaystyle operatorname d G nbsp denotes the tangent map or Jacobian T M T R p displaystyle TM to T mathbb R p nbsp The subspace ker K x displaystyle ker K x nbsp has dimension smaller than that of ker L x displaystyle ker L x nbsp namely dim ker L x n 1 displaystyle dim ker L x n 1 nbsp and dim ker K x n p displaystyle dim ker K x n p nbsp ker K x displaystyle ker K x nbsp belongs to ker L x displaystyle ker L x nbsp if and only if L x T x M displaystyle L x in T star x M nbsp belongs to the image of K x R p T x M displaystyle K star x mathbb R star p to T star x M nbsp Computationally speaking the condition is that L x displaystyle L x nbsp belongs to the row space of the matrix of K x displaystyle K x nbsp or equivalently the column space of the matrix of K x displaystyle K star x nbsp the transpose If w x L p T x M displaystyle omega x in Lambda p T star x M nbsp denotes the exterior product of the columns of the matrix of K x displaystyle K star x nbsp the stationary condition for f N displaystyle f N nbsp at x displaystyle x nbsp becomes L x w x 0 L p 1 T x M displaystyle L x wedge omega x 0 in Lambda p 1 left T star x M right nbsp Once again in this formulation it is not necessary to explicitly find the Lagrange multipliers the numbers l 1 l p displaystyle lambda 1 ldots lambda p nbsp such that d f x i 1 p l i d g i x displaystyle operatorname d f x sum i 1 p lambda i operatorname d g i x nbsp Interpretation of the Lagrange multipliers EditIn this section we modify the constraint equations from the form g i x 0 displaystyle g i bf x 0 nbsp to the form g i x c i displaystyle g i bf x c i nbsp where the c i displaystyle c i nbsp are m real constants that are considered to be additional arguments of the Lagrangian expression L displaystyle mathcal L nbsp Often the Lagrange multipliers have an interpretation as some quantity of interest For example by parametrising the constraint s contour line that is if the Lagrangian expression is L x 1 x 2 l 1 l 2 c 1 c 2 f x 1 x 2 l 1 c 1 g 1 x 1 x 2 l 2 c 2 g 2 x 1 x 2 displaystyle begin aligned amp mathcal L x 1 x 2 ldots lambda 1 lambda 2 ldots c 1 c 2 ldots 4pt amp f x 1 x 2 ldots lambda 1 c 1 g 1 x 1 x 2 ldots lambda 2 c 2 g 2 x 1 x 2 dots cdots end aligned nbsp then L c k l k displaystyle frac partial mathcal L partial c k lambda k nbsp So lk is the rate of change of the quantity being optimized as a function of the constraint parameter As examples in Lagrangian mechanics the equations of motion are derived by finding stationary points of the action the time integral of the difference between kinetic and potential energy Thus the force on a particle due to a scalar potential F V can be interpreted as a Lagrange multiplier determining the change in action transfer of potential to kinetic energy following a variation in the particle s constrained trajectory In control theory this is formulated instead as costate equations Moreover by the envelope theorem the optimal value of a Lagrange multiplier has an interpretation as the marginal effect of the corresponding constraint constant upon the optimal attainable value of the original objective function If we denote values at the optimum with a star displaystyle star nbsp then it can be shown that d f x 1 c 1 c 2 x 2 c 1 c 2 d c k l k displaystyle frac operatorname d f left x 1 star c 1 c 2 dots x 2 star c 1 c 2 dots dots right operatorname d c k lambda star k nbsp For example in economics the optimal profit to a player is calculated subject to a constrained space of actions where a Lagrange multiplier is the change in the optimal value of the objective function profit due to the relaxation of a given constraint e g through a change in income in such a context l k displaystyle lambda star k nbsp is the marginal cost of the constraint and is referred to as the shadow price 15 Sufficient conditions EditMain article Bordered Hessian Sufficient conditions for a constrained local maximum or minimum can be stated in terms of a sequence of principal minors determinants of upper left justified sub matrices of the bordered Hessian matrix of second derivatives of the Lagrangian expression 6 16 Examples EditExample 1 Edit nbsp Illustration of the constrained optimization problem 1Suppose we wish to maximize f x y x y displaystyle f x y x y nbsp subject to the constraint x 2 y 2 1 displaystyle x 2 y 2 1 nbsp The feasible set is the unit circle and the level sets of f are diagonal lines with slope 1 so we can see graphically that the maximum occurs at 1 2 1 2 displaystyle left tfrac 1 sqrt 2 tfrac 1 sqrt 2 right nbsp and that the minimum occurs at 1 2 1 2 displaystyle left tfrac 1 sqrt 2 tfrac 1 sqrt 2 right nbsp For the method of Lagrange multipliers the constraint is g x y x 2 y 2 1 0 displaystyle g x y x 2 y 2 1 0 nbsp hence the Lagrangian function L x y l f x y l g x y x y l x 2 y 2 1 displaystyle begin aligned mathcal L x y lambda amp f x y lambda cdot g x y 4pt amp x y lambda x 2 y 2 1 end aligned nbsp is a function that is equivalent to f x y displaystyle f x y nbsp when g x y displaystyle g x y nbsp is set to 0 Now we can calculate the gradient x y l L x y l L x L y L l 1 2 l x 1 2 l y x 2 y 2 1 displaystyle begin aligned nabla x y lambda mathcal L x y lambda amp left frac partial mathcal L partial x frac partial mathcal L partial y frac partial mathcal L partial lambda right 4pt amp left 1 2 lambda x 1 2 lambda y x 2 y 2 1 right color gray end aligned nbsp and therefore x y l L x y l 0 1 2 l x 0 1 2 l y 0 x 2 y 2 1 0 displaystyle nabla x y lambda mathcal L x y lambda 0 quad Leftrightarrow quad begin cases 1 2 lambda x 0 1 2 lambda y 0 x 2 y 2 1 0 end cases nbsp Notice that the last equation is the original constraint The first two equations yield x y 1 2 l l 0 displaystyle x y frac 1 2 lambda qquad lambda neq 0 nbsp By substituting into the last equation we have 1 4 l 2 1 4 l 2 1 0 displaystyle frac 1 4 lambda 2 frac 1 4 lambda 2 1 0 nbsp so l 1 2 displaystyle lambda pm frac 1 sqrt 2 nbsp which implies that the stationary points of L displaystyle mathcal L nbsp are 2 2 2 2 1 2 2 2 2 2 1 2 displaystyle left tfrac sqrt 2 2 tfrac sqrt 2 2 tfrac 1 sqrt 2 right qquad left tfrac sqrt 2 2 tfrac sqrt 2 2 tfrac 1 sqrt 2 right nbsp Evaluating the objective function f at these points yields f 2 2 2 2 2 f 2 2 2 2 2 displaystyle f left tfrac sqrt 2 2 tfrac sqrt 2 2 right sqrt 2 qquad f left tfrac sqrt 2 2 tfrac sqrt 2 2 right sqrt 2 nbsp Thus the constrained maximum is 2 displaystyle sqrt 2 nbsp and the constrained minimum is 2 displaystyle sqrt 2 nbsp Example 2 Edit nbsp Illustration of the constrained optimization problem 2Now we modify the objective function of Example 1 so that we minimize f x y x y 2 displaystyle f x y x y 2 nbsp instead of f x y x y displaystyle f x y x y nbsp again along the circle g x y x 2 y 2 1 0 displaystyle g x y x 2 y 2 1 0 nbsp Now the level sets of f displaystyle f nbsp are still lines of slope 1 and the points on the circle tangent to these level sets are again 2 2 2 2 displaystyle sqrt 2 2 sqrt 2 2 nbsp and 2 2 2 2 displaystyle sqrt 2 2 sqrt 2 2 nbsp These tangency points are maxima of f displaystyle f nbsp On the other hand the minima occur on the level set for f 0 displaystyle f 0 nbsp since by its construction f displaystyle f nbsp cannot take negative values at 2 2 2 2 displaystyle sqrt 2 2 sqrt 2 2 nbsp and 2 2 2 2 displaystyle sqrt 2 2 sqrt 2 2 nbsp where the level curves of f displaystyle f nbsp are not tangent to the constraint The condition that x y l f x y l g x y 0 displaystyle nabla x y lambda left f x y lambda cdot g x y right 0 nbsp correctly identifies all four points as extrema the minima are characterized in by l 0 displaystyle lambda 0 nbsp and the maxima by l 2 displaystyle lambda 2 nbsp Example 3 Edit nbsp Illustration of constrained optimization problem 3 This example deals with more strenuous calculations but it is still a single constraint problem Suppose one wants to find the maximum values of f x y x 2 y displaystyle f x y x 2 y nbsp with the condition that the x displaystyle x nbsp and y displaystyle y nbsp coordinates lie on the circle around the origin with radius 3 displaystyle sqrt 3 nbsp That is subject to the constraint g x y x 2 y 2 3 0 displaystyle g x y x 2 y 2 3 0 nbsp As there is just a single constraint there is a single multiplier say l displaystyle lambda nbsp The constraint g x y displaystyle g x y nbsp is identically zero on the circle of radius 3 displaystyle sqrt 3 nbsp Any multiple of g x y displaystyle g x y nbsp may be added to g x y displaystyle g x y nbsp leaving g x y displaystyle g x y nbsp unchanged in the region of interest on the circle where our original constraint is satisfied Applying the ordinary Lagrange multiplier method yields L x y l f x y l g x y x 2 y l x 2 y 2 3 displaystyle begin aligned mathcal L x y lambda amp f x y lambda cdot g x y amp x 2 y lambda x 2 y 2 3 end aligned nbsp from which the gradient can be calculated x y l L x y l L x L y L l 2 x y 2 l x x 2 2 l y x 2 y 2 3 displaystyle begin aligned nabla x y lambda mathcal L x y lambda amp left frac partial mathcal L partial x frac partial mathcal L partial y frac partial mathcal L partial lambda right amp left 2xy 2 lambda x x 2 2 lambda y x 2 y 2 3 right end aligned nbsp And therefore x y l L x y l 0 2 x y 2 l x 0 x 2 2 l y 0 x 2 y 2 3 0 x y l 0 i x 2 2 l y ii x 2 y 2 3 iii displaystyle nabla x y lambda mathcal L x y lambda 0 quad iff quad begin cases 2xy 2 lambda x 0 x 2 2 lambda y 0 x 2 y 2 3 0 end cases quad iff quad begin cases x y lambda 0 amp text i x 2 2 lambda y amp text ii x 2 y 2 3 amp text iii end cases nbsp iii is just the original constraint i implies x 0 displaystyle x 0 nbsp or l y displaystyle lambda y nbsp If x 0 displaystyle x 0 nbsp then y 3 displaystyle y pm sqrt 3 nbsp by iii and consequently l 0 displaystyle lambda 0 nbsp from ii If l y displaystyle lambda y nbsp substituting this into ii yields x 2 2 y 2 displaystyle x 2 2y 2 nbsp Substituting this into iii and solving for y displaystyle y nbsp gives y 1 displaystyle y pm 1 nbsp Thus there are six critical points of L displaystyle mathcal L nbsp 2 1 1 2 1 1 2 1 1 2 1 1 0 3 0 0 3 0 displaystyle sqrt 2 1 1 quad sqrt 2 1 1 quad sqrt 2 1 1 quad sqrt 2 1 1 quad 0 sqrt 3 0 quad 0 sqrt 3 0 nbsp Evaluating the objective at these points one finds that f 2 1 2 f 2 1 2 f 0 3 0 displaystyle f pm sqrt 2 1 2 quad f pm sqrt 2 1 2 quad f 0 pm sqrt 3 0 nbsp Therefore the objective function attains the global maximum subject to the constraints at 2 1 displaystyle pm sqrt 2 1 nbsp and the global minimum at 2 1 displaystyle pm sqrt 2 1 nbsp The point 0 3 displaystyle 0 sqrt 3 nbsp is a local minimum of f displaystyle f nbsp and 0 3 displaystyle 0 sqrt 3 nbsp is a local maximum of f displaystyle f nbsp as may be determined by consideration of the Hessian matrix of L x y 0 displaystyle mathcal L x y 0 nbsp Note that while 2 1 1 displaystyle sqrt 2 1 1 nbsp is a critical point of L displaystyle mathcal L nbsp it is not a local extremum of L displaystyle mathcal L nbsp We have L 2 e 1 1 d 2 d e 2 2 2 e displaystyle mathcal L left sqrt 2 varepsilon 1 1 delta right 2 delta left varepsilon 2 left 2 sqrt 2 right varepsilon right nbsp Given any neighbourhood of 2 1 1 displaystyle sqrt 2 1 1 nbsp one can choose a small positive e displaystyle varepsilon nbsp and a small d displaystyle delta nbsp of either sign to get L displaystyle mathcal L nbsp values both greater and less than 2 displaystyle 2 nbsp This can also be seen from the Hessian matrix of L displaystyle mathcal L nbsp evaluated at this point or indeed at any of the critical points which is an indefinite matrix Each of the critical points of L displaystyle mathcal L nbsp is a saddle point of L displaystyle mathcal L nbsp 4 Example 4 Edit EntropySuppose we wish to find the discrete probability distribution on the points p 1 p 2 p n displaystyle p 1 p 2 ldots p n nbsp with maximal information entropy This is the same as saying that we wish to find the least structured probability distribution on the points p 1 p 2 p n displaystyle p 1 p 2 cdots p n nbsp In other words we wish to maximize the Shannon entropy equation f p 1 p 2 p n j 1 n p j log 2 p j displaystyle f p 1 p 2 ldots p n sum j 1 n p j log 2 p j nbsp For this to be a probability distribution the sum of the probabilities p i displaystyle p i nbsp at each point x i displaystyle x i nbsp must equal 1 so our constraint is g p 1 p 2 p n j 1 n p j 1 displaystyle g p 1 p 2 ldots p n sum j 1 n p j 1 nbsp We use Lagrange multipliers to find the point of maximum entropy p displaystyle vec p nbsp across all discrete probability distributions p displaystyle vec p nbsp on x 1 x 2 x n displaystyle x 1 x 2 ldots x n nbsp We require that p f l g 1 p p 0 displaystyle left frac partial partial vec p f lambda g 1 right vec p vec p 0 nbsp which gives a system of n equations k 1 n displaystyle k 1 ldots n nbsp such that p k j 1 n p j log 2 p j l j 1 n p j 1 p k p k 0 displaystyle left frac partial partial p k left left sum j 1 n p j log 2 p j right lambda left sum j 1 n p j 1 right right right p k p star k 0 nbsp Carrying out the differentiation of these n equations we get 1 ln 2 log 2 p k l 0 displaystyle left frac 1 ln 2 log 2 p star k right lambda 0 nbsp This shows that all p k displaystyle p star k nbsp are equal because they depend on l only By using the constraint j p j 1 displaystyle sum j p j 1 nbsp we find p k 1 n displaystyle p star k frac 1 n nbsp Hence the uniform distribution is the distribution with the greatest entropy among distributions on n points Example 5 Edit Numerical optimization nbsp Lagrange multipliers cause the critical points to occur at saddle points Example 5 nbsp The magnitude of the gradient can be used to force the critical points to occur at local minima Example 5 The critical points of Lagrangians occur at saddle points rather than at local maxima or minima 4 17 Unfortunately many numerical optimization techniques such as hill climbing gradient descent some of the quasi Newton methods among others are designed to find local maxima or minima and not saddle points For this reason one must either modify the formulation to ensure that it s a minimization problem for example by extremizing the square of the gradient of the Lagrangian as below or else use an optimization technique that finds stationary points such as Newton s method without an extremum seeking line search and not necessarily extrema As a simple example consider the problem of finding the value of x that minimizes f x x 2 displaystyle f x x 2 nbsp constrained such that x 2 1 displaystyle x 2 1 nbsp This problem is somewhat untypical because there are only two values that satisfy this constraint but it is useful for illustration purposes because the corresponding unconstrained function can be visualized in three dimensions Using Lagrange multipliers this problem can be converted into an unconstrained optimization problem L x l x 2 l x 2 1 displaystyle mathcal L x lambda x 2 lambda x 2 1 nbsp The two critical points occur at saddle points where x 1 and x 1 In order to solve this problem with a numerical optimization technique we must first transform this problem such that the critical points occur at local minima This is done by computing the magnitude of the gradient of the unconstrained optimization problem First we compute the partial derivative of the unconstrained problem with respect to each variable L x 2 x 2 x l L l x 2 1 displaystyle begin aligned amp frac partial mathcal L partial x 2x 2x lambda 5pt amp frac partial mathcal L partial lambda x 2 1 end aligned nbsp If the target function is not easily differentiable the differential with respect to each variable can be approximated as L x L x e l L x l e L l L x l e L x l e displaystyle begin aligned frac partial mathcal L partial x approx frac mathcal L x varepsilon lambda mathcal L x lambda varepsilon 5pt frac partial mathcal L partial lambda approx frac mathcal L x lambda varepsilon mathcal L x lambda varepsilon end aligned nbsp where e displaystyle varepsilon nbsp is a small value Next we compute the magnitude of the gradient which is the square root of the sum of the squares of the partial derivatives h x l 2 x 2 x l 2 x 2 1 2 L x e l L x l e 2 L x l e L x l e 2 displaystyle begin aligned h x lambda amp sqrt 2x 2x lambda 2 x 2 1 2 4pt amp approx sqrt left frac mathcal L x varepsilon lambda mathcal L x lambda varepsilon right 2 left frac mathcal L x lambda varepsilon mathcal L x lambda varepsilon right 2 end aligned nbsp Since magnitude is always non negative optimizing over the squared magnitude is equivalent to optimizing over the magnitude Thus the square root may be omitted from these equations with no expected difference in the results of optimization The critical points of h occur at x 1 and x 1 just as in L displaystyle mathcal L nbsp Unlike the critical points in L displaystyle mathcal L nbsp however the critical points in h occur at local minima so numerical optimization techniques can be used to find them Applications EditControl theory Edit In optimal control theory the Lagrange multipliers are interpreted as costate variables and Lagrange multipliers are reformulated as the minimization of the Hamiltonian in Pontryagin s minimum principle Nonlinear programming Edit The Lagrange multiplier method has several generalizations In nonlinear programming there are several multiplier rules e g the Caratheodory John Multiplier Rule and the Convex Multiplier Rule for inequality constraints 18 Power systems Edit Methods based on Lagrange multipliers have applications in power systems e g in distributed energy resources DER placement and load shedding 19 See also EditAdjustment of observations Duality Gittins index Karush Kuhn Tucker conditions generalization of the method of Lagrange multipliers Lagrange multipliers on Banach spaces another generalization of the method of Lagrange multipliers Lagrange multiplier test in maximum likelihood estimation Lagrangian relaxationReferences Edit Hoffmann Laurence D Bradley Gerald L 2004 Calculus for Business Economics and the Social and Life Sciences 8th ed pp 575 588 ISBN 0 07 242432 X Beavis Brian Dobbs Ian M 1990 Static Optimization Optimization and Stability Theory for Economic Analysis New York Cambridge University Press p 40 ISBN 0 521 33605 8 Protter Murray H Morrey Charles B Jr 1985 Intermediate Calculus 2nd ed New York NY Springer p 267 ISBN 0 387 96058 9 a href Template Cite book html title Template Cite book cite book a CS1 maint multiple names authors list link a b c Walsh G R 1975 Saddle point Property of Lagrangian Function Methods of Optimization New York NY John Wiley amp Sons pp 39 44 ISBN 0 471 91922 5 Kalman Dan 2009 Leveling with Lagrange An alternate view of constrained optimization Mathematics Magazine 82 3 186 196 doi 10 1080 0025570X 2009 11953617 JSTOR 27765899 S2CID 121070192 a b Silberberg Eugene Suen Wing 2001 The Structure of Economics A Mathematical Analysis Third ed Boston Irwin McGraw Hill pp 134 141 ISBN 0 07 234352 4 de la Fuente Angel 2000 Mathematical Methods and Models for Economists Cambridge Cambridge University Press p 285 doi 10 1017 CBO9780511810756 ISBN 9780521585125 Luenberger David G 1969 Optimization by Vector Space Methods New York John Wiley amp Sons pp 188 189 Bertsekas Dimitri P 1999 Nonlinear Programming Second ed Cambridge MA Athena Scientific ISBN 1 886529 00 0 Vapnyarskii I B 2001 1994 Lagrange multipliers Encyclopedia of Mathematics EMS Press Lasdon Leon S 2002 1970 Optimization Theory for Large Systems reprint ed Mineola New York NY Dover ISBN 0 486 41999 1 MR 1888251 Hiriart Urruty Jean Baptiste Lemarechal Claude 1993 Chapter XII Abstract duality for practitioners Convex analysis and minimization algorithms Grundlehren der Mathematischen Wissenschaften Fundamental Principles of Mathematical Sciences Vol 306 Berlin DE Springer Verlag pp 136 193 and Bibliographical comments pp 334 335 ISBN 3 540 56852 2 MR 1295240 Volume II Advanced theory and bundle methods Lemarechal Claude 15 19 May 2000 Lagrangian relaxation In Junger Michael Naddef Denis eds Computational combinatorial optimization Papers from the Spring School held in Schloss Dagstuhl Spring School held in Schloss Dagstuhl May 15 19 2000 Lecture Notes in Computer Science Vol 2241 Berlin DE Springer Verlag published 2001 pp 112 156 doi 10 1007 3 540 45586 8 4 ISBN 3 540 42877 1 MR 1900016 S2CID 9048698 Lafontaine Jacques 2015 An Introduction to Differential Manifolds Springer p 70 ISBN 9783319207353 Dixit Avinash K 1990 Shadow Prices Optimization in Economic Theory 2nd ed New York Oxford University Press pp 40 54 ISBN 0 19 877210 6 Chiang Alpha C 1984 Fundamental Methods of Mathematical Economics Third ed McGraw Hill p 386 ISBN 0 07 010813 7 Heath Michael T 2005 Scientific Computing An introductory survey McGraw Hill p 203 ISBN 978 0 07 124489 3 Pourciau Bruce H 1980 Modern multiplier rules American Mathematical Monthly 87 6 433 452 doi 10 2307 2320250 JSTOR 2320250 Gautam Mukesh Bhusal Narayan Benidris Mohammed 2020 A sensitivity based approach to adaptive under frequency load shedding 2020 IEEE Texas Power and Energy Conference TPEC Institute of Electronic and Electrical Engineers pp 1 5 doi 10 1109 TPEC48276 2020 9042569 Further reading EditBeavis Brian Dobbs Ian M 1990 Static Optimization Optimization and Stability Theory for Economic Analysis New York NY Cambridge University Press pp 32 72 ISBN 0 521 33605 8 Bertsekas Dimitri P 1982 Constrained optimization and Lagrange multiplier methods New York NY Academic Press ISBN 0 12 093480 9 Beveridge Gordon S G Schechter Robert S 1970 Lagrangian multipliers Optimization Theory and Practice New York NY McGraw Hill pp 244 259 ISBN 0 07 005128 3 Binger Brian R Hoffman Elizabeth 1998 Constrained optimization Microeconomics with Calculus 2nd ed Reading Addison Wesley pp 56 91 ISBN 0 321 01225 9 Carter Michael 2001 Equality constraints Foundations of Mathematical Economics Cambridge MA MIT Press pp 516 549 ISBN 0 262 53192 5 Hestenes Magnus R 1966 Minima of functions subject to equality constraints Calculus of Variations and Optimal Control Theory New York NY Wiley pp 29 34 Wylie C Ray Barrett Louis C 1995 The extrema of integrals under constraint Advanced Engineering Mathematics Sixth ed New York NY McGraw Hill pp 1096 1103 ISBN 0 07 072206 4 External links Edit nbsp The Wikibook Calculus optimization methods has a page on the topic of Lagrange multipliers ExpositionSteuard Conceptual introduction slimy com plus a brief discussion of Lagrange multipliers in the calculus of variations as used in physics Carpenter Kenneth H Lagrange multipliers for quadratic forms with linear constraints PDF Kansas State University Additional text and interactive appletsResnik Simple explanation with an example of governments using taxes as Lagrange multipliers umiacs umd edu University of Maryland Klein Dan Lagrange multipliers without permanent scarring Explanation with focus on the intuition PDF nlp cs berkeley edu University of California Berkeley Sathyanarayana Shashi Geometric representation of method of Lagrange multipliers wolfram com Mathematica demonstration Wolfram Research Needs Internet Explorer Firefox Safari Provides compelling insight in 2 dimensions that at a minimizing point the direction of steepest descent must be perpendicular to the tangent of the constraint curve at that point Lagrange multipliers two variables MIT Open Courseware ocw mit edu Applet Massachusetts Institute of Technology Lagrange multipliers MIT Open Courseware ocw mit edu video lecture Mathematics 18 02 Multivariable calculus Massachusetts Institute of Technology Fall 2007 Bertsekas Details on Lagrange multipliers PDF athenasc com slides course lecture Non Linear Programming Course slides accompanying text on nonlinear optimization Wyatt John 7 April 2004 19 November 2002 Legrange multipliers constrained optimization and the maximum entropy principle PDF www mtl mit edu Elec E amp C S Mech E 6 050 Information entropy and computation Geometric idea behind Lagrange multipliers Using Lagrange multipliers in optimization matlab cheme cmu edu MATLAB example Pittsburgh PA Carnegie Mellon University 24 December 2011 Retrieved from https en wikipedia org w index php title Lag, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.