fbpx
Wikipedia

Bellman equation

A Bellman equation, named after Richard E. Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming.[1] It writes the "value" of a decision problem at a certain point in time in terms of the payoff from some initial choices and the "value" of the remaining decision problem that results from those initial choices.[2] This breaks a dynamic optimization problem into a sequence of simpler subproblems, as Bellman's “principle of optimality" prescribes.[3] The equation applies to algebraic structures with a total ordering; for algebraic structures with a partial ordering, the generic Bellman's equation can be used.[4]

Bellman flow chart

The Bellman equation was first applied to engineering control theory and to other topics in applied mathematics, and subsequently became an important tool in economic theory; though the basic concepts of dynamic programming are prefigured in John von Neumann and Oskar Morgenstern's Theory of Games and Economic Behavior and Abraham Wald's sequential analysis.[citation needed] The term 'Bellman equation' usually refers to the dynamic programming equation associated with discrete-time optimization problems.[5] In continuous-time optimization problems, the analogous equation is a partial differential equation that is called the Hamilton–Jacobi–Bellman equation.[6][7]

In discrete time any multi-stage optimization problem can be solved by analyzing the appropriate Bellman equation. The appropriate Bellman equation can be found by introducing new state variables (state augmentation).[8] However, the resulting augmented-state multi-stage optimization problem has a higher dimensional state space than the original multi-stage optimization problem - an issue that can potentially render the augmented problem intractable due to the “curse of dimensionality”. Alternatively, it has been shown that if the cost function of the multi-stage optimization problem satisfies a "backward separable" structure, then the appropriate Bellman equation can be found without state augmentation.[9]

Analytical concepts in dynamic programming edit

To understand the Bellman equation, several underlying concepts must be understood. First, any optimization problem has some objective: minimizing travel time, minimizing cost, maximizing profits, maximizing utility, etc. The mathematical function that describes this objective is called the objective function.

Dynamic programming breaks a multi-period planning problem into simpler steps at different points in time. Therefore, it requires keeping track of how the decision situation is evolving over time. The information about the current situation that is needed to make a correct decision is called the "state".[10][11] For example, to decide how much to consume and spend at each point in time, people would need to know (among other things) their initial wealth. Therefore, wealth   would be one of their state variables, but there would probably be others.

The variables chosen at any given point in time are often called the control variables. For instance, given their current wealth, people might decide how much to consume now. Choosing the control variables now may be equivalent to choosing the next state; more generally, the next state is affected by other factors in addition to the current control. For example, in the simplest case, today's wealth (the state) and consumption (the control) might exactly determine tomorrow's wealth (the new state), though typically other factors will affect tomorrow's wealth too.

The dynamic programming approach describes the optimal plan by finding a rule that tells what the controls should be, given any possible value of the state. For example, if consumption (c) depends only on wealth (W), we would seek a rule   that gives consumption as a function of wealth. Such a rule, determining the controls as a function of the states, is called a policy function (See Bellman, 1957, Ch. III.2).[10]

Finally, by definition, the optimal decision rule is the one that achieves the best possible value of the objective. For example, if someone chooses consumption, given wealth, in order to maximize happiness (assuming happiness H can be represented by a mathematical function, such as a utility function and is something defined by wealth), then each level of wealth will be associated with some highest possible level of happiness,  . The best possible value of the objective, written as a function of the state, is called the value function.

Bellman showed that a dynamic optimization problem in discrete time can be stated in a recursive, step-by-step form known as backward induction by writing down the relationship between the value function in one period and the value function in the next period. The relationship between these two value functions is called the "Bellman equation". In this approach, the optimal policy in the last time period is specified in advance as a function of the state variable's value at that time, and the resulting optimal value of the objective function is thus expressed in terms of that value of the state variable. Next, the next-to-last period's optimization involves maximizing the sum of that period's period-specific objective function and the optimal value of the future objective function, giving that period's optimal policy contingent upon the value of the state variable as of the next-to-last period decision.[clarification needed] This logic continues recursively back in time, until the first period decision rule is derived, as a function of the initial state variable value, by optimizing the sum of the first-period-specific objective function and the value of the second period's value function, which gives the value for all the future periods. Thus, each period's decision is made by explicitly acknowledging that all future decisions will be optimally made.

Derivation edit

A dynamic decision problem edit

Let   be the state at time  . For a decision that begins at time 0, we take as given the initial state  . At any time, the set of possible actions depends on the current state; we express this as  , where a particular action   represents particular values for one or more control variables, and   is the set of actions available to be taken at state  . We also assume that the state changes from   to a new state   when action   is taken, and that the current payoff from taking action   in state   is  . Finally, we assume impatience, represented by a discount factor  .

Under these assumptions, an infinite-horizon decision problem takes the following form:

 

subject to the constraints

 

Notice that we have defined notation   to denote the optimal value that can be obtained by maximizing this objective function subject to the assumed constraints. This function is the value function. It is a function of the initial state variable  , since the best value obtainable depends on the initial situation.

Bellman's principle of optimality edit

The dynamic programming method breaks this decision problem into smaller subproblems. Bellman's principle of optimality describes how to do this:

Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. (See Bellman, 1957, Chap. III.3.)[10][11][12]

In computer science, a problem that can be broken apart like this is said to have optimal substructure. In the context of dynamic game theory, this principle is analogous to the concept of subgame perfect equilibrium, although what constitutes an optimal policy in this case is conditioned on the decision-maker's opponents choosing similarly optimal policies from their points of view.

As suggested by the principle of optimality, we will consider the first decision separately, setting aside all future decisions (we will start afresh from time 1 with the new state  ). Collecting the future decisions in brackets on the right, the above infinite-horizon decision problem is equivalent to:[clarification needed]

 

subject to the constraints

 

Here we are choosing  , knowing that our choice will cause the time 1 state to be  . That new state will then affect the decision problem from time 1 on. The whole future decision problem appears inside the square brackets on the right.[clarification needed][further explanation needed]

The Bellman equation edit

So far it seems we have only made the problem uglier by separating today's decision from future decisions. But we can simplify by noticing that what is inside the square brackets on the right is the value of the time 1 decision problem, starting from state  .

Therefore, we can rewrite the problem as a recursive definition of the value function:

 , subject to the constraints:  

This is the Bellman equation. It can be simplified even further if we drop time subscripts and plug in the value of the next state:

 

The Bellman equation is classified as a functional equation, because solving it means finding the unknown function  , which is the value function. Recall that the value function describes the best possible value of the objective, as a function of the state  . By calculating the value function, we will also find the function   that describes the optimal action as a function of the state; this is called the policy function.

In a stochastic problem edit

In the deterministic setting, other techniques besides dynamic programming can be used to tackle the above optimal control problem. However, the Bellman Equation is often the most convenient method of solving stochastic optimal control problems.

For a specific example from economics, consider an infinitely-lived consumer with initial wealth endowment   at period  . They have an instantaneous utility function   where   denotes consumption and discounts the next period utility at a rate of  . Assume that what is not consumed in period   carries over to the next period with interest rate  . Then the consumer's utility maximization problem is to choose a consumption plan   that solves

 

subject to

 

and

 

The first constraint is the capital accumulation/law of motion specified by the problem, while the second constraint is a transversality condition that the consumer does not carry debt at the end of their life. The Bellman equation is

 

Alternatively, one can treat the sequence problem directly using, for example, the Hamiltonian equations.

Now, if the interest rate varies from period to period, the consumer is faced with a stochastic optimization problem. Let the interest r follow a Markov process with probability transition function   where   denotes the probability measure governing the distribution of interest rate next period if current interest rate is  . In this model the consumer decides their current period consumption after the current period interest rate is announced.

Rather than simply choosing a single sequence  , the consumer now must choose a sequence   for each possible realization of a   in such a way that their lifetime expected utility is maximized:

 

The expectation   is taken with respect to the appropriate probability measure given by Q on the sequences of r's. Because r is governed by a Markov process, dynamic programming simplifies the problem significantly. Then the Bellman equation is simply:

 

Under some reasonable assumption, the resulting optimal policy function g(a,r) is measurable.

For a general stochastic sequential optimization problem with Markovian shocks and where the agent is faced with their decision ex-post, the Bellman equation takes a very similar form

 

Solution methods edit

  • The method of undetermined coefficients, also known as 'guess and verify', can be used to solve some infinite-horizon, autonomous Bellman equations.[13]
  • The Bellman equation can be solved by backwards induction, either analytically in a few special cases, or numerically on a computer. Numerical backwards induction is applicable to a wide variety of problems, but may be infeasible when there are many state variables, due to the curse of dimensionality. Approximate dynamic programming has been introduced by D. P. Bertsekas and J. N. Tsitsiklis with the use of artificial neural networks (multilayer perceptrons) for approximating the Bellman function.[14] This is an effective mitigation strategy for reducing the impact of dimensionality by replacing the memorization of the complete function mapping for the whole space domain with the memorization of the sole neural network parameters. In particular, for continuous-time systems, an approximate dynamic programming approach that combines both policy iterations with neural networks was introduced.[15] In discrete-time, an approach to solve the HJB equation combining value iterations and neural networks was introduced.[16]
  • By calculating the first-order conditions associated with the Bellman equation, and then using the envelope theorem to eliminate the derivatives of the value function, it is possible to obtain a system of difference equations or differential equations called the 'Euler equations'.[17] Standard techniques for the solution of difference or differential equations can then be used to calculate the dynamics of the state variables and the control variables of the optimization problem.

Applications in economics edit

The first known application of a Bellman equation in economics is due to Martin Beckmann and Richard Muth.[18] Martin Beckmann also wrote extensively on consumption theory using the Bellman equation in 1959. His work influenced Edmund S. Phelps, among others.

A celebrated economic application of a Bellman equation is Robert C. Merton's seminal 1973 article on the intertemporal capital asset pricing model.[19] (See also Merton's portfolio problem).The solution to Merton's theoretical model, one in which investors chose between income today and future income or capital gains, is a form of Bellman's equation. Because economic applications of dynamic programming usually result in a Bellman equation that is a difference equation, economists refer to dynamic programming as a "recursive method" and a subfield of recursive economics is now recognized within economics.

Nancy Stokey, Robert E. Lucas, and Edward Prescott describe stochastic and nonstochastic dynamic programming in considerable detail, and develop theorems for the existence of solutions to problems meeting certain conditions. They also describe many examples of modeling theoretical problems in economics using recursive methods.[20] This book led to dynamic programming being employed to solve a wide range of theoretical problems in economics, including optimal economic growth, resource extraction, principal–agent problems, public finance, business investment, asset pricing, factor supply, and industrial organization. Lars Ljungqvist and Thomas Sargent apply dynamic programming to study a variety of theoretical questions in monetary policy, fiscal policy, taxation, economic growth, search theory, and labor economics.[21] Avinash Dixit and Robert Pindyck showed the value of the method for thinking about capital budgeting.[22] Anderson adapted the technique to business valuation, including privately held businesses.[23]

Using dynamic programming to solve concrete problems is complicated by informational difficulties, such as choosing the unobservable discount rate. There are also computational issues, the main one being the curse of dimensionality arising from the vast number of possible actions and potential state variables that must be considered before an optimal strategy can be selected. For an extensive discussion of computational issues, see Miranda and Fackler,[24] and Meyn 2007.[25]

Example edit

In Markov decision processes, a Bellman equation is a recursion for expected rewards. For example, the expected reward for being in a particular state s and following some fixed policy   has the Bellman equation:

 

This equation describes the expected reward for taking the action prescribed by some policy  .

The equation for the optimal policy is referred to as the Bellman optimality equation:

 

where   is the optimal policy and   refers to the value function of the optimal policy. The equation above describes the reward for taking the action giving the highest expected return.

See also edit

References edit

  1. ^ Dixit, Avinash K. (1990). Optimization in Economic Theory (2nd ed.). Oxford University Press. p. 164. ISBN 0-19-877211-4.
  2. ^ "Bellman's principle of optimality". www.ques10.com. Retrieved 2023-08-17.
  3. ^ Kirk, Donald E. (1970). Optimal Control Theory: An Introduction. Prentice-Hall. p. 55. ISBN 0-13-638098-0.
  4. ^ Szcześniak, Ireneusz; Woźna-Szcześniak, Bożena (2023), "Generic Dijkstra: Correctness and tractability", NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, pp. 1–7, arXiv:2204.13547, doi:10.1109/NOMS56928.2023.10154322, ISBN 978-1-6654-7716-1, S2CID 248427020
  5. ^ Kirk 1970, p. 70
  6. ^ Kamien, Morton I.; Schwartz, Nancy L. (1991). Dynamic Optimization: The Calculus of Variations and Optimal Control in Economics and Management (Second ed.). Amsterdam: Elsevier. p. 261. ISBN 0-444-01609-0.
  7. ^ Kirk 1970, p. 88
  8. ^ Jones, Morgan; Peet, Matthew M. (2020). "Extensions of the Dynamic Programming Framework: Battery Scheduling, Demand Charges, and Renewable Integration". IEEE Transactions on Automatic Control. 66 (4): 1602–1617. arXiv:1812.00792. doi:10.1109/TAC.2020.3002235. S2CID 119622206.
  9. ^ Jones, Morgan; Peet, Matthew M. (2021). "A Generalization of Bellman's Equation with Application to Path Planning, Obstacle Avoidance and Invariant Set Estimation". Automatica. 127: 109510. arXiv:2006.08175. doi:10.1016/j.automatica.2021.109510. S2CID 222350370.
  10. ^ a b c Bellman, R.E. (2003) [1957]. Dynamic Programming. Dover. ISBN 0-486-42809-5.
  11. ^ a b Dreyfus, S. (2002). "Richard Bellman on the birth of dynamic programming". Operations Research. 50 (1): 48–51. doi:10.1287/opre.50.1.48.17791.
  12. ^ Bellman, R (August 1952). "On the Theory of Dynamic Programming". Proc Natl Acad Sci U S A. 38 (8): 716–9. Bibcode:1952PNAS...38..716B. doi:10.1073/pnas.38.8.716. PMC 1063639. PMID 16589166.
  13. ^ Ljungqvist, Lars; Sargent, Thomas J. (2004). Recursive Macroeconomic Theory (2nd ed.). MIT Press. pp. 88–90. ISBN 0-262-12274-X.
  14. ^ Bertsekas, Dimitri P.; Tsitsiklis, John N. (1996). Neuro-dynamic Programming. Athena Scientific. ISBN 978-1-886529-10-6.
  15. ^ Abu-Khalaf, Murad; Lewis, Frank L. (2005). "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach". Automatica. 41 (5): 779–791. doi:10.1016/j.automatica.2004.11.034. S2CID 14757582.
  16. ^ Al-Tamimi, Asma; Lewis, Frank L.; Abu-Khalaf, Murad (2008). "Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof". IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. 38 (4): 943–949. doi:10.1109/TSMCB.2008.926614. PMID 18632382. S2CID 14202785.
  17. ^ Miao, Jianjun (2014). Economic Dynamics in Discrete Time. MIT Press. p. 134. ISBN 978-0-262-32560-8.
  18. ^ Beckmann, Martin; Muth, Richard (1954). "On the Solution to the 'Fundamental Equation' of inventory theory" (PDF). Cowles Commission Discussion Paper 2116.
  19. ^ Merton, Robert C. (1973). "An Intertemporal Capital Asset Pricing Model". Econometrica. 41 (5): 867–887. doi:10.2307/1913811. JSTOR 1913811.
  20. ^ Stokey, Nancy; Lucas, Robert E.; Prescott, Edward (1989). Recursive Methods in Economic Dynamics. Harvard University Press. ISBN 0-674-75096-9.
  21. ^ Ljungqvist, Lars; Sargent, Thomas (2012). Recursive Macroeconomic Theory (3rd ed.). MIT Press. ISBN 978-0-262-01874-6.
  22. ^ Dixit, Avinash; Pindyck, Robert (1994). Investment Under Uncertainty. Princeton University Press. ISBN 0-691-03410-9.
  23. ^ Anderson, Patrick L. (2004). "Ch. 10". Business Economics & Finance. CRC Press. ISBN 1-58488-348-0.
    — (2009). "The Value of Private Businesses in the United States". Business Economics. 44 (2): 87–108. doi:10.1057/be.2009.4. S2CID 154743445.
    — (2013). Economics of Business Valuation. Stanford University Press. ISBN 9780804758307. Stanford Press 2013-08-08 at the Wayback Machine
  24. ^ Miranda, Mario J.; Fackler, Paul L. (2004). Applied Computational Economics and Finance. MIT Press. ISBN 978-0-262-29175-0.
  25. ^ Meyn, Sean (2008). Control Techniques for Complex Networks. Cambridge University Press. ISBN 978-0-521-88441-9. Appendix contains abridged Meyn & Tweedie 2007-10-12 at the Wayback Machine.

bellman, equation, this, article, includes, list, general, references, lacks, sufficient, corresponding, inline, citations, please, help, improve, this, article, introducing, more, precise, citations, april, 2018, learn, when, remove, this, template, message, . This article includes a list of general references but it lacks sufficient corresponding inline citations Please help to improve this article by introducing more precise citations April 2018 Learn how and when to remove this template message A Bellman equation named after Richard E Bellman is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming 1 It writes the value of a decision problem at a certain point in time in terms of the payoff from some initial choices and the value of the remaining decision problem that results from those initial choices 2 This breaks a dynamic optimization problem into a sequence of simpler subproblems as Bellman s principle of optimality prescribes 3 The equation applies to algebraic structures with a total ordering for algebraic structures with a partial ordering the generic Bellman s equation can be used 4 Bellman flow chartThe Bellman equation was first applied to engineering control theory and to other topics in applied mathematics and subsequently became an important tool in economic theory though the basic concepts of dynamic programming are prefigured in John von Neumann and Oskar Morgenstern s Theory of Games and Economic Behavior and Abraham Wald s sequential analysis citation needed The term Bellman equation usually refers to the dynamic programming equation associated with discrete time optimization problems 5 In continuous time optimization problems the analogous equation is a partial differential equation that is called the Hamilton Jacobi Bellman equation 6 7 In discrete time any multi stage optimization problem can be solved by analyzing the appropriate Bellman equation The appropriate Bellman equation can be found by introducing new state variables state augmentation 8 However the resulting augmented state multi stage optimization problem has a higher dimensional state space than the original multi stage optimization problem an issue that can potentially render the augmented problem intractable due to the curse of dimensionality Alternatively it has been shown that if the cost function of the multi stage optimization problem satisfies a backward separable structure then the appropriate Bellman equation can be found without state augmentation 9 Contents 1 Analytical concepts in dynamic programming 2 Derivation 2 1 A dynamic decision problem 2 2 Bellman s principle of optimality 2 3 The Bellman equation 2 4 In a stochastic problem 3 Solution methods 4 Applications in economics 5 Example 6 See also 7 ReferencesAnalytical concepts in dynamic programming editTo understand the Bellman equation several underlying concepts must be understood First any optimization problem has some objective minimizing travel time minimizing cost maximizing profits maximizing utility etc The mathematical function that describes this objective is called the objective function Dynamic programming breaks a multi period planning problem into simpler steps at different points in time Therefore it requires keeping track of how the decision situation is evolving over time The information about the current situation that is needed to make a correct decision is called the state 10 11 For example to decide how much to consume and spend at each point in time people would need to know among other things their initial wealth Therefore wealth W displaystyle W nbsp would be one of their state variables but there would probably be others The variables chosen at any given point in time are often called the control variables For instance given their current wealth people might decide how much to consume now Choosing the control variables now may be equivalent to choosing the next state more generally the next state is affected by other factors in addition to the current control For example in the simplest case today s wealth the state and consumption the control might exactly determine tomorrow s wealth the new state though typically other factors will affect tomorrow s wealth too The dynamic programming approach describes the optimal plan by finding a rule that tells what the controls should be given any possible value of the state For example if consumption c depends only on wealth W we would seek a rule c W displaystyle c W nbsp that gives consumption as a function of wealth Such a rule determining the controls as a function of the states is called a policy function See Bellman 1957 Ch III 2 10 Finally by definition the optimal decision rule is the one that achieves the best possible value of the objective For example if someone chooses consumption given wealth in order to maximize happiness assuming happiness H can be represented by a mathematical function such as a utility function and is something defined by wealth then each level of wealth will be associated with some highest possible level of happiness H W displaystyle H W nbsp The best possible value of the objective written as a function of the state is called the value function Bellman showed that a dynamic optimization problem in discrete time can be stated in a recursive step by step form known as backward induction by writing down the relationship between the value function in one period and the value function in the next period The relationship between these two value functions is called the Bellman equation In this approach the optimal policy in the last time period is specified in advance as a function of the state variable s value at that time and the resulting optimal value of the objective function is thus expressed in terms of that value of the state variable Next the next to last period s optimization involves maximizing the sum of that period s period specific objective function and the optimal value of the future objective function giving that period s optimal policy contingent upon the value of the state variable as of the next to last period decision clarification needed This logic continues recursively back in time until the first period decision rule is derived as a function of the initial state variable value by optimizing the sum of the first period specific objective function and the value of the second period s value function which gives the value for all the future periods Thus each period s decision is made by explicitly acknowledging that all future decisions will be optimally made Derivation editA dynamic decision problem edit Let x t displaystyle x t nbsp be the state at time t displaystyle t nbsp For a decision that begins at time 0 we take as given the initial state x 0 displaystyle x 0 nbsp At any time the set of possible actions depends on the current state we express this as a t G x t displaystyle a t in Gamma x t nbsp where a particular action a t displaystyle a t nbsp represents particular values for one or more control variables and G x t displaystyle Gamma x t nbsp is the set of actions available to be taken at state x t displaystyle x t nbsp We also assume that the state changes from x displaystyle x nbsp to a new state T x a displaystyle T x a nbsp when action a displaystyle a nbsp is taken and that the current payoff from taking action a displaystyle a nbsp in state x displaystyle x nbsp is F x a displaystyle F x a nbsp Finally we assume impatience represented by a discount factor 0 lt b lt 1 displaystyle 0 lt beta lt 1 nbsp Under these assumptions an infinite horizon decision problem takes the following form V x 0 max a t t 0 t 0 b t F x t a t displaystyle V x 0 max left a t right t 0 infty sum t 0 infty beta t F x t a t nbsp subject to the constraints a t G x t x t 1 T x t a t t 0 1 2 displaystyle a t in Gamma x t x t 1 T x t a t forall t 0 1 2 dots nbsp Notice that we have defined notation V x 0 displaystyle V x 0 nbsp to denote the optimal value that can be obtained by maximizing this objective function subject to the assumed constraints This function is the value function It is a function of the initial state variable x 0 displaystyle x 0 nbsp since the best value obtainable depends on the initial situation Bellman s principle of optimality editThe dynamic programming method breaks this decision problem into smaller subproblems Bellman s principle of optimality describes how to do this Principle of Optimality An optimal policy has the property that whatever the initial state and initial decision are the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision See Bellman 1957 Chap III 3 10 11 12 In computer science a problem that can be broken apart like this is said to have optimal substructure In the context of dynamic game theory this principle is analogous to the concept of subgame perfect equilibrium although what constitutes an optimal policy in this case is conditioned on the decision maker s opponents choosing similarly optimal policies from their points of view As suggested by the principle of optimality we will consider the first decision separately setting aside all future decisions we will start afresh from time 1 with the new state x 1 displaystyle x 1 nbsp Collecting the future decisions in brackets on the right the above infinite horizon decision problem is equivalent to clarification needed max a 0 F x 0 a 0 b max a t t 1 t 1 b t 1 F x t a t a t G x t x t 1 T x t a t t 1 displaystyle max a 0 left F x 0 a 0 beta left max left a t right t 1 infty sum t 1 infty beta t 1 F x t a t a t in Gamma x t x t 1 T x t a t forall t geq 1 right right nbsp subject to the constraints a 0 G x 0 x 1 T x 0 a 0 displaystyle a 0 in Gamma x 0 x 1 T x 0 a 0 nbsp Here we are choosing a 0 displaystyle a 0 nbsp knowing that our choice will cause the time 1 state to be x 1 T x 0 a 0 displaystyle x 1 T x 0 a 0 nbsp That new state will then affect the decision problem from time 1 on The whole future decision problem appears inside the square brackets on the right clarification needed further explanation needed The Bellman equation edit So far it seems we have only made the problem uglier by separating today s decision from future decisions But we can simplify by noticing that what is inside the square brackets on the right is the value of the time 1 decision problem starting from state x 1 T x 0 a 0 displaystyle x 1 T x 0 a 0 nbsp Therefore we can rewrite the problem as a recursive definition of the value function V x 0 max a 0 F x 0 a 0 b V x 1 displaystyle V x 0 max a 0 F x 0 a 0 beta V x 1 nbsp subject to the constraints a 0 G x 0 x 1 T x 0 a 0 displaystyle a 0 in Gamma x 0 x 1 T x 0 a 0 nbsp This is the Bellman equation It can be simplified even further if we drop time subscripts and plug in the value of the next state V x max a G x F x a b V T x a displaystyle V x max a in Gamma x F x a beta V T x a nbsp The Bellman equation is classified as a functional equation because solving it means finding the unknown function V displaystyle V nbsp which is the value function Recall that the value function describes the best possible value of the objective as a function of the state x displaystyle x nbsp By calculating the value function we will also find the function a x displaystyle a x nbsp that describes the optimal action as a function of the state this is called the policy function In a stochastic problem edit See also Markov decision process In the deterministic setting other techniques besides dynamic programming can be used to tackle the above optimal control problem However the Bellman Equation is often the most convenient method of solving stochastic optimal control problems For a specific example from economics consider an infinitely lived consumer with initial wealth endowment a 0 displaystyle color Red a 0 nbsp at period 0 displaystyle 0 nbsp They have an instantaneous utility function u c displaystyle u c nbsp where c displaystyle c nbsp denotes consumption and discounts the next period utility at a rate of 0 lt b lt 1 displaystyle 0 lt beta lt 1 nbsp Assume that what is not consumed in period t displaystyle t nbsp carries over to the next period with interest rate r displaystyle r nbsp Then the consumer s utility maximization problem is to choose a consumption plan c t displaystyle color OliveGreen c t nbsp that solves max t 0 b t u c t displaystyle max sum t 0 infty beta t u color OliveGreen c t nbsp subject to a t 1 1 r a t c t c t 0 displaystyle color Red a t 1 1 r color Red a t color OliveGreen c t color OliveGreen c t geq 0 nbsp and lim t a t 0 displaystyle lim t rightarrow infty color Red a t geq 0 nbsp The first constraint is the capital accumulation law of motion specified by the problem while the second constraint is a transversality condition that the consumer does not carry debt at the end of their life The Bellman equation is V a max 0 c a u c b V 1 r a c displaystyle V a max 0 leq c leq a u c beta V 1 r a c nbsp Alternatively one can treat the sequence problem directly using for example the Hamiltonian equations Now if the interest rate varies from period to period the consumer is faced with a stochastic optimization problem Let the interest r follow a Markov process with probability transition function Q r d m r displaystyle Q r d mu r nbsp where d m r displaystyle d mu r nbsp denotes the probability measure governing the distribution of interest rate next period if current interest rate is r displaystyle r nbsp In this model the consumer decides their current period consumption after the current period interest rate is announced Rather than simply choosing a single sequence c t displaystyle color OliveGreen c t nbsp the consumer now must choose a sequence c t displaystyle color OliveGreen c t nbsp for each possible realization of a r t displaystyle r t nbsp in such a way that their lifetime expected utility is maximized max c t t 0 E t 0 b t u c t displaystyle max left c t right t 0 infty mathbb E bigg sum t 0 infty beta t u color OliveGreen c t bigg nbsp The expectation E displaystyle mathbb E nbsp is taken with respect to the appropriate probability measure given by Q on the sequences of r s Because r is governed by a Markov process dynamic programming simplifies the problem significantly Then the Bellman equation is simply V a r max 0 c a u c b V 1 r a c r Q r d m r displaystyle V a r max 0 leq c leq a u c beta int V 1 r a c r Q r d mu r nbsp Under some reasonable assumption the resulting optimal policy function g a r is measurable For a general stochastic sequential optimization problem with Markovian shocks and where the agent is faced with their decision ex post the Bellman equation takes a very similar form V x z max c G x z F x c z b V T x c z d m z z displaystyle V x z max c in Gamma x z F x c z beta int V T x c z d mu z z nbsp Solution methods editThe method of undetermined coefficients also known as guess and verify can be used to solve some infinite horizon autonomous Bellman equations 13 The Bellman equation can be solved by backwards induction either analytically in a few special cases or numerically on a computer Numerical backwards induction is applicable to a wide variety of problems but may be infeasible when there are many state variables due to the curse of dimensionality Approximate dynamic programming has been introduced by D P Bertsekas and J N Tsitsiklis with the use of artificial neural networks multilayer perceptrons for approximating the Bellman function 14 This is an effective mitigation strategy for reducing the impact of dimensionality by replacing the memorization of the complete function mapping for the whole space domain with the memorization of the sole neural network parameters In particular for continuous time systems an approximate dynamic programming approach that combines both policy iterations with neural networks was introduced 15 In discrete time an approach to solve the HJB equation combining value iterations and neural networks was introduced 16 By calculating the first order conditions associated with the Bellman equation and then using the envelope theorem to eliminate the derivatives of the value function it is possible to obtain a system of difference equations or differential equations called the Euler equations 17 Standard techniques for the solution of difference or differential equations can then be used to calculate the dynamics of the state variables and the control variables of the optimization problem Applications in economics editThe first known application of a Bellman equation in economics is due to Martin Beckmann and Richard Muth 18 Martin Beckmann also wrote extensively on consumption theory using the Bellman equation in 1959 His work influenced Edmund S Phelps among others A celebrated economic application of a Bellman equation is Robert C Merton s seminal 1973 article on the intertemporal capital asset pricing model 19 See also Merton s portfolio problem The solution to Merton s theoretical model one in which investors chose between income today and future income or capital gains is a form of Bellman s equation Because economic applications of dynamic programming usually result in a Bellman equation that is a difference equation economists refer to dynamic programming as a recursive method and a subfield of recursive economics is now recognized within economics Nancy Stokey Robert E Lucas and Edward Prescott describe stochastic and nonstochastic dynamic programming in considerable detail and develop theorems for the existence of solutions to problems meeting certain conditions They also describe many examples of modeling theoretical problems in economics using recursive methods 20 This book led to dynamic programming being employed to solve a wide range of theoretical problems in economics including optimal economic growth resource extraction principal agent problems public finance business investment asset pricing factor supply and industrial organization Lars Ljungqvist and Thomas Sargent apply dynamic programming to study a variety of theoretical questions in monetary policy fiscal policy taxation economic growth search theory and labor economics 21 Avinash Dixit and Robert Pindyck showed the value of the method for thinking about capital budgeting 22 Anderson adapted the technique to business valuation including privately held businesses 23 Using dynamic programming to solve concrete problems is complicated by informational difficulties such as choosing the unobservable discount rate There are also computational issues the main one being the curse of dimensionality arising from the vast number of possible actions and potential state variables that must be considered before an optimal strategy can be selected For an extensive discussion of computational issues see Miranda and Fackler 24 and Meyn 2007 25 Example editIn Markov decision processes a Bellman equation is a recursion for expected rewards For example the expected reward for being in a particular state s and following some fixed policy p displaystyle pi nbsp has the Bellman equation V p s R s p s g s P s s p s V p s displaystyle V pi s R s pi s gamma sum s P s s pi s V pi s nbsp This equation describes the expected reward for taking the action prescribed by some policy p displaystyle pi nbsp The equation for the optimal policy is referred to as the Bellman optimality equation V p s max a R s a g s P s s a V p s displaystyle V pi s max a left R s a gamma sum s P s s a V pi s right nbsp where p displaystyle pi nbsp is the optimal policy and V p displaystyle V pi nbsp refers to the value function of the optimal policy The equation above describes the reward for taking the action giving the highest expected return See also editBellman pseudospectral method Dynamic programming Problem optimization method Hamilton Jacobi Bellman equation An optimality condition in optimal control theory Markov decision process Mathematical model Optimal control theory Mathematical way of attaining a desired output from a dynamic system Optimal substructure Recursive competitive equilibrium economic equilibrium concept associated with a dynamic programPages displaying wikidata descriptions as a fallback Stochastic dynamic programming 1957 technique for modelling problems of decision making under uncertaintyReferences edit Dixit Avinash K 1990 Optimization in Economic Theory 2nd ed Oxford University Press p 164 ISBN 0 19 877211 4 Bellman s principle of optimality www ques10 com Retrieved 2023 08 17 Kirk Donald E 1970 Optimal Control Theory An Introduction Prentice Hall p 55 ISBN 0 13 638098 0 Szczesniak Ireneusz Wozna Szczesniak Bozena 2023 Generic Dijkstra Correctness and tractability NOMS 2023 2023 IEEE IFIP Network Operations and Management Symposium pp 1 7 arXiv 2204 13547 doi 10 1109 NOMS56928 2023 10154322 ISBN 978 1 6654 7716 1 S2CID 248427020 Kirk 1970 p 70 Kamien Morton I Schwartz Nancy L 1991 Dynamic Optimization The Calculus of Variations and Optimal Control in Economics and Management Second ed Amsterdam Elsevier p 261 ISBN 0 444 01609 0 Kirk 1970 p 88 Jones Morgan Peet Matthew M 2020 Extensions of the Dynamic Programming Framework Battery Scheduling Demand Charges and Renewable Integration IEEE Transactions on Automatic Control 66 4 1602 1617 arXiv 1812 00792 doi 10 1109 TAC 2020 3002235 S2CID 119622206 Jones Morgan Peet Matthew M 2021 A Generalization of Bellman s Equation with Application to Path Planning Obstacle Avoidance and Invariant Set Estimation Automatica 127 109510 arXiv 2006 08175 doi 10 1016 j automatica 2021 109510 S2CID 222350370 a b c Bellman R E 2003 1957 Dynamic Programming Dover ISBN 0 486 42809 5 a b Dreyfus S 2002 Richard Bellman on the birth of dynamic programming Operations Research 50 1 48 51 doi 10 1287 opre 50 1 48 17791 Bellman R August 1952 On the Theory of Dynamic Programming Proc Natl Acad Sci U S A 38 8 716 9 Bibcode 1952PNAS 38 716B doi 10 1073 pnas 38 8 716 PMC 1063639 PMID 16589166 Ljungqvist Lars Sargent Thomas J 2004 Recursive Macroeconomic Theory 2nd ed MIT Press pp 88 90 ISBN 0 262 12274 X Bertsekas Dimitri P Tsitsiklis John N 1996 Neuro dynamic Programming Athena Scientific ISBN 978 1 886529 10 6 Abu Khalaf Murad Lewis Frank L 2005 Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach Automatica 41 5 779 791 doi 10 1016 j automatica 2004 11 034 S2CID 14757582 Al Tamimi Asma Lewis Frank L Abu Khalaf Murad 2008 Discrete Time Nonlinear HJB Solution Using Approximate Dynamic Programming Convergence Proof IEEE Transactions on Systems Man and Cybernetics Part B Cybernetics 38 4 943 949 doi 10 1109 TSMCB 2008 926614 PMID 18632382 S2CID 14202785 Miao Jianjun 2014 Economic Dynamics in Discrete Time MIT Press p 134 ISBN 978 0 262 32560 8 Beckmann Martin Muth Richard 1954 On the Solution to the Fundamental Equation of inventory theory PDF Cowles Commission Discussion Paper 2116 Merton Robert C 1973 An Intertemporal Capital Asset Pricing Model Econometrica 41 5 867 887 doi 10 2307 1913811 JSTOR 1913811 Stokey Nancy Lucas Robert E Prescott Edward 1989 Recursive Methods in Economic Dynamics Harvard University Press ISBN 0 674 75096 9 Ljungqvist Lars Sargent Thomas 2012 Recursive Macroeconomic Theory 3rd ed MIT Press ISBN 978 0 262 01874 6 Dixit Avinash Pindyck Robert 1994 Investment Under Uncertainty Princeton University Press ISBN 0 691 03410 9 Anderson Patrick L 2004 Ch 10 Business Economics amp Finance CRC Press ISBN 1 58488 348 0 2009 The Value of Private Businesses in the United States Business Economics 44 2 87 108 doi 10 1057 be 2009 4 S2CID 154743445 2013 Economics of Business Valuation Stanford University Press ISBN 9780804758307 Stanford Press Archived 2013 08 08 at the Wayback Machine Miranda Mario J Fackler Paul L 2004 Applied Computational Economics and Finance MIT Press ISBN 978 0 262 29175 0 Meyn Sean 2008 Control Techniques for Complex Networks Cambridge University Press ISBN 978 0 521 88441 9 Appendix contains abridged Meyn amp Tweedie Archived 2007 10 12 at the Wayback Machine Retrieved from https en wikipedia org w index php title Bellman equation amp oldid 1186470062 Bellman s principle of optimality, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.