fbpx
Wikipedia

Belief propagation

Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node (or variable), conditional on any observed nodes (or variables). Belief propagation is commonly used in artificial intelligence and information theory, and has demonstrated empirical success in numerous applications, including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability.[1]

The algorithm was first proposed by Judea Pearl in 1982,[2] who formulated it as an exact inference algorithm on trees, later extended to polytrees.[3] While the algorithm is not exact on general graphs, it has been shown to be a useful approximate algorithm.[4]

Motivation edit

Given a finite set of discrete random variables   with joint probability mass function  , a common task is to compute the marginal distributions of the  . The marginal of a single   is defined to be

 

where   is a vector of possible values for the  , and the notation   means that the sum is taken over those   whose  th coordinate is equal to  .

Computing marginal distributions using this formula quickly becomes computationally prohibitive as the number of variables grows. For example, given 100 binary variables  , computing a single marginal   using   and the above formula would involve summing over   possible values for  . If it is known that the probability mass function   factors in a convenient way, belief propagation allows the marginals to be computed much more efficiently.

Description of the sum-product algorithm edit

Variants of the belief propagation algorithm exist for several types of graphical models (Bayesian networks and Markov random fields[5] in particular). We describe here the variant that operates on a factor graph. A factor graph is a bipartite graph containing nodes corresponding to variables   and factors  , with edges between variables and the factors in which they appear. We can write the joint mass function:

 

where   is the vector of neighboring variable nodes to the factor node  . Any Bayesian network or Markov random field can be represented as a factor graph by using a factor for each node with its parents or a factor for each node with its neighborhood respectively.[6]

The algorithm works by passing real valued functions called messages along the edges between the hidden nodes. More precisely, if   is a variable node and   is a factor node connected to   in the factor graph, then the messages   from   to   and the messages   from   to   are real-valued functions  , whose domain is the set of values that can be taken by the random variable associated with  , denoted  . These messages contain the "influence" that one variable exerts on another. The messages are computed differently depending on whether the node receiving the message is a variable node or a factor node. Keeping the same notation:

  • A message   from a variable node   to a factor node   is defined by
     
    for  , where   is the set of neighboring factor nodes of  . If   is empty then   is set to the uniform distribution over  .
  • A message   from a factor node   to a variable node   is defined to be the product of the factor with messages from all other nodes, marginalized over all variables except the one associated with  ,
     
    for  , where   is the set of neighboring (variable) nodes to  . If   is empty, then  , since in this case  .

As shown by the previous formula: the complete marginalization is reduced to a sum of products of simpler terms than the ones appearing in the full joint distribution. This is the reason that belief propagation is sometimes called sum-product message passing, or the sum-product algorithm.

In a typical run, each message will be updated iteratively from the previous value of the neighboring messages. Different scheduling can be used for updating the messages. In the case where the graphical model is a tree, an optimal scheduling converges after computing each message exactly once (see next sub-section). When the factor graph has cycles, such an optimal scheduling does not exist, and a typical choice is to update all messages simultaneously at each iteration.

Upon convergence (if convergence happened), the estimated marginal distribution of each node is proportional to the product of all messages from adjoining factors (missing the normalization constant):

 

Likewise, the estimated joint marginal distribution of the set of variables belonging to one factor is proportional to the product of the factor and the messages from the variables:

 

In the case where the factor graph is acyclic (i.e. is a tree or a forest), these estimated marginal actually converge to the true marginals in a finite number of iterations. This can be shown by mathematical induction.

Exact algorithm for trees edit

In the case when the factor graph is a tree, the belief propagation algorithm will compute the exact marginals. Furthermore, with proper scheduling of the message updates, it will terminate after two full passes through the tree. This optimal scheduling can be described as follows:

Before starting, the graph is oriented by designating one node as the root; any non-root node which is connected to only one other node is called a leaf.

In the first step, messages are passed inwards: starting at the leaves, each node passes a message along the (unique) edge towards the root node. The tree structure guarantees that it is possible to obtain messages from all other adjoining nodes before passing the message on. This continues until the root has obtained messages from all of its adjoining nodes.

The second step involves passing the messages back out: starting at the root, messages are passed in the reverse direction. The algorithm is completed when all leaves have received their messages.

Approximate algorithm for general graphs edit

Although it was originally designed for acyclic graphical models, the Belief Propagation algorithm can be used in general graphs. The algorithm is then sometimes called loopy belief propagation, because graphs typically contain cycles, or loops. The initialization and scheduling of message updates must be adjusted slightly (compared with the previously described schedule for acyclic graphs) because graphs might not contain any leaves. Instead, one initializes all variable messages to 1 and uses the same message definitions above, updating all messages at every iteration (although messages coming from known leaves or tree-structured subgraphs may no longer need updating after sufficient iterations). It is easy to show that in a tree, the message definitions of this modified procedure will converge to the set of message definitions given above within a number of iterations equal to the diameter of the tree.

The precise conditions under which loopy belief propagation will converge are still not well understood; it is known that on graphs containing a single loop it converges in most cases, but the probabilities obtained might be incorrect.[7] Several sufficient (but not necessary) conditions for convergence of loopy belief propagation to a unique fixed point exist.[8] There exist graphs which will fail to converge, or which will oscillate between multiple states over repeated iterations. Techniques like EXIT charts can provide an approximate visualization of the progress of belief propagation and an approximate test for convergence.

There are other approximate methods for marginalization including variational methods and Monte Carlo methods.

One method of exact marginalization in general graphs is called the junction tree algorithm, which is simply belief propagation on a modified graph guaranteed to be a tree. The basic premise is to eliminate cycles by clustering them into single nodes.

Related algorithm and complexity issues edit

A similar algorithm is commonly referred to as the Viterbi algorithm, but also known as a special case of the max-product or min-sum algorithm, which solves the related problem of maximization, or most probable explanation. Instead of attempting to solve the marginal, the goal here is to find the values   that maximizes the global function (i.e. most probable values in a probabilistic setting), and it can be defined using the arg max:

 

An algorithm that solves this problem is nearly identical to belief propagation, with the sums replaced by maxima in the definitions.[9]

It is worth noting that inference problems like marginalization and maximization are NP-hard to solve exactly and approximately (at least for relative error) in a graphical model. More precisely, the marginalization problem defined above is #P-complete and maximization is NP-complete.

The memory usage of belief propagation can be reduced through the use of the Island algorithm (at a small cost in time complexity).

Relation to free energy edit

The sum-product algorithm is related to the calculation of free energy in thermodynamics. Let Z be the partition function. A probability distribution

 

(as per the factor graph representation) can be viewed as a measure of the internal energy present in a system, computed as

 

The free energy of the system is then

 

It can then be shown that the points of convergence of the sum-product algorithm represent the points where the free energy in such a system is minimized. Similarly, it can be shown that a fixed point of the iterative belief propagation algorithm in graphs with cycles is a stationary point of a free energy approximation.[10]

Generalized belief propagation (GBP) edit

Belief propagation algorithms are normally presented as message update equations on a factor graph, involving messages between variable nodes and their neighboring factor nodes and vice versa. Considering messages between regions in a graph is one way of generalizing the belief propagation algorithm.[10] There are several ways of defining the set of regions in a graph that can exchange messages. One method uses ideas introduced by Kikuchi in the physics literature,[11][12][13] and is known as Kikuchi's cluster variation method.[14]

Improvements in the performance of belief propagation algorithms are also achievable by breaking the replicas symmetry in the distributions of the fields (messages). This generalization leads to a new kind of algorithm called survey propagation (SP), which have proved to be very efficient in NP-complete problems like satisfiability[1] and graph coloring.

The cluster variational method and the survey propagation algorithms are two different improvements to belief propagation. The name generalized survey propagation (GSP) is waiting to be assigned to the algorithm that merges both generalizations.

Gaussian belief propagation (GaBP) edit

Gaussian belief propagation is a variant of the belief propagation algorithm when the underlying distributions are Gaussian. The first work analyzing this special model was the seminal work of Weiss and Freeman.[15]

The GaBP algorithm solves the following marginalization problem:

 

where Z is a normalization constant, A is a symmetric positive definite matrix (inverse covariance matrix a.k.a. precision matrix) and b is the shift vector.

Equivalently, it can be shown that using the Gaussian model, the solution of the marginalization problem is equivalent to the MAP assignment problem:

 

This problem is also equivalent to the following minimization problem of the quadratic form:

 

Which is also equivalent to the linear system of equations

 

Convergence of the GaBP algorithm is easier to analyze (relatively to the general BP case) and there are two known sufficient convergence conditions. The first one was formulated by Weiss et al. in the year 2000, when the information matrix A is diagonally dominant. The second convergence condition was formulated by Johnson et al.[16] in 2006, when the spectral radius of the matrix

 

where D = diag(A). Later, Su and Wu established the necessary and sufficient convergence conditions for synchronous GaBP and damped GaBP, as well as another sufficient convergence condition for asynchronous GaBP. For each case, the convergence condition involves verifying 1) a set (determined by A) being non-empty, 2) the spectral radius of a certain matrix being smaller than one, and 3) the singularity issue (when converting BP message into belief) does not occur.[17]

The GaBP algorithm was linked to the linear algebra domain,[18] and it was shown that the GaBP algorithm can be viewed as an iterative algorithm for solving the linear system of equations Ax = b where A is the information matrix and b is the shift vector. Empirically, the GaBP algorithm is shown to converge faster than classical iterative methods like the Jacobi method, the Gauss–Seidel method, successive over-relaxation, and others.[19] Additionally, the GaBP algorithm is shown to be immune to numerical problems of the preconditioned conjugate gradient method[20]

Syndrome-based BP decoding edit

The previous description of BP algorithm is called the codeword-based decoding, which calculates the approximate marginal probability  , given received codeword  . There is an equivalent form,[21] which calculate  , where   is the syndrome of the received codeword   and   is the decoded error. The decoded input vector is  . This variation only changes the interpretation of the mass function  . Explicitly, the messages are

 
where   is the prior error probability on variable   

This syndrome-based decoder doesn't require information on the received bits, thus can be adapted to quantum codes, where the only information is the measurement syndrome.

In the binary case,  , those messages can be simplified to cause an exponential reduction of   in the complexity[22][23]

Define log-likelihood ratio  ,  , then

 
 
where  

The posterior log-likelihood ratio can be estimated as  

References edit

  1. ^ a b Braunstein, A.; Mézard, M.; Zecchina, R. (2005). "Survey propagation: An algorithm for satisfiability". Random Structures & Algorithms. 27 (2): 201–226. arXiv:cs/0212002. doi:10.1002/rsa.20057. S2CID 6601396.
  2. ^ Pearl, Judea (1982). "Reverend Bayes on inference engines: A distributed hierarchical approach" (PDF). Proceedings of the Second National Conference on Artificial Intelligence. AAAI-82: Pittsburgh, PA. Menlo Park, California: AAAI Press. pp. 133–136. Retrieved 28 March 2009.
  3. ^ Kim, Jin H.; Pearl, Judea (1983). "A computational model for combined causal and diagnostic reasoning in inference systems" (PDF). Proceedings of the Eighth International Joint Conference on Artificial Intelligence. IJCAI-83: Karlsruhe, Germany. Vol. 1. pp. 190–193. Retrieved 20 March 2016.
  4. ^ Pearl, Judea (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (2nd ed.). San Francisco, CA: Morgan Kaufmann. ISBN 978-1-55860-479-7.
  5. ^ Yedidia, J.S.; Freeman, W.T.; Y. (January 2003). "Understanding Belief Propagation and Its Generalizations". In Lakemeyer, Gerhard; Nebel, Bernhard (eds.). Exploring Artificial Intelligence in the New Millennium. Morgan Kaufmann. pp. 239–236. ISBN 978-1-55860-811-5. Retrieved 30 March 2009.
  6. ^ Wainwright, M. J.; Jordan, M. I. (2007). "2.1 Probability Distributions on Graphs". Graphical Models, Exponential Families, and Variational Inference. Vol. 1. pp. 5–9. doi:10.1561/2200000001. {{cite book}}: |journal= ignored (help)
  7. ^ Weiss, Yair (2000). "Correctness of Local Probability Propagation in Graphical Models with Loops". Neural Computation. 12 (1): 1–41. doi:10.1162/089976600300015880. PMID 10636932. S2CID 15402308.
  8. ^ Mooij, J; Kappen, H (2007). "Sufficient Conditions for Convergence of the Sum–Product Algorithm". IEEE Transactions on Information Theory. 53 (12): 4422–4437. arXiv:cs/0504030. doi:10.1109/TIT.2007.909166. S2CID 57228.
  9. ^ Löliger, Hans-Andrea (2004). "An Introduction to Factor Graphs". IEEE Signal Processing Magazine. 21 (1): 28–41. Bibcode:2004ISPM...21...28L. doi:10.1109/msp.2004.1267047. S2CID 7722934.
  10. ^ a b Yedidia, J.S.; Freeman, W.T.; Weiss, Y.; Y. (July 2005). "Constructing free-energy approximations and generalized belief propagation algorithms". IEEE Transactions on Information Theory. 51 (7): 2282–2312. CiteSeerX 10.1.1.3.5650. doi:10.1109/TIT.2005.850085. S2CID 52835993. Retrieved 28 March 2009.
  11. ^ Kikuchi, Ryoichi (15 March 1951). "A Theory of Cooperative Phenomena". Physical Review. 81 (6): 988–1003. Bibcode:1951PhRv...81..988K. doi:10.1103/PhysRev.81.988.
  12. ^ Kurata, Michio; Kikuchi, Ryoichi; Watari, Tatsuro (1953). "A Theory of Cooperative Phenomena. III. Detailed Discussions of the Cluster Variation Method". The Journal of Chemical Physics. 21 (3): 434–448. Bibcode:1953JChPh..21..434K. doi:10.1063/1.1698926.
  13. ^ Kikuchi, Ryoichi; Brush, Stephen G. (1967). "Improvement of the Cluster‐Variation Method". The Journal of Chemical Physics. 47 (1): 195–203. Bibcode:1967JChPh..47..195K. doi:10.1063/1.1711845.
  14. ^ Pelizzola, Alessandro (2005). "Cluster variation method in statistical physics and probabilistic graphical models". Journal of Physics A: Mathematical and General. 38 (33): R309–R339. arXiv:cond-mat/0508216. Bibcode:2005JPhA...38R.309P. doi:10.1088/0305-4470/38/33/R01. ISSN 0305-4470. S2CID 942.[permanent dead link]
  15. ^ Weiss, Yair; Freeman, William T. (October 2001). "Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology". Neural Computation. 13 (10): 2173–2200. CiteSeerX 10.1.1.44.794. doi:10.1162/089976601750541769. PMID 11570995. S2CID 10624764.
  16. ^ Malioutov, Dmitry M.; Johnson, Jason K.; Willsky, Alan S. (October 2006). "Walk-sums and belief propagation in Gaussian graphical models". Journal of Machine Learning Research. 7: 2031–2064. Retrieved 28 March 2009.
  17. ^ Su, Qinliang; Wu, Yik-Chung (March 2015). "On convergence conditions of Gaussian belief propagation". IEEE Trans. Signal Process. 63 (5): 1144–1155. Bibcode:2015ITSP...63.1144S. doi:10.1109/TSP.2015.2389755. S2CID 12055229.
  18. ^ Gaussian belief propagation solver for systems of linear equations. By O. Shental, D. Bickson, P. H. Siegel, J. K. Wolf, and D. Dolev, IEEE Int. Symp. on Inform. Theory (ISIT), Toronto, Canada, July 2008. http://www.cs.huji.ac.il/labs/danss/p2p/gabp/ 14 June 2011 at the Wayback Machine
  19. ^ Linear Detection via Belief Propagation. Danny Bickson, Danny Dolev, Ori Shental, Paul H. Siegel and Jack K. Wolf. In the 45th Annual Allerton Conference on Communication, Control, and Computing, Allerton House, Illinois, 7 Sept.. http://www.cs.huji.ac.il/labs/danss/p2p/gabp/ 14 June 2011 at the Wayback Machine
  20. ^ Distributed large scale network utility maximization. D. Bickson, Y. Tock, A. Zymnis, S. Boyd and D. Dolev. In the International symposium on information theory (ISIT), July 2009. http://www.cs.huji.ac.il/labs/danss/p2p/gabp/ 14 June 2011 at the Wayback Machine
  21. ^ Dave, Maulik A. (1 December 2006). "Review of "Information Theory, Inference, and Learning Algorithms by David J. C. MacKay", Cambridge University Press, 2003". ACM SIGACT News. 37 (4): 34–36. doi:10.1145/1189056.1189063. ISSN 0163-5700. S2CID 10570465.
  22. ^ Filler, Tomas (17 November 2009). "Simplification of the Belief propagation algorithm" (PDF).
  23. ^ Liu, Ye-Hua; Poulin, David (22 May 2019). "Neural Belief-Propagation Decoders for Quantum Error-Correcting Codes". Physical Review Letters. 122 (20): 200501. arXiv:1811.07835. doi:10.1103/physrevlett.122.200501. ISSN 0031-9007. PMID 31172756. S2CID 53959182.

Further reading edit

  • Bickson, Danny. (2009). Gaussian Belief Propagation Resource Page —Webpage containing recent publications as well as Matlab source code.
  • Bishop, Christopher M. (2006). "Chapter 8: Graphical models" (PDF). Pattern Recognition and Machine Learning. Springer. pp. 359–418. ISBN 978-0-387-31073-2. Retrieved 2 December 2023.
  • Coughlan, James. (2009). .
  • Löliger, Hans-Andrea (2004). "An Introduction to Factor Graphs". IEEE Signal Processing Magazine. 21 (1): 28–41. Bibcode:2004ISPM...21...28L. doi:10.1109/MSP.2004.1267047. S2CID 7722934.
  • Mackenzie, Dana (2005). "Communication Speed Nears Terminal Velocity", New Scientist. 9 July 2005. Issue 2507 (Registration required)
  • Wymeersch, Henk (2007). Iterative Receiver Design. Cambridge University Press. ISBN 978-0-521-87315-4.
  • Yedidia, J.S.; Freeman, W.T.; Weiss, Y. (January 2003). "Understanding Belief Propagation and Its Generalizations". In Lakemeyer, Gerhard; Nebel, Bernhard (eds.). Exploring Artificial Intelligence in the New Millennium. Morgan Kaufmann. pp. 239–269. ISBN 978-1-55860-811-5. Retrieved 30 March 2009.
  • Yedidia, J.S.; Freeman, W.T.; Weiss, Y. (July 2005). "Constructing free-energy approximations and generalized belief propagation algorithms". IEEE Transactions on Information Theory. 51 (7): 2282–2312. CiteSeerX 10.1.1.3.5650. doi:10.1109/TIT.2005.850085. S2CID 52835993.

belief, propagation, this, article, includes, list, general, references, lacks, sufficient, corresponding, inline, citations, please, help, improve, this, article, introducing, more, precise, citations, april, 2009, learn, when, remove, this, template, message. This article includes a list of general references but it lacks sufficient corresponding inline citations Please help to improve this article by introducing more precise citations April 2009 Learn how and when to remove this template message Belief propagation also known as sum product message passing is a message passing algorithm for performing inference on graphical models such as Bayesian networks and Markov random fields It calculates the marginal distribution for each unobserved node or variable conditional on any observed nodes or variables Belief propagation is commonly used in artificial intelligence and information theory and has demonstrated empirical success in numerous applications including low density parity check codes turbo codes free energy approximation and satisfiability 1 The algorithm was first proposed by Judea Pearl in 1982 2 who formulated it as an exact inference algorithm on trees later extended to polytrees 3 While the algorithm is not exact on general graphs it has been shown to be a useful approximate algorithm 4 Contents 1 Motivation 2 Description of the sum product algorithm 3 Exact algorithm for trees 4 Approximate algorithm for general graphs 5 Related algorithm and complexity issues 6 Relation to free energy 7 Generalized belief propagation GBP 8 Gaussian belief propagation GaBP 9 Syndrome based BP decoding 10 References 11 Further readingMotivation editGiven a finite set of discrete random variables X1 Xn displaystyle X 1 ldots X n nbsp with joint probability mass function p displaystyle p nbsp a common task is to compute the marginal distributions of the Xi displaystyle X i nbsp The marginal of a single Xi displaystyle X i nbsp is defined to be pXi xi x xi xip x displaystyle p X i x i sum mathbf x x i x i p mathbf x nbsp where x x1 xn displaystyle mathbf x x 1 ldots x n nbsp is a vector of possible values for the Xi displaystyle X i nbsp and the notation x xi xi displaystyle mathbf x x i x i nbsp means that the sum is taken over those x displaystyle mathbf x nbsp whose i displaystyle i nbsp th coordinate is equal to xi displaystyle x i nbsp Computing marginal distributions using this formula quickly becomes computationally prohibitive as the number of variables grows For example given 100 binary variables X1 X100 displaystyle X 1 ldots X 100 nbsp computing a single marginal Xi displaystyle X i nbsp using p displaystyle p nbsp and the above formula would involve summing over 299 6 34 1029 displaystyle 2 99 approx 6 34 times 10 29 nbsp possible values for x displaystyle mathbf x nbsp If it is known that the probability mass function p displaystyle p nbsp factors in a convenient way belief propagation allows the marginals to be computed much more efficiently Description of the sum product algorithm editVariants of the belief propagation algorithm exist for several types of graphical models Bayesian networks and Markov random fields 5 in particular We describe here the variant that operates on a factor graph A factor graph is a bipartite graph containing nodes corresponding to variables V displaystyle V nbsp and factors F displaystyle F nbsp with edges between variables and the factors in which they appear We can write the joint mass function p x a Ffa xa displaystyle p mathbf x prod a in F f a mathbf x a nbsp where xa displaystyle mathbf x a nbsp is the vector of neighboring variable nodes to the factor node a displaystyle a nbsp Any Bayesian network or Markov random field can be represented as a factor graph by using a factor for each node with its parents or a factor for each node with its neighborhood respectively 6 The algorithm works by passing real valued functions called messages along the edges between the hidden nodes More precisely if v displaystyle v nbsp is a variable node and a displaystyle a nbsp is a factor node connected to v displaystyle v nbsp in the factor graph then the messages mv a displaystyle mu v to a nbsp from v displaystyle v nbsp to a displaystyle a nbsp and the messages ma v displaystyle mu a to v nbsp from a displaystyle a nbsp to v displaystyle v nbsp are real valued functions mv a ma v Dom v R displaystyle mu v to a mu a to v operatorname Dom v to mathbb R nbsp whose domain is the set of values that can be taken by the random variable associated with v displaystyle v nbsp denoted Dom v displaystyle operatorname Dom v nbsp These messages contain the influence that one variable exerts on another The messages are computed differently depending on whether the node receiving the message is a variable node or a factor node Keeping the same notation A message mv a Dom v R displaystyle mu v to a operatorname Dom v to mathbb R nbsp from a variable node v displaystyle v nbsp to a factor node a displaystyle a nbsp is defined by mv a xv a N v a ma v xv displaystyle mu v to a x v prod a in N v setminus a mu a to v x v nbsp for xv Dom v displaystyle x v in operatorname Dom v nbsp where N v displaystyle N v nbsp is the set of neighboring factor nodes of v displaystyle v nbsp If N v a displaystyle N v setminus a nbsp is empty then mv a xv displaystyle mu v to a x v nbsp is set to the uniform distribution over Dom v displaystyle operatorname Dom v nbsp A message ma v Dom v R displaystyle mu a to v operatorname Dom v to mathbb R nbsp from a factor node a displaystyle a nbsp to a variable node v displaystyle v nbsp is defined to be the product of the factor with messages from all other nodes marginalized over all variables except the one associated with v displaystyle v nbsp ma v xv xa xv xv fa xa v N a v mv a xv displaystyle mu a to v x v sum mathbf x a x v x v left f a mathbf x a prod v in N a setminus v mu v to a x v right nbsp for xv Dom v displaystyle x v in operatorname Dom v nbsp where N a displaystyle N a nbsp is the set of neighboring variable nodes to a displaystyle a nbsp If N a v displaystyle N a setminus v nbsp is empty then ma v xv fa xv displaystyle mu a to v x v f a x v nbsp since in this case xv xa displaystyle x v x a nbsp As shown by the previous formula the complete marginalization is reduced to a sum of products of simpler terms than the ones appearing in the full joint distribution This is the reason that belief propagation is sometimes called sum product message passing or the sum product algorithm In a typical run each message will be updated iteratively from the previous value of the neighboring messages Different scheduling can be used for updating the messages In the case where the graphical model is a tree an optimal scheduling converges after computing each message exactly once see next sub section When the factor graph has cycles such an optimal scheduling does not exist and a typical choice is to update all messages simultaneously at each iteration Upon convergence if convergence happened the estimated marginal distribution of each node is proportional to the product of all messages from adjoining factors missing the normalization constant pXv xv a N v ma v xv displaystyle p X v x v propto prod a in N v mu a to v x v nbsp Likewise the estimated joint marginal distribution of the set of variables belonging to one factor is proportional to the product of the factor and the messages from the variables pXa xa fa xa v N a mv a xv displaystyle p X a mathbf x a propto f a mathbf x a prod v in N a mu v to a x v nbsp In the case where the factor graph is acyclic i e is a tree or a forest these estimated marginal actually converge to the true marginals in a finite number of iterations This can be shown by mathematical induction Exact algorithm for trees editIn the case when the factor graph is a tree the belief propagation algorithm will compute the exact marginals Furthermore with proper scheduling of the message updates it will terminate after two full passes through the tree This optimal scheduling can be described as follows Before starting the graph is oriented by designating one node as the root any non root node which is connected to only one other node is called a leaf In the first step messages are passed inwards starting at the leaves each node passes a message along the unique edge towards the root node The tree structure guarantees that it is possible to obtain messages from all other adjoining nodes before passing the message on This continues until the root has obtained messages from all of its adjoining nodes The second step involves passing the messages back out starting at the root messages are passed in the reverse direction The algorithm is completed when all leaves have received their messages Approximate algorithm for general graphs editAlthough it was originally designed for acyclic graphical models the Belief Propagation algorithm can be used in general graphs The algorithm is then sometimes called loopy belief propagation because graphs typically contain cycles or loops The initialization and scheduling of message updates must be adjusted slightly compared with the previously described schedule for acyclic graphs because graphs might not contain any leaves Instead one initializes all variable messages to 1 and uses the same message definitions above updating all messages at every iteration although messages coming from known leaves or tree structured subgraphs may no longer need updating after sufficient iterations It is easy to show that in a tree the message definitions of this modified procedure will converge to the set of message definitions given above within a number of iterations equal to the diameter of the tree The precise conditions under which loopy belief propagation will converge are still not well understood it is known that on graphs containing a single loop it converges in most cases but the probabilities obtained might be incorrect 7 Several sufficient but not necessary conditions for convergence of loopy belief propagation to a unique fixed point exist 8 There exist graphs which will fail to converge or which will oscillate between multiple states over repeated iterations Techniques like EXIT charts can provide an approximate visualization of the progress of belief propagation and an approximate test for convergence There are other approximate methods for marginalization including variational methods and Monte Carlo methods One method of exact marginalization in general graphs is called the junction tree algorithm which is simply belief propagation on a modified graph guaranteed to be a tree The basic premise is to eliminate cycles by clustering them into single nodes Related algorithm and complexity issues editA similar algorithm is commonly referred to as the Viterbi algorithm but also known as a special case of the max product or min sum algorithm which solves the related problem of maximization or most probable explanation Instead of attempting to solve the marginal the goal here is to find the values x displaystyle mathbf x nbsp that maximizes the global function i e most probable values in a probabilistic setting and it can be defined using the arg max arg maxxg x displaystyle operatorname arg max mathbf x g mathbf x nbsp An algorithm that solves this problem is nearly identical to belief propagation with the sums replaced by maxima in the definitions 9 It is worth noting that inference problems like marginalization and maximization are NP hard to solve exactly and approximately at least for relative error in a graphical model More precisely the marginalization problem defined above is P complete and maximization is NP complete The memory usage of belief propagation can be reduced through the use of the Island algorithm at a small cost in time complexity Relation to free energy editThe sum product algorithm is related to the calculation of free energy in thermodynamics Let Z be the partition function A probability distribution P X 1Z fjfj xj displaystyle P mathbf X frac 1 Z prod f j f j x j nbsp as per the factor graph representation can be viewed as a measure of the internal energy present in a system computed as E X log fjfj xj displaystyle E mathbf X log prod f j f j x j nbsp The free energy of the system is then F U H XP X E X XP X log P X displaystyle F U H sum mathbf X P mathbf X E mathbf X sum mathbf X P mathbf X log P mathbf X nbsp It can then be shown that the points of convergence of the sum product algorithm represent the points where the free energy in such a system is minimized Similarly it can be shown that a fixed point of the iterative belief propagation algorithm in graphs with cycles is a stationary point of a free energy approximation 10 Generalized belief propagation GBP editBelief propagation algorithms are normally presented as message update equations on a factor graph involving messages between variable nodes and their neighboring factor nodes and vice versa Considering messages between regions in a graph is one way of generalizing the belief propagation algorithm 10 There are several ways of defining the set of regions in a graph that can exchange messages One method uses ideas introduced by Kikuchi in the physics literature 11 12 13 and is known as Kikuchi s cluster variation method 14 Improvements in the performance of belief propagation algorithms are also achievable by breaking the replicas symmetry in the distributions of the fields messages This generalization leads to a new kind of algorithm called survey propagation SP which have proved to be very efficient in NP complete problems like satisfiability 1 and graph coloring The cluster variational method and the survey propagation algorithms are two different improvements to belief propagation The name generalized survey propagation GSP is waiting to be assigned to the algorithm that merges both generalizations Gaussian belief propagation GaBP editGaussian belief propagation is a variant of the belief propagation algorithm when the underlying distributions are Gaussian The first work analyzing this special model was the seminal work of Weiss and Freeman 15 The GaBP algorithm solves the following marginalization problem P xi 1Z j iexp 12xTAx bTx dxj displaystyle P x i frac 1 Z int j neq i exp tfrac 1 2 x T Ax b T x dx j nbsp where Z is a normalization constant A is a symmetric positive definite matrix inverse covariance matrix a k a precision matrix and b is the shift vector Equivalently it can be shown that using the Gaussian model the solution of the marginalization problem is equivalent to the MAP assignment problem argmaxx P x 1Zexp 12xTAx bTx displaystyle underset x operatorname argmax P x frac 1 Z exp tfrac 1 2 x T Ax b T x nbsp This problem is also equivalent to the following minimization problem of the quadratic form minx 1 2xTAx bTx displaystyle underset x operatorname min 1 2x T Ax b T x nbsp Which is also equivalent to the linear system of equations Ax b displaystyle Ax b nbsp Convergence of the GaBP algorithm is easier to analyze relatively to the general BP case and there are two known sufficient convergence conditions The first one was formulated by Weiss et al in the year 2000 when the information matrix A is diagonally dominant The second convergence condition was formulated by Johnson et al 16 in 2006 when the spectral radius of the matrix r I D 1 2AD 1 2 lt 1 displaystyle rho I D 1 2 AD 1 2 lt 1 nbsp where D diag A Later Su and Wu established the necessary and sufficient convergence conditions for synchronous GaBP and damped GaBP as well as another sufficient convergence condition for asynchronous GaBP For each case the convergence condition involves verifying 1 a set determined by A being non empty 2 the spectral radius of a certain matrix being smaller than one and 3 the singularity issue when converting BP message into belief does not occur 17 The GaBP algorithm was linked to the linear algebra domain 18 and it was shown that the GaBP algorithm can be viewed as an iterative algorithm for solving the linear system of equations Ax b where A is the information matrix and b is the shift vector Empirically the GaBP algorithm is shown to converge faster than classical iterative methods like the Jacobi method the Gauss Seidel method successive over relaxation and others 19 Additionally the GaBP algorithm is shown to be immune to numerical problems of the preconditioned conjugate gradient method 20 Syndrome based BP decoding editThe previous description of BP algorithm is called the codeword based decoding which calculates the approximate marginal probability P x X displaystyle P x X nbsp given received codeword X displaystyle X nbsp There is an equivalent form 21 which calculate P e s displaystyle P e s nbsp where s displaystyle s nbsp is the syndrome of the received codeword X displaystyle X nbsp and e displaystyle e nbsp is the decoded error The decoded input vector is x X e displaystyle x X e nbsp This variation only changes the interpretation of the mass function fa Xa displaystyle f a X a nbsp Explicitly the messages are xv Dom v mv a xv P Xv a N v a ma v xv displaystyle forall x v in Dom v mu v to a x v P X v prod a in N v setminus a mu a to v x v nbsp where P Xv displaystyle P X v nbsp is the prior error probability on variable v displaystyle v nbsp xv Dom v ma v xv xa xv xvd syndrome xv s v N a v mv a xv displaystyle forall x v in Dom v mu a to v x v sum mathbf x a x v x v delta text syndrome mathbf x v mathbf s prod v in N a setminus v mu v to a x v nbsp This syndrome based decoder doesn t require information on the received bits thus can be adapted to quantum codes where the only information is the measurement syndrome In the binary case xi 0 1 displaystyle x i in 0 1 nbsp those messages can be simplified to cause an exponential reduction of 2 v N v displaystyle 2 v N v nbsp in the complexity 22 23 Define log likelihood ratio lv log uv a xv 0 uv a xv 1 displaystyle l v log frac u v to a x v 0 u v to a x v 1 nbsp La log ua v xv 0 ua v xv 1 displaystyle L a log frac u a to v x v 0 u a to v x v 1 nbsp then v a lv lv 0 a N v a La displaystyle v to a l v l v 0 sum a in N v setminus a L a nbsp a v La 1 sa2tanh 1 v N a v tanh lv 2 displaystyle a to v L a 1 s a 2 tanh 1 prod v in N a setminus v tanh l v 2 nbsp where lv 0 log P xv 0 P xv 1 const displaystyle l v 0 log P x v 0 P x v 1 text const nbsp The posterior log likelihood ratio can be estimated as lv lv 0 a N v La displaystyle l v l v 0 sum a in N v L a nbsp References edit a b Braunstein A Mezard M Zecchina R 2005 Survey propagation An algorithm for satisfiability Random Structures amp Algorithms 27 2 201 226 arXiv cs 0212002 doi 10 1002 rsa 20057 S2CID 6601396 Pearl Judea 1982 Reverend Bayes on inference engines A distributed hierarchical approach PDF Proceedings of the Second National Conference on Artificial Intelligence AAAI 82 Pittsburgh PA Menlo Park California AAAI Press pp 133 136 Retrieved 28 March 2009 Kim Jin H Pearl Judea 1983 A computational model for combined causal and diagnostic reasoning in inference systems PDF Proceedings of the Eighth International Joint Conference on Artificial Intelligence IJCAI 83 Karlsruhe Germany Vol 1 pp 190 193 Retrieved 20 March 2016 Pearl Judea 1988 Probabilistic Reasoning in Intelligent Systems Networks of Plausible Inference 2nd ed San Francisco CA Morgan Kaufmann ISBN 978 1 55860 479 7 Yedidia J S Freeman W T Y January 2003 Understanding Belief Propagation and Its Generalizations In Lakemeyer Gerhard Nebel Bernhard eds Exploring Artificial Intelligence in the New Millennium Morgan Kaufmann pp 239 236 ISBN 978 1 55860 811 5 Retrieved 30 March 2009 Wainwright M J Jordan M I 2007 2 1 Probability Distributions on Graphs Graphical Models Exponential Families and Variational Inference Vol 1 pp 5 9 doi 10 1561 2200000001 a href Template Cite book html title Template Cite book cite book a journal ignored help Weiss Yair 2000 Correctness of Local Probability Propagation in Graphical Models with Loops Neural Computation 12 1 1 41 doi 10 1162 089976600300015880 PMID 10636932 S2CID 15402308 Mooij J Kappen H 2007 Sufficient Conditions for Convergence of the Sum Product Algorithm IEEE Transactions on Information Theory 53 12 4422 4437 arXiv cs 0504030 doi 10 1109 TIT 2007 909166 S2CID 57228 Loliger Hans Andrea 2004 An Introduction to Factor Graphs IEEE Signal Processing Magazine 21 1 28 41 Bibcode 2004ISPM 21 28L doi 10 1109 msp 2004 1267047 S2CID 7722934 a b Yedidia J S Freeman W T Weiss Y Y July 2005 Constructing free energy approximations and generalized belief propagation algorithms IEEE Transactions on Information Theory 51 7 2282 2312 CiteSeerX 10 1 1 3 5650 doi 10 1109 TIT 2005 850085 S2CID 52835993 Retrieved 28 March 2009 Kikuchi Ryoichi 15 March 1951 A Theory of Cooperative Phenomena Physical Review 81 6 988 1003 Bibcode 1951PhRv 81 988K doi 10 1103 PhysRev 81 988 Kurata Michio Kikuchi Ryoichi Watari Tatsuro 1953 A Theory of Cooperative Phenomena III Detailed Discussions of the Cluster Variation Method The Journal of Chemical Physics 21 3 434 448 Bibcode 1953JChPh 21 434K doi 10 1063 1 1698926 Kikuchi Ryoichi Brush Stephen G 1967 Improvement of the Cluster Variation Method The Journal of Chemical Physics 47 1 195 203 Bibcode 1967JChPh 47 195K doi 10 1063 1 1711845 Pelizzola Alessandro 2005 Cluster variation method in statistical physics and probabilistic graphical models Journal of Physics A Mathematical and General 38 33 R309 R339 arXiv cond mat 0508216 Bibcode 2005JPhA 38R 309P doi 10 1088 0305 4470 38 33 R01 ISSN 0305 4470 S2CID 942 permanent dead link Weiss Yair Freeman William T October 2001 Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology Neural Computation 13 10 2173 2200 CiteSeerX 10 1 1 44 794 doi 10 1162 089976601750541769 PMID 11570995 S2CID 10624764 Malioutov Dmitry M Johnson Jason K Willsky Alan S October 2006 Walk sums and belief propagation in Gaussian graphical models Journal of Machine Learning Research 7 2031 2064 Retrieved 28 March 2009 Su Qinliang Wu Yik Chung March 2015 On convergence conditions of Gaussian belief propagation IEEE Trans Signal Process 63 5 1144 1155 Bibcode 2015ITSP 63 1144S doi 10 1109 TSP 2015 2389755 S2CID 12055229 Gaussian belief propagation solver for systems of linear equations By O Shental D Bickson P H Siegel J K Wolf and D Dolev IEEE Int Symp on Inform Theory ISIT Toronto Canada July 2008 http www cs huji ac il labs danss p2p gabp Archived 14 June 2011 at the Wayback Machine Linear Detection via Belief Propagation Danny Bickson Danny Dolev Ori Shental Paul H Siegel and Jack K Wolf In the 45th Annual Allerton Conference on Communication Control and Computing Allerton House Illinois 7 Sept http www cs huji ac il labs danss p2p gabp Archived 14 June 2011 at the Wayback Machine Distributed large scale network utility maximization D Bickson Y Tock A Zymnis S Boyd and D Dolev In the International symposium on information theory ISIT July 2009 http www cs huji ac il labs danss p2p gabp Archived 14 June 2011 at the Wayback Machine Dave Maulik A 1 December 2006 Review of Information Theory Inference and Learning Algorithms by David J C MacKay Cambridge University Press 2003 ACM SIGACT News 37 4 34 36 doi 10 1145 1189056 1189063 ISSN 0163 5700 S2CID 10570465 Filler Tomas 17 November 2009 Simplification of the Belief propagation algorithm PDF Liu Ye Hua Poulin David 22 May 2019 Neural Belief Propagation Decoders for Quantum Error Correcting Codes Physical Review Letters 122 20 200501 arXiv 1811 07835 doi 10 1103 physrevlett 122 200501 ISSN 0031 9007 PMID 31172756 S2CID 53959182 Further reading editBickson Danny 2009 Gaussian Belief Propagation Resource Page Webpage containing recent publications as well as Matlab source code Bishop Christopher M 2006 Chapter 8 Graphical models PDF Pattern Recognition and Machine Learning Springer pp 359 418 ISBN 978 0 387 31073 2 Retrieved 2 December 2023 Coughlan James 2009 A Tutorial Introduction to Belief Propagation Loliger Hans Andrea 2004 An Introduction to Factor Graphs IEEE Signal Processing Magazine 21 1 28 41 Bibcode 2004ISPM 21 28L doi 10 1109 MSP 2004 1267047 S2CID 7722934 Mackenzie Dana 2005 Communication Speed Nears Terminal Velocity New Scientist 9 July 2005 Issue 2507 Registration required Wymeersch Henk 2007 Iterative Receiver Design Cambridge University Press ISBN 978 0 521 87315 4 Yedidia J S Freeman W T Weiss Y January 2003 Understanding Belief Propagation and Its Generalizations In Lakemeyer Gerhard Nebel Bernhard eds Exploring Artificial Intelligence in the New Millennium Morgan Kaufmann pp 239 269 ISBN 978 1 55860 811 5 Retrieved 30 March 2009 Yedidia J S Freeman W T Weiss Y July 2005 Constructing free energy approximations and generalized belief propagation algorithms IEEE Transactions on Information Theory 51 7 2282 2312 CiteSeerX 10 1 1 3 5650 doi 10 1109 TIT 2005 850085 S2CID 52835993 Retrieved from https en wikipedia org w index php title Belief propagation amp oldid 1194015922, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.