fbpx
Wikipedia

Dead-end elimination

The dead-end elimination algorithm (DEE) is a method for minimizing a function over a discrete set of independent variables. The basic idea is to identify "dead ends", i.e., combinations of variables that are not necessary to define a global minimum because there is always a way of replacing such combination by a better or equivalent one. Then we can refrain from searching such combinations further. Hence, dead-end elimination is a mirror image of dynamic programming, in which "good" combinations are identified and explored further.

Although the method itself is general, it has been developed and applied mainly to the problems of predicting and designing the structures of proteins. It closely related to the notion of dominance in optimization also known as substitutability in a Constraint Satisfaction Problem. The original description and proof of the dead-end elimination theorem can be found in [1].

Basic requirements edit

An effective DEE implementation requires four pieces of information:

  1. A well-defined finite set of discrete independent variables
  2. A precomputed numerical value (considered the "energy") associated with each element in the set of variables (and possibly with their pairs, triples, etc.)
  3. A criterion or criteria for determining when an element is a "dead end", that is, when it cannot possibly be a member of the solution set
  4. An objective function (considered the "energy function") to be minimized

Note that the criteria can easily be reversed to identify the maximum of a given function as well.

Applications to protein structure prediction edit

Dead-end elimination has been used effectively to predict the structure of side chains on a given protein backbone structure by minimizing an energy function  . The dihedral angle search space of the side chains is restricted to a discrete set of rotamers for each amino acid position in the protein (which is, obviously, of fixed length). The original DEE description included criteria for the elimination of single rotamers and of rotamer pairs, although this can be expanded.

In the following discussion, let   be the length of the protein and let   represent the rotamer of the   side chain. Since atoms in proteins are assumed to interact only by two-body potentials, the energy may be written

 

Where   represents the "self-energy" of a particular rotamer  , and   represents the "pair energy" of the rotamers  .

Also note that   (that is, the pair energy between a rotamer and itself) is taken to be zero, and thus does not affect the summations. This notation simplifies the description of the pairs criterion below.

Singles elimination criterion edit

If a particular rotamer   of sidechain   cannot possibly give a better energy than another rotamer   of the same sidechain, then rotamer A can be eliminated from further consideration, which reduces the search space. Mathematically, this condition is expressed by the inequality

 

where   is the minimum (best) energy possible between rotamer   of sidechain   and any rotamer X of side chain  . Similarly,   is the maximum (worst) energy possible between rotamer   of sidechain   and any rotamer X of side chain  .

Pairs elimination criterion edit

The pairs criterion is more difficult to describe and to implement, but it adds significant eliminating power. For brevity, we define the shorthand variable   that is the intrinsic energy of a pair of rotamers   and   at positions   and  , respectively

 

A given pair of rotamers   and   at positions   and  , respectively, cannot both be in the final solution (although one or the other may be) if there is another pair   and   that always gives a better energy. Expressed mathematically,

 

where  ,   and  .

Energy matrices edit

For large  , the matrices of precomputed energies can become costly to store. Let   be the number of amino acid positions, as above, and let   be the number of rotamers at each position (this is usually, but not necessarily, constant over all positions). Each self-energy matrix for a given position requires   entries, so the total number of self-energies to store is  . Each pair energy matrix between two positions   and  , for   discrete rotamers at each position, requires a   matrix. This makes the total number of entries in an unreduced pair matrix  . This can be trimmed somewhat, at the cost of additional complexity in implementation, because pair energies are symmetrical and the pair energy between a rotamer and itself is zero.

Implementation and efficiency edit

The above two criteria are normally applied iteratively until convergence, defined as the point at which no more rotamers or pairs can be eliminated. Since this is normally a reduction in the sample space by many orders of magnitude, simple enumeration will suffice to determine the minimum within this pared-down set.

Given this model, it is clear that the DEE algorithm is guaranteed to find the optimal solution; that is, it is a global optimization process. The single-rotamer search scales quadratically in time with total number of rotamers. The pair search scales cubically and is the slowest part of the algorithm (aside from energy calculations). This is a dramatic improvement over the brute-force enumeration which scales as  .

A large-scale benchmark of DEE compared with alternative methods of protein structure prediction and design finds that DEE reliably converges to the optimal solution for protein lengths for which it runs in a reasonable amount of time[2]. It significantly outperforms the alternatives under consideration, which involved techniques derived from mean field theory, genetic algorithms, and the Monte Carlo method. However, the other algorithms are appreciably faster than DEE and thus can be applied to larger and more complex problems; their relative accuracy can be extrapolated from a comparison to the DEE solution within the scope of problems accessible to DEE.

Protein design edit

The preceding discussion implicitly assumed that the rotamers   are all different orientations of the same amino acid side chain. That is, the sequence of the protein was assumed to be fixed. It is also possible to allow multiple side chains to "compete" over a position   by including both types of side chains in the set of rotamers for that position. This allows a novel sequence to be designed onto a given protein backbone. A short zinc finger protein fold has been redesigned this way[3]. However, this greatly increases the number of rotamers per position and still requires a fixed protein length.

Generalizations edit

More powerful and more general criteria have been introduced that improve both the efficiency and the eliminating power of the method for both prediction and design applications. One example is a refinement of the singles elimination criterion known as the Goldstein criterion[4], which arises from fairly straightforward algebraic manipulation before applying the minimization:

 

Thus rotamer   can be eliminated if any alternative rotamer from the set at   contributes less to the total energy than  . This is an improvement over the original criterion, which requires comparison of the best possible (that is, the smallest) energy contribution from   with the worst possible contribution from an alternative rotamer.

An extended discussion of elaborate DEE criteria and a benchmark of their relative performance can be found in [5].

References edit

  1. ^ Desmet J, de Maeyer M, Hazes B, Lasters I. (1992). The dead-end elimination theorem and its use in protein side-chain positioning. Nature, 356, 539-542. PMID 21488406.
  2. ^ Voigt CA, Gordon DB, Mayo SL. (2000). Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design. J Mol Biol 299(3):789-803.
  3. ^ Dahiyat BI, Mayo SL. (1997). De novo protein design: fully automated sequence selection. Science 278(5335):82-7.
  4. ^ Goldstein RF. (1994). Efficient rotamer elimination applied to protein side-chains and related spin glasses. Biophys J 66(5):1335-40.
  5. ^ Pierce NA, Spriet JA, Desmet J, Mayo SL. (2000). Conformational splitting: a more powerful criterion for dead-end elimination. J Comput Chem 21: 999-1009.

dead, elimination, dead, elimination, algorithm, method, minimizing, function, over, discrete, independent, variables, basic, idea, identify, dead, ends, combinations, variables, that, necessary, define, global, minimum, because, there, always, replacing, such. The dead end elimination algorithm DEE is a method for minimizing a function over a discrete set of independent variables The basic idea is to identify dead ends i e combinations of variables that are not necessary to define a global minimum because there is always a way of replacing such combination by a better or equivalent one Then we can refrain from searching such combinations further Hence dead end elimination is a mirror image of dynamic programming in which good combinations are identified and explored further Although the method itself is general it has been developed and applied mainly to the problems of predicting and designing the structures of proteins It closely related to the notion of dominance in optimization also known as substitutability in a Constraint Satisfaction Problem The original description and proof of the dead end elimination theorem can be found in 1 Contents 1 Basic requirements 2 Applications to protein structure prediction 2 1 Singles elimination criterion 2 2 Pairs elimination criterion 2 3 Energy matrices 3 Implementation and efficiency 4 Protein design 5 Generalizations 6 ReferencesBasic requirements editAn effective DEE implementation requires four pieces of information A well defined finite set of discrete independent variables A precomputed numerical value considered the energy associated with each element in the set of variables and possibly with their pairs triples etc A criterion or criteria for determining when an element is a dead end that is when it cannot possibly be a member of the solution set An objective function considered the energy function to be minimized Note that the criteria can easily be reversed to identify the maximum of a given function as well Applications to protein structure prediction editDead end elimination has been used effectively to predict the structure of side chains on a given protein backbone structure by minimizing an energy function E displaystyle E nbsp The dihedral angle search space of the side chains is restricted to a discrete set of rotamers for each amino acid position in the protein which is obviously of fixed length The original DEE description included criteria for the elimination of single rotamers and of rotamer pairs although this can be expanded In the following discussion let N displaystyle N nbsp be the length of the protein and let r k displaystyle r k nbsp represent the rotamer of the k t h displaystyle mathrm k th nbsp side chain Since atoms in proteins are assumed to interact only by two body potentials the energy may be written E T O T k E k r k k l E k l r k r l displaystyle E TOT sum k E k r k sum k neq l E kl r k r l nbsp Where E k r k displaystyle E k r k nbsp represents the self energy of a particular rotamer r k displaystyle r k nbsp and E k l r k r l displaystyle E kl r k r l nbsp represents the pair energy of the rotamers r k r j displaystyle r k r j nbsp Also note that E k k r k A r k A displaystyle E kk r k A r k A nbsp that is the pair energy between a rotamer and itself is taken to be zero and thus does not affect the summations This notation simplifies the description of the pairs criterion below Singles elimination criterion edit If a particular rotamer r k A displaystyle r k A nbsp of sidechain k displaystyle k nbsp cannot possibly give a better energy than another rotamer r k B displaystyle r k B nbsp of the same sidechain then rotamer A can be eliminated from further consideration which reduces the search space Mathematically this condition is expressed by the inequality E k r k A l 1 N min X E k l r k A r l X gt E k r k B l 1 N max X E k l r k B r l X displaystyle E k r k A sum l 1 N min X E kl r k A r l X gt E k r k B sum l 1 N max X E kl r k B r l X nbsp where min X E k l r k A r l X displaystyle min X E kl r k A r l X nbsp is the minimum best energy possible between rotamer r k A displaystyle r k A nbsp of sidechain k displaystyle k nbsp and any rotamer X of side chain l displaystyle l nbsp Similarly max X E k l r k B r l X displaystyle max X E kl r k B r l X nbsp is the maximum worst energy possible between rotamer r k B displaystyle r k B nbsp of sidechain k displaystyle k nbsp and any rotamer X of side chain l displaystyle l nbsp Pairs elimination criterion edit The pairs criterion is more difficult to describe and to implement but it adds significant eliminating power For brevity we define the shorthand variable U k l A B displaystyle U kl AB nbsp that is the intrinsic energy of a pair of rotamers A displaystyle A nbsp and B displaystyle B nbsp at positions k displaystyle k nbsp and l displaystyle l nbsp respectively U k l A B d e f E k r k A E l r l B E k l r k A r l B displaystyle U kl AB stackrel mathrm def E k r k A E l r l B E kl r k A r l B nbsp A given pair of rotamers A displaystyle A nbsp and B displaystyle B nbsp at positions k displaystyle k nbsp and l displaystyle l nbsp respectively cannot both be in the final solution although one or the other may be if there is another pair C displaystyle C nbsp and D displaystyle D nbsp that always gives a better energy Expressed mathematically U k l A B i 1 N min X E k i r k A r i X E l j r l B r j X gt U k l C D i 1 N max X E k i r k C r i X E l j r l D r j X displaystyle U kl AB sum i 1 N min X left E ki r k A r i X E lj r l B r j X right gt U kl CD sum i 1 N max X left E ki r k C r i X E lj r l D r j X right nbsp where A C displaystyle A neq C nbsp B D displaystyle B neq D nbsp and k l displaystyle k neq l nbsp Energy matrices edit For large N displaystyle N nbsp the matrices of precomputed energies can become costly to store Let N displaystyle N nbsp be the number of amino acid positions as above and let p displaystyle p nbsp be the number of rotamers at each position this is usually but not necessarily constant over all positions Each self energy matrix for a given position requires p displaystyle p nbsp entries so the total number of self energies to store is N p displaystyle Np nbsp Each pair energy matrix between two positions r k displaystyle r k nbsp and r l displaystyle r l nbsp for p displaystyle p nbsp discrete rotamers at each position requires a p p displaystyle p times p nbsp matrix This makes the total number of entries in an unreduced pair matrix N 2 p 2 displaystyle N 2 p 2 nbsp This can be trimmed somewhat at the cost of additional complexity in implementation because pair energies are symmetrical and the pair energy between a rotamer and itself is zero Implementation and efficiency editThe above two criteria are normally applied iteratively until convergence defined as the point at which no more rotamers or pairs can be eliminated Since this is normally a reduction in the sample space by many orders of magnitude simple enumeration will suffice to determine the minimum within this pared down set Given this model it is clear that the DEE algorithm is guaranteed to find the optimal solution that is it is a global optimization process The single rotamer search scales quadratically in time with total number of rotamers The pair search scales cubically and is the slowest part of the algorithm aside from energy calculations This is a dramatic improvement over the brute force enumeration which scales as O p N displaystyle O p N nbsp A large scale benchmark of DEE compared with alternative methods of protein structure prediction and design finds that DEE reliably converges to the optimal solution for protein lengths for which it runs in a reasonable amount of time 2 It significantly outperforms the alternatives under consideration which involved techniques derived from mean field theory genetic algorithms and the Monte Carlo method However the other algorithms are appreciably faster than DEE and thus can be applied to larger and more complex problems their relative accuracy can be extrapolated from a comparison to the DEE solution within the scope of problems accessible to DEE Protein design editMain article Protein design The preceding discussion implicitly assumed that the rotamers r k displaystyle r k nbsp are all different orientations of the same amino acid side chain That is the sequence of the protein was assumed to be fixed It is also possible to allow multiple side chains to compete over a position k displaystyle k nbsp by including both types of side chains in the set of rotamers for that position This allows a novel sequence to be designed onto a given protein backbone A short zinc finger protein fold has been redesigned this way 3 However this greatly increases the number of rotamers per position and still requires a fixed protein length Generalizations editMore powerful and more general criteria have been introduced that improve both the efficiency and the eliminating power of the method for both prediction and design applications One example is a refinement of the singles elimination criterion known as the Goldstein criterion 4 which arises from fairly straightforward algebraic manipulation before applying the minimization E k r k A E k r k B l 1 N min X E k l r k A r l X E k l r k B r l X gt 0 displaystyle E k r k A E k r k B sum l 1 N min X left E kl r k A r l X E kl r k B r l X right gt 0 nbsp Thus rotamer r k A displaystyle r k A nbsp can be eliminated if any alternative rotamer from the set at r k displaystyle r k nbsp contributes less to the total energy than r k A displaystyle r k A nbsp This is an improvement over the original criterion which requires comparison of the best possible that is the smallest energy contribution from r k A displaystyle r k A nbsp with the worst possible contribution from an alternative rotamer An extended discussion of elaborate DEE criteria and a benchmark of their relative performance can be found in 5 References edit Desmet J de Maeyer M Hazes B Lasters I 1992 The dead end elimination theorem and its use in protein side chain positioning Nature 356 539 542 PMID 21488406 Voigt CA Gordon DB Mayo SL 2000 Trading accuracy for speed A quantitative comparison of search algorithms in protein sequence design J Mol Biol 299 3 789 803 Dahiyat BI Mayo SL 1997 De novo protein design fully automated sequence selection Science 278 5335 82 7 Goldstein RF 1994 Efficient rotamer elimination applied to protein side chains and related spin glasses Biophys J 66 5 1335 40 Pierce NA Spriet JA Desmet J Mayo SL 2000 Conformational splitting a more powerful criterion for dead end elimination J Comput Chem 21 999 1009 Retrieved from https en wikipedia org w index php title Dead end elimination amp oldid 1108404907, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.