fbpx
Wikipedia

Protein design

Protein design is the rational design of new protein molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function.[1] Proteins can be designed from scratch (de novo design) or by making calculated variants of a known protein structure and its sequence (termed protein redesign). Rational protein design approaches make protein-sequence predictions that will fold to specific structures. These predicted sequences can then be validated experimentally through methods such as peptide synthesis, site-directed mutagenesis, or artificial gene synthesis.

Rational protein design dates back to the mid-1970s.[2] Recently, however, there were numerous examples of successful rational design of water-soluble and even transmembrane peptides and proteins, in part due to a better understanding of different factors contributing to protein structure stability and development of better computational methods.

Overview and history edit

The goal in rational protein design is to predict amino acid sequences that will fold to a specific protein structure. Although the number of possible protein sequences is vast, growing exponentially with the size of the protein chain, only a subset of them will fold reliably and quickly to one native state. Protein design involves identifying novel sequences within this subset. The native state of a protein is the conformational free energy minimum for the chain. Thus, protein design is the search for sequences that have the chosen structure as a free energy minimum. In a sense, it is the reverse of protein structure prediction. In design, a tertiary structure is specified, and a sequence that will fold to it is identified. Hence, it is also termed inverse folding. Protein design is then an optimization problem: using some scoring criteria, an optimized sequence that will fold to the desired structure is chosen.

When the first proteins were rationally designed during the 1970s and 1980s, the sequence for these was optimized manually based on analyses of other known proteins, the sequence composition, amino acid charges, and the geometry of the desired structure.[2] The first designed proteins are attributed to Bernd Gutte, who designed a reduced version of a known catalyst, bovine ribonuclease, and tertiary structures consisting of beta-sheets and alpha-helices, including a binder of DDT. Urry and colleagues later designed elastin-like fibrous peptides based on rules on sequence composition. Richardson and coworkers designed a 79-residue protein with no sequence homology to a known protein.[2] In the 1990s, the advent of powerful computers, libraries of amino acid conformations, and force fields developed mainly for molecular dynamics simulations enabled the development of structure-based computational protein design tools. Following the development of these computational tools, great success has been achieved over the last 30 years in protein design. The first protein successfully designed completely de novo was done by Stephen Mayo and coworkers in 1997,[3] and, shortly after, in 1999 Peter S. Kim and coworkers designed dimers, trimers, and tetramers of unnatural right-handed coiled coils.[4][5] In 2003, David Baker's laboratory designed a full protein to a fold never seen before in nature.[6] Later, in 2008, Baker's group computationally designed enzymes for two different reactions.[7] In 2010, one of the most powerful broadly neutralizing antibodies was isolated from patient serum using a computationally designed protein probe.[8] Due to these and other successes (e.g., see examples below), protein design has become one of the most important tools available for protein engineering. There is great hope that the design of new proteins, small and large, will have uses in biomedicine and bioengineering.

Underlying models of protein structure and function edit

Protein design programs use computer models of the molecular forces that drive proteins in in vivo environments. In order to make the problem tractable, these forces are simplified by protein design models. Although protein design programs vary greatly, they have to address four main modeling questions: What is the target structure of the design, what flexibility is allowed on the target structure, which sequences are included in the search, and which force field will be used to score sequences and structures.

Target structure edit

 
The Top7 protein was one of the first proteins designed for a fold that had never been seen before in nature[6]

Protein function is heavily dependent on protein structure, and rational protein design uses this relationship to design function by designing proteins that have a target structure or fold. Thus, by definition, in rational protein design the target structure or ensemble of structures must be known beforehand. This contrasts with other forms of protein engineering, such as directed evolution, where a variety of methods are used to find proteins that achieve a specific function, and with protein structure prediction where the sequence is known, but the structure is unknown.

Most often, the target structure is based on a known structure of another protein. However, novel folds not seen in nature have been made increasingly possible. Peter S. Kim and coworkers designed trimers and tetramers of unnatural coiled coils, which had not been seen before in nature.[4][5] The protein Top7, developed in David Baker's lab, was designed completely using protein design algorithms, to a completely novel fold.[6] More recently, Baker and coworkers developed a series of principles to design ideal globular-protein structures based on protein folding funnels that bridge between secondary structure prediction and tertiary structures. These principles, which build on both protein structure prediction and protein design, were used to design five different novel protein topologies.[9]

Sequence space edit

 
FSD-1 (shown in blue, PDB id: 1FSV) was the first de novo computational design of a full protein.[3] The target fold was that of the zinc finger in residues 33–60 of the structure of protein Zif268 (shown in red, PDB id: 1ZAA). The designed sequence had very little sequence identity with any known protein sequence.

In rational protein design, proteins can be redesigned from the sequence and structure of a known protein, or completely from scratch in de novo protein design. In protein redesign, most of the residues in the sequence are maintained as their wild-type amino-acid while a few are allowed to mutate. In de novo design, the entire sequence is designed anew, based on no prior sequence.

Both de novo designs and protein redesigns can establish rules on the sequence space: the specific amino acids that are allowed at each mutable residue position. For example, the composition of the surface of the RSC3 probe to select HIV-broadly neutralizing antibodies was restricted based on evolutionary data and charge balancing. Many of the earliest attempts on protein design were heavily based on empiric rules on the sequence space.[2] Moreover, the design of fibrous proteins usually follows strict rules on the sequence space. Collagen-based designed proteins, for example, are often composed of Gly-Pro-X repeating patterns.[2] The advent of computational techniques allows designing proteins with no human intervention in sequence selection.[3]

Structural flexibility edit

 
Common protein design programs use rotamer libraries to simplify the conformational space of protein side chains. This animation loops through all the rotamers of the isoleucine amino acid based on the Penultimate Rotamer Library (total of 7 rotamers).[10]

In protein design, the target structure (or structures) of the protein are known. However, a rational protein design approach must model some flexibility on the target structure in order to increase the number of sequences that can be designed for that structure and to minimize the chance of a sequence folding to a different structure. For example, in a protein redesign of one small amino acid (such as alanine) in the tightly packed core of a protein, very few mutants would be predicted by a rational design approach to fold to the target structure, if the surrounding side-chains are not allowed to be repacked.

Thus, an essential parameter of any design process is the amount of flexibility allowed for both the side-chains and the backbone. In the simplest models, the protein backbone is kept rigid while some of the protein side-chains are allowed to change conformations. However, side-chains can have many degrees of freedom in their bond lengths, bond angles, and χ dihedral angles. To simplify this space, protein design methods use rotamer libraries that assume ideal values for bond lengths and bond angles, while restricting χ dihedral angles to a few frequently observed low-energy conformations termed rotamers.

Rotamer libraries are derived from the statistical analysis of many protein structures. Backbone-independent rotamer libraries describe all rotamers.[10] Backbone-dependent rotamer libraries, in contrast, describe the rotamers as how likely they are to appear depending on the protein backbone arrangement around the side chain.[11] Most protein design programs use one conformation (e.g., the modal value for rotamer dihedrals in space) or several points in the region described by the rotamer; the OSPREY protein design program, in contrast, models the entire continuous region.[12]

Although rational protein design must preserve the general backbone fold a protein, allowing some backbone flexibility can significantly increase the number of sequences that fold to the structure while maintaining the general fold of the protein.[13] Backbone flexibility is especially important in protein redesign because sequence mutations often result in small changes to the backbone structure. Moreover, backbone flexibility can be essential for more advanced applications of protein design, such as binding prediction and enzyme design. Some models of protein design backbone flexibility include small and continuous global backbone movements, discrete backbone samples around the target fold, backrub motions, and protein loop flexibility.[13][14]

Energy function edit

 
Comparison of various potential energy functions. The most accurate energy are those that use quantum mechanical calculations, but these are too slow for protein design. On the other extreme, heuristic energy functions are based on statistical terms and are very fast. In the middle are molecular mechanics energy functions that are physically-based but are not as computationally expensive as quantum mechanical simulations.[15]

Rational protein design techniques must be able to discriminate sequences that will be stable under the target fold from those that would prefer other low-energy competing states. Thus, protein design requires accurate energy functions that can rank and score sequences by how well they fold to the target structure. At the same time, however, these energy functions must consider the computational challenges behind protein design. One of the most challenging requirements for successful design is an energy function that is both accurate and simple for computational calculations.

The most accurate energy functions are those based on quantum mechanical simulations. However, such simulations are too slow and typically impractical for protein design. Instead, many protein design algorithms use either physics-based energy functions adapted from molecular mechanics simulation programs, knowledge based energy-functions, or a hybrid mix of both. The trend has been toward using more physics-based potential energy functions.[15]

Physics-based energy functions, such as AMBER and CHARMM, are typically derived from quantum mechanical simulations, and experimental data from thermodynamics, crystallography, and spectroscopy.[16] These energy functions typically simplify physical energy function and make them pairwise decomposable, meaning that the total energy of a protein conformation can be calculated by adding the pairwise energy between each atom pair, which makes them attractive for optimization algorithms. Physics-based energy functions typically model an attractive-repulsive Lennard-Jones term between atoms and a pairwise electrostatics coulombic term[17] between non-bonded atoms.

 
Water-mediated hydrogen bonds play a key role in protein–protein binding. One such interaction is shown between residues D457, S365 in the heavy chain of the HIV-broadly-neutralizing antibody VRC01 (green) and residues N58 and Y59 in the HIV envelope protein GP120 (purple).[18]

Statistical potentials, in contrast to physics-based potentials, have the advantage of being fast to compute, of accounting implicitly of complex effects and being less sensitive to small changes in the protein structure.[19] These energy functions are based on deriving energy values from frequency of appearance on a structural database.

Protein design, however, has requirements that can sometimes be limited in molecular mechanics force-fields. Molecular mechanics force-fields, which have been used mostly in molecular dynamics simulations, are optimized for the simulation of single sequences, but protein design searches through many conformations of many sequences. Thus, molecular mechanics force-fields must be tailored for protein design. In practice, protein design energy functions often incorporate both statistical terms and physics-based terms. For example, the Rosetta energy function, one of the most-used energy functions, incorporates physics-based energy terms originating in the CHARMM energy function, and statistical energy terms, such as rotamer probability and knowledge-based electrostatics. Typically, energy functions are highly customized between laboratories, and specifically tailored for every design.[16]

Challenges for effective design energy functions edit

Water makes up most of the molecules surrounding proteins and is the main driver of protein structure. Thus, modeling the interaction between water and protein is vital in protein design. The number of water molecules that interact with a protein at any given time is huge and each one has a large number of degrees of freedom and interaction partners. Instead, protein design programs model most of such water molecules as a continuum, modeling both the hydrophobic effect and solvation polarization.[16]

Individual water molecules can sometimes have a crucial structural role in the core of proteins, and in protein–protein or protein–ligand interactions. Failing to model such waters can result in mispredictions of the optimal sequence of a protein–protein interface. As an alternative, water molecules can be added to rotamers. [16]


As an optimization problem edit

 
This animation illustrates the complexity of a protein design search, which typically compares all the rotamer-conformations from all possible mutations at all residues. In this example, the residues Phe36 and His 106 are allowed to mutate to, respectively, the amino acids Tyr and Asn. Phe and Tyr have 4 rotamers each in the rotamer library, while Asn and His have 7 and 8 rotamers, respectively, in the rotamer library (from the Richardson's penultimate rotamer library[10]). The animation loops through all (4 + 4) x (7 + 8) = 120 possibilities. The structure shown is that of myoglobin, PDB id: 1mbn.

The goal of protein design is to find a protein sequence that will fold to a target structure. A protein design algorithm must, thus, search all the conformations of each sequence, with respect to the target fold, and rank sequences according to the lowest-energy conformation of each one, as determined by the protein design energy function. Thus, a typical input to the protein design algorithm is the target fold, the sequence space, the structural flexibility, and the energy function, while the output is one or more sequences that are predicted to fold stably to the target structure.

The number of candidate protein sequences, however, grows exponentially with the number of protein residues; for example, there are 20100 protein sequences of length 100. Furthermore, even if amino acid side-chain conformations are limited to a few rotamers (see Structural flexibility), this results in an exponential number of conformations for each sequence. Thus, in our 100 residue protein, and assuming that each amino acid has exactly 10 rotamers, a search algorithm that searches this space will have to search over 200100 protein conformations.

The most common energy functions can be decomposed into pairwise terms between rotamers and amino acid types, which casts the problem as a combinatorial one, and powerful optimization algorithms can be used to solve it. In those cases, the total energy of each conformation belonging to each sequence can be formulated as a sum of individual and pairwise terms between residue positions. If a designer is interested only in the best sequence, the protein design algorithm only requires the lowest-energy conformation of the lowest-energy sequence. In these cases, the amino acid identity of each rotamer can be ignored and all rotamers belonging to different amino acids can be treated the same. Let ri be a rotamer at residue position i in the protein chain, and E(ri) the potential energy between the internal atoms of the rotamer. Let E(ri, rj) be the potential energy between ri and rotamer rj at residue position j. Then, we define the optimization problem as one of finding the conformation of minimum energy (ET):

 

(1)

The problem of minimizing ET is an NP-hard problem.[14][20][21] Even though the class of problems is NP-hard, in practice many instances of protein design can be solved exactly or optimized satisfactorily through heuristic methods.

Algorithms edit

Several algorithms have been developed specifically for the protein design problem. These algorithms can be divided into two broad classes: exact algorithms, such as dead-end elimination, that lack runtime guarantees but guarantee the quality of the solution; and heuristic algorithms, such as Monte Carlo, that are faster than exact algorithms but have no guarantees on the optimality of the results. Exact algorithms guarantee that the optimization process produced the optimal according to the protein design model. Thus, if the predictions of exact algorithms fail when these are experimentally validated, then the source of error can be attributed to the energy function, the allowed flexibility, the sequence space or the target structure (e.g., if it cannot be designed for).[22]

Some protein design algorithms are listed below. Although these algorithms address only the most basic formulation of the protein design problem, Equation (1), when the optimization goal changes because designers introduce improvements and extensions to the protein design model, such as improvements to the structural flexibility allowed (e.g., protein backbone flexibility) or including sophisticated energy terms, many of the extensions on protein design that improve modeling are built atop these algorithms. For example, Rosetta Design incorporates sophisticated energy terms, and backbone flexibility using Monte Carlo as the underlying optimizing algorithm. OSPREY's algorithms build on the dead-end elimination algorithm and A* to incorporate continuous backbone and side-chain movements. Thus, these algorithms provide a good perspective on the different kinds of algorithms available for protein design.

In 2020 scientists reported the development of an AI-based process using genome databases for evolution-based designing of novel proteins. They used deep learning to identify design-rules.[23][24] In 2022, a study reported deep learning software that can design proteins that contain prespecified functional sites.[25][26]

With mathematical guarantees edit

Dead-end elimination edit

The dead-end elimination (DEE) algorithm reduces the search space of the problem iteratively by removing rotamers that can be provably shown to be not part of the global lowest energy conformation (GMEC). On each iteration, the dead-end elimination algorithm compares all possible pairs of rotamers at each residue position, and removes each rotamer r′i that can be shown to always be of higher energy than another rotamer ri and is thus not part of the GMEC:

 

Other powerful extensions to the dead-end elimination algorithm include the pairs elimination criterion, and the generalized dead-end elimination criterion. This algorithm has also been extended to handle continuous rotamers with provable guarantees.

Although the Dead-end elimination algorithm runs in polynomial time on each iteration, it cannot guarantee convergence. If, after a certain number of iterations, the dead-end elimination algorithm does not prune any more rotamers, then either rotamers have to be merged or another search algorithm must be used to search the remaining search space. In such cases, the dead-end elimination acts as a pre-filtering algorithm to reduce the search space, while other algorithms, such as A*, Monte Carlo, Linear Programming, or FASTER are used to search the remaining search space.[14]

Branch and bound edit

The protein design conformational space can be represented as a tree, where the protein residues are ordered in an arbitrary way, and the tree branches at each of the rotamers in a residue. Branch and bound algorithms use this representation to efficiently explore the conformation tree: At each branching, branch and bound algorithms bound the conformation space and explore only the promising branches.[14][27][28]

A popular search algorithm for protein design is the A* search algorithm.[14][28] A* computes a lower-bound score on each partial tree path that lower bounds (with guarantees) the energy of each of the expanded rotamers. Each partial conformation is added to a priority queue and at each iteration the partial path with the lowest lower bound is popped from the queue and expanded. The algorithm stops once a full conformation has been enumerated and guarantees that the conformation is the optimal.

The A* score f in protein design consists of two parts, f=g+h. g is the exact energy of the rotamers that have already been assigned in the partial conformation. h is a lower bound on the energy of the rotamers that have not yet been assigned. Each is designed as follows, where d is the index of the last assigned residue in the partial conformation.

 
 

Integer linear programming edit

The problem of optimizing ET (Equation (1)) can be easily formulated as an integer linear program (ILP).[29] One of the most powerful formulations uses binary variables to represent the presence of a rotamer and edges in the final solution, and constraints the solution to have exactly one rotamer for each residue and one pairwise interaction for each pair of residues:

 

s.t.

 
 
 

ILP solvers, such as CPLEX, can compute the exact optimal solution for large instances of protein design problems. These solvers use a linear programming relaxation of the problem, where qi and qij are allowed to take continuous values, in combination with a branch and cut algorithm to search only a small portion of the conformation space for the optimal solution. ILP solvers have been shown to solve many instances of the side-chain placement problem.[29]

Message-passing based approximations to the linear programming dual edit

ILP solvers depend on linear programming (LP) algorithms, such as the Simplex or barrier-based methods to perform the LP relaxation at each branch. These LP algorithms were developed as general-purpose optimization methods and are not optimized for the protein design problem (Equation (1)). In consequence, the LP relaxation becomes the bottleneck of ILP solvers when the problem size is large.[30] Recently, several alternatives based on message-passing algorithms have been designed specifically for the optimization of the LP relaxation of the protein design problem. These algorithms can approximate both the dual or the primal instances of the integer programming, but in order to maintain guarantees on optimality, they are most useful when used to approximate the dual of the protein design problem, because approximating the dual guarantees that no solutions are missed. Message-passing based approximations include the tree reweighted max-product message passing algorithm,[31][32] and the message passing linear programming algorithm.[33]

Optimization algorithms without guarantees edit

Monte Carlo and simulated annealing edit

Monte Carlo is one of the most widely used algorithms for protein design. In its simplest form, a Monte Carlo algorithm selects a residue at random, and in that residue a randomly chosen rotamer (of any amino acid) is evaluated.[21] The new energy of the protein, Enew is compared against the old energy Eold and the new rotamer is accepted with a probability of:

 

where β is the Boltzmann constant and the temperature T can be chosen such that in the initial rounds it is high and it is slowly annealed to overcome local minima.[12]

FASTER edit

The FASTER algorithm uses a combination of deterministic and stochastic criteria to optimize amino acid sequences. FASTER first uses DEE to eliminate rotamers that are not part of the optimal solution. Then, a series of iterative steps optimize the rotamer assignment.[34][35]

Belief propagation edit

In belief propagation for protein design, the algorithm exchanges messages that describe the belief that each residue has about the probability of each rotamer in neighboring residues. The algorithm updates messages on every iteration and iterates until convergence or until a fixed number of iterations. Convergence is not guaranteed in protein design. The message mi→ j(rj that a residue i sends to every rotamer (rj at neighboring residue j is defined as:

 

Both max-product and sum-product belief propagation have been used to optimize protein design.

Applications and examples of designed proteins edit

Enzyme design edit

The design of new enzymes is a use of protein design with huge bioengineering and biomedical applications. In general, designing a protein structure can be different from designing an enzyme, because the design of enzymes must consider many states involved in the catalytic mechanism. However protein design is a prerequisite of de novo enzyme design because, at the very least, the design of catalysts requires a scaffold in which the catalytic mechanism can be inserted.[36]

Great progress in de novo enzyme design, and redesign, was made in the first decade of the 21st century. In three major studies, David Baker and coworkers de novo designed enzymes for the retro-aldol reaction,[37] a Kemp-elimination reaction,[38] and for the Diels-Alder reaction.[39] Furthermore, Stephen Mayo and coworkers developed an iterative method to design the most efficient known enzyme for the Kemp-elimination reaction.[40] Also, in the laboratory of Bruce Donald, computational protein design was used to switch the specificity of one of the protein domains of the nonribosomal peptide synthetase that produces Gramicidin S, from its natural substrate phenylalanine to other noncognate substrates including charged amino acids; the redesigned enzymes had activities close to those of the wild-type.[41]

Design for affinity edit

Protein–protein interactions are involved in most biotic processes. Many of the hardest-to-treat diseases, such as Alzheimer's, many forms of cancer (e.g., TP53), and human immunodeficiency virus (HIV) infection involve protein–protein interactions. Thus, to treat such diseases, it is desirable to design protein or protein-like therapeutics that bind one of the partners of the interaction and, thus, disrupt the disease-causing interaction. This requires designing protein-therapeutics for affinity toward its partner.

Protein–protein interactions can be designed using protein design algorithms because the principles that rule protein stability also rule protein–protein binding. Protein–protein interaction design, however, presents challenges not commonly present in protein design. One of the most important challenges is that, in general, the interfaces between proteins are more polar than protein cores, and binding involves a tradeoff between desolvation and hydrogen bond formation.[42] To overcome this challenge, Bruce Tidor and coworkers developed a method to improve the affinity of antibodies by focusing on electrostatic contributions. They found that, for the antibodies designed in the study, reducing the desolvation costs of the residues in the interface increased the affinity of the binding pair.[42][43][44]

Scoring binding predictions edit

Protein design energy functions must be adapted to score binding predictions because binding involves a trade-off between the lowest-energy conformations of the free proteins (EP and EL) and the lowest-energy conformation of the bound complex (EPL):

 .

The K* algorithm approximates the binding constant of the algorithm by including conformational entropy into the free energy calculation. The K* algorithm considers only the lowest-energy conformations of the free and bound complexes (denoted by the sets P, L, and PL) to approximate the partition functions of each complex:[14]

 

Design for specificity edit

The design of protein–protein interactions must be highly specific because proteins can interact with a large number of proteins; successful design requires selective binders. Thus, protein design algorithms must be able to distinguish between on-target (or positive design) and off-target binding (or negative design).[2][42] One of the most prominent examples of design for specificity is the design of specific bZIP-binding peptides by Amy Keating and coworkers for 19 out of the 20 bZIP families; 8 of these peptides were specific for their intended partner over competing peptides.[42][45][46] Further, positive and negative design was also used by Anderson and coworkers to predict mutations in the active site of a drug target that conferred resistance to a new drug; positive design was used to maintain wild-type activity, while negative design was used to disrupt binding of the drug.[47] Recent computational redesign by Costas Maranas and coworkers was also capable of experimentally switching the cofactor specificity of Candida boidinii xylose reductase from NADPH to NADH.[48]

Protein resurfacing edit

Protein resurfacing consists of designing a protein's surface while preserving the overall fold, core, and boundary regions of the protein intact. Protein resurfacing is especially useful to alter the binding of a protein to other proteins. One of the most important applications of protein resurfacing was the design of the RSC3 probe to select broadly neutralizing HIV antibodies at the NIH Vaccine Research Center. First, residues outside of the binding interface between the gp120 HIV envelope protein and the formerly discovered b12-antibody were selected to be designed. Then, the sequence spaced was selected based on evolutionary information, solubility, similarity with the wild-type, and other considerations. Then the RosettaDesign software was used to find optimal sequences in the selected sequence space. RSC3 was later used to discover the broadly neutralizing antibody VRC01 in the serum of a long-term HIV-infected non-progressor individual.[49]

Design of globular proteins edit

Globular proteins are proteins that contain a hydrophobic core and a hydrophilic surface. Globular proteins often assume a stable structure, unlike fibrous proteins, which have multiple conformations. The three-dimensional structure of globular proteins is typically easier to determine through X-ray crystallography and nuclear magnetic resonance than both fibrous proteins and membrane proteins, which makes globular proteins more attractive for protein design than the other types of proteins. Most successful protein designs have involved globular proteins. Both RSD-1, and Top7 were de novo designs of globular proteins. Five more protein structures were designed, synthesized, and verified in 2012 by the Baker group. These new proteins serve no biotic function, but the structures are intended to act as building-blocks that can be expanded to incorporate functional active sites. The structures were found computationally by using new heuristics based on analyzing the connecting loops between parts of the sequence that specify secondary structures.[50]

Design of membrane proteins edit

Several transmembrane proteins have been successfully designed,[51] along with many other membrane-associated peptides and proteins.[52] Recently, Costas Maranas and his coworkers developed an automated tool[53] to redesign the pore size of Outer Membrane Porin Type-F (OmpF) from E.coli to any desired sub-nm size and assembled them in membranes to perform precise angstrom scale separation.

Other applications edit

One of the most desirable uses for protein design is for biosensors, proteins that will sense the presence of specific compounds. Some attempts in the design of biosensors include sensors for unnatural molecules including TNT.[54] More recently, Kuhlman and coworkers designed a biosensor of the PAK1.[55]

In a sense, protein design is a subset of battery design.[further explanation needed]

See also edit

References edit

  1. ^ Korendovych, Ivan (March 19, 2018). "Minimalist design of peptide and protein catalysts". American Chemical Society. Retrieved March 22, 2018.
  2. ^ a b c d e f Richardson, JS; Richardson, DC (July 1989). "The de novo design of protein structures". Trends in Biochemical Sciences. 14 (7): 304–9. doi:10.1016/0968-0004(89)90070-4. PMID 2672455.
  3. ^ a b c Dahiyat, BI; Mayo, SL (October 3, 1997). "De novo protein design: fully automated sequence selection". Science. 278 (5335): 82–7. CiteSeerX 10.1.1.72.7304. doi:10.1126/science.278.5335.82. PMID 9311930.
  4. ^ a b Gordon, DB; Marshall, SA; Mayo, SL (August 1999). "Energy functions for protein design". Current Opinion in Structural Biology. 9 (4): 509–13. doi:10.1016/s0959-440x(99)80072-4. PMID 10449371.
  5. ^ a b Harbury, PB; Plecs, JJ; Tidor, B; Alber, T; Kim, PS (November 20, 1998). "High-resolution protein design with backbone freedom". Science. 282 (5393): 1462–7. doi:10.1126/science.282.5393.1462. PMID 9822371.
  6. ^ a b c Kuhlman, B; Dantas, G; Ireton, GC; Varani, G; Stoddard, BL; Baker, D (November 21, 2003). "Design of a novel globular protein fold with atomic-level accuracy". Science. 302 (5649): 1364–8. Bibcode:2003Sci...302.1364K. doi:10.1126/science.1089427. PMID 14631033. S2CID 1939390.
  7. ^ Sterner, R; Merkl, R; Raushel, FM (May 2008). "Computational design of enzymes". Chemistry & Biology. 15 (5): 421–3. doi:10.1016/j.chembiol.2008.04.007. PMID 18482694.
  8. ^ Wu, X; Yang, ZY; Li, Y; Hogerkorp, CM; Schief, WR; Seaman, MS; Zhou, T; Schmidt, SD; Wu, L; Xu, L; Longo, NS; McKee, K; O'Dell, S; Louder, MK; Wycuff, DL; Feng, Y; Nason, M; Doria-Rose, N; Connors, M; Kwong, PD; Roederer, M; Wyatt, RT; Nabel, GJ; Mascola, JR (August 13, 2010). "Rational design of envelope identifies broadly neutralizing human monoclonal antibodies to HIV-1". Science. 329 (5993): 856–61. Bibcode:2010Sci...329..856W. doi:10.1126/science.1187659. PMC 2965066. PMID 20616233.
  9. ^ Höcker, B (November 8, 2012). "Structural biology: A toolbox for protein design". Nature. 491 (7423): 204–5. Bibcode:2012Natur.491..204H. doi:10.1038/491204a. PMID 23135466. S2CID 4426247.
  10. ^ a b c Lovell, SC; Word, JM; Richardson, JS; Richardson, DC (August 15, 2000). "The penultimate rotamer library". Proteins. 40 (3): 389–408. CiteSeerX 10.1.1.555.4071. doi:10.1002/1097-0134(20000815)40:3<389::AID-PROT50>3.0.CO;2-2. PMID 10861930. S2CID 3055173.
  11. ^ Shapovalov, MV; Dunbrack RL, Jr (June 8, 2011). "A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions". Structure. 19 (6): 844–58. doi:10.1016/j.str.2011.03.019. PMC 3118414. PMID 21645855.
  12. ^ a b Samish, I; MacDermaid, CM; Perez-Aguilar, JM; Saven, JG (2011). "Theoretical and computational protein design". Annual Review of Physical Chemistry. 62: 129–49. Bibcode:2011ARPC...62..129S. doi:10.1146/annurev-physchem-032210-103509. PMID 21128762.
  13. ^ a b Mandell, DJ; Kortemme, T (August 2009). "Backbone flexibility in computational protein design" (PDF). Current Opinion in Biotechnology. 20 (4): 420–8. doi:10.1016/j.copbio.2009.07.006. PMID 19709874.
  14. ^ a b c d e f Donald, Bruce R. (2011). Algorithms in Structural Molecular Biology. Cambridge, MA: MIT Press.
  15. ^ a b Boas, F. E. & Harbury, P. B. (2007). "Potential energy functions for protein design". Current Opinion in Structural Biology. 17 (2): 199–204. doi:10.1016/j.sbi.2007.03.006. PMID 17387014.
  16. ^ a b c d Boas, FE; Harbury, PB (April 2007). "Potential energy functions for protein design". Current Opinion in Structural Biology. 17 (2): 199–204. doi:10.1016/j.sbi.2007.03.006. PMID 17387014.
  17. ^ Vizcarra, CL; Mayo, SL (December 2005). "Electrostatics in computational protein design". Current Opinion in Chemical Biology. 9 (6): 622–6. doi:10.1016/j.cbpa.2005.10.014. PMID 16257567.
  18. ^ Zhou, T; Georgiev, I; Wu, X; Yang, ZY; Dai, K; Finzi, A; Kwon, YD; Scheid, JF; Shi, W; Xu, L; Yang, Y; Zhu, J; Nussenzweig, MC; Sodroski, J; Shapiro, L; Nabel, GJ; Mascola, JR; Kwong, PD (August 13, 2010). "Structural basis for broad and potent neutralization of HIV-1 by antibody VRC01". Science. 329 (5993): 811–7. Bibcode:2010Sci...329..811Z. doi:10.1126/science.1192819. PMC 2981354. PMID 20616231.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  19. ^ Mendes, J; Guerois, R; Serrano, L (August 2002). "Energy estimation in protein design". Current Opinion in Structural Biology. 12 (4): 441–6. doi:10.1016/s0959-440x(02)00345-7. PMID 12163065.
  20. ^ Pierce, NA; Winfree, E (October 2002). "Protein design is NP-hard". Protein Engineering. 15 (10): 779–82. doi:10.1093/protein/15.10.779. PMID 12468711.
  21. ^ a b Voigt, CA; Gordon, DB; Mayo, SL (June 9, 2000). "Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design". Journal of Molecular Biology. 299 (3): 789–803. CiteSeerX 10.1.1.138.2023. doi:10.1006/jmbi.2000.3758. PMID 10835284.
  22. ^ Hong, EJ; Lippow, SM; Tidor, B; Lozano-Pérez, T (September 2009). "Rotamer optimization for protein design through MAP estimation and problem-size reduction". Journal of Computational Chemistry. 30 (12): 1923–45. doi:10.1002/jcc.21188. PMC 3495010. PMID 19123203.
  23. ^ "Machine learning reveals recipe for building artificial proteins". phys.org. Retrieved August 17, 2020.
  24. ^ Russ, William P.; Figliuzzi, Matteo; Stocker, Christian; Barrat-Charlaix, Pierre; Socolich, Michael; Kast, Peter; Hilvert, Donald; Monasson, Remi; Cocco, Simona; Weigt, Martin; Ranganathan, Rama (2020). "An evolution-based model for designing chorismatemutase enzymes". Science. 369 (6502): 440–445. Bibcode:2020Sci...369..440R. doi:10.1126/science.aba3304. PMID 32703877. S2CID 220714458.
  25. ^ "Biologists train AI to generate medicines and vaccines". University of Washington-Harborview Medical Center.
  26. ^ Wang, Jue; Lisanza, Sidney; Juergens, David; Tischer, Doug; Watson, Joseph L.; Castro, Karla M.; Ragotte, Robert; Saragovi, Amijai; Milles, Lukas F.; Baek, Minkyung; Anishchenko, Ivan; Yang, Wei; Hicks, Derrick R.; Expòsit, Marc; Schlichthaerle, Thomas; Chun, Jung-Ho; Dauparas, Justas; Bennett, Nathaniel; Wicky, Basile I. M.; Muenks, Andrew; DiMaio, Frank; Correia, Bruno; Ovchinnikov, Sergey; Baker, David (July 22, 2022). "Scaffolding protein functional sites using deep learning" (PDF). Science. 377 (6604): 387–394. Bibcode:2022Sci...377..387W. doi:10.1126/science.abn2100. ISSN 0036-8075. PMC 9621694. PMID 35862514.
  27. ^ Gordon, DB; Mayo, SL (September 15, 1999). "Branch-and-terminate: a combinatorial optimization algorithm for protein design". Structure. 7 (9): 1089–98. doi:10.1016/s0969-2126(99)80176-2. PMID 10508778.
  28. ^ a b Leach, AR; Lemon, AP (November 1, 1998). "Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm". Proteins. 33 (2): 227–39. CiteSeerX 10.1.1.133.7986. doi:10.1002/(sici)1097-0134(19981101)33:2<227::aid-prot7>3.0.co;2-f. PMID 9779790. S2CID 12872539.
  29. ^ a b Kingsford, CL; Chazelle, B; Singh, M (April 1, 2005). "Solving and analyzing side-chain positioning problems using linear and integer programming". Bioinformatics. 21 (7): 1028–36. doi:10.1093/bioinformatics/bti144. PMID 15546935.
  30. ^ Yanover, Chen; Talya Meltzer; Yair Weiss (2006). "Linear Programming Relaxations and Belief Propagation – An Empirical Study". Journal of Machine Learning Research. 7: 1887–1907.
  31. ^ Wainwright, Martin J; Tommi S. Jaakkola; Alan S. Willsky (2005). "MAP estimation via agreement on trees: message-passing and linear programming". IEEE Transactions on Information Theory. 51 (11): 3697–3717. CiteSeerX 10.1.1.71.9565. doi:10.1109/tit.2005.856938. S2CID 10007532.
  32. ^ Kolmogorov, Vladimir (October 28, 2006). "Convergent tree-reweighted message passing for energy minimization". IEEE Transactions on Pattern Analysis and Machine Intelligence. 28 (10): 1568–1583. doi:10.1109/TPAMI.2006.200. PMID 16986540. S2CID 8616813.
  33. ^ Globerson, Amir; Tommi S. Jaakkola (2007). "Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations". Advances in Neural Information Processing Systems.
  34. ^ Allen, BD; Mayo, SL (July 30, 2006). "Dramatic performance enhancements for the FASTER optimization algorithm". Journal of Computational Chemistry. 27 (10): 1071–5. CiteSeerX 10.1.1.425.5418. doi:10.1002/jcc.20420. PMID 16685715. S2CID 769053.
  35. ^ Desmet, J; Spriet, J; Lasters, I (July 1, 2002). "Fast and accurate side-chain topology and energy refinement (FASTER) as a new method for protein structure optimization". Proteins. 48 (1): 31–43. doi:10.1002/prot.10131. PMID 12012335. S2CID 21524437.
  36. ^ Baker, D (October 2010). "An exciting but challenging road ahead for computational enzyme design". Protein Science. 19 (10): 1817–9. doi:10.1002/pro.481. PMC 2998717. PMID 20717908.
  37. ^ Jiang, Lin; Althoff, Eric A.; Clemente, Fernando R.; Doyle, Lindsey; Rothlisberger, Daniela; Zanghellini, Alexandre; Gallaher, Jasmine L.; Betker, Jamie L.; Tanaka, Fujie (2008). "De Novo Computational Design of Retro-Aldol Enzymes". Science. 319 (5868): 1387–91. Bibcode:2008Sci...319.1387J. doi:10.1126/science.1152692. PMC 3431203. PMID 18323453.
  38. ^ Röthlisberger, Daniela; Khersonsky, Olga; Wollacott, Andrew M.; Jiang, Lin; Dechancie, Jason; Betker, Jamie; Gallaher, Jasmine L.; Althoff, Eric A.; Zanghellini, Alexandre (2008). "Kemp elimination catalysts by computational enzyme design". Nature. 453 (7192): 190–5. Bibcode:2008Natur.453..190R. doi:10.1038/nature06879. PMID 18354394.
  39. ^ Siegel, JB; Zanghellini, A; Lovick, HM; Kiss, G; Lambert, AR; St Clair, JL; Gallaher, JL; Hilvert, D; Gelb, MH; Stoddard, BL; Houk, KN; Michael, FE; Baker, D (July 16, 2010). "Computational design of an enzyme catalyst for a stereoselective bimolecular Diels-Alder reaction". Science. 329 (5989): 309–13. Bibcode:2010Sci...329..309S. doi:10.1126/science.1190239. PMC 3241958. PMID 20647463.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  40. ^ Privett, HK; Kiss, G; Lee, TM; Blomberg, R; Chica, RA; Thomas, LM; Hilvert, D; Houk, KN; Mayo, SL (March 6, 2012). "Iterative approach to computational enzyme design". Proceedings of the National Academy of Sciences of the United States of America. 109 (10): 3790–5. Bibcode:2012PNAS..109.3790P. doi:10.1073/pnas.1118082108. PMC 3309769. PMID 22357762.
  41. ^ Chen, CY; Georgiev, I; Anderson, AC; Donald, BR (March 10, 2009). "Computational structure-based redesign of enzyme activity". Proceedings of the National Academy of Sciences of the United States of America. 106 (10): 3764–9. Bibcode:2009PNAS..106.3764C. doi:10.1073/pnas.0900266106. PMC 2645347. PMID 19228942.
  42. ^ a b c d Karanicolas, J; Kuhlman, B (August 2009). "Computational design of affinity and specificity at protein–protein interfaces". Current Opinion in Structural Biology. 19 (4): 458–63. doi:10.1016/j.sbi.2009.07.005. PMC 2882636. PMID 19646858.
  43. ^ Shoichet, BK (October 2007). "No free energy lunch". Nature Biotechnology. 25 (10): 1109–10. doi:10.1038/nbt1007-1109. PMID 17921992. S2CID 5527226.
  44. ^ Lippow, SM; Wittrup, KD; Tidor, B (October 2007). "Computational design of antibody-affinity improvement beyond in vivo maturation". Nature Biotechnology. 25 (10): 1171–6. doi:10.1038/nbt1336. PMC 2803018. PMID 17891135.
  45. ^ Schreiber, G; Keating, AE (February 2011). "Protein binding specificity versus promiscuity". Current Opinion in Structural Biology. 21 (1): 50–61. doi:10.1016/j.sbi.2010.10.002. PMC 3053118. PMID 21071205.
  46. ^ Grigoryan, G; Reinke, AW; Keating, AE (April 16, 2009). "Design of protein-interaction specificity gives selective bZIP-binding peptides". Nature. 458 (7240): 859–64. Bibcode:2009Natur.458..859G. doi:10.1038/nature07885. PMC 2748673. PMID 19370028.
  47. ^ Frey, KM; Georgiev, I; Donald, BR; Anderson, AC (August 3, 2010). "Predicting resistance mutations using protein design algorithms". Proceedings of the National Academy of Sciences of the United States of America. 107 (31): 13707–12. Bibcode:2010PNAS..10713707F. doi:10.1073/pnas.1002162107. PMC 2922245. PMID 20643959.
  48. ^ Khoury, GA; Fazelinia, H; Chin, JW; Pantazes, RJ; Cirino, PC; Maranas, CD (October 2009). "Computational design of Candida boidinii xylose reductase for altered cofactor specificity". Protein Science. 18 (10): 2125–38. doi:10.1002/pro.227. PMC 2786976. PMID 19693930.
  49. ^ Burton, DR; Weiss, RA (August 13, 2010). "AIDS/HIV. A boost for HIV vaccine design". Science. 329 (5993): 770–3. Bibcode:2010Sci...329..770B. doi:10.1126/science.1194693. PMID 20705840. S2CID 206528638.
  50. ^ Jessica Marshall (November 7, 2012). "Proteins made to order". Nature News. Retrieved November 17, 2012.
  51. ^ Designed transmembrane alpha-hairpin proteins in OPM database
  52. ^ Designed membrane-associated peptides and proteins in OPM database
  53. ^ Chowdhury, Ratul; Kumar, Manish; Maranas, Costas D.; Golbeck, John H.; Baker, Carol; Prabhakar, Jeevan; Grisewood, Matthew; Decker, Karl; Shankla, Manish (September 10, 2018). "PoreDesigner for tuning solute selectivity in a robust and highly permeable outer membrane pore". Nature Communications. 9 (1): 3661. Bibcode:2018NatCo...9.3661C. doi:10.1038/s41467-018-06097-1. ISSN 2041-1723. PMC 6131167. PMID 30202038.
  54. ^ Looger, Loren L.; Dwyer, Mary A.; Smith, James J. & Hellinga, Homme W. (2003). "Computational design of receptor and sensor proteins with novel functions". Nature. 423 (6936): 185–190. Bibcode:2003Natur.423..185L. doi:10.1038/nature01556. PMID 12736688. S2CID 4387641.
  55. ^ Jha, RK; Wu, YI; Zawistowski, JS; MacNevin, C; Hahn, KM; Kuhlman, B (October 21, 2011). "Redesign of the PAK1 autoinhibitory domain for enhanced stability and affinity in biosensor applications". Journal of Molecular Biology. 413 (2): 513–22. doi:10.1016/j.jmb.2011.08.022. PMC 3202338. PMID 21888918.

Further reading edit

  • Donald, Bruce R. (2011). Algorithms in Structural Molecular Biology. Cambridge, MA: MIT Press.
  • Sander, Chris; Vriend, Gerrit; Bazan, Fernando; Horovitz, Amnon; Nakamura, Haruki; Ribas, Luis; Finkelstein, Alexei V.; Lockhart, Andrew; Merkl, Rainer; et al. (1992). "Protein Design on computers. Five new proteins: Shpilka, Grendel, Fingerclasp, Leather and Aida". Proteins: Structure, Function, and Bioinformatics. 12 (2): 105–110. doi:10.1002/prot.340120203. PMID 1603799. S2CID 38986245.
  • Jin, Wenzhen; Kambara, Ohki; Sasakawa, Hiroaki; Tamura, Atsuo & Takada, Shoji (2003). "De Novo Design of Foldable Proteins with Smooth Folding Funnel: Automated Negative Design and Experimental Verification". Structure. 11 (5): 581–590. doi:10.1016/S0969-2126(03)00075-3. PMID 12737823.
  • Pokala, Navin & Handel, Tracy M. (2005). "Energy Functions for Protein Design: Adjustment with Protein–Protein Complex Affinities, Models for the Unfolded State, and Negative Design of Solubility and Specificity". Journal of Molecular Biology. 347 (1): 203–227. doi:10.1016/j.jmb.2004.12.019. PMID 15733929.

protein, design, this, article, about, rational, protein, design, broader, engineering, proteins, protein, engineering, rational, design, protein, molecules, design, novel, activity, behavior, purpose, advance, basic, understanding, protein, function, proteins. This article is about rational protein design For the broader engineering of proteins see Protein engineering Protein design is the rational design of new protein molecules to design novel activity behavior or purpose and to advance basic understanding of protein function 1 Proteins can be designed from scratch de novo design or by making calculated variants of a known protein structure and its sequence termed protein redesign Rational protein design approaches make protein sequence predictions that will fold to specific structures These predicted sequences can then be validated experimentally through methods such as peptide synthesis site directed mutagenesis or artificial gene synthesis Rational protein design dates back to the mid 1970s 2 Recently however there were numerous examples of successful rational design of water soluble and even transmembrane peptides and proteins in part due to a better understanding of different factors contributing to protein structure stability and development of better computational methods Contents 1 Overview and history 2 Underlying models of protein structure and function 2 1 Target structure 2 2 Sequence space 2 3 Structural flexibility 2 4 Energy function 2 4 1 Challenges for effective design energy functions 3 As an optimization problem 4 Algorithms 4 1 With mathematical guarantees 4 1 1 Dead end elimination 4 1 2 Branch and bound 4 1 3 Integer linear programming 4 1 4 Message passing based approximations to the linear programming dual 4 2 Optimization algorithms without guarantees 4 2 1 Monte Carlo and simulated annealing 4 2 2 FASTER 4 2 3 Belief propagation 5 Applications and examples of designed proteins 5 1 Enzyme design 5 2 Design for affinity 5 2 1 Scoring binding predictions 5 3 Design for specificity 5 4 Protein resurfacing 5 5 Design of globular proteins 5 6 Design of membrane proteins 5 7 Other applications 6 See also 7 References 8 Further readingOverview and history editThe goal in rational protein design is to predict amino acid sequences that will fold to a specific protein structure Although the number of possible protein sequences is vast growing exponentially with the size of the protein chain only a subset of them will fold reliably and quickly to one native state Protein design involves identifying novel sequences within this subset The native state of a protein is the conformational free energy minimum for the chain Thus protein design is the search for sequences that have the chosen structure as a free energy minimum In a sense it is the reverse of protein structure prediction In design a tertiary structure is specified and a sequence that will fold to it is identified Hence it is also termed inverse folding Protein design is then an optimization problem using some scoring criteria an optimized sequence that will fold to the desired structure is chosen When the first proteins were rationally designed during the 1970s and 1980s the sequence for these was optimized manually based on analyses of other known proteins the sequence composition amino acid charges and the geometry of the desired structure 2 The first designed proteins are attributed to Bernd Gutte who designed a reduced version of a known catalyst bovine ribonuclease and tertiary structures consisting of beta sheets and alpha helices including a binder of DDT Urry and colleagues later designed elastin like fibrous peptides based on rules on sequence composition Richardson and coworkers designed a 79 residue protein with no sequence homology to a known protein 2 In the 1990s the advent of powerful computers libraries of amino acid conformations and force fields developed mainly for molecular dynamics simulations enabled the development of structure based computational protein design tools Following the development of these computational tools great success has been achieved over the last 30 years in protein design The first protein successfully designed completely de novo was done by Stephen Mayo and coworkers in 1997 3 and shortly after in 1999 Peter S Kim and coworkers designed dimers trimers and tetramers of unnatural right handed coiled coils 4 5 In 2003 David Baker s laboratory designed a full protein to a fold never seen before in nature 6 Later in 2008 Baker s group computationally designed enzymes for two different reactions 7 In 2010 one of the most powerful broadly neutralizing antibodies was isolated from patient serum using a computationally designed protein probe 8 Due to these and other successes e g see examples below protein design has become one of the most important tools available for protein engineering There is great hope that the design of new proteins small and large will have uses in biomedicine and bioengineering Underlying models of protein structure and function editProtein design programs use computer models of the molecular forces that drive proteins in in vivo environments In order to make the problem tractable these forces are simplified by protein design models Although protein design programs vary greatly they have to address four main modeling questions What is the target structure of the design what flexibility is allowed on the target structure which sequences are included in the search and which force field will be used to score sequences and structures Target structure edit nbsp The Top7 protein was one of the first proteins designed for a fold that had never been seen before in nature 6 Protein function is heavily dependent on protein structure and rational protein design uses this relationship to design function by designing proteins that have a target structure or fold Thus by definition in rational protein design the target structure or ensemble of structures must be known beforehand This contrasts with other forms of protein engineering such as directed evolution where a variety of methods are used to find proteins that achieve a specific function and with protein structure prediction where the sequence is known but the structure is unknown Most often the target structure is based on a known structure of another protein However novel folds not seen in nature have been made increasingly possible Peter S Kim and coworkers designed trimers and tetramers of unnatural coiled coils which had not been seen before in nature 4 5 The protein Top7 developed in David Baker s lab was designed completely using protein design algorithms to a completely novel fold 6 More recently Baker and coworkers developed a series of principles to design ideal globular protein structures based on protein folding funnels that bridge between secondary structure prediction and tertiary structures These principles which build on both protein structure prediction and protein design were used to design five different novel protein topologies 9 Sequence space edit nbsp FSD 1 shown in blue PDB id 1FSV was the first de novo computational design of a full protein 3 The target fold was that of the zinc finger in residues 33 60 of the structure of protein Zif268 shown in red PDB id 1ZAA The designed sequence had very little sequence identity with any known protein sequence In rational protein design proteins can be redesigned from the sequence and structure of a known protein or completely from scratch in de novo protein design In protein redesign most of the residues in the sequence are maintained as their wild type amino acid while a few are allowed to mutate In de novo design the entire sequence is designed anew based on no prior sequence Both de novo designs and protein redesigns can establish rules on the sequence space the specific amino acids that are allowed at each mutable residue position For example the composition of the surface of the RSC3 probe to select HIV broadly neutralizing antibodies was restricted based on evolutionary data and charge balancing Many of the earliest attempts on protein design were heavily based on empiric rules on the sequence space 2 Moreover the design of fibrous proteins usually follows strict rules on the sequence space Collagen based designed proteins for example are often composed of Gly Pro X repeating patterns 2 The advent of computational techniques allows designing proteins with no human intervention in sequence selection 3 Structural flexibility edit nbsp Common protein design programs use rotamer libraries to simplify the conformational space of protein side chains This animation loops through all the rotamers of the isoleucine amino acid based on the Penultimate Rotamer Library total of 7 rotamers 10 In protein design the target structure or structures of the protein are known However a rational protein design approach must model some flexibility on the target structure in order to increase the number of sequences that can be designed for that structure and to minimize the chance of a sequence folding to a different structure For example in a protein redesign of one small amino acid such as alanine in the tightly packed core of a protein very few mutants would be predicted by a rational design approach to fold to the target structure if the surrounding side chains are not allowed to be repacked Thus an essential parameter of any design process is the amount of flexibility allowed for both the side chains and the backbone In the simplest models the protein backbone is kept rigid while some of the protein side chains are allowed to change conformations However side chains can have many degrees of freedom in their bond lengths bond angles and x dihedral angles To simplify this space protein design methods use rotamer libraries that assume ideal values for bond lengths and bond angles while restricting x dihedral angles to a few frequently observed low energy conformations termed rotamers Rotamer libraries are derived from the statistical analysis of many protein structures Backbone independent rotamer libraries describe all rotamers 10 Backbone dependent rotamer libraries in contrast describe the rotamers as how likely they are to appear depending on the protein backbone arrangement around the side chain 11 Most protein design programs use one conformation e g the modal value for rotamer dihedrals in space or several points in the region described by the rotamer the OSPREY protein design program in contrast models the entire continuous region 12 Although rational protein design must preserve the general backbone fold a protein allowing some backbone flexibility can significantly increase the number of sequences that fold to the structure while maintaining the general fold of the protein 13 Backbone flexibility is especially important in protein redesign because sequence mutations often result in small changes to the backbone structure Moreover backbone flexibility can be essential for more advanced applications of protein design such as binding prediction and enzyme design Some models of protein design backbone flexibility include small and continuous global backbone movements discrete backbone samples around the target fold backrub motions and protein loop flexibility 13 14 Energy function edit nbsp Comparison of various potential energy functions The most accurate energy are those that use quantum mechanical calculations but these are too slow for protein design On the other extreme heuristic energy functions are based on statistical terms and are very fast In the middle are molecular mechanics energy functions that are physically based but are not as computationally expensive as quantum mechanical simulations 15 Rational protein design techniques must be able to discriminate sequences that will be stable under the target fold from those that would prefer other low energy competing states Thus protein design requires accurate energy functions that can rank and score sequences by how well they fold to the target structure At the same time however these energy functions must consider the computational challenges behind protein design One of the most challenging requirements for successful design is an energy function that is both accurate and simple for computational calculations The most accurate energy functions are those based on quantum mechanical simulations However such simulations are too slow and typically impractical for protein design Instead many protein design algorithms use either physics based energy functions adapted from molecular mechanics simulation programs knowledge based energy functions or a hybrid mix of both The trend has been toward using more physics based potential energy functions 15 Physics based energy functions such as AMBER and CHARMM are typically derived from quantum mechanical simulations and experimental data from thermodynamics crystallography and spectroscopy 16 These energy functions typically simplify physical energy function and make them pairwise decomposable meaning that the total energy of a protein conformation can be calculated by adding the pairwise energy between each atom pair which makes them attractive for optimization algorithms Physics based energy functions typically model an attractive repulsive Lennard Jones term between atoms and a pairwise electrostatics coulombic term 17 between non bonded atoms nbsp Water mediated hydrogen bonds play a key role in protein protein binding One such interaction is shown between residues D457 S365 in the heavy chain of the HIV broadly neutralizing antibody VRC01 green and residues N58 and Y59 in the HIV envelope protein GP120 purple 18 Statistical potentials in contrast to physics based potentials have the advantage of being fast to compute of accounting implicitly of complex effects and being less sensitive to small changes in the protein structure 19 These energy functions are based on deriving energy values from frequency of appearance on a structural database Protein design however has requirements that can sometimes be limited in molecular mechanics force fields Molecular mechanics force fields which have been used mostly in molecular dynamics simulations are optimized for the simulation of single sequences but protein design searches through many conformations of many sequences Thus molecular mechanics force fields must be tailored for protein design In practice protein design energy functions often incorporate both statistical terms and physics based terms For example the Rosetta energy function one of the most used energy functions incorporates physics based energy terms originating in the CHARMM energy function and statistical energy terms such as rotamer probability and knowledge based electrostatics Typically energy functions are highly customized between laboratories and specifically tailored for every design 16 Challenges for effective design energy functions edit Water makes up most of the molecules surrounding proteins and is the main driver of protein structure Thus modeling the interaction between water and protein is vital in protein design The number of water molecules that interact with a protein at any given time is huge and each one has a large number of degrees of freedom and interaction partners Instead protein design programs model most of such water molecules as a continuum modeling both the hydrophobic effect and solvation polarization 16 Individual water molecules can sometimes have a crucial structural role in the core of proteins and in protein protein or protein ligand interactions Failing to model such waters can result in mispredictions of the optimal sequence of a protein protein interface As an alternative water molecules can be added to rotamers 16 As an optimization problem edit nbsp This animation illustrates the complexity of a protein design search which typically compares all the rotamer conformations from all possible mutations at all residues In this example the residues Phe36 and His 106 are allowed to mutate to respectively the amino acids Tyr and Asn Phe and Tyr have 4 rotamers each in the rotamer library while Asn and His have 7 and 8 rotamers respectively in the rotamer library from the Richardson s penultimate rotamer library 10 The animation loops through all 4 4 x 7 8 120 possibilities The structure shown is that of myoglobin PDB id 1mbn The goal of protein design is to find a protein sequence that will fold to a target structure A protein design algorithm must thus search all the conformations of each sequence with respect to the target fold and rank sequences according to the lowest energy conformation of each one as determined by the protein design energy function Thus a typical input to the protein design algorithm is the target fold the sequence space the structural flexibility and the energy function while the output is one or more sequences that are predicted to fold stably to the target structure The number of candidate protein sequences however grows exponentially with the number of protein residues for example there are 20100 protein sequences of length 100 Furthermore even if amino acid side chain conformations are limited to a few rotamers see Structural flexibility this results in an exponential number of conformations for each sequence Thus in our 100 residue protein and assuming that each amino acid has exactly 10 rotamers a search algorithm that searches this space will have to search over 200100 protein conformations The most common energy functions can be decomposed into pairwise terms between rotamers and amino acid types which casts the problem as a combinatorial one and powerful optimization algorithms can be used to solve it In those cases the total energy of each conformation belonging to each sequence can be formulated as a sum of individual and pairwise terms between residue positions If a designer is interested only in the best sequence the protein design algorithm only requires the lowest energy conformation of the lowest energy sequence In these cases the amino acid identity of each rotamer can be ignored and all rotamers belonging to different amino acids can be treated the same Let ri be a rotamer at residue position i in the protein chain and E ri the potential energy between the internal atoms of the rotamer Let E ri rj be the potential energy between ri and rotamer rj at residue position j Then we define the optimization problem as one of finding the conformation of minimum energy ET min E T i E i r i i j E i j r i r j displaystyle min E T sum i Big E i r i sum i neq j E ij r i r j Big nbsp 1 The problem of minimizing ET is an NP hard problem 14 20 21 Even though the class of problems is NP hard in practice many instances of protein design can be solved exactly or optimized satisfactorily through heuristic methods Algorithms editSeveral algorithms have been developed specifically for the protein design problem These algorithms can be divided into two broad classes exact algorithms such as dead end elimination that lack runtime guarantees but guarantee the quality of the solution and heuristic algorithms such as Monte Carlo that are faster than exact algorithms but have no guarantees on the optimality of the results Exact algorithms guarantee that the optimization process produced the optimal according to the protein design model Thus if the predictions of exact algorithms fail when these are experimentally validated then the source of error can be attributed to the energy function the allowed flexibility the sequence space or the target structure e g if it cannot be designed for 22 Some protein design algorithms are listed below Although these algorithms address only the most basic formulation of the protein design problem Equation 1 when the optimization goal changes because designers introduce improvements and extensions to the protein design model such as improvements to the structural flexibility allowed e g protein backbone flexibility or including sophisticated energy terms many of the extensions on protein design that improve modeling are built atop these algorithms For example Rosetta Design incorporates sophisticated energy terms and backbone flexibility using Monte Carlo as the underlying optimizing algorithm OSPREY s algorithms build on the dead end elimination algorithm and A to incorporate continuous backbone and side chain movements Thus these algorithms provide a good perspective on the different kinds of algorithms available for protein design In 2020 scientists reported the development of an AI based process using genome databases for evolution based designing of novel proteins They used deep learning to identify design rules 23 24 In 2022 a study reported deep learning software that can design proteins that contain prespecified functional sites 25 26 With mathematical guarantees edit Dead end elimination edit Main article Dead end elimination The dead end elimination DEE algorithm reduces the search space of the problem iteratively by removing rotamers that can be provably shown to be not part of the global lowest energy conformation GMEC On each iteration the dead end elimination algorithm compares all possible pairs of rotamers at each residue position and removes each rotamer r i that can be shown to always be of higher energy than another rotamer ri and is thus not part of the GMEC E r i j i min r j E r i r j gt E r i j i max r j E r i r j displaystyle E r i prime sum j neq i min r j E r i prime r j gt E r i sum j neq i max r j E r i r j nbsp Other powerful extensions to the dead end elimination algorithm include the pairs elimination criterion and the generalized dead end elimination criterion This algorithm has also been extended to handle continuous rotamers with provable guarantees Although the Dead end elimination algorithm runs in polynomial time on each iteration it cannot guarantee convergence If after a certain number of iterations the dead end elimination algorithm does not prune any more rotamers then either rotamers have to be merged or another search algorithm must be used to search the remaining search space In such cases the dead end elimination acts as a pre filtering algorithm to reduce the search space while other algorithms such as A Monte Carlo Linear Programming or FASTER are used to search the remaining search space 14 Branch and bound edit Main article Branch and bound The protein design conformational space can be represented as a tree where the protein residues are ordered in an arbitrary way and the tree branches at each of the rotamers in a residue Branch and bound algorithms use this representation to efficiently explore the conformation tree At each branching branch and bound algorithms bound the conformation space and explore only the promising branches 14 27 28 A popular search algorithm for protein design is the A search algorithm 14 28 A computes a lower bound score on each partial tree path that lower bounds with guarantees the energy of each of the expanded rotamers Each partial conformation is added to a priority queue and at each iteration the partial path with the lowest lower bound is popped from the queue and expanded The algorithm stops once a full conformation has been enumerated and guarantees that the conformation is the optimal The A score f in protein design consists of two parts f g h g is the exact energy of the rotamers that have already been assigned in the partial conformation h is a lower bound on the energy of the rotamers that have not yet been assigned Each is designed as follows where d is the index of the last assigned residue in the partial conformation g i 1 d E r i j i 1 d E r i r j displaystyle g sum i 1 d E r i sum j i 1 d E r i r j nbsp h j d 1 n min r j E r j i 1 d E r i r j k j 1 n min r k E r j r k displaystyle h sum j d 1 n min r j E r j sum i 1 d E r i r j sum k j 1 n min r k E r j r k nbsp Integer linear programming edit Further information Linear programming Integer unknowns and Integer programming The problem of optimizing ET Equation 1 can be easily formulated as an integer linear program ILP 29 One of the most powerful formulations uses binary variables to represent the presence of a rotamer and edges in the final solution and constraints the solution to have exactly one rotamer for each residue and one pairwise interaction for each pair of residues min i r i E i r i q i r i j i r j E i j r i r j q i j r i r j displaystyle min sum i sum r i E i r i q i r i sum j neq i sum r j E ij r i r j q ij r i r j nbsp s t r i q i r i 1 i displaystyle sum r i q i r i 1 forall i nbsp r j q i j r i r j q i r i i r i j displaystyle sum r j q ij r i r j q i r i forall i r i j nbsp q i q i j 0 1 displaystyle q i q ij in 0 1 nbsp ILP solvers such as CPLEX can compute the exact optimal solution for large instances of protein design problems These solvers use a linear programming relaxation of the problem where qi and qij are allowed to take continuous values in combination with a branch and cut algorithm to search only a small portion of the conformation space for the optimal solution ILP solvers have been shown to solve many instances of the side chain placement problem 29 Message passing based approximations to the linear programming dual edit ILP solvers depend on linear programming LP algorithms such as the Simplex or barrier based methods to perform the LP relaxation at each branch These LP algorithms were developed as general purpose optimization methods and are not optimized for the protein design problem Equation 1 In consequence the LP relaxation becomes the bottleneck of ILP solvers when the problem size is large 30 Recently several alternatives based on message passing algorithms have been designed specifically for the optimization of the LP relaxation of the protein design problem These algorithms can approximate both the dual or the primal instances of the integer programming but in order to maintain guarantees on optimality they are most useful when used to approximate the dual of the protein design problem because approximating the dual guarantees that no solutions are missed Message passing based approximations include the tree reweighted max product message passing algorithm 31 32 and the message passing linear programming algorithm 33 Optimization algorithms without guarantees edit Monte Carlo and simulated annealing edit Monte Carlo is one of the most widely used algorithms for protein design In its simplest form a Monte Carlo algorithm selects a residue at random and in that residue a randomly chosen rotamer of any amino acid is evaluated 21 The new energy of the protein Enew is compared against the old energy Eold and the new rotamer is accepted with a probability of p e b E new E old displaystyle p e beta E text new E text old nbsp where b is the Boltzmann constant and the temperature T can be chosen such that in the initial rounds it is high and it is slowly annealed to overcome local minima 12 FASTER edit The FASTER algorithm uses a combination of deterministic and stochastic criteria to optimize amino acid sequences FASTER first uses DEE to eliminate rotamers that are not part of the optimal solution Then a series of iterative steps optimize the rotamer assignment 34 35 Belief propagation edit In belief propagation for protein design the algorithm exchanges messages that describe the belief that each residue has about the probability of each rotamer in neighboring residues The algorithm updates messages on every iteration and iterates until convergence or until a fixed number of iterations Convergence is not guaranteed in protein design The message mi j rj that a residue i sends to every rotamer rj at neighboring residue j is defined as m i j r j max r i e E i r i E i j r i r j T k N i j m k i r i displaystyle m i to j r j max r i Big e frac E i r i E ij r i r j T Big prod k in N i backslash j m k to i r i nbsp Both max product and sum product belief propagation have been used to optimize protein design Applications and examples of designed proteins editEnzyme design edit The design of new enzymes is a use of protein design with huge bioengineering and biomedical applications In general designing a protein structure can be different from designing an enzyme because the design of enzymes must consider many states involved in the catalytic mechanism However protein design is a prerequisite of de novo enzyme design because at the very least the design of catalysts requires a scaffold in which the catalytic mechanism can be inserted 36 Great progress in de novo enzyme design and redesign was made in the first decade of the 21st century In three major studies David Baker and coworkers de novo designed enzymes for the retro aldol reaction 37 a Kemp elimination reaction 38 and for the Diels Alder reaction 39 Furthermore Stephen Mayo and coworkers developed an iterative method to design the most efficient known enzyme for the Kemp elimination reaction 40 Also in the laboratory of Bruce Donald computational protein design was used to switch the specificity of one of the protein domains of the nonribosomal peptide synthetase that produces Gramicidin S from its natural substrate phenylalanine to other noncognate substrates including charged amino acids the redesigned enzymes had activities close to those of the wild type 41 Design for affinity edit Protein protein interactions are involved in most biotic processes Many of the hardest to treat diseases such as Alzheimer s many forms of cancer e g TP53 and human immunodeficiency virus HIV infection involve protein protein interactions Thus to treat such diseases it is desirable to design protein or protein like therapeutics that bind one of the partners of the interaction and thus disrupt the disease causing interaction This requires designing protein therapeutics for affinity toward its partner Protein protein interactions can be designed using protein design algorithms because the principles that rule protein stability also rule protein protein binding Protein protein interaction design however presents challenges not commonly present in protein design One of the most important challenges is that in general the interfaces between proteins are more polar than protein cores and binding involves a tradeoff between desolvation and hydrogen bond formation 42 To overcome this challenge Bruce Tidor and coworkers developed a method to improve the affinity of antibodies by focusing on electrostatic contributions They found that for the antibodies designed in the study reducing the desolvation costs of the residues in the interface increased the affinity of the binding pair 42 43 44 Scoring binding predictions edit Protein design energy functions must be adapted to score binding predictions because binding involves a trade off between the lowest energy conformations of the free proteins EP and EL and the lowest energy conformation of the bound complex EPL D G E P L E P E L displaystyle Delta G E PL E P E L nbsp The K algorithm approximates the binding constant of the algorithm by including conformational entropy into the free energy calculation The K algorithm considers only the lowest energy conformations of the free and bound complexes denoted by the sets P L and PL to approximate the partition functions of each complex 14 K x P L e E x R T x P e E x R T x L e E x R T displaystyle K frac sum limits x in PL e E x RT sum limits x in P e E x RT sum limits x in L e E x RT nbsp Design for specificity edit The design of protein protein interactions must be highly specific because proteins can interact with a large number of proteins successful design requires selective binders Thus protein design algorithms must be able to distinguish between on target or positive design and off target binding or negative design 2 42 One of the most prominent examples of design for specificity is the design of specific bZIP binding peptides by Amy Keating and coworkers for 19 out of the 20 bZIP families 8 of these peptides were specific for their intended partner over competing peptides 42 45 46 Further positive and negative design was also used by Anderson and coworkers to predict mutations in the active site of a drug target that conferred resistance to a new drug positive design was used to maintain wild type activity while negative design was used to disrupt binding of the drug 47 Recent computational redesign by Costas Maranas and coworkers was also capable of experimentally switching the cofactor specificity of Candida boidinii xylose reductase from NADPH to NADH 48 Protein resurfacing edit Protein resurfacing consists of designing a protein s surface while preserving the overall fold core and boundary regions of the protein intact Protein resurfacing is especially useful to alter the binding of a protein to other proteins One of the most important applications of protein resurfacing was the design of the RSC3 probe to select broadly neutralizing HIV antibodies at the NIH Vaccine Research Center First residues outside of the binding interface between the gp120 HIV envelope protein and the formerly discovered b12 antibody were selected to be designed Then the sequence spaced was selected based on evolutionary information solubility similarity with the wild type and other considerations Then the RosettaDesign software was used to find optimal sequences in the selected sequence space RSC3 was later used to discover the broadly neutralizing antibody VRC01 in the serum of a long term HIV infected non progressor individual 49 Design of globular proteins edit Globular proteins are proteins that contain a hydrophobic core and a hydrophilic surface Globular proteins often assume a stable structure unlike fibrous proteins which have multiple conformations The three dimensional structure of globular proteins is typically easier to determine through X ray crystallography and nuclear magnetic resonance than both fibrous proteins and membrane proteins which makes globular proteins more attractive for protein design than the other types of proteins Most successful protein designs have involved globular proteins Both RSD 1 and Top7 were de novo designs of globular proteins Five more protein structures were designed synthesized and verified in 2012 by the Baker group These new proteins serve no biotic function but the structures are intended to act as building blocks that can be expanded to incorporate functional active sites The structures were found computationally by using new heuristics based on analyzing the connecting loops between parts of the sequence that specify secondary structures 50 Design of membrane proteins edit Several transmembrane proteins have been successfully designed 51 along with many other membrane associated peptides and proteins 52 Recently Costas Maranas and his coworkers developed an automated tool 53 to redesign the pore size of Outer Membrane Porin Type F OmpF from E coli to any desired sub nm size and assembled them in membranes to perform precise angstrom scale separation Other applications edit One of the most desirable uses for protein design is for biosensors proteins that will sense the presence of specific compounds Some attempts in the design of biosensors include sensors for unnatural molecules including TNT 54 More recently Kuhlman and coworkers designed a biosensor of the PAK1 55 In a sense protein design is a subset of battery design further explanation needed See also editMolecular design software Protein engineering Protein structure prediction software Comparison of software for molecular mechanics modelingReferences edit Korendovych Ivan March 19 2018 Minimalist design of peptide and protein catalysts American Chemical Society Retrieved March 22 2018 a b c d e f Richardson JS Richardson DC July 1989 The de novo design of protein structures Trends in Biochemical Sciences 14 7 304 9 doi 10 1016 0968 0004 89 90070 4 PMID 2672455 a b c Dahiyat BI Mayo SL October 3 1997 De novo protein design fully automated sequence selection Science 278 5335 82 7 CiteSeerX 10 1 1 72 7304 doi 10 1126 science 278 5335 82 PMID 9311930 a b Gordon DB Marshall SA Mayo SL August 1999 Energy functions for protein design Current Opinion in Structural Biology 9 4 509 13 doi 10 1016 s0959 440x 99 80072 4 PMID 10449371 a b Harbury PB Plecs JJ Tidor B Alber T Kim PS November 20 1998 High resolution protein design with backbone freedom Science 282 5393 1462 7 doi 10 1126 science 282 5393 1462 PMID 9822371 a b c Kuhlman B Dantas G Ireton GC Varani G Stoddard BL Baker D November 21 2003 Design of a novel globular protein fold with atomic level accuracy Science 302 5649 1364 8 Bibcode 2003Sci 302 1364K doi 10 1126 science 1089427 PMID 14631033 S2CID 1939390 Sterner R Merkl R Raushel FM May 2008 Computational design of enzymes Chemistry amp Biology 15 5 421 3 doi 10 1016 j chembiol 2008 04 007 PMID 18482694 Wu X Yang ZY Li Y Hogerkorp CM Schief WR Seaman MS Zhou T Schmidt SD Wu L Xu L Longo NS McKee K O Dell S Louder MK Wycuff DL Feng Y Nason M Doria Rose N Connors M Kwong PD Roederer M Wyatt RT Nabel GJ Mascola JR August 13 2010 Rational design of envelope identifies broadly neutralizing human monoclonal antibodies to HIV 1 Science 329 5993 856 61 Bibcode 2010Sci 329 856W doi 10 1126 science 1187659 PMC 2965066 PMID 20616233 Hocker B November 8 2012 Structural biology A toolbox for protein design Nature 491 7423 204 5 Bibcode 2012Natur 491 204H doi 10 1038 491204a PMID 23135466 S2CID 4426247 a b c Lovell SC Word JM Richardson JS Richardson DC August 15 2000 The penultimate rotamer library Proteins 40 3 389 408 CiteSeerX 10 1 1 555 4071 doi 10 1002 1097 0134 20000815 40 3 lt 389 AID PROT50 gt 3 0 CO 2 2 PMID 10861930 S2CID 3055173 Shapovalov MV Dunbrack RL Jr June 8 2011 A smoothed backbone dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions Structure 19 6 844 58 doi 10 1016 j str 2011 03 019 PMC 3118414 PMID 21645855 a b Samish I MacDermaid CM Perez Aguilar JM Saven JG 2011 Theoretical and computational protein design Annual Review of Physical Chemistry 62 129 49 Bibcode 2011ARPC 62 129S doi 10 1146 annurev physchem 032210 103509 PMID 21128762 a b Mandell DJ Kortemme T August 2009 Backbone flexibility in computational protein design PDF Current Opinion in Biotechnology 20 4 420 8 doi 10 1016 j copbio 2009 07 006 PMID 19709874 a b c d e f Donald Bruce R 2011 Algorithms in Structural Molecular Biology Cambridge MA MIT Press a b Boas F E amp Harbury P B 2007 Potential energy functions for protein design Current Opinion in Structural Biology 17 2 199 204 doi 10 1016 j sbi 2007 03 006 PMID 17387014 a b c d Boas FE Harbury PB April 2007 Potential energy functions for protein design Current Opinion in Structural Biology 17 2 199 204 doi 10 1016 j sbi 2007 03 006 PMID 17387014 Vizcarra CL Mayo SL December 2005 Electrostatics in computational protein design Current Opinion in Chemical Biology 9 6 622 6 doi 10 1016 j cbpa 2005 10 014 PMID 16257567 Zhou T Georgiev I Wu X Yang ZY Dai K Finzi A Kwon YD Scheid JF Shi W Xu L Yang Y Zhu J Nussenzweig MC Sodroski J Shapiro L Nabel GJ Mascola JR Kwong PD August 13 2010 Structural basis for broad and potent neutralization of HIV 1 by antibody VRC01 Science 329 5993 811 7 Bibcode 2010Sci 329 811Z doi 10 1126 science 1192819 PMC 2981354 PMID 20616231 a href Template Cite journal html title Template Cite journal cite journal a CS1 maint multiple names authors list link Mendes J Guerois R Serrano L August 2002 Energy estimation in protein design Current Opinion in Structural Biology 12 4 441 6 doi 10 1016 s0959 440x 02 00345 7 PMID 12163065 Pierce NA Winfree E October 2002 Protein design is NP hard Protein Engineering 15 10 779 82 doi 10 1093 protein 15 10 779 PMID 12468711 a b Voigt CA Gordon DB Mayo SL June 9 2000 Trading accuracy for speed A quantitative comparison of search algorithms in protein sequence design Journal of Molecular Biology 299 3 789 803 CiteSeerX 10 1 1 138 2023 doi 10 1006 jmbi 2000 3758 PMID 10835284 Hong EJ Lippow SM Tidor B Lozano Perez T September 2009 Rotamer optimization for protein design through MAP estimation and problem size reduction Journal of Computational Chemistry 30 12 1923 45 doi 10 1002 jcc 21188 PMC 3495010 PMID 19123203 Machine learning reveals recipe for building artificial proteins phys org Retrieved August 17 2020 Russ William P Figliuzzi Matteo Stocker Christian Barrat Charlaix Pierre Socolich Michael Kast Peter Hilvert Donald Monasson Remi Cocco Simona Weigt Martin Ranganathan Rama 2020 An evolution based model for designing chorismatemutase enzymes Science 369 6502 440 445 Bibcode 2020Sci 369 440R doi 10 1126 science aba3304 PMID 32703877 S2CID 220714458 Biologists train AI to generate medicines and vaccines University of Washington Harborview Medical Center Wang Jue Lisanza Sidney Juergens David Tischer Doug Watson Joseph L Castro Karla M Ragotte Robert Saragovi Amijai Milles Lukas F Baek Minkyung Anishchenko Ivan Yang Wei Hicks Derrick R Exposit Marc Schlichthaerle Thomas Chun Jung Ho Dauparas Justas Bennett Nathaniel Wicky Basile I M Muenks Andrew DiMaio Frank Correia Bruno Ovchinnikov Sergey Baker David July 22 2022 Scaffolding protein functional sites using deep learning PDF Science 377 6604 387 394 Bibcode 2022Sci 377 387W doi 10 1126 science abn2100 ISSN 0036 8075 PMC 9621694 PMID 35862514 Gordon DB Mayo SL September 15 1999 Branch and terminate a combinatorial optimization algorithm for protein design Structure 7 9 1089 98 doi 10 1016 s0969 2126 99 80176 2 PMID 10508778 a b Leach AR Lemon AP November 1 1998 Exploring the conformational space of protein side chains using dead end elimination and the A algorithm Proteins 33 2 227 39 CiteSeerX 10 1 1 133 7986 doi 10 1002 sici 1097 0134 19981101 33 2 lt 227 aid prot7 gt 3 0 co 2 f PMID 9779790 S2CID 12872539 a b Kingsford CL Chazelle B Singh M April 1 2005 Solving and analyzing side chain positioning problems using linear and integer programming Bioinformatics 21 7 1028 36 doi 10 1093 bioinformatics bti144 PMID 15546935 Yanover Chen Talya Meltzer Yair Weiss 2006 Linear Programming Relaxations and Belief Propagation An Empirical Study Journal of Machine Learning Research 7 1887 1907 Wainwright Martin J Tommi S Jaakkola Alan S Willsky 2005 MAP estimation via agreement on trees message passing and linear programming IEEE Transactions on Information Theory 51 11 3697 3717 CiteSeerX 10 1 1 71 9565 doi 10 1109 tit 2005 856938 S2CID 10007532 Kolmogorov Vladimir October 28 2006 Convergent tree reweighted message passing for energy minimization IEEE Transactions on Pattern Analysis and Machine Intelligence 28 10 1568 1583 doi 10 1109 TPAMI 2006 200 PMID 16986540 S2CID 8616813 Globerson Amir Tommi S Jaakkola 2007 Fixing max product Convergent message passing algorithms for MAP LP relaxations Advances in Neural Information Processing Systems Allen BD Mayo SL July 30 2006 Dramatic performance enhancements for the FASTER optimization algorithm Journal of Computational Chemistry 27 10 1071 5 CiteSeerX 10 1 1 425 5418 doi 10 1002 jcc 20420 PMID 16685715 S2CID 769053 Desmet J Spriet J Lasters I July 1 2002 Fast and accurate side chain topology and energy refinement FASTER as a new method for protein structure optimization Proteins 48 1 31 43 doi 10 1002 prot 10131 PMID 12012335 S2CID 21524437 Baker D October 2010 An exciting but challenging road ahead for computational enzyme design Protein Science 19 10 1817 9 doi 10 1002 pro 481 PMC 2998717 PMID 20717908 Jiang Lin Althoff Eric A Clemente Fernando R Doyle Lindsey Rothlisberger Daniela Zanghellini Alexandre Gallaher Jasmine L Betker Jamie L Tanaka Fujie 2008 De Novo Computational Design of Retro Aldol Enzymes Science 319 5868 1387 91 Bibcode 2008Sci 319 1387J doi 10 1126 science 1152692 PMC 3431203 PMID 18323453 Rothlisberger Daniela Khersonsky Olga Wollacott Andrew M Jiang Lin Dechancie Jason Betker Jamie Gallaher Jasmine L Althoff Eric A Zanghellini Alexandre 2008 Kemp elimination catalysts by computational enzyme design Nature 453 7192 190 5 Bibcode 2008Natur 453 190R doi 10 1038 nature06879 PMID 18354394 Siegel JB Zanghellini A Lovick HM Kiss G Lambert AR St Clair JL Gallaher JL Hilvert D Gelb MH Stoddard BL Houk KN Michael FE Baker D July 16 2010 Computational design of an enzyme catalyst for a stereoselective bimolecular Diels Alder reaction Science 329 5989 309 13 Bibcode 2010Sci 329 309S doi 10 1126 science 1190239 PMC 3241958 PMID 20647463 a href Template Cite journal html title Template Cite journal cite journal a CS1 maint multiple names authors list link Privett HK Kiss G Lee TM Blomberg R Chica RA Thomas LM Hilvert D Houk KN Mayo SL March 6 2012 Iterative approach to computational enzyme design Proceedings of the National Academy of Sciences of the United States of America 109 10 3790 5 Bibcode 2012PNAS 109 3790P doi 10 1073 pnas 1118082108 PMC 3309769 PMID 22357762 Chen CY Georgiev I Anderson AC Donald BR March 10 2009 Computational structure based redesign of enzyme activity Proceedings of the National Academy of Sciences of the United States of America 106 10 3764 9 Bibcode 2009PNAS 106 3764C doi 10 1073 pnas 0900266106 PMC 2645347 PMID 19228942 a b c d Karanicolas J Kuhlman B August 2009 Computational design of affinity and specificity at protein protein interfaces Current Opinion in Structural Biology 19 4 458 63 doi 10 1016 j sbi 2009 07 005 PMC 2882636 PMID 19646858 Shoichet BK October 2007 No free energy lunch Nature Biotechnology 25 10 1109 10 doi 10 1038 nbt1007 1109 PMID 17921992 S2CID 5527226 Lippow SM Wittrup KD Tidor B October 2007 Computational design of antibody affinity improvement beyond in vivo maturation Nature Biotechnology 25 10 1171 6 doi 10 1038 nbt1336 PMC 2803018 PMID 17891135 Schreiber G Keating AE February 2011 Protein binding specificity versus promiscuity Current Opinion in Structural Biology 21 1 50 61 doi 10 1016 j sbi 2010 10 002 PMC 3053118 PMID 21071205 Grigoryan G Reinke AW Keating AE April 16 2009 Design of protein interaction specificity gives selective bZIP binding peptides Nature 458 7240 859 64 Bibcode 2009Natur 458 859G doi 10 1038 nature07885 PMC 2748673 PMID 19370028 Frey KM Georgiev I Donald BR Anderson AC August 3 2010 Predicting resistance mutations using protein design algorithms Proceedings of the National Academy of Sciences of the United States of America 107 31 13707 12 Bibcode 2010PNAS 10713707F doi 10 1073 pnas 1002162107 PMC 2922245 PMID 20643959 Khoury GA Fazelinia H Chin JW Pantazes RJ Cirino PC Maranas CD October 2009 Computational design of Candida boidinii xylose reductase for altered cofactor specificity Protein Science 18 10 2125 38 doi 10 1002 pro 227 PMC 2786976 PMID 19693930 Burton DR Weiss RA August 13 2010 AIDS HIV A boost for HIV vaccine design Science 329 5993 770 3 Bibcode 2010Sci 329 770B doi 10 1126 science 1194693 PMID 20705840 S2CID 206528638 Jessica Marshall November 7 2012 Proteins made to order Nature News Retrieved November 17 2012 Designed transmembrane alpha hairpin proteins in OPM database Designed membrane associated peptides and proteins in OPM database Chowdhury Ratul Kumar Manish Maranas Costas D Golbeck John H Baker Carol Prabhakar Jeevan Grisewood Matthew Decker Karl Shankla Manish September 10 2018 PoreDesigner for tuning solute selectivity in a robust and highly permeable outer membrane pore Nature Communications 9 1 3661 Bibcode 2018NatCo 9 3661C doi 10 1038 s41467 018 06097 1 ISSN 2041 1723 PMC 6131167 PMID 30202038 Looger Loren L Dwyer Mary A Smith James J amp Hellinga Homme W 2003 Computational design of receptor and sensor proteins with novel functions Nature 423 6936 185 190 Bibcode 2003Natur 423 185L doi 10 1038 nature01556 PMID 12736688 S2CID 4387641 Jha RK Wu YI Zawistowski JS MacNevin C Hahn KM Kuhlman B October 21 2011 Redesign of the PAK1 autoinhibitory domain for enhanced stability and affinity in biosensor applications Journal of Molecular Biology 413 2 513 22 doi 10 1016 j jmb 2011 08 022 PMC 3202338 PMID 21888918 Further reading editDonald Bruce R 2011 Algorithms in Structural Molecular Biology Cambridge MA MIT Press Sander Chris Vriend Gerrit Bazan Fernando Horovitz Amnon Nakamura Haruki Ribas Luis Finkelstein Alexei V Lockhart Andrew Merkl Rainer et al 1992 Protein Design on computers Five new proteins Shpilka Grendel Fingerclasp Leather and Aida Proteins Structure Function and Bioinformatics 12 2 105 110 doi 10 1002 prot 340120203 PMID 1603799 S2CID 38986245 Jin Wenzhen Kambara Ohki Sasakawa Hiroaki Tamura Atsuo amp Takada Shoji 2003 De Novo Design of Foldable Proteins with Smooth Folding Funnel Automated Negative Design and Experimental Verification Structure 11 5 581 590 doi 10 1016 S0969 2126 03 00075 3 PMID 12737823 Pokala Navin amp Handel Tracy M 2005 Energy Functions for Protein Design Adjustment with Protein Protein Complex Affinities Models for the Unfolded State and Negative Design of Solubility and Specificity Journal of Molecular Biology 347 1 203 227 doi 10 1016 j jmb 2004 12 019 PMID 15733929 Retrieved from https en wikipedia org w index php title Protein design amp oldid 1164580948, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.