|
Author
|
Topic: Coupled Mutations and Quantization of Functionality
|
John Bracht
Member
Member # 5
|
posted 25. February 2002 15:02
In "Proposed Mechanism for Stability of Proteins to Evolutionary Mutations," Erik Nelson and Jose Onuchic (PNAS, vol. 95, issue 18, 10682-10686, Sept 1, 1998) outline a mathematical model to understand protein folding based upon the free energy of folding: quote:
It is shown that the sequence-ordering tendencies induced by design into different fast-folding, thermodynamically stable native structures interfere. This interference results in a type of quasiorthagonality between optimal native structures, which divides sequence space into fast-folding, thermally stable families surrounded by slow-folding, low stability shells.
The authors describe the folding of proteins into stable configurations defined by their hydrophobic core residues. These configurations are separated or "orthogonalized" by high-energy frustration barriers. quote:
This results in a picture of sequence space as being populated by families, each folding to a particular coarse grained structure and each surrounded by a shell of increasingly frustrated sequences.
quote:
Specifically, because the fastest-folding, most stable sequences are those that minimize the energy of one highly-connected compact structure against all the others, the energy of a minimally frustrated sequence placed into the folded structure of the wrong sequence family will have one of the worst possible energies. Hence, the sequences and structures of the minimally frustrated modes tend to be mutually dissimilar.
The implication: quote:
Any stepwise mutational path between one minimally frustrated sequence family and another must then visit a region of slow or nonfolding sequences.
In other words, moving from one family to another requires crossing a high-energy frustration barrier along which the protein does not fold into a stable structure and is therefore incapable of performing biological functions. The implication, then, is that moving from one family to another will require more than just one small change at a time (a gradual walk). The evolutionary process must leap over the nonfunctional frustration barriers in order to produce new protein families. This requires that multiple coordinated mutations occur: quote:
Again, because the two minimal frustration sequence families are dissimilar in the way that H [hydrophobic] residues are distributed in sequence, a substantial number of exchange mutations (two or three) are required to change a sequence folding to v0 [one hypothetical protein family] into a sequence folding to v1 [another hypothetical protein family]. If we take a stepwise mutational trajectory between v0 and v1 along the least frustrated path, we must pass through a region where the sequences fold ~10 times slower, whereas if we do not take this path, the situation is much worse...If these were real proteins, this would mean that the sequences could not continuously evolve from one structure into the other, i.e., we would always encounter a region of sequences that do not fold on the order of physiological timescales.
The reason for the "coarse-graining" of sequence space is that the hydrophobic core residues of a protein must be able to achieve a low-energy configuration in which they are buried inside the structure while hydrophilic residues are exposed. It seems that in moving from one core configuration to another in a gradual fashion will inevitably reach a point where the core residues are not able to form the old stable configuration but are not yet able to form the new stable configuration either. Hence, a frustration barrier. A study by Blanco et al (J. Mol. Biol. (1999) 285, 741-753) supports this conclusion. This study involved two proteins, alpha-spectrin SH3 domain (SH3) and the B1 domain of streptococcal protein G (PG), two small dissimilar proteins. The researchers tried creating hybrid proteins to gradually move through sequence space from the SH3 protein to PG. They found that the intermediate sequences did not fold into stable structures. The researchers found that even when they added most of the residues from PG to the SH3 sequence, while maintaining the SH3 core residues, the protein did not fold into a stable shape. Only when the core SH3 residues were removed was the sequence able to fold into the PG shape. Furthermore, the SH3 structure was found be nonfolding even when only two non-core residues were mutated. This implies that folding is a holistic feature, requiring the cooperation of the hydrophobic core resides and the hydrophilic non-core residues to specify the unique low-frustration structure of a given protein family. In addition, it appears that the core residues from different protein families can actually counteract each other, producing an unstable, nonfolded protein when combined. This is what Nelson and Onuchic were referring to as the frustration barrier. Blanco et al noted that quote:
The set of sequences analyzed here are hybrids of the sequences of SH3 and PG and represent a more or less uniform sampling of sequence identities between 100 and ~10% with each protein, but only those sequences very similar to the wild-type proteins have unique folds. [emphasis added]
This agrees well with Nelson and Onuchic: quote:
Implicit in the concept of sequence design is the idea that proteins must exceed a certain level of fast folding and stability to function in biochemical processes. The frustration function [lambda] (which measures this ability) separates sequences, independent of length, into two distinct regimes. In the frustrated regime, lambda less than 1, the energy gap delta E between native and non-native (misfolded) configurations cannot be distinguished from the characteristic energy barriers between misfolded structures. This means that below the collapse (coil-globule) temperature, the chain exists in a superposition of long-lived, misfolded traps.
The implication of these two studies is that proteins exhibit a type of irreducible complexity which derives from the need to coordinate the core and non-core residues to achieve a stable fold. Moving in sequence space from one stable fold to another requires crossing a frustration barrier in which the protein is unfolded and nonfunctional. I want to call this "coarse-graining" of sequence space functional quantization. Just as irreducible complexity of biomolecular machines defines a step too large for chance alone to take, this functional quantization refers to the smallest step (functional quanta) which will move from one protein to another while maintaining selectable function. This functional quantization means that the fitness function of proteins is step-like rather than smooth. The more coordinated, coupled changes must occur, the greater the size of the step that chance alone must take (because selection is by definition incapable of operating upon a change which confers no functional advantage). As in Behe's multi-part version of irreducible complexity, if we find that the degree of coupled, coordinated mutation required to generate a new protein fold is too great to be reasonably attributed to chance, we may reasonably reject chance as the cause of that protein family. We know from experience that designing intelligences are eminently capable of coordinating multiple parts toward a desired goal, and intelligent agency may be a more reasonable hypothesis for the origination of these highly coupled changes. This quantization of function space has profound implications for current science. Many Darwinian scenarios assume a smooth gradient with finely graded steps for any adaptation. However, if function space is quantized as the literature suggests, those finely graduated steps are nonexistent. Consider what Richard Dawkins writes in The Blind Watchmaker: (chapter 4, page 77) quote:
1. Could the human eye have arisen directly from no eye at all, in a single step? 2. Could the human eye have arisen directly from something slightly different from itself, something that we may call X?The answer to Question 1 is clearly a decisive no. The odds against a 'yes' answer for questions like Question 1 are emany billions of times greater than the number of atoms in the universe. It would need a gigantic and vanishingly improbable leap across genetic hyperspace. The answer to Question 2 is equally clearly yes, privided only that the difference between the modern eye and its immediate predecessor X is sufficiently small. Provided, in other words, that they are sufficiently close to one another in the space of all possible structures. If the answer to Question 2 for any particular degree of difference is no, all we have to do is repeat the question for a smaller degree of difference. Carry on doing this until we find a degree of difference sufficiently small to give us a 'yes' answer to Question 2.
In other words, the vast improbabilities are broken down into small, more probable steps. Then, we simply assume that those small steps could realistically be achieved in any given organism. However, biological reality doesn't cooperate with this vision. In reality the steps cannot be broken down indefinitely--you reach a functional quantum limit on that step size. That step size seems to rely upon the number of changes that must be coordinated, or coupled, to generate a new protein fold. Consider the eye evolution scenario. A flat sheet of cells gradually becomes invaginated because invagination gives a selective advantage of direction sensing. Eventually a cup is formed and some sort of lens material happens to be placed over the opening. Gradually, step by step, the ability to adjust the focus of the lens, the ability to move the eye about, and the ability to adjust the size of the aperture (pupil) are aquired. Now, it is certain that this scenario will require the development of new proteins that can perform these tasks. Since these proteins will have new functions, they must have new and novel shapes, novel folds. And here is the problem. The new folds only exist when multiple parts are coordinated to the attainment of the functional structure. Dawkin's scenario fails because he assumes that every step can be broken down into smaller steps. The challenge now is to determine precisely the size of the steps that biological systems require. Notice that we are layering on another level of requirement for these new proteins, because not only must they fold stably, but they must also fold into shapes capable of performing the needed cellular tasks. In other words, stable protein folds are a necessary but not sufficient condition for protein function. Here are two recent examples of the type of biological function that I am talking about, and that appear to embody quantized function. The first comes from the latest issue of Science, an article titled "Scaffolding Proteins--More than Meets the Eye" by Gary Johnson, Science, vol 295, 15 Feb 2002. In this review of the latest research, Johnson describes a new signalling pathway by which a mitogen-activeated protein kinase (MAPK) is activated by a scaffolding protein in the cell. Apparently some sort of protein-protein interaction between scaffolding protein and kindase allows the MAPK to become activated in an entirely novel way. This proten-protein interaction is precisely the sort of interaction that is going to be quantized, because only when the two proteins achieve the proper folds relative to each other will they be able to interact to produce a signal (in this case, activation of the MAPK). The function is holistic; it's an all-or-none phenomenon. Another recent example is from the latest issue of Nature, titled "Mechanism of force generation by myosin heads in skeletal muscle" (Nature, February 7, 2002, Volume 415, Issue 6872, pp. 659-662). In this study the researchers examined the mechanism of myosin-actin force-generating interactions in muscle fibers. They found that specific conformation alterations in the myosin head are essential to the generation of force. Here again the function will be quantized; imagine a myosin protein evolving to fit a (pre-existing) actin filament. There is no way to gradually approach the function. Multiple parts of the protein must be coordinated to achive stable folding capable of binding specifically to actin and capable of transducing energy from ATP to a pulling motion. As a biology student, I have seen this sort of complex, functionally quantized interaction many times in the biological systems I've studied. I suspect that the standard Darwinian models of evolution for such systems presuppose a smooth functional gradient which simply does not exist. The future emphasis of biology should concentrate on quantifying the size of the functional quanta or steps, and understanding how these coordinated changes can be effected. This seems a very promising field for design theory. John Bracht [ 25 February 2002: Message edited by: John Bracht ]
IP: Logged
|
|
RalphW
Member
Member # 116
|
posted 25. February 2002 19:43
Listening in on the ARN discussions, I've seen a lot of people talk about "replaying the tape" of evolution as though it was pretty much a foregone conclusion that something useful would pop out. There is an underlying assumption here of a very favorable fitness function which I find highly implausible. Also, there is a lot of argument about how we might hope to determine what relevant chance hypotheses need to be eliminated when using Dembski's explanatory filter to argue for design. If some definitive work could be done on the likelihood of the formation of usefully folded proteins, this would not only go a long way towards characterizing the fitness functions, but might help to establish something like a consensus on the relevant chance hypotheses for protein formation/design.
IP: Logged
|
|
Sasquatch
Member
Member # 147
|
posted 27. February 2002 21:24
John Bract’s post is an excellent introduction into the complexities of protein folding and what they potentially mean for evolution. The Nelson and Onuchic paper shows that stable folded native structures are often surrounded in sequence space by regions of low stability, thus limiting the ability for small, stepwise change from one stable region of space to another. John writes of the Nelson and Onuchic paper: quote: In other words, moving from one family to another requires crossing a high-energy frustration barrier along which the protein does not fold into a stable structure and is therefore incapable of performing biological functions. The implication, then, is that moving from one family to another will require more than just one small change at a time (a gradual walk). The evolutionary process must leap over the nonfunctional frustration barriers in order to produce new protein families. This requires that multiple coordinated mutations occur:
The implication within the implication is that such barriers between folded native structures can’t be reached by point mutations that are the mainstay of Darwinian evolution. But of course Darwinian evolution does not rely just on point mutations. A just published paper in PNAS shows that these barriers can be “tunneled” through a combination of point mutation and crossing over. (Cui et al., Recombinatoric exploration of novel folded structures: A heteropolymer-based model of protein evolutionary landscapes., PNAS, Vol. 99, Issue 2, 809-814, January 22, 2002. full text). Cui et al. show this by using a highly coarse-grained two-dimensional hydrophobic polar model, similar to that of Nelson and Onuchic, with recombination. This allows for the often overlooked mechanism of recombination to complement point mutations in such a way as to promote diversity of structure that might otherwise be rare. The results are interesting. By studying crossovers alone, Cui et al. show with their model that “overall, 13.0% of all nonhomologous crossover events and 6.91% of all nonhomologous crossover offspring lead to new folds. (Table 2).” This is in contrast to the homologous crossovers of which only 0.346% lead to viable “offspring” with new folds. Cui et al. liken this ability to “tunneling” between regions of high fitness sequence space. They remark: quote: Evolutionary explorations by point mutations may be likened to diffusion. Their extent is limited on a fragmented mortality landscape, because sequences belonging to different networks (Table 1) are beyond reach. On the other hand, crossovers can "tunnel" through the infinitely high mortality barriers between networks. Fig. 2 shows that with point mutations alone, even if one starts in the largest network a substantial fraction of viable sequences and structures cannot be explored (dotted curves). In contrast, with both point mutations and crossovers, practically all viable sequences (6,347 of 6,349) can be reached from the dominant network (see Fig. 7, which is published as supporting information on the PNAS web site, www.pnas.org).
{emphasis added}The figure below, shown with caption, gives an idea of how this may occur. (Note: the valleys correspond to high fitness, the peaks to low fitness.)  “ Fig. 3. Tunneling underneath a mortality landscape. (A) shows a 25- and a 15-step optimal point-mutation path that lead from sequences B and D to sequence C (shown in their unique native conformations), respectively. The graduations on the horizontal scale correspond to 41 different unique sequences that collectively encode 19 different structures (marked by vertical shadings, two of the regions encode for the same structure.) The 0 vs. sequence profile is the mortality landscape along these point-mutation paths. The dotted arrows in A depict the "tunneling effect" of a crossover between sequences B and D that leads directly to sequence C. In this crossover, the 11-monomer segment (enclosed by dotted boxes) inherited by sequence C from parent sequence B acts similar to an "autonomous folding unit."
And just below in the text, the authors point out that “Evolutionarily, the crossover in Fig. 3 has the important advantage of preserving the autonomous folding unit. In contrast, if the point-mutation-only route from B to C in Fig. 3 were taken, this unit would be dismantled first along the path before it could be reassembled.” In their concluding remarks, Cui et al. state: quote: We have presented a simple structure-based study of evolution. Notwithstanding the present model's extreme simplicity, protein-like features such as autonomous folding units arise naturally from its minimalist construct. Segment analyses suggest that crossovers can be a much more effective means to explore new viable sequences than one might have hitherto posited, and nonhomologous recombination is seen as an efficient tool of structural innovation. These theoretical predictions may help elaborate the schema idea (7) of modular evolution (49) as well as the foldon concept (47) and are testable by experiments. The present results also bear on the evolution of sexual reproduction (50, 51). It is hoped that the insight gained from this effort would shed light not only on in vivo evolution but would facilitate the development of in vitro evolution technology as well (52, 53).
These results are perhaps not too surprising. Bogarad and Deem ( A hierarchical approach to protein molecular evolution, PNAS Vol. 96, Issue 6, 2591-2595, March 16, 1999. full text) had shown earlier that recombination greatly increases the ability for searching sequence space over point mutation alone: quote: Abstract: Biological diversity has evolved despite the essentially infinite complexity of protein sequence space. We present a hierarchical approach to the efficient searching of this space and quantify the evolutionary potential of our approach with Monte Carlo simulations. These simulations demonstrate that nonhomologous juxtaposition of encoded structure is the rate-limiting step in the production of new tertiary protein folds. Nonhomologous "swapping" of low-energy secondary structures increased the binding constant of a simulated protein by [approx.]10^7 relative to base substitution alone. Applications of our approach include the generation of new protein folds and modeling the molecular evolution of disease.
Another way that the ability of recombination to generate structural and functional diversity is shown with artificial evolution studies. In particular, in vitro evolution using DNA swapping (analogous to recombination) has been shown to improve evolutionary rates by orders of magnitude (see for example, Harayama S., Artificial evolution by DNA shuffling. Trends Biotechnol 1998 Feb;16(2):76-82.) Furthermore, merely by juxtaposing various folding subdomains through exon shuffling or gene fusion, many new novel protein functions can be generated without the necessity of new folds (see for example, Hopfner et al, New enzyme lineages by subdomain shuffling, PNAS USA 1998 Aug 18;95(17):9813-8.; and for a review of exon shuffling, see Patthy L., Genome evolution and the evolution of exon-shuffling--a review., Gene 1999 Sep 30;238(1):103-14; the literature is full of examples of exon shuffling.) And finally, gene and/or domain duplication is worth considering as well. This is especially relevant since a newly arisen duplicate is under less (or no) selective pressure, and can therefore explore sequence space more easily, possibly traversing the space that is otherwise very low fitness. The available references on that are extremely numerous and none will be given here, but they too provide excellent examples of novel gene evolution. All in all, there is a vast array of known natural mechanisms for generating novel protein function that selection has to work on. Further studies and new technology will undoubtedly shed more light on this. In conclusion, John Bract’s idea of quantifying functional steps between related proteins is intriguing and perhaps very useful, but it remains to be seen whether or not such steps can indeed be easily quantified, and more importantly, whether or not they would represent a barrier to natural evolution. Love, The Sasquatch [ 27 February 2002: Message edited by: Sasquatch ]
IP: Logged
|
|
RalphW
Member
Member # 116
|
posted 28. February 2002 16:56
I'd be interested in seeing this discussion applied to OOL as well, where the focus would be not on the transitions between protein groups, but the origin of the first groups. Given the sparsity of soluble proteins in the space of possible strings of amino acids, how do we account for the origin of the first biologically useful proteins? How did early proto-life forms prevent their precious stock of free amino acids from being soaked up in useless insoluble proteins?
IP: Logged
|
|
RalphW
Member
Member # 116
|
posted 28. February 2002 17:39
A discussion on ARN may provide an interesting testbed for this concept(DNA sliding rings). Is there a path between the different and very dissimilar groups of DNA sliding rings, or are they separated by a void?
IP: Logged
|
|
John Bracht
Member
Member # 5
|
posted 01. March 2002 12:16
Sasquatch's post raises some very good points in response to my contention that protein folding space is "coarse-grained" and thus functionality is quantized. While not disagreeing with this basic premise, Sasquatch proposed a mechanism that can take steps larger than the frustration barriers between protein families. Therefore, the barriers are not a problem for evolving new protein families.Sasquatch may be right on this. However, I would have to say that the jury is definitely still out, since even the papers he cites give some theoretical reason for wondering whether nonhomologous recombination is effective on rugged (quantized) evolutionary landscapes, and whether these mechanisms are even used widely in biology for generating novel protein folds. Perhaps most importantly, he neglects to cite the end of the Cui et al paper where they show that the benefit provided by nonhomologous recombinations decreases as the ruggedness of the fitness landscape increases, and in a very rugged landscape provides little or no benefit. Since my original argument was that the fitness landscape is highly rugged (quantized), the problem of generating novel protein folds remains just as stark as ever. First, a quick explanation of homologous versus nonhomologous recombination. Homologous recombination refers to crossover between genes that encode similar proteins. For instance, in diploid reproduction the crossover occurs between the two pairs of identical chromosomes. This sort of crossover, often referred to as "DNA shuffling," tends to combine similar with similar proteins. Nonhomologous recombination, on the other hand, is the recombination of different stretches of DNA. An example of this would be two different genes (not two alleles of the same gene) recombining. Nonhomologous recombination is the type which is supposed to generate novelty. It's easy to see why: if you recombine two genes that are similar, the result won't be much different. But recombination of two very different genes might give rise to a structure vastly different from either "parent" sequence. Let's imagine working backwords. Imagine two parent sequences have recombined to give a daughter sequence with stable, unique structure. Remember what I posted earlier about the folding of proteins: quote:
[referring to the Blanco et al study]: The researchers found that even when they added most of the residues from PG to the SH3 sequence, while maintaining the SH3 core residues, the protein did not fold into a stable shape. Only when the core SH3 residues were removed was the sequence able to fold into the PG shape. Furthermore, the SH3 structure was found be nonfolding even when only two non-core residues were mutated. This implies that folding is a holistic feature, requiring the cooperation of the hydrophobic core resides and the hydrophilic non-core residues to specify the unique low-frustration structure of a given protein family.
The implication is that the new daughter protein, in order to fold stably, must have precisely coordinated hydrophobic and hydrophilic residues that work together to produce a stable structure. This, in turn, means that not just any parents will do--only certain ones which contain just the right complementary sequences which, when combined (at the correct position in the sequence), give a unique stable daughter fold. Two questions arise: (1) what are the probabilities associated with getting this correct combination, and (2) where did the parent sequences come from? If the probabilities in (1) are too low, nonhomologous recombination won't buy you very much. Indeed, it's interesting that the only recombination that is "programmed" into meiosis is homologous recombination. Why is this? I suspet the answer is that nonhomologous recombination would be simply too disruptive and devastating to the cell. Imagine if your reproductive cells went through a stage of nonhomologous recombination in which various genes were spliced together randomly. Imagine the genetic chaos that would result. I suspect that very few viable offspring would be produced by those reproductive cells that engaged in nonhomologous recombination. Instead, cells go through a very orderly phase of homologous recombination in which similar genes cross over. This maintains the order and structure of genetic material necessary for life. Listen to what Bogarad and Deem (PNAS 1999, cited by Sasquatch) had to say: quote:
We address here, form a theoretical point of view, the question of how protein space can be searched efficiently and thoroughly, either in the laboratory or in Nature. We demonstrate that point miutation alone is incapable of evolving systems with substantially new protrein folds. We demonstrate further that even the DNA shuffling approach is incapable of evolving substantially new protein folds. Our Monte Carlo simulations demonstrate that nonhomologous DNA "swapping" of low-energy structures is a key step in searching protein space.
Thus, the curious thing is that the only technique which generates novel protein folds is not regularly utilized by biological systems, while point mutations and homologous recombination are normal parts of ordinary reproduction but are incapable of generating new folds. I know that such events as rearrangements, deletions, horizontal transfers and transpositions are events that occur and can cause nonhomologous recombination. But those events occur with much lower frequency relative to homologous recombination and point mutation. If anyone has some citations on the frequencies of these events in biological systems, I would be very interested in reading them. Question (2) (above) asks where the parent sequences came from. It seems that we can only regress back a certain distance with nonhomologous recombination before we have to rely upon point mutation to generate enough diversity for nonhomologous recombination to produce novel folds. Remember, only certain parents, when combined, produce novel folds. Therefore, it is reasonable to assume that some base level of diversity of protein folds must be present before nonhomologous recombination can take over. What is this base level? I don't know and I'm not sure anyone does. Again, if anyone has citations on this I would love to look at them. In support of the above argument, Cui et al even comment: quote:
The power of [nonhomologous] recombination is in amplifying existing diversity, not in generating a high degree of diversity from a very small number of starting sequences.
The question is, where did this "existing diversity" come from? It seems that it must have come from point mutations operating alone. Let's look at Cui et al, where a model similar to that used by Nelson and Onuchic (emphasizing hydrophobic/hydrophilic interactions) was used with recombination. Interestingly, they found that quote:
Only 3,428 (3.0% of all point mutations) lead ot new structures. This is significantly lower than the structural innovation rates of crossovers. Overall, 6.90% of all [nonhomologous] crossover offspring encode for new structures, corresponding to a structural innovation rate of 7.67% among crossover offspring that are different from either parent.
While Cui et al say there is a "significant" difference between point mutations and crossovers, it seems that crossovers are only about twice as good as point mutations for finding novel structures. Furthermore, homologous crossovers performed abysmally with only 0.688% producing new structures. Looking further in the paper, we find the following: quote:
When the landscape is rugged, the number of sequence explored by point mutations alone is comparable to that explored by point mutations plus [nonhomologous] crossovers. This is because point mutations are more effective in finding a low-mortaility area from an already well populated spot nearby, whreas when the landsape is rugged many crossover offspring are likely to end up at high-mortality spots.
Here is the data used to come to this conclusion: Figure 6 (alpha is a measure of landscape ruggedness):
 Caption: Fig. 6. The number of sequences (A) and structures (B) explored after 5,000 generations (as defined in the Fig. 4 legend, GN = 100) with point mutations plus crossovers (µm = 0.09, µc = 0.01; solid curves) and with point mutations alone (µm = 0.1, µc = 0; dashed curves), both plotted as functions of [alpha ]. The smallest [alpha ] value considered is 0.05. click here for larger version Notice how as the landscape ruggedness increases, the effectiveness of nonhomologous recombinations versus point mutations decreases. Thus, it seems that rugged (functionally quantized) landscapes are crippling for both point mutation and nonhomologous crossover. The reason is that a large step, like nonhomologous crossover, far more likely to get you to a nonfolding, nonfunctional sequence than to a stably folding, functional one. What I suspect is happening is that requirements for stable folding are becoming so stringent that the probabilities for question (1) above are just too low for recombination to do you any good. In other words, perhaps one or more of the needed parental sequences are not present, or the probability is just too small that the correct two will recombine in precisely the right way to produce a functional daughter sequence. The fact that the folding requirements on proteins are indeed very stringent is supported by the evidence from Nelson et al and Blanco et al cited in my original post. Perhaps most importantly, my original point was that the landscape is extremely rugged; the function space is quantized and thus cannot be easily traversed by point mutation. This study shows that the problem isn't solved by nonhomologous recombination, either. John Bracht P.S. I will be leaving this afternoon for a spring break trip and I won't have access to a computer. So I will be absent from this thread (and board) for the next week. [ 01 March 2002: Message edited by: John Bracht ]
IP: Logged
|
|
John Bracht
Member
Member # 5
|
posted 02. April 2002 18:11
The Role of Core Packing in Achieving Folding Specificity
I have been doing some more research on this topic of protein folding, and how it relates to my original thesis that proteins are, in some sense, irreducibly complex in terms of the relationship between amino acid sequence and stable, useful 3D (tertiary) structure; that the protein sequence space is extremely rugged or conversely that the islands of functionality are very small in the enormous sea of possibilities.
I will focus on three papers:
1. a study by Bassil I. Dahiyat and Stephen L. Mayo in PNAS titled "Probing the Role of packing specificity in protein design" (PNAS, vol 94, pp 10172-7, Sept 1997)
2. a commentary by H.W. Hellinga titled "Rational protein design: Combining theory and experiment" (PNAS, vol 94, pp 10015-7)
3. a minireview by James R. Beasley and Michael H. Hecht titled "Protein Design: The Choice of de Novo Sequences" (Journal of Biological Chemistry, vol. 272(4), pp 2031-4, Jan 24 1997)
These papers all highlight the importance of hydrophobic side chain packing in achieving specificity and stability of protein fold.
It is well known that the primary driving force in protein folding is the drive to bury hydrophobic sequences on the interior while exposing hydrophillic residues to the aqueous solvent. Indeed, Hellinga notes that by simply incorporating the correct pattern (called a "binary pattern") of hydrophobic and hydrophillic residues, the desired basic structure can be produced quite reliably and easily. But there seems to be a fundamental problem:
From Hellinga: quote:
However, the local details were found to be difficult to get correct. The interiors of these designed proteins show a high degree of disorder, which does not resemble the tightly packed, unique arrangement of natural systems. Global correctness in these designs apparently resulted from incorporation of the correct "binary pattern" of hydrophobic and hydrophilic residues, which sets up the geometric specification of the protein interior and exterior for the hydrophobic effect to act on. The difficulty in designing well-ordered cores can be viewed as a problem in specificity. The side chains in a disordered core adopt many alternative conformations of approximately equal energy, instead of assuming a single, specific arrangement.
Beasley and Hecht comment,
quote:
The importance of good packing in natural proteins is evident both from analyzing their structures and mutating their sequences. The hydrophobic cores of wild-type structures are invariably well packed with densities approaching those seen in crystals of small organic molecules. Mutations that reduce packing density typically render a protein less stable, while those that improve packing yield proteins with enhanced stability. Packing also plays an important role in the structures of de novo proteins. By using different arrangements of nonpolar residues to redesign the hydrophobic core of Rop, Munson et al demonstrated that size, shape, and relative location of side chains can specify both the stability and the "native-like" properties of a protein. Underpacking yielded proteins that were not stable, whereas overpacking yielded structures that were stable but not native-like.
[bold emphasis added]
This last sentence is key; stable proteins may not be specific enough to be functional (or native-like). Indeed, Davidson and Sauer (PNAS vol 91, pp 2146-50, march 1994) constructed highly hydrophobic "QLR" proteins that were extremely stable--nearly indestructible--and it is proposed that they take on a "molten" structure in which many different structures are possible. (These proteins were so hydrophobic that they were insoluble in aqueous solution and had to be kept in 6 mol Gdn*HCL, a denaturing agent!)
So how do we ensure that the protein only takes on one structure (in other words, how do we embue a protein with specificity)? There are two ways to do this:
1. Adjust the hydrophobic residues till they can pack in an orderly fashion
2. Alter the sequence in such a way that alternative conformations are energetically disfavored (the negative design approach)
The study by Dahiyat and Mayo applies directly to option (1) above. In this study, they took the sequence of streptococcal protein G beta-1, and held the hydrophilic residues constant while varying the hydrophobic (core) residues. They only used 7 hydrophobic residues as the pool of options from which to construct various randomized core sequences. These different proteins were evaluated via a computer algorithm and the ones that scored well were synthesized to compare theoretical and experimental results. The result was that only certain highly specific sequences of hydrophobic residues were able to fold in a specific, native-like manner. The authors' main argument in the paper was that hydrophobic packing should be included in computer models of protein folding in order to design more native-like proteins (exhibiting structural specificity).
One particularly interesting result was a sequence (called alpha85) that seemed "overpacked" in the sense that it seemed to fluctuate between several different packing arrangements, though it was fairly stable overall. A variant was constructed, labelled alpha85W43V in which the large, bulky hydrophobic tryptophan was replaced by a smaller (still hydrophobic) valine. Interestingly, the new sequence was not as stable as the original, but it exhibited greater specificity (by various measures like NMR spectra and amide exchange. The authors note "Alpha85W43V appears to have improved structural specificity at the expense of stability, a phenomenon observed previously in coiled coils."
This is a common theme: structural stability is not enough. The protein also needs to have specificity, the ability to fold into only one preferred configuration. Hellinga comments "Another approach [to achieving specificity] is to lower the free energy of the desired state by searching for a core-forming sequence with the lowest possible free energy that can be located in the entire space of sequences and their conformations ("target state optimization"). [after a reference to the Dahiyat and Mayo study]...it is necessary to predict sequences that fit exquisitely." The bottom line is that precise packing details matter.
Beasely and Hecht comment on negative design (option 2 above):
quote:
Merely designing favorable interactions in the folded state of a protein is not sufficient to generate a unique structure. It is equally important to design against competing alternatives. This is sometims described as "negative design". Some examples of negative design are fairly simple, such as the incorporation of a glycine or proline to disfavor continuation of a helix and thereby enhance the likelihood of a desired turn. Others are more subtle. For example, inclusion of polar residues at key position throughout a sequence can disfavor "wrong" hydrophobic cores and thereby favor the formation of the desired unique structure. Experimental support for this suggestion comes from the work of Raleigh et al., who showed that incorporation of polar residues at the interface between buried and exposed alpha-helical surfaces enhances the native-like properties of their alpha-2 peptide.
The way to achieve specificity, then, is to design for the optimum set of core residues, and to design against alternative, competing conformations. As we saw with the Dahiyat/Mayo paper, achieving specificity may require trading off some stability.
This all seems reasonable, but one question worth pondering is: what is ultimately responsible for the overall protein fold? The hydrophobic core residues, or the hydrophilic residues? The answer is: both--they are dependent upon each other and neither alone is sufficient to specify the fold. In a sense, the information required to specify the protein fold is distributed throughout the entire sequence and not localized in one type of residue. The overall fold is a collective, emergent property which cannot be reduced to any particular amino acid or subset of amino acids.
For instance, Blanco et al. (cited in my first post on this thread) noted
quote:
...the hydrophobic core is a global feature of all folded proteins but a particular core does not define a set of interactions as exclusive, with the ability to specify the fold. The core is necessary to stabilize a structure, but it is not sufficient to specify it.
And, as we've already seen in the Dahiyat and Mayo study, holding the hydrophilic surface residues constant is not enough to specify the fold either; the fold only emerges when the correct set of hydrophobic residues are added. The specific protein fold is a cooperative effort that is only achieved through numerous subtle interactions which are difficult to measure.
What does this imply for an evolutionary process? As I stated earlier, the specific requirements for protein folding amount to a sort of irreducible complexity of the protein in which multiple coordinated amino acids (both hydrophilic and hydrophobic) must be assembled. Unlike most machines in which certain subcategories of parts can take on their own form and subfunction apart from the whole, amino acid subsequences don't take on the correct configuration unless they are in context with other parts of the protein sequence. The interdependence is striking and profound--far beyond what is found in man-made machines. It is akin to a machine part, say a camshaft or a piston, being just a blob of metal until the engine block is also present.
This remarkable emergent functional holism is just another reason protein space is functionally quantized; why the protein folding landscape is extremely rugged.
John Bracht [ 02 April 2002, 18:13: Message edited by: John Bracht ]
IP: Logged
|
|
|