Member # 5
posted 08. April 2002 14:50
The question everyone seems to be asking (after we unveiled the MESA, Monotonic Evolutionary Simulation Algorithm) is, "so?" "What's the point?"
In this post I hope to make that a bit clearer. (If you are unfamiliar with MESA, I suggest you go here to learn more.)
The MESA program is a generic model of an evolutionary process, and grants the evolutionary process the best possible fitness function: one that is smooth, and gradually slopes up to a single, unique, global optimum. The model then introduces what we call "coupling." Coupling simply implies that some bits cannot mutate independently but rather must change in coordinated ways to give selective advantage. This is an accurate picture of both protein folding and binding site requirements. In any case, this coupling effectively adds "steps" on the sides of this mountain, effectively making it look more like a rice-paddy or Mayan temple. These steps have large flat areas where no selective advantage is gained, then sudden jumps when all the bits change to the correct state and the organism reaches a new level. As Micah Sparacio has pointed out, this coupling is akin to Richard Dawkin's METHINKS IT IS LIKE A WEASEL example but with the requirement that no selective advantage is gained until an entire word is produced, while mutations only change individual letters. It is our goal with the program to experiment with how much coupled change can be produced via Darwinian mechanisms, and at what point does the coupling provide a real problem for an evolutionary scenario. This is where you can help. Run the program. Play with it. And let us know any interesting findings.
Coupling can be thought of as a sort of irreducible complexity occurring at the level of an individual protein. There are many examples of this from the primary literature, but I'll just cite a few. A paper by Nelson and Onuchic (http://www.pnas.org/cgi/content/full/95/18/10682) explores a simple model of protein folding which involves the arrangement of two types of amino acids: hydrophobic (water-hating) and hydrophillic (water-loving). They found that proteins are expected to fold into distinct, dissimilar classes, and that they will be "frustrated" or unstable and unfolding in the areas between them. They comment:
Again, because the two minimal frustration sequence families are dissimilar in the way that H [hydrophobic] residues [amino acids] are distributed in sequence, a substantial number of exchange mutations (two or three) are required to change a sequence folding to v0 into a sequence folding to v1. If we take a stepwise mutational trajectory between v0 and v1 [as in a Darwinian, gradualistic route) along the least frustrated path, we must pass through a region where the sequences fold approx. 10 times slower, whereas if we do not take this path, the situation is much worse. If sequences are required to fold faster, and be more stable than those at the cusp lambda(2/3) in the frustration function, i.e., if lambda0 (a frustration threshold), exceeds lambda(2/3), then all the sequences between the two families folding to v0 and v1 are excluded. If these were real proteins, this would mean that the sequences could not continuously evolve from one structure into the other, i.e., we would always encounter a region of sequences that do not fold on the order of physiological timescales.
And a picture says a thousand words:
Fig. 4. Bar graph schematic of the model sequence space. The shaded areas correspond to sequence families folding to the v0 and v1 core geometries. The height of each shaded bar is the logarithm of the number of sequences log N(x) folding to the structure type at x. The unshaded (excluded) region contains only frustrated FRUST sequences. When lambda 0 passes below the frustration maxima in Fig. 3, this unshaded region is physiologically excluded. A spontaneous double or triple exchange mutation is required to mutate across the gap.
It is these gaps (excluded regions) in protein folding space that we are modelling with the coupling feature of the MESA algorithm. The gaps between stable protein folds require multiple coordinated changes to occur in the transition from one protein fold to another. And, as I argued in this thread, the complexity of real protein folding (as opposed to the simple model used by Nelson and Onuchic) induces extreme ruggedness and many frustrated, nonfolding gaps in the protein folding landscape. Indeed, experiments (such as those by Blanco et al) suggest that most of protein sequence space is nonfolding.
Furthermore, I just got through reading an article by Doug Axe (JMB 2000, 301:585-95. "Extreme Functional Sensitivity to Conservative Amino Acid Changes on Enzyme Exteriors") which suggests that even the hydrophillic amino acids on protein exteriors are under some surprisingly stringent constraints (even to conservative mutations) and contribute in essential ways to the protein fold and stability. This means that both the hydrophobic and hydrophilic residues are key in determining the fold, and that protein folding space is even more divided up and rugged than I had thought.
For some concrete examples of biological coupling, here are some quotes from the thread "Coupled Mutations and the Quantization of Functionality":
quote:For an even more recent example of biological coupling, take a look at the March 8 issue of Science (295:1863-8), an article by Jormakka et al titled "Molecular Basis of Proton Motive Force Generation: Structure of Formate Dehydrogenase-N." These researchers determined the structure of a new enzyme complex to a resolution of 1.6 angstroms, and found that the system contains four iron-sulphur clusters that form an "electric wire" (they actually use the word) to transport electrons from formate to a quinone molecule. Now, certainly the mutations that formed this electric wire (the iron-suphur groups) will be highly coupled--they have to all be present in order for the system to achieve functionality.
Here are two recent examples of the type of biological function that I am talking about, and that appear to embody quantized function. The first comes from the latest issue of Science, an article titled "Scaffolding Proteins--More than Meets the Eye" by Gary Johnson, Science, vol 295, 15 Feb 2002. In this review of the latest research, Johnson describes a new signalling pathway by which a mitogen-activeated protein kinase (MAPK) is activated by a scaffolding protein in the cell. Apparently some sort of protein-protein interaction between scaffolding protein and kindase allows the MAPK to become activated in an entirely novel way. This proten-protein interaction is precisely the sort of interaction that is going to be quantized, because only when the two proteins achieve the proper folds relative to each other will they be able to interact to produce a signal (in this case, activation of the MAPK). The function is holistic; it's an all-or-none phenomenon. [And, I might add, the changes that produce that function are highly coupled.]
Another recent example is from the latest issue of Nature , titled "Mechanism of force generation by myosin heads in skeletal muscle" (Nature, February 7, 2002, Volume 415, Issue 6872, pp. 659-662). In this study the researchers examined the mechanism of myosin-actin force-generating interactions in muscle fibers. They found that specific conformation alterations in the myosin head are essential to the generation of force. Here again the function will be quantized; imagine a myosin protein evolving to fit a (pre-existing) actin filament. There is no way to gradually approach the function. Multiple parts of the protein must be coordinated to achive stable folding capable of binding specifically to actin and capable of transducing energy from ATP to a pulling motion. [Thus, in this case too the function will require coupled, coordinated changes that provide no selective advantage until they are all present.]
As a final note, we would like to link the MESA program explicitly with biological reality by estimating how much information is in those coupled changes and how much information is contained in biologically coupled mutations. If we find that the amount of coupling in biology is far beyond what is reasonable to achieve in the MESA algorithm (by requiring too may generations to have reasonably existed in the history of the earth), we will have posed a serious challenge to a Darwinian explanation of these structures.
So, how much information is contained in a bit versus an amino acid? Well, information is defined in terms of what is ruled out, or excluded, by that piece of information. So a bit is a choce between two options. Likewise, an amino acid is a choice between 20 options (there are 20 amino acids used in proteins). So, it seems to me that an amino acid contains roughly 10 times the amount of information in a single bit. So if we find that an evolutionary algorithm can "solve" a coupling of 20 bits but not 30 bits, that corresponds to roughly 2 or 3 amino acids, respectively. This seems to be what we've found just in informal tests: it is difficult but possible to solve a 20 bit coupling (with many millions of generations) and 30 bits has proved extremely difficult to solve; one simulation found it in several million generations but another computer has been running for 3 days straight and hasn't come to a solution yet. Now, many active sites on proteins comprise at least 5-10 amino acids that are required for function and are highly conserved. Thus, it seems that we may be finding the limits of what the Darwinian mechanism can do--and it seems to be falling short of what we see in biology.
[ 08 April 2002, 15:51: Message edited by: John Bracht ]