|
Author
|
Topic: Ka/Ks and positive selection
|
Kirk Durston
Member
Member # 174
|
posted 13. March 2002 16:39
If, as Janitor suggests, to account for high Ka/Ks ratios we credit the natural system with being constrained by some sort of program, actively selecting, searching out a least path in design space and performing induction, then I'd say that the natural system had something in 'mind' and we are right back to ID. It still seems to me that Darwinists are sneaking ID into natural selection and calling it 'positive selection.' What would allay my fears would be a rigorous explanation of how natural selection predicts high Ka/Ks ratios, without glossing over the details. That explanation may be out there, but I still haven't seen it.
IP: Logged
|
|
Drosera
Member
Member # 139
|
posted 13. March 2002 17:12
quote:
If, as Janitor suggests, to account for high Ka/Ks ratios we credit the natural system with being constrained by some sort of program, actively selecting, searching out a least path in design space and performing induction, then I'd say that the natural system had something in 'mind' and we are right back to ID. It still seems to me that Darwinists are sneaking ID into natural selection and calling it 'positive selection.' What would allay my fears would be a rigorous explanation of how natural selection predicts high Ka/Ks ratios, without glossing over the details. That explanation may be out there, but I still haven't seen it.
Your wish is my command. Amazing what is on google these days...
eVOLUTIONARY mECHANISMS wEB pAGE http://www.biology.duke.edu/rausher/evmech/EMhand.html
Ka/Ks Ratios (KAKS) http://www.biology.duke.edu/rausher/evmech/kaks.pdf
...some good introductory stuff on the underlying population genetics is there.
Drosera
IP: Logged
|
|
Kirk Durston
Member
Member # 174
|
posted 13. March 2002 18:13
Regarding Charlie D's recent post, I have no problem with the rapid divergence of duplicated genes given that the duplicated gene will have fewer constraints as long as the 'original' gene can continue to perform its required function. It seems to me that if the duplicated gene is not needed to perform the original function, and it has not yet evolved to perform the new function, then there will be virtually no constraints on what sort of substitutions occur per site, be they synonymous or non-synonymous. The situation would be similar to the pseudogene scenario that Johnson discusses, which would yield an expected Ka/Ks of approximately unity. Once the evolution of the paralogue has gone far enough to begin to achieve a new, positive function, albeit poorly, then natural selection will cut in and begin to restrict the non-synonymous substitutions while letting the synonymous substitutions proceed, as the paralogue is optimized to the new function through natural selection. The effect this will have (so far as I can see) is to reduce the number of options for Sa but not Ss, which will result in a Ka/Ks ratio that drops below unity. In other words, the most rapid evolution occurs when no constraints are placed on the evolving gene by natural selection, but under this scenario, Ka is roughly equal to Ks. What am I missing in describing the evolutionary trajectory of the paralogue and its effect on Ka/Ks?
Regarding Charlie D's suggestion that ID might involve a higher mutation rate in potentially adaptive sites, I think this is relevant to this topic. First, if I were designing a genome that would be capable of optimizing for variations that may occur within function demands, then one way of doing it would be to ensure that the adaptive sites would have a higher mutation rate. However, there is the danger that too high a mutation rate would result in loss of function for a single gene. If this were the case, and especially if it were the case that a protein needed to have multiple, fine-tuned roles within a cell, then I would construct the genome such that that region was more prone to duplication in order to have the necessary redundancy to ensure both functionality for all required roles, as well as the ability to have paralogues free to be fine-tuned to suit the various functions which require the same general 3-D structure. Given this level of redundancy, high mutation rates in adaptive sites would be useful and, if high enough, could produce a higher Ka than Ks. In other words, under ID, if I see multiple paralogues within a particular genome, I would immediately suspect that the protein has different roles within the cell and these roles are essential enough for the intelligent agent to build in a propensity for duplication. If I, as the agent, chose to be more directly involved in the fine tuning. I would simply make the necessary non-synonymous changes, producing high Ka/Ks ratios as a consequence.
Something else. Blanco et al., have shown that folding sequence space for a protein is bounded by non-folding sequence space, like an island in an ocean. If functional proteins require folding (which seems to be the case for most enzymes) then natural selection can only work within the island of folding sequence space for a given protein. If fine tuning for multiple functions requires relatively minor sequence changes, then natural selection can be expected to 'find' the optimal solution, given duplicated genes to work with. However, if optimization requires sequences that are at opposite sides of the island, then the probability of random mutations throwing up the distant orthologue within a reasonable amount of time might be too low for an intelligent agent to accept. If this is the case, and if I were in charge, I might arrange for necessary non-synonymous tweaking to get there sooner, which would have the effect of a high Ka/Ks ratio. This is why I suggested at the outset to see if there was a correlation between widely divergent paralogues and a high Ka/Ks ratio. In other words, for multiple paralogues within a given genome, if the paralogues show little divergence, then I would predict a low Ka/Ks ratio under ID, but if the divergence is very large, I would expect a point to be reached after which the Ka/Ks ratios suddenly increase above 1 (given the same amount of evolutionary time). I can think of reasons why this might not be the case, even under ID, but I'll stop here.
IP: Logged
|
|
charlie d.
Member
Member # 159
|
posted 13. March 2002 20:16
Kirk: what I think you are missing is the fact that the chances of fixation for a neutral mutation (such as a synonymous substitution) are actually very small. I am not an evolutionary biologist, and I think you should not just take my word, but as far as i remember, the probability of fixation for a neutral mutation is inversely proportional to the population size, something like P=1/2N (where N is the number of individuals in a population taht are reproductively active at any given time). Thus, for any given synonymous substitution that becomes fixed, there are at least hundreds, probably many thousands, which just disappear without a trace.
On the other hand, for a mutation that gives a selective advantage, the probability of fixation is something like P=2s (with s the selective advantage). That is, a mutation that confers even just a 2 percent selective advantage has a chance of fixation of about 4%. (Please don't ask me how these equations are derived, I learned them too long ago. However, I am sure they are in any molecular evolutionary genetics textbook, like Nei's). Thus, advantageous mutations have a chance of fixation orders of magnitude higher than neutral mutations (or, to put it the other way around, in reasonably large populations, for 1 neutral mutation that becomes fixed, hundreds of advantageous mutations can).
In a nutshell, I think that's how Ka can largely overcome Ks if selective pressure (i.e. reproductive advantage for certaint mutants) is strong enough.
As for your folding discussion, most proteins are highly sensitive to folding constraints. That just means that negative selection is going to act on sites that encode for aa critical for structure. However, since stochastically all mutations behave independently, that does not affect the rate of substitution at neutral or adaptive sites in the same gene.
Finally, if I understand correctly, your ID-based prediction is that most duplicated genes should end up to be functional and selectively advantageous. I am not sure what the data out there are, or whether there is a way to extract this info from what is published. In fact, i am not even sure what the prediction, if any, would be in terms of darwinian biology. I'll have to think about that, and read some more (this is getting WAY over my head, it may take a while ).
IP: Logged
|
|
Kirk Durston
Member
Member # 174
|
posted 14. March 2002 08:30
Thanks to Drosera and the site he suggested, things are much clearer now (an equation is worth a thousand words). However I do see a problem and a case where a high Ka/Ks ratio would not be predicted under natural selection, but would be more likely under ID and the Johnson case falls into it.
The Eqn for Ka/Ks is: alpha + 2N(beta)s
where
alpha = proportion of non-synonymous mutations that are neutral beta = proportion of non-synonymous mutations that are advantageous N= population size s = selection coefficient
To get a Ka/Ks ratio that is larger than one, N must be sufficiently large and beta must be larger than zero, since alpha is always less than or equal to 1.
If a cell has two functions which need two paralogues, each of which when optimized still lie very close together in sequence space, then it is likely that certain single mutations will take the paralogue a good percentage of the way to optimization and, therefore, be advantageous and beta will be larger than zero. If N is large enough, we would expect a Ka/Ks larger than 1. I haven't read the Zhang paper mentioned earlier, but I would like to know if that case falls into this category.
However, if the two paralogues, when optimized lie very far apart within the folding sequence space for that protein, then a different situation seems more likely. Let us imagine that the original gene is duplicated and the duplicated gene experiences the first one or two or three mutations. Since the second, unfulfilled function of the cell requires a paralogue that is greatly divergent in sequence from the original, a single mutation in the duplicate of the original only takes the paralogue a minute percentage of the way there and is, therefore, unlikely to have any advantage at all (beta would equal zero). Thus, for greatly divergent paralogues that have optimized functions, we would predict a Ka/Ks ratio less than 1. I've made an assumption here, which I think is valid. The assumption is that for regions of folding sequence space that are very far from the optimized well, the fitness landscape is relatively flat. As the random walk approaches the optimized location, the landscape begins a negative slope (which itself increases exponentially) down into the optimized well and natural selection can operate at that point to speedily direct the trajectory to the lowest point in the well. In regions of flat, folding sequence space, beta is zero, natural selection has nothing to work with, and the trajectory is essentially a random walk. For that reason, for organisms requiring optimized paralogues that are greatly divergent, Ka/Ks ratios for those paralogues should be less than 1. But for organisms requiring paralogues that are relatively close together within the same folding sequence space, Ka/Ks ratios can be significantly larger than 1.
For reasons that I put forward in my previous post, an intelligent designer might do the selecting for optimization of hugely divergent paralogues. This would have the effect beta = 1 and, given even modest population numbers, would produce a Ka/Ks ratio significantly larger than 1. In Johnson et al.'s case, the copies bore little similarity to their ancestral precursors, indicating that the paralogues were hugely divergent within the folding sequence space for that protein. This is a case where we would expect Ka/Ks ratios less than 1, given the reasoning I put forward in the previous paragraph, but would be an ideal opportunity for ID to provide the necessary optimization, which should be detectable as a high Ka/Ks ratio. Bottom line: for paralogues that are close in sequence space and have a Ka/Ks ratio greater than 1, that is to be expected under natural selection. But for paralogues that are hugely divergent and still have a high Ka/Ks ratio, that is not predicted under natural selection but is under ID.
Make sense? [ 14 March 2002, 11:47: Message edited by: Kirk Durston ]
IP: Logged
|
|
charlie d.
Member
Member # 159
|
posted 14. March 2002 11:13
Just looking at it, I think Kirk's discussion does make sense: high Ka/Ks ratios are going to be obtained whenever i) the proportion of advantageous mutations is large, ii) the selection coefficient (advantage for each mutation) is large, iii) the population is large, or iv) any reasonable combination of the above. Note also how dependent the function is from the population size: even genes with a very low beta (0.01, that is 1% of possible substitution advantageous) and low selective advantage (s=0.01) would have a Ka/Ks>1 as long as the effective population is at least 5,000 (certainly quite common for most species).
As for Kirk's discussion of fitness landscapes and evolutionary paths, it also makes sense, although I think it is strongly dependent on what the function of the proteins in question is. For instance, diversity per se is in some cases advantageous. Indeed, some of the most numerous and most divergent gene families are found in the immune system (e.g.: antigen receptors, transplantation antigens), in which the ability to recognize and respond to the multitude of potential environmental antigens is highly adaptive. In this case, every new allele confers a clear advantage to its carrier, just because it's different (no flat landscape before you reach a new well, as Kirk calls it).
The RNase I example does not fall in this category, but is conceptually similar: every substitution changes the optimal catalytic pH of the enzyme just a bit in the right direction, and is therefore selectable. As I mentioned in my previous post, it takes very little selective advantage to make a big difference in terms of probability of fixation.
Unfortunately, at this stage we do not know what the function of the morpheus gene family is, but I wouldn't be surprised, given their amazing diversity (not only in sequence, but in alternative splicing isoforms etc), if high variability for these proteins was an advantageous trait in itself (rather than, say, highly restricted functional specialization).
As usual, the problem in all evolutionary studies of adaptation is to reconstruct the mutation path and the individual selection constraints at every step. Unfortunately, very rarely - if ever - can we get the entire picture with any accuracy.
I am checking out for a few days - nice talking to y'all. This was fun.
IP: Logged
|
|
Janitor@MIT
Member
Member # 125
|
posted 15. March 2002 16:28
Some thoughts on synonymous and non-synonymous mutations:
Biologists commonly refer to “degeneracy” of the code as an error-proofing feature. As usual half the story is told in the biology textbooks. Biologists do understand that the “design” of the code works, e.g., to largely restrict transitions to permitted states. But the full implications of that are not really understood or explored. They refer to such mutations as “synonymous,” justifying a “never mind” or “makes no difference” position. Synonymous mutations don’t affect the course of evolution because they are synonymous; they are not a “difference that makes a difference.” They don’t make a difference in the IMM demos (which is all natural selection is, an actuarial accounting); they therefore make no difference in evolution.
Research into neutral landscapes belies the neglect of the importance of “forcing” searches along “neutral networks,” minimizing risk and maximizing adaptive potential while maintaining functional integrity. (Perfectly ingenious!… uh, I mean, perfectly coincidental!) “Neutrality” is not inconsequential, as the term implies, it is an effective adaptive search strategy. It falsifies the notion that evolutionary searches are driven only by non-synonymous mutations. (But “neutrality” was given a “Darwinian spin” long before anyone ever suspected that it was an adaptive search strategy. So it can’t really be said to falsify the Darwinian premise. No observation can possibly falsify it.)
These kinds of adaptive strategies, which are universal and effectively “hardwired” into the genome raise all sorts of interesting and difficult questions about the origin of the code. Descriptions of the DNA code found in textbooks don’t appear to have been updated since the one I studied in college 25 years ago. Even then it looked like a description that might have been penned by Samuel Morse! Obviously, biologists don’t take the code/program metaphor very seriously. (But if we do, it leads somewhere interesting, I believe.)
Biologists obviously think of the code as string of alphabetic characters, as somehow encoding the organism/population. It is a blueprint, a set of operating instructions, etc. This is what I call the “bit string,” ”flippy-flop,” model. It is a central theme in the Darwinian “dumbing down” of the genome: Living things are the product of forces (RM&NS) over which they have no effective control. This is the central theme of Darwinian dysteleology and profoundly (and disturbingly) informs the whole modern ethos, from biology, to psychology, to political ideology. (But its broader implications are another subject.)
Biologists do not understand the code as an adaptation and as implementing in its very “design” an adaptive strategy for evolutionary search! They don’t understand the genetic code/program at all! Its not simply a string of bits, it’s a search engine!
Consider the error-proofing systems mentioned above: the very existence of error proofing of the genome belies the “all-sufficiency” of natural selection. But biologists don’t see it that way. That error proofing affects greater negative control over the course of evolution than the negative control effected by natural selection is reported in no biology textbook that I’m aware of. I suspect that it just doesn’t occur to biologists to interpret the result in that way, because it never occurs to them that genome can control aspects of its own evolution or that natural selection is not effectively omnipotent.
(Kirk Durston and John Bracht have noticed that terms like “positive” selection and selection “pressure” are misleading. The terms strongly suggest that environmental parameters perform a positive, priori conditioning on the definitively random origin of mutations. The ideas suggested by the terms, “positive” and “pressure,” couldn’t be understood literally, although its obvious that biologists do take them very literally. It is not only self-contradictory of the theory to do so; it is a reversion to the “quasi-Lamarckian” thinking of Darwin. The conditioning is done by the genome itself, not by any disembodied anthropomorphic (“God-like”?!) force (natural selection) that lies outside of it. On the doctrine of the “all-sufficiency” of natural selection it is impossible to see the genome/cell as sufficient unto itself.)
But the fact that living things can correct errors raises many (embarrassing?) questions: Are the “errors” truly “random”? Are “uncorrected errors” really errors? Do “errors” ever have any significant effect on evolution? Taking the observations rather more literally than biologists are wont to do, “errors” would appear to be a statistically insignificant factor in evolution. (Hence the “creative bookkeeping.”) The conclusion is warranted on the view that the code is optimal, in the Shannon sense. (But biologists don’t mean “optimal” in any sense that makes much sense.) If the code is “Shannon optimal” we have no reason for assuming that the ineliminable residue of error is ever a significant factor in evolution and we can accept the observations at face value and not have to “juggle the numbers.”
Taking a slightly different perspective leads to all sorts of new deductions and predictions: If the code is Shannon optimal and the code implements an adaptive search strategy, than I cannot say definitively that “mutations are random wrt to fitness,” or the corollary that mutations are minimized wrt to fitness, since I have no basis for assuming that mutations are “random” or equivalently “errors.” Instead I might just as well conclude that mutations are maximized wrt to fitness as part of an optimizing/adapting evolutionary search strategy and that “errors” may not be “accidents” so much as they are “mistakes,” if you know what I mean. So the idea that mutations occur independently of fitness no longer figures in my perspective on things. (Originally I abandoned it because it was a painfully illogical brand of special pleading.)
Simply by eliminating that “dysteleological spin” from my thinking leads to a far more interesting biology, at least for me.
IP: Logged
|
|
Drosera
Member
Member # 139
|
posted 15. March 2002 22:51
quote:
If a cell has two functions which need two paralogues, each of which when optimized still lie very close together in sequence space, then it is likely that certain single mutations will take the paralogue a good percentage of the way to optimization and, therefore, be advantageous and beta will be larger than zero. If N is large enough, we would expect a Ka/Ks larger than 1. I haven't read the Zhang paper mentioned earlier, but I would like to know if that case falls into this category.
OK so far...
quote:
However, if the two paralogues, when optimized lie very far apart within the folding sequence space for that protein, then a different situation seems more likely. Let us imagine that the original gene is duplicated and the duplicated gene experiences the first one or two or three mutations. Since the second, unfulfilled function of the cell requires a paralogue that is greatly divergent in sequence from the original,
Full stop! Where do you get this? You seem to be using the following logic:
1) Paralogue is highly divergent 2) Therefore the originally unfufilled function requires a highly divergent gene.
But of course, if the unfufilled function wasn't required in the first place, but was only helpful to have, then any protein that does the function, even crudely, will on average be selected. This is where things start, and divergence via selection of further beneficial mutations can continue until the new function is optimized.
quote:
a single mutation in the duplicate of the original only takes the paralogue a minute percentage of the way there and is, therefore, unlikely to have any advantage at all (beta would equal zero).
This is another unjustified assumption, sorry to say. One point mutation can have a large phenotypic effect, there are inumerable examples of this. And we're not even talking about deletions, frameshifts, chimerization, recombination, etc., yet.
quote:
Thus, for greatly divergent paralogues that have optimized functions, we would predict a Ka/Ks ratio less than 1. I've made an assumption here, which I think is valid. The assumption is that for regions of folding sequence space that are very far from the optimized well, the fitness landscape is relatively flat. As the random walk approaches the optimized location, the landscape begins a negative slope (which itself increases exponentially) down into the optimized well and natural selection can operate at that point to speedily direct the trajectory to the lowest point in the well. In regions of flat, folding sequence space, beta is zero, natural selection has nothing to work with, and the trajectory is essentially a random walk. For that reason, for organisms requiring optimized paralogues that are greatly divergent, Ka/Ks ratios for those paralogues should be less than 1. But for organisms requiring paralogues that are relatively close together within the same folding sequence space, Ka/Ks ratios can be significantly larger than 1.
I don't think that this is going to save your original idea. It is commonly observed, for example, that a given protein will do one thing well, and many other things at low efficiency. Ergo, a duplicated gene already has some potential alternative functions, and any mutations which help improve that alternative function will tend to get selected. This can be repeated over and over. You're basically asserting that there is some sort of limit to this process, but I see no reason why to postulate such a thing.
Here's an example I saw on ARN awhile ago, although ARN is apparently down at the moment so I'll have to look elsewhere...
...well this isn't quite the same quote, but it will do:
quote:
Biochemistry, 39 (18), 5303 -5311, 2000. Web Release Date: April 14, 2000
Copyright © 2000 American Chemical Society
Recruitment of a Double Bond Isomerase To Serve as a Reductive Dehalogenase during Biodegradation of Pentachlorophenol
Kandiah Anandarajah, Philip M. Kiefer, Jr., Bryon S. Donohoe, and Shelley D. Copley*
Abstract:
Tetrachlorohydroquinone dehalogenase catalyzes the replacement of chlorine atoms on tetrachlorohydroquinone and trichlorohydroquinone with hydrogen atoms during the biodegradation of pentachlorophenol by Sphingomonas chlorophenolica. The sequence of the active site region of tetrachlorohydroquinone dehalogenase is very similar to those of the corresponding regions of maleylacetoacetate isomerases, enzymes that catalyze the glutathione-dependent isomerization of a cis double bond in maleylacetoacetate to the trans configuration during the catabolism of phenylalanine and tyrosine. Furthermore, tetrachlorohydroquinone dehalogenase catalyzes the isomerization of maleylacetone (an analogue of maleylacetoacetate) at a rate nearly comparable to that of a bona fide bacterial maleylacetoacetate isomerase. Since maleylacetoacetate isomerase is involved in a common and presumably ancient pathway for catabolism of tyrosine, while tetrachlorohydroquinone dehalogenase catalyzes a more specialized reaction, it is likely that tetrachlorohydroquinone dehalogenase arose from a maleylacetoacetate isomerase. The substrates and overall transformations involved in the dehalogenation and isomerization reactions are strikingly different. This enzyme provides a remarkable example of Nature's ability to recruit an enzyme with a useful structural scaffold and elaborate upon its basic catalytic capabilities to generate a catalyst for a newly needed reaction.
--------------------------------------------------
Many xenobiotic pesticides, polymers, textile dyes, and munitions are resistant to biodegradation because microorganisms lack metabolic pathways to accomplish their breakdown. However, some xenobiotics can be biodegraded. In such cases, microorganisms have assembled new metabolic pathways, probably primarily by recruiting existing enzymes to perform new roles (1-3). Subsequent mutations may improve the fitness of these recruited enzymes for their new roles. Here we report a dramatic example of an enzyme that has apparently been recruited from an unexpected source to provide a reductive dehalogenase required for biodegradation of pentachlorophenol (PCP)1 by Sphingomonas chlorophenolica.
PCP was first introduced as a wood preservative in 1936 (4), and has been used in large quantities since that time. PCP would be expected to be recalcitrant to biodegradation for two reasons. First, it is highly chlorinated, and the resistance of aromatic xenobiotics to biodegradation generally increases with the number of chlorine substituents. Second, it is very toxic because it uncouples oxidative phosphorylation and perturbs membrane properties. Surprisingly, however, PCP can be degraded by some microorganisms. In the Gram-negative soil bacterium S. chlorophenolica, PCP degradation begins with an initial hydroxylation reaction which forms tetrachlorohydroquinone (TCHQ) (see Figure 1) (5). Subsequently, TCHQ is converted to trichlorohydroquinone (TriCHQ) and then 2,6-dichlorohydroquinone (DCHQ) by two successive reductive dehalogenation reactions catalyzed by TCHQ dehalogenase (6). Each of these reactions results in the conversion of 2 equiv of glutathione to glutathione disulfide. The final known step is the ring cleavage of DCHQ (7).
TCHQ dehalogenase has been the focus of several years of study in our laboratory (8-10). Our working model for its mechanism is shown in Figure 2. TCHQ dehalogenase has low but significant sequence identity to members of the theta and zeta classes (8, 11) of the glutathione S-transferase (GST) superfamily. Although most members of the GST superfamily catalyze a simple nucleophilic attack of glutathione upon an electrophilic substrate to form a glutathione conjugate, in several cases this basic chemical strategy has been incorporated into a more complex transformation (12-14) (see Figure 3). TCHQ dehalogenase appears to be related to one of these enzymes, a maleylacetoacetate (MAA) isomerase that catalyzes the glutathione-dependent conversion of a cis double bond in MAA to the trans configuration during the catabolism of phenylalanine and tyrosine in mammals and some bacteria and fungi (see Figure 4). We show here that the sequence of TCHQ dehalogenase is quite similar to those of human, mouse, and fungal MAA isomerases in a region that we know contains active site residues. Furthermore, TCHQ dehalogenase has isomerase activity similar to that of a bacterial MAA isomerase, and Cys13 is required for both the dehalogenase and isomerase activities. The functional and evolutionary implications of these findings will be discussed below.
quote:
For reasons that I put forward in my previous post, an intelligent designer might do the selecting for optimization of hugely divergent paralogues. This would have the effect beta = 1 and, given even modest population numbers, would produce a Ka/Ks ratio significantly larger than 1. In Johnson et al.'s case, the copies bore little similarity to their ancestral precursors, indicating that the paralogues were hugely divergent within the folding sequence space for that protein. This is a case where we would expect Ka/Ks ratios less than 1, given the reasoning I put forward in the previous paragraph, but would be an ideal opportunity for ID to provide the necessary optimization, which should be detectable as a high Ka/Ks ratio. Bottom line: for paralogues that are close in sequence space and have a Ka/Ks ratio greater than 1, that is to be expected under natural selection. But for paralogues that are hugely divergent and still have a high Ka/Ks ratio, that is not predicted under natural selection but is under ID.
Again, this depends on your assumption which seems unjustified. Is there a certain percentage of sequence divergence where you would draw the line? Is there a certain Ka/Ks ratio where you would draw the line? What is to prevent a protein that is 'near the line' (e.g. 50% to pick an arbitrary example), to have another substitution and cross the line? Which side of the line is the above-mentioned enzyme on?
Drosera
IP: Logged
|
|
Kirk Durston
Member
Member # 174
|
posted 18. March 2002 11:45
My previous post outlined a case where high Ka/Ks ratios would be more likely under ID than under natural selection:
"Bottom line: for paralogues that are close in sequence space and have a Ka/Ks ratio greater than 1, that is to be expected under natural selection. But for paralogues that are hugely divergent and still have a high Ka/Ks ratio, that is not predicted under natural selection but is under ID."
What follows is a response to Drosera's preceding post.
First, the word 'optimized' is important in discussing two, widely divergent paralogues. 'Optimized' entails that there is a function that is most efficiently fulfilled by the paralogue in question, rather than some other paralogue that is less divergent from the original protein which, itself, is continuing to fulfill its original function. It follows from this that if the optimized paralogue is widely divergent from the original protein, then it is widely divergent because that is what is most efficient.
Second, it is true that a single mutation can have a dramatic effect on the functionality of a protein, and I suggested that this is more likely to be the case if the second function requires a paralogue that lies very close in sequence space to the original protein, which is fulfilling the original function. It is also possible that if the second function is best fulfilled by a paralogue very much divergent from the original, than even one mutation from the original could be significantly beneficial so far as fulfilling the second function.
But surely it should be evident that this is not very probable. A single mutation can have very significant phenotypic effects, but in most cases, we are talking about a fully functioning protein become less, or non-functional. I would expect that we would all agree that it is much more probable that if the second function is best fulfilled by an optimized paralogue that is widely divergent from the original, that the likelihood that the first few mutations will increase fitness is small. Yes, we can certainly postulate a fitness landscape that slopes away steeply from the original enzyme sequence, down to a well that is a great distance away, and we may even find the occasional example, but from what we are learning about the tight constraints on functional sequence space, that is the exception rather than the norm. Furthermore, the steeper the fitness landscape, the implication is that the second function is not just nice to have, but greatly beneficial. So it seems that we can test the slope of the fitness landscape by knocking out the divergent paralogue to see if it has a significant effect. If not, there goes the theory that the function makes a large enough difference in fitness, to make a steeply sloping fitness landscape from the distant original gene, possible.
There are exceptions, such as the immune system, where the fitness landscape is cratered by wells laying very close to each other and each well represents a different sequence. But in general, recent work has shown that the sequence context imposes severe sequence constraints on a functional protein (Axe, 2000). Substitutions that are tolerated singly, may not be tolerated within the context of other, normally tolerated, single substitutions. All this to say that Drosera's response invokes the less probable rather than the more probable.
Drosera also doubted that there is a limit to the process. Work by Blanco et al. has shown that folding sequence space for a protein is surrounded by non-folding sequence space. Since most functional proteins require some 3-D structure, non-folding sequence space will also be, for most proteins, non-functional sequence space. Since it is non-functional, there is no phenotypic expression and, hence, nothing for natural selection to work with. So there is a limit, so far as natural selection is concerned, as to how much guidance can be given to the evolving paralogue before it falls into non-folding, non-functional sequence space.
Finally, it seems that there is a way to test for a high Ka/Ks ratio that has been produced by ID. First, we look for widely divergent paralogues that have high ratios, given the reasoning I've put forward in this post and the previous one. Second, we test to see the phenotypic effect of knocking out the paralogue. If it can easily be knocked out with only one or two mutations, then the implication is that the functional sequence space for that protein is highly constrained and the fitness landscape even a short distance away in sequence space is flat for that function. Of course, we do not change the conserved regions of the paralogue that it shares with the original protein. Rather we would induce mutations that might lie on a proposed evolutionary path between the original enzyme and the paralogue. Highly constrained paralogue sequences, coupled with high Ka/Ks ratios would not be predicted for an evolutionary path that has random walked for most of the way across a flat fitness landscape. Furthermore, the probability of finding that tightly constrained fitness well, via a random walk, may render a natural selection explanation absurd. The specificity of the optimized paralogue, coupled with its distance from the original in sequence space, may make ID necessary, which may be evident by the high Ka/Ks ratio.
On the other hand, if each step in the trajectory from the original protein produces an advantageous phenotypic effect, then the high Ka/Ks ratio could readily be explained by natural selection, but it seems that we could test for that also. It should be noted, however, that ID would predict that as well, as there is no need for a designer to fine tune a paralogue when it can be done relatively quickly by the natural processes set up by ID in the first place. Since both Darwinism and ID would predict high Ka/Ks ratios in this sort of case, neither would be useful in distinguishing between ID and Darwinism.
Drosera's suggested assumption that for widely divergent optimized paralogues, the fitness landscape slopes significantly all the way down from the original enzyme thus explaining a high Ka/Ks ratio, is an assumption. It would have to be tested.
Regarding:
"Since maleylacetoacetate isomerase is involved in a common and presumably ancient pathway for catabolism of tyrosine, while tetrachlorohydroquinone dehalogenase catalyzes a more specialized reaction, it is likely that tetrachlorohydroquinone dehalogenase arose from a maleylacetoacetate isomerase."
And
" This enzyme provides a remarkable example of Nature's ability to recruit an enzyme with a useful structural scaffold and elaborate upon its basic catalytic capabilities to generate a catalyst for a newly needed reaction."
Phrases like 'it is likely' and 'Nature's ability' are not scientific explanations. Perhaps the paper expands on this with probability calculations to determine if the process is likely. If the enzyme was merely recruited and then optimized, as the summary suggests, that is not particularly challenging for natural selection to do. If, however, the optimized enzyme is widely divergent from the original, then we need to look at the fitness landscape to see if a high Ka/Ks ratio is predicted (I don't know what the Ka/Ks ratio is in this case). What we need to do (and what is of particular interest to me) is to map functional sequence space for the two enzymes, see how far apart in sequence space they are (it sounds like they may actually be linked), and whether there is a plausible evolutionary trajectory between the two enzymes. I would also be especially interested in the conserved and non-conserved regions, as well as the Ka/Ks ratios in conserved and non-conserved regions. I would then have a better idea as to whether ID is required or if the natural system produced by ID will produce the desired effect within the constraints of history. [ 18 March 2002, 11:53: Message edited by: Kirk Durston ]
IP: Logged
|
|
|