|
Author
|
Topic: Protein Evolution
|
Josh
Member
Member # 405
|
posted 09. July 2003 17:51
The following work identifies residues within allosteric proteins that are important for transmitting binding energy between proteins. It identifies multiple residues that are involved at the same time.
quote:
Evolutionarily conserved networks of residues mediate allosteric communication in proteins Gürol M. Süel 1, 2, Steve W. Lockless1, 2, Mark A. Wall2 & Rama Ranganathan2 1. These authors contributed equally to this work. 2. Howard Hughes Medical Institute and Department of Pharmacology, The University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, Texas 75390-9050, USA. Correspondence should be addressed to R Ranganathan. e-mail: rama@chop.swmed.edu
A fundamental goal in cellular signaling is to understand allosteric communication, the process by which signals originating at one site in a protein propagate reliably to affect distant functional sites. The general principles of protein structure that underlie this process remain unknown. Here, we describe a sequence-based statistical method for quantitatively mapping the global network of amino acid interactions in a protein. Application of this method for three structurally and functionally distinct protein families (G protein–coupled receptors, the chymotrypsin class of serine proteases and hemoglobins) reveals a surprisingly simple architecture for amino acid interactions in each protein family: a small subset of residues forms physically connected networks that link distant functional sites in the tertiary structure. Although small in number, residues comprising the network show excellent correlation with the large body of mechanistic data available for each family. The data suggest that evolutionarily conserved sparse networks of amino acid interactions represent structural motifs for allosteric communication in proteins.
Communication between distant sites in allosteric proteins is fundamental to their function and often defines the biological role of a protein family. In signaling proteins, it represents information transfer — the transmission of signals initiated at one functional surface to a distinct surface mediating downstream signaling. For example, ligand binding at an externally accessible site in G protein–coupled receptors (GPCRs) reliably triggers structural changes at distant cytoplasmic domains that mediate interaction with heterotrimeric G proteins1, 2. Studies in many other protein systems indicate that long-range interactions of amino acids also are important in binding (and catalytic) specificity. Substrate recognition in the chymotrypsin family of serine proteases3, 4, the tuning of antibody specificity through B-cell maturation5 and the cooperativity of oxygen binding in hemoglobin6-9 all depend not only on residues directly contacting substrate, but also on distant residues located in supporting loops and other secondary structural elements. Crystallographic studies in all of these systems5, 9-11 indicate that the distant residues participating in substrate recognition do so by acting through intervening positions to control the structure of the substrate-binding site. These long-range interactions are remarkable because many other sites, even if closer to active site residues, show little contribution to function. Taken together, these studies indicate that proteins are complex materials in which perturbations at sites — for example, substrate binding, covalent modification or mutation — may cause conformational change to happen in a fracture-like manner that is not obvious in atomic structures. From a biological point of view, these fractures represent the energy transduction mechanisms that mediate signal flow, allosteric regulation and specificity in molecular recognition.
This is work raises a very relevant question for protein evolution. If allosteric proteins in general function by engaging conduits of energetically coupled residues to transmit binding energies and convey signals, how can these proteins evolve in a stepwise fashion? If a mutation disrupts the conductivity of these allosteric conduits, then multiple mutations must occur simultaneously in order for the protein to maintain allosteric function. Also, when thinking about signaling networks that contain multiple allosteric proteins, then the pathways may need to be modified through several proteins, not just one. Any comments on this? (Just thought about this after the limits to generating disulfide bonds issue was discussed.)
Note, that these conduits are recognized because the amino acids represented in the sequence database appear together with higher frequency (i.e. position x and y always appear as these two amino acids because they are transmitting allosteric information through the protein.) In order to modify the signal transduction, perhaps multiple residues in the conduit must be modified before the appropriate function is realized? This creates some barriers to evolving in a step-wise fashion.
I'm not sure if this is a literature review post or a brainstorm, but I know the moderators can move it wherever they wish (including the trash.) [ 16. December 2004, 12:46: Message edited by: Josh ]
IP: Logged
|
|
Pim van Meurs
Member
Member # 541
|
posted 10. July 2003 01:17
Interesting hypothesis which I would like to see further worked out in detail to determine if indeed a stepwise fashion evolution is impossible. Looking forward to your further contributions here.
IP: Logged
|
|
Grape Ape
Member
Member # 399
|
posted 10. July 2003 17:10
This is a very interesting paper. Thanks Josh for bringing it to my attention.
You write:
quote: This is work raises a very relevant question for protein evolution. If allosteric proteins in general function by engaging conduits of energetically coupled residues to transmit binding energies and convey signals, how can these proteins evolve in a stepwise fashion? If a mutation disrupts the conductivity of these allosteric conduits, then multiple mutations must occur simultaneously in order for the protein to maintain allosteric function.
But the paper didn't show that these networks can't tolerate mutation. Quite the opposite; the patterns of mutations that they do tolerate -- what the authors refer to as perturbation coupling -- allow us to identify the network on the basis of conservation between couplets of residues. In other words, if you find a mutation to glutamate (just pulling this as an example out of thin air here) at one position, you should expect to see a compensatory mutation (perhaps to a basic amino acid) at another position more frequently than what's expected by chance alone. By using a statistical analysis, they've identified residues of high coulpling in a large set of sequence data from many different proteins of the same family. When they add these couplets together, they end up with a network consisting of around 10-15% of the protein's residues. And lo and behold, this network correlates nicely to what's been established (by other means) as being the responsible party for the protein's function. But these networks do not have invariant sequences. Nor for that matter is any given residue within the network itself always going to be highly conserved. If this were the case, then the authors would not have had to have done the analysis that they did -- a simple sequence alignment would have sufficed, and they could have just picked out the conserved residues from there. Instead they had to account for the fact that the residues within the network are constantly mutating, but that this should constrain the mutations that should be selected for in other parts of the network.
I just wish I could do this kind of analysis on my protein. No more guess work in trying to find residues to mutate! But alas, I don't have mountains of sequence data. Maybe some day.
IP: Logged
|
|
Art
Member
Member # 179
|
posted 10. July 2003 23:26
Hi Josh,
A couple of snippets from the excerpt you posted add to a theme I (and others) have been developing on ISCID for some time now.
"Application of this method for three structurally and functionally distinct protein families (G protein–coupled receptors, the chymotrypsin class of serine proteases and hemoglobins) reveals a surprisingly simple architecture for amino acid interactions in each protein family: a small subset of residues forms physically connected networks that link distant functional sites in the tertiary structure."
and
"Taken together, these studies indicate that proteins are complex materials in which perturbations at sites — for example, substrate binding, covalent modification or mutation — may cause conformational change to happen in a fracture-like manner that is not obvious in atomic structures."
I'd say that these statements add to the list of data and conclusions that support a low-CSI view of biology. The involvlement of small numbers of amino acid residues in allosteric mechanisms is a pretty obvious addition to the list. The "fracture-like manner" in which allostreic effects may come about also reflects a low-CSI "state" - this follows from the limited fraction of any given protein that may be involved in the "fracture".
You said (among other things):
"If a mutation disrupts the conductivity of these allosteric conduits, then multiple mutations must occur simultaneously in order for the protein to maintain allosteric function."
I am not sure what the point of this statement is - it seems to be implying a greater susceptibility of allosteric proteins to mutation, but I don't think you can support this with reasonable estimates of mutation frequency.
The more interesting point vis-a-vis mutations derives from the low-CSI implications mentioned above. It turns out that, in all likelihood, networks that function by transducing conformational changes can evolve relatively easily, owing to the small numbers of amino acid residues that need to change to add to or alter any given network. This is a manifestation of the low-CSI state of proteins.
IP: Logged
|
|
Josh
Member
Member # 405
|
posted 11. July 2003 01:06
quote: "Interesting hypothesis which I would like to see further worked out in detail to determine if indeed a stepwise fashion evolution is impossible. Looking forward to your further contributions here."
Josh: I've thought about submitting some kind of developed thesis on this, but like many who complain about the arguments made from "the sidelines" I feel that the argument could only be properly supported with actual experiments, and I have other experiments planned out to finish my own thesis. Perhaps when I finish my thesis I'll apply for federal funding to test the idea. (I don't think the authors of the paper will be too interested in my hypothesis, I already ran it by them and they weren't too worried about it.)
quote: "This is a very interesting paper. Thanks Josh for bringing it to my attention."
I've been thinking about this for some time, it's good to have feedback.
quote: You write:
quote: This is work raises a very relevant question for protein evolution. If allosteric proteins in general function by engaging conduits of energetically coupled residues to transmit binding energies and convey signals, how can these proteins evolve in a stepwise fashion? If a mutation disrupts the conductivity of these allosteric conduits, then multiple mutations must occur simultaneously in order for the protein to maintain allosteric function. But the paper didn't show that these networks can't tolerate mutation.
This is why I said, "If…" I think the hypothesis must be directly tested. Degree of tolerance would be a great question to pursue (aim 1 of our federal grant takes form...)
quote: "Quite the opposite; the patterns of mutations that they do tolerate -- what the authors refer to as perturbation coupling -- allow us to identify the network on the basis of conservation between couplets of residues. In other words, if you find a mutation to glutamate (just pulling this as an example out of thin air here) at one position, you should expect to see a compensatory mutation (perhaps to a basic amino acid) at another position more frequently than what's expected by chance alone .
Indeed, and they don't study the intermediates, just functional sequences found in current organisms. If you change this particular glutamate before making the compensatory mutations, is the allosteric function maintained? Even though we have an extensive database for the sequence families they chose, it does not necessarily reveal the status of intermediates. However, this is the point by itself: if two residues must be altered together, then this is an obstacle for RM&NS. I am not proving that this is the case, simply offering that this study suggests restraints are inevitable during the evolution of any given protein, and that there are theoretical ways to CALCULATE such things (as opposed to universally stating how possible/impossible it is.) It is a matter of note-taking to simply examine the sequences and determine the number of different kinds of amino acids that are currently occupying residue X, classify them hydrophobic etc. and then show that this residue is limited in its variation wrt functional allosteric conduits. If they have not done it here, I believe they do this kind of analysis elsewhere.
quote: But these networks do not have invariant sequences. Nor for that matter is any given residue within the network itself always going to be highly conserved.
I am not advocating either scenario, simply that residues co-vary and suggests that variance at only one position may compromise allostery. If so, then there are obstacles for RM&NS. Surely this isn't so dramatic, prolines can't be stuck just anywhere. Again, this statistical approach is perhaps a general way to approach malleability of protein sequences, and say something quantitative.
quote: If this were the case, then the authors would not have had to have done the analysis that they did -- a simple sequence alignment would have sufficed, and they could have just picked out the conserved residues from there.
This addresses Art's CSI point. The reason they found only a few residues to be involved is that they weren't simply doing sequence alignments with proteins containing highly conserved residues.
quote: Instead they had to account for the fact that the residues within the network are constantly mutating, but that this should constrain the mutations that should be selected for in other parts of the network.
The question is whether the time period between mutation 1 and the constrained mutation 2 involves a protein of compromised allosteric function. Certaintly some mutations compromise function before the second occurs to restore it. And by the way, did you mean that the authors were cataloguing the complexity of design as we see it, and arraying it in such a way that we can detect allosteric interactions? Who said mutations produced all of this? (tongue in cheek) Are we assuming the answer before asking the question?
quote: I just wish I could do this kind of analysis on my protein. No more guess work in trying to find residues to mutate! But alas, I don't have mountains of sequence data. Maybe some day.
Even if you get enough sequences, they have to vary enough such that the whole protein doesn't look like it covaries with each and every other residue! It is also not trivial to generate these alignments, judging what stretch of sequence is similar on several different proteins of low homology is tough.
This, by the way, is one way to approach the accessability of any given protein forming from non-functional sequence by RM&NS. By statistically analyzing smaller components within proteins we may be able to determine limits to variation that maintain functionality. If protein X only tolerates two mutations, you're going to have a pretty hard time deriving it by RM&NS. If allosteric channel Y in protein X requires four simultaneous mutations to convey binding energy of protein Z to protein A instead of protein B (and allosteric communication from Z to A is disrupted during the process), we have another problem for RM&NS.
quote: I'd say that these statements add to the list of data and conclusions that support a low-CSI view of biology. The involvement of small numbers of amino acid residues in allosteric mechanisms is a pretty obvious addition to the list. The "fracture-like manner" in which allosteric effects may come about also reflects a low-CSI "state" - this follows from the limited fraction of any given protein that may be involved in the "fracture"
Does the above point about the sequences they chose sufficiently address the issues you raised? If they chose cytochrome C, nearly every single residue would co-vary with their analysis, and the CSI would be very high. I guess I disagree about the general CSI content of proteins if you see it as low.
quote: If a mutation disrupts the conductivity of these allosteric conduits, then multiple mutations must occur simultaneously in order for the protein to maintain allosteric function. I am not sure what the point of this statement is - it seems to be implying a greater susceptibility of allosteric proteins to mutation, but I don't think you can support this with reasonable estimates of mutation frequency.
I mean to say, if you mutate a residue involved in allosteric interactions within a protein, you may need to mutate another residue simultaneously in order to maintain the allsteric interaction. This means that RM&NS have a steeper hill to climb.
quote: The more interesting point vis-a-vis mutations derives from the low-CSI implications mentioned above. It turns out that, in all likelihood, networks that function by transducing conformational changes can evolve relatively easily, owing to the small numbers of amino acid residues that need to change to add to or alter any given network. This is a manifestation of the low-CSI state of proteins.
This statement would be much more credible with experiments to prove it. Would you like to write for a federal grant and work with me on the problem? : )
IP: Logged
|
|
charlie d.
Member
Member # 159
|
posted 11. July 2003 09:35
The problem of a meaningful analysis of paired mutations is complicated by the fact that all possible transitions have to be considered, i.e. if mutating an aa residue in the context of a currently existing protein seems to have deleterious effects on its function (eg, in this case, allostery), the same mutation may be tolerated in the presence of compensatory substitutions elsewhere. There is a vast literature on this kind of issue (see here or here, for instance). Such compensatory substitutions can act as "bridges" between two apparently functionally isolated forms, and I am afraid that sampling of the entire structural/functional space for even a simple protein would be a rather daunting task from both an experimental and theoretical point of view.
As for Art's comment on the low "CSI" of proteins, I think he may be referring to the observed relative phylogenetic plasticity of most sequences, which would suggest that most transitions involving a few aminoacid residues should not be that unlikely. But if one wants to formally avoid any reference to phylogenesis (and accusations of circular reasoning and such) there are quite a bit of relevant data on the ease in vitro evolution of allosteric ribozymes (here for instance, and this review). Of course, ribozymes are not proteins, but they offer good insights and a vastly more manageable system to study the complexity of structural/functional relationships and constraints. [ 11. July 2003, 09:37: Message edited by: charlie d. ]
IP: Logged
|
|
Grape Ape
Member
Member # 399
|
posted 11. July 2003 16:15
Josh--
Sorry, I thought you were making a stronger claim than you actually were. As far as using this approach to further study the evolution of proteins, I think it's a great idea, but the subject is extremely difficult to tease apart for a variety of reasons. That's why I get irritated with people like Dembski who use a "tornado in a junkyard" approach, which is simplisitic to the point of being totally useless. But anyway, about some of your specific comments:
quote: Indeed, and they don't study the intermediates, just functional sequences found in current organisms. If you change this particular glutamate before making the compensatory mutations, is the allosteric function maintained? Even though we have an extensive database for the sequence families they chose, it does not necessarily reveal the status of intermediates. However, this is the point by itself: if two residues must be altered together, then this is an obstacle for RM&NS.
As you note, the claim that two residues must be altered together can't be justified by this paper. But I also think we can make a good case for the opposite based on this paper: Two residues don't always need to be altered together. Whether or not they do depends on the contex (in this case the specific residues within the network) that one pair of interacting residues finds itself in. For example, a glutamate at site 1 may require a basic amino acid at site 2 in one particular context. If you change one, you've got to change the other. But if you change a third site, which may now interact differently with sites 1 or 2, now the glutamate can interact with any polar residue at site 2. That's why the analysis looked for perturbation coupling and not simply "if you have X, do you always have Y?". It's hard to tell from the paper the degree to which there is variation among couplets, because I can't quite figure out what the absolute signficance of ??Gstat is, but the fact that it varies from couplet to couplet at all shows us that there is some variation. And this tells us that two given residues don't always need to be altered together, though in many cases they probably do.
To me this is why the concept of the network is so important here. Networks seem to be more capable of evolving into complex states than independent, modular systems are. The proteome as a whole for example is one big scale-free network, and there are several good papers out about how this improves its evolvability, by allowing preferential attachment at certain nodes, or by duplicating certain nodes for example. Now that the network concept is extended to the individual protein, I think it will help us to better understand how proteins evolve.
quote: This addresses Art's CSI point. The reason they found only a few residues to be involved is that they weren't simply doing sequence alignments with proteins containing highly conserved residues.
I think there's more to it than this. What they found is that only a small percent (7-14%, depending on the protein) are coevolving. The rest they conclude are evolutionarily independent, which means that their evolution is not contstrained by the context of the protein in which these residues reside. Now some of them may be conserved because they play other important roles, but they're not conserved because they have to interact with certain residues within the protein. Like the authors say, this gives us a good indication of why proteins are so plasitic when it comes to mutation. So their conclusion is that the interdependent portion of the protein tis in fact very limited.
quote: The question is whether the time period between mutation 1 and the constrained mutation 2 involves a protein of compromised allosteric function. Certaintly some mutations compromise function before the second occurs to restore it.
That's probably true in some cases, but again I think it's highly context dependent. The fact that these residues vary as to which ones they interact with (again, it's hard to tell how much) shows that there is probably an interconnected pathway through sequence space which can be traversed to reach any of the sequences used in the analysis. In fact, it can probably be traversed to reach a large number of protein sequences with highly diverse functions. But I guess that's the question we're asking, isn't it?
quote: And by the way, did you mean that the authors were cataloguing the complexity of design as we see it, and arraying it in such a way that we can detect allosteric interactions? Who said mutations produced all of this? (tongue in cheek) Are we assuming the answer before asking the question?
If we were to assume that these sequences are all designed, rather than derived from a common ancestor (assume a designed common ancestor if you want), why we should expect to see the level of variation that we do see? Why not make all of the independent amino acids identical? That would seem to be easier. There certainly isn't any reason to make their distribution the same as that for all natural proteins, mimicking an equilibrium process based on random mutation. And furthermore, I would add that even if we assume that there was design at some time in the past, mutation and evolution of these sequences is going to be inevitable. So the analysis which assumes their evolution (and works quite nicely, surprise, surprise), would still be valid.
quote: Even if you get enough sequences, they have to vary enough such that the whole protein doesn't look like it covaries with each and every other residue!
True 'dat. But of course the idea is that the more proteins you get, the more variety you'll have. My protein is at least 600MY old, and can be found in all animalia which have so-far been sequenced. (Which means a whopping 6 or so ) So if I had more sequences, I should have lots of variation to study. Or at least this is what you think if you're an evolutionist.
IP: Logged
|
|
Josh
Member
Member # 405
|
posted 11. July 2003 18:40
I'll reply more later, but I can't help myself...
quote: True 'dat. But of course the idea is that the more proteins you get, the more variety you'll have. My protein is at least 600MY old, and can be found in all animalia which have so-far been sequenced. (Which means a whopping 6 or so [Mad] ) So if I had more sequences, I should have lots of variation to study. Or at least this is what you think if you're an evolutionist.
Cdc14 just got itself a nice crystal structure, oh happy day!!! Good thing sequence tells us something by consensus without these fancy statistical methods, anyway...
IP: Logged
|
|
Josh
Member
Member # 405
|
posted 17. July 2003 12:49
Grape Ape- Just finished my committee meeting yesterday, so was swamped. Here’s a collection of responses.
quote: Sorry, I thought you were making a stronger claim than you actually were. As far as using this approach to further study the evolution of proteins, I think it's a great idea, but the subject is extremely difficult to tease apart for a variety of reasons. That's why I get irritated with people like Dembski who use a "tornado in a junkyard" approach, which is simplisitic to the point of being totally useless. But anyway, about some of your specific comments:
Isn’t Lenski’s paper a bit simplistic in terms of analyzing the evolution of IC systems? Is EQU really anything like a flagellum? You have to start somewhere, and applying mathematical models almost always requires simplification before they are ready to go. Give it some time, a statistical methodology to detect complexity is useful in itself even if it fails to prove the necessity of design in biology.
quote: The question is whether the time period between mutation 1 and the constrained mutation 2 involves a protein of compromised allosteric function. Certaintly some mutations compromise function before the second occurs to restore it. That's probably true in some cases, but again I think it's highly context dependent. The fact that these residues vary as to which ones they interact with (again, it's hard to tell how much) shows that there is probably an interconnected pathway through sequence space which can be traversed to reach any of the sequences used in the analysis. In fact, it can probably be traversed to reach a large number of protein sequences with highly diverse functions. But I guess that's the question we're asking, isn't it?
Yes, that is exactly the question. Bold claims about im/possibility of such and such occurring ultimately will require more refined tests and data to support them. Just how many cases, and how context dependent all this is will be much better understood in the context of actual results and data, instead of the use of personal intuitions (which depend more on worldview commitments than anything else) that are commonly appealed to in the current debate.
quote: If we were to assume that these sequences are all designed, rather than derived from a common ancestor (assume a designed common ancestor if you want), why we should expect to see the level of variation that we do see?
Are you asking me to offer wildly speculative models? I’ll give it a go. Firstly, proteins are highly suited to perform the task they have been “designed” for. This means that they must both maximize fruitful interactions while minimizing deleterious ones. An off the wall example is that if some transcription factor began binding microtubules more often than DNA, functionality would be compromised (and this would be a very poor route for bridging one functional class of proteins to another.) My alternate explanation for level of variation is that each protein is highly suited for the native environment of the cell that it is found within. There are big differences between a human cancer cell and Xenopus oocytes for example. Most experiments testing the essential nature of protein function look at viability as a readout. IMO viability is not a very good assay, certaintly not very sensitive. There may be subtle interactions that a protein performs that cannot be done in another slightly different environment. Thus the protein can perform the function that is “essential” in most environments (heck, most biochemistry is a very poor mimic of the environment within a cell) but may lose some degree of optimal functionality that we cannot test because our output of “essentiality” is very insensitive. Most proteins have not been studied to the degree required to test such subtle, and perhaps important functions that depend upon environmental factors for a protein. A very good example of important subtle interactions is the genetic diseases found in humans. Many genetic diseases in humans are very subtle mutations affecting processes that are not essential (otherwise you get embryonic lethality and no viable organism.) For example many disorders of deteriorationg brain function like Hunington’s or Parkinson’s seem to be caused by expansion of polyglutamine tracts within certain proteins that leads to their accumulation/inefficient processing/blockage of vesicle transport pathways. These proteins may be “functional” for viability, but are quite suboptimal toward maintaining proper brain function throughout the life of human individuals. Most experiments can’t test such removed possibilities using the assays we have developed.
quote: Why not make all of the independent amino acids identical? That would seem to be easier. There certainly isn't any reason to make their distribution the same as that for all natural proteins, mimicking an equilibrium process based on random mutation.
Thus the statement that the variability of proteins matches their distribution based upon random mutation is a matter of interpretation. It may very well be that all proteins are exquisitely optimized for the function that they perform in the organism that they are found within. Thus, amino acids X,Y,Z are such and such because they avoid all possible deleterious interactions with every other protein in the cell, while concurrently maximizing all the fruitful interactions within the same cell. All other proteins within that cell are also matched in amino acid composition in a likewise fashion. This hypothesis is far beyond testing, and I would bet that regardless if evolution or design is ultimately completely responsible (or some combination therein) this will turn out to be the case. Considering the tree-branching shape of evolution, where all organisms are at the tips of the branches, proteins aren’t simply compared tip to tip, but along the trajectory of the branches as well. Exquisite optimization is very likely.
quote: And furthermore, I would add that even if we assume that there was design at some time in the past, mutation and evolution of these sequences is going to be inevitable. So the analysis which assumes their evolution (and works quite nicely, surprise, surprise), would still be valid.
Yes, although the question I want to ask is how malleable these sequences are, and whether trajectories through sequence space exist that bridge functionality gaps. Assuming that evolution produced them by easily navigating through random sequence space and that bridges of functional gaps are abundant is avoiding the question.
IP: Logged
|
|
Art
Member
Member # 179
|
posted 19. July 2003 09:40
Hi Josh,
Sorry about the slow response. I haven't a lot to add to what's been said, but I thought I'd clear up a couple of things about my claims about CSI and the like.
But first, one comment: you said, most recently, "Yes, although the question I want to ask is how malleable these sequences are, and whether trajectories through sequence space exist that bridge functionality gaps. Assuming that evolution produced them by easily navigating through random sequence space and that bridges of functional gaps are abundant is avoiding the question."
After reading the paper you cite, I'd have to say: 1. For the proteins examined, the combinations that are involved in allosteric effects are indeed quite malleable. This follows directly from the data, which shows a degree of variability that takes some pretty clever tweaking to tease apart. and 2. The matter of accesibility and bridges in the instances discussed in the paper is not an assumption, but a reasonable and logical conclusion that flows from the data and analysis in the paper you have pointed us to. (A useful exercise would be to examine all of the dataset for one of the proteins of interest, focus on a few of the residues suggested to be important for allosteric effects, and catalogue the variability in the putative interacting amino acid partners. I haven't done this, but I suspect that one would see sufficient flexibility to easily accommodate "Darwinian" processes.)
Now for the low-CSI state of things:
A high-CSI scenario in the examples of interest in this paper would be pretty easy to see (indeed, the paper and its most clever approach would not be needed, and the study probably would never have been done) - there would be little or no variation in functionally-important amino acid residues. Moreover, this limited variability would extend over most or all of the protein(s) in question. Neither of these constraints is seen, hence my assertion in my previous post.
These data can be added to the list (which is much longer than I can provide here) of evidence indicating the low-CSI nature of proteins. A sampling from these and related discussions (I haven't paid attention to whether you participated in any of these previous threads; if so, apologies for the repetition. Consider this a brief summation that helps put my overall "model" into greater relief):
This thread has buried in it many references to experiments that point (decisively, IMO) to a low-CSI nature for protein/eznyme functionaility. These experiments are the sort that are needed - direct measurements of the "volume" of sequence space that is occupied by functional members. Pay close attention to the discussion of the very low correlation between sequence size and frequency of occurrence.
This thread deals with an example of the "creation", by natural and random processes, of a multifunctional enzyme from, essentially, "nothing". Follow-up discussion that focuses on Dembski's treatment of the subject can be found here
ARN has lost the third thread that comes immediately to mind - it dealt with a rather involved discussion of an example whereby functionality was added to an IC complex in a decidedly low-CSI manner. Maybe Mike Gene has saved the entire thread (I have an edited verison in which I deleted the tangential posts that did not pertain directly to DnaK etc.) The bottom line, however, can be re-stated here: the evolution of IC systems need not be (and likely is not, if recent studies concerning interactions between flagellar components are any indication) a high-CSI proposition.
For these and many other reasons, it is safe to say that experimental results point inexorably towards a low-CSI model for protein (and RNA, for that matter) functionality. The study that is the focus of this thread adds a new dimension to this model - allostery as well doesn't seem to be a high-CSI proposition.
IP: Logged
|
|
Josh
Member
Member # 405
|
posted 29. July 2003 10:26
Art-
quote: After reading the paper you cite, I'd have to say: 1. For the proteins examined, the combinations that are involved in allosteric effects are indeed quite malleable. This follows directly from the data, which shows a degree of variability that takes some pretty clever tweaking to tease apart.
-I agree completely. There appears to be many amino acid combinations that can transmit allosteric information, I would bet that all amino acids could be used in an allosteric network. Allosteric networks may not be all that CSI. However, their malleability is another issue. The point of citing this paper is to show that there are ways to identify limits of variation for proteins using statistical analysis. The fact that one amino acid in an allosteric conduit always shows up with one of three other amino acids at position X indicates that these mutations co-vary as found in life today, indicating the likelihood that during their mutagenesis, they must also be covaried to maintain function of the allosteric conduit. This forces limits to variation that can occur when trying to maintain protein function and implies that multiple mutations must occur together to generate an altered allosteric conduit that retains function. This creates a steeper hill for RM&NS to climb.
quote: 2. The matter of accesibility and bridges in the instances discussed in the paper is not an assumption, but a reasonable and logical conclusion that flows from the data and analysis in the paper you have pointed us to. (A useful exercise would be to examine all of the dataset for one of the proteins of interest, focus on a few of the residues suggested to be important for allosteric effects, and catalogue the variability in the putative interacting amino acid partners. I haven't done this, but I suspect that one would see sufficient flexibility to easily accommodate "Darwinian" processes.)
-It would be better to demonstrate this before boldly asserting it, that's the utility of this sort of probabilistic approach. Calculations are easily made.
quote: Now for the low-CSI state of things:
The CSI content of proteins is another entirely different subject. Perusing the links you cited leaves a weak impression of the exact argument you wish to make (and there are many linked articles and which ones are critical/peripheral to your point isn't clear) and I haven't the time to sift through every last post to try and determine the exact case you are offering and respond to all of it. My suggestion for you is to either cite an article showing low CSI proofs or write a novel article and submit it to ISCID explaining a proof of why you think proteins are low CSI. This way you can place all the relevant citations within the clear prose used to support your argument and we can all follow much easier. Then everyone can see your hypothesis clearly laid out and follow the strengths and weaknesses of your claim, and offer their evaluations. CSI is tangiential to this thread, so I'll engage the topic at a more appropriate time when the arguments you wish to support are more clear to follow. Looking forward to future interactions,
Josh
IP: Logged
|
|
Grape Ape
Member
Member # 399
|
posted 30. July 2003 18:26
Sorry for the delay in getting back to you, Josh. I know about the "perturbation constraints" of grad school.
I wrote:
"If we were to assume that these sequences are all designed, rather than derived from a common ancestor (assume a designed common ancestor if you want), why we should expect to see the level of variation that we do see?"
And you replied:
"Are you asking me to offer wildly speculative models? "
No! I was just asking a rhetorical question, in the hopes of pointing out why the authors, along with everyone else, find the evolutionary point of view useful. In fact, the very nature of the study depends on the idea that random mutation is randomizing the covariance between two residues that are not coevolving. If random mutation had not been at work, they would have had lots of false positives. The author's explain their reasoning, for example...
quote: First, if site l contributes nothing to either the folding or function of the protein, the corresponding amino acid frequencies in the MSA should be unconstrained and, therefore, should approach their mean values in all proteins.
And then showing a real-life example...
quote: We made a perturbation at this site by extracting the subalignment containing only tyrosine at this site (Tyr296), a manipulation that retains 34.6% of sequences from the parent alignment. Both the full MSA and the Tyr296 subalignment are sufficiently diverse and over-represented so that unconserved sites, such as position 19, show amino acid frequencies near to their mean values found in all proteins (Fig. 1d).
Now, the fact that there are residues whose amino acid distributions are near the mean value for all proteins tells you that they're near equilibrium after having randomly mutated. Either that, or you're saying that they were designed that way for some reason, which aside from being highly unparsimonious, doesn't make much sense. That's why I wrote this:
"Why not make all of the independent amino acids identical? That would seem to be easier. There certainly isn't any reason to make their distribution the same as that for all natural proteins, mimicking an equilibrium process based on random mutation."
To which you responded...
"Thus the statement that the variability of proteins matches their distribution based upon random mutation is a matter of interpretation. It may very well be that all proteins are exquisitely optimized for the function that they perform in the organism that they are found within."
The notion that the amino acid distribution got this way because of mutation might be an interpretation, but the fact that it matches said distribution is not. They do match that distribution. If you want to claim that they were somehow designed to mimic this distribution, you'd have to give a reason why a designer would desire such a distribution. What you're arguing here is that an unseen function is being specified by a residue that is randomly distributed! That's a mighty strange way to specify a function. I suppose it could happen that different organisms had different (subtle) needs, which would account for a particular distribution, but why is it the same distribution for many different residues?
As an aside, given that your 'wildly speculative' model assumes exquisite optimality for all protein residues, I'd say you have a really hard time squaring that with the available evidence. Even if you discount all in vitro mutagenesis, there's way too many polymorphisms to believe that that every residue is exquisitely optimized for something.
Anyway, given neutral theory, which assumes that in the absence of selection, a residue should tend towards a random distribution, researchers are now predicting both protein function and structure accurately using evolutionary theory. Here are a couple of recent examples:
Combining inference from evolution and geometric probability in protein structure evaluation.
Functional divergence in protein (family) sequence evolution.
Concerning your most recent post, you wrote:
"Allosteric networks may not be all that CSI. However, their malleability is another issue. "
Just a minor point, but I believe that malleability and CSI can be considered the inverse of each other here. A protein or allosteric network which is highly malleable would not be very "specified", and hence low in CSI. At least if we're using the non-probabilistic definition. You continue...
"The point of citing this paper is to show that there are ways to identify limits of variation for proteins using statistical analysis. The fact that one amino acid in an allosteric conduit always shows up with one of three other amino acids at position X indicates that these mutations co-vary as found in life today, indicating the likelihood that during their mutagenesis, they must also be covaried to maintain function of the allosteric conduit. This forces limits to variation that can occur when trying to maintain protein function and implies that multiple mutations must occur together to generate an altered allosteric conduit that retains function. This creates a steeper hill for RM&NS to climb."
Josh, this is exactly what I think you cannot infer from this paper. Like I said before, I do think you can infer the opposite. I haven't poured through their sequence data, but as far as I can see, there are no examples where multiple mutations have to occur at once, or where a given residue at position X means that there is always a given residue at position Y. If you can show where this happens in their data, please point it out -- I could be mistaken. But given my reading, what the authors found is a change in the distribution of amino acids away from normal distribution in cases where there is perturbation coupling. Consider one of their examples:
quote: In contrast, position 125, a moderately conserved site in the third transmembrane helix of GPCRs, shows several changes in its distribution in response to the Tyr296 perturbation (Fig. 1e) corresponding to a larger coupling energy (Gstat125,296Y = 1.86kT*).
Now if you look at figure 1e, you'll see that there are about 7 different amino acids that position 125 can be if position 296 is a tyrosine. Given that this is a moderately conserved residue, it's only three fewer than you'll find in the entire MSA, but it's enough for them to assign it a large coupling energy. If 296 is tyrosine, than 125 is more likely to be isoleucine, less likely to be leucine or methionine (but it still happens), and about equally likely to be serine, threonine, valine, or alanine. Cystein, glycine, and asparagine are residues found at that position that are not found in conjunction with Y296. But they're at extremely low frequency anyway.
In this case, the residues are really not that specified. You're mostly talking about a situation where if 296 is a tryrosine, then 125 has its distribution shifted in favor of isoleucine at the expense of leucine. Extrapolating this to the network as a whole, and we can see that it's really not that specifed, and there is lots of variation which leads us to believe that there is plenty of malleability. Keep in mind that we're talking about networks here, and that there's more than one way to send a signal accross a network.
IP: Logged
|
|
Art
Member
Member # 179
|
posted 31. July 2003 23:00
Hi Josh,
Grape Ape is hitting on the same issues I would like to raise - in the interest of keeping things simple and focused (nothing like two people saying the same thing in their own idiosyncratic ways - and goodness knows I've lots of idiosyncracies), I'll let Grape Ape continue to explain things.
As far as the "zero-CSI model", I'd ask you to read through this thread and perhaps revive the discussion there. It would probably help readers if we (they?) could avoid the same issues that have been raised in this discussion. (Needless to say, the evidence is more than simply intriguing - it's compelling enough to warrant a serious overhaul of the application of notions of information to the subject of ID and origins.)
IP: Logged
|
|
|