|
Author
|
Topic: The arbitrariness of the genetic code
|
Jules
Member
Member # 181
|
posted 01. February 2006 16:52
I've been away from ISCID for a while, so this topic may have already been covered. If so, dear Moderator, please delete it, since there is no need to beat a dead horse.
A passage in Franklin Harold's book, The Way of the Cell, can lead one to believe that the gentetic code is arbitrary: quote: ...no one has discovered a persuasive chemical connection between any particular triplet of nucleotides and the amino acid that this triplet codes for.
(p.58)
(I've found the same idea in Crick's writings, though I don't have them handy.) This seems to suggest that the genetic code is arbitrary: No triplet of nucleotides is chemically (or physically?) connected to a particular amino acid. Any triplet could have been matched to a completely different amino acid. Nor does there seem to be a reason why it need be a triplet, instead of a singlet, duplet, quadruplet, etc. As far as I can tell, the only reason they are matched is because there are enzymes that are specific to each amino acid and to the tRNA that codes for them.
This raises a number of interesting issues and questions.
First, if the arbitrariness is real, then the genetic code functions as a language in just the same way that human languages function. We arbitrarily match words to objects. For example, I call this thing in front of me a "computer", though I could just as easily call it a "flabber." Of course, if I did, nobody else would know what I was talking about. Because I want to be understood by other people, I obey convention and call it a "computer." But getting back to the genetic code, if it is arbitrary in the same way as human words are arbitrary, then it strongly suggests that the genetic code was designed by someone. They arbitrarily decided that certain triplets of nucleotides should stand for certain amino acids. They could have decided to have the triplets stand for different amino acids, if they had wanted to. Or they could have used singlets, duplets, quadruplets, etc. (For singlets and duplets, they would have had to use more nucleotides, if they still wanted to use 20 amino acids). Then they designed the enzymes that match the amino acids to the correct tRNAs.
If we reject the design explanation, then we are left with what looks like an awful conundrum. For even if we had DNA, RNA, and proteins, without the enzymes that match the amino acids to the tRNAs, protein synthesis doesn't happen. Perhaps there is a way to explain how this all could have happened without design, but I bet the story would be rather complicated and implausible.
But if we are willing to accept the design explanation (at least hypothetically), it leaves us with some interesting questions that an engineer might ask, such as: Why a triplet, instead of another number? (My own guess is that a triplet gives enough room for the amino acid chain to form). And why four nucleotides, instead of three, since with three we would still have 27 comibnations, which would be enough for our 20 amino acids and stop codons. Another question might be why does the genetic code match unequally, with some amino acids getting 6 different triplets, and some getting only 1?
I think that going with a design explanation opens up thinking about the genetic code in ways we might not have thought about it, otherwise.
IP: Logged
|
|
KBC1963
Member
Member # 1868
|
posted 02. February 2006 12:22
Jules you said:
"I think that going with a design explanation opens up thinking about the genetic code in ways we might not have thought about it, otherwise."
Scientist at this point have indeed made some correlations about what makes what but your statement indeed has merit. As an engineer there are observations about the forms that are generated from the genetic code that go beyond just what defines an amino acid or other chemical combination, take for instance the form, fit and function of each part of each organism. How do the cells in your bones know how to form into each specific shape? why is your femur of a specific length, width and height? The truth is that without architectural constraints defined within the DNA then form would have no control and at best would be globular or at worst horribly deformed as a rule of thumb. So my assumption is that the coding within the genome constitutes material specifications and location parameters so that the components can be built the same over and over, just as the bluprints I make define building materials along with specific locations and shapes in a 3 dimensional matrix. There is orchestration that defines Intelligent design and this shows another facet of the design inference. [ 02. February 2006, 12:27: Message edited by: KBC1963 ]
IP: Logged
|
|
Eric Anderson
Member
Member # 1431
|
posted 02. February 2006 17:03
It is my understanding that the code, at least in terms of the sequence of the nucleotides, is arbitrary in the way you describe. I propose that a 3x4 structure is the minimal requirement for a system that codes for at least 20 amino acids, while still preserving the matching helical structure of DNA.
I posted a few thoughts here
IP: Logged
|
|
|
|
Jules
Member
Member # 181
|
posted 03. February 2006 16:20
Eric Anderson writes: quote: I propose that a 3x4 structure is the minimal requirement for a system that codes for at least 20 amino acids, while still preserving the matching helical structure of DNA.
Interesting. It seems we could still make pairs of nucleotides with a 3x3 system, and still preserve the helical structure of DNA. But I'm guessing at this and could be wrong. My further guess is that if there is a design reason for the 3x4 system, it has something to do with the more than one triplet coding for a certain kind of amino acid. I'm guessing there might be some need for this. Perhaps if someone knows more about amino acids, they could answer this.
Allen, I looked at your pdf. Way over my head, I'm afraid. Are you proposing that the code came about randomly or by design? I couldn't tell. I am curious about the graphs for the different kinds of amino acids. I'm wondering if it might explain why we would want more than one triplet to code for certain amino acids.
Whoops, Eric, I just tried pairing the 3 nucleotides, and realized I couldn't do it. I think you're right. We need a 3x4 system to pair the nucleotides and probably need that to keep the helical structure of DNA. Cool. I'm still wondering why some amino acids have so many more triplets coding for them. [ 03. February 2006, 16:24: Message edited by: Jules ]
IP: Logged
|
|
Jules
Member
Member # 181
|
posted 04. February 2006 16:45
I did a little more reading on the enzymes that connect the amino acids to the tRNAs. They are specific for the amino acids, but are not specific for the tRNAs, which allows different codons for the same amino acids. I'll have to read more, and see if that explains why some amino acids have more codons than others. It might, but then we would want to be able to understand why we would want that, from a design perspective. And that may be over my head. I'm hoping someone with more knowledge of biochemistry will chime in.
So thanks to Eric Anderson, we understand why we have to have an even number of nucleotides involved: so they can pair up, and DNA can attain its helical structure, and the process of DNA replication can occur.
But we don't necessarily need to have 4 nucleotides. We could have 6 nucleotides, and the codon could be a duplet. This would still give us 64 possible codons, but the DNA strand would only have to be 2/3 the present length. Would that be an engineering advantage? And would it offset having to have extra nucleotides? My guess, as I said before, is that we need a triplet for the codon, in order to allow room (inside the ribosome) for formation of the amino acid chain. So the savings in space of a duplet would be offset in the disadvantage in "crowding" the space in the ribosome.
I'm also guessing that Allen is trying to figure out how the genetic code came to be without design. More power to him. I think it's implausible, but doubtless he thinks an unidentified designer is implausible. So we're even.
And not to ignore KBC1963, yes, we can see all sorts of design reasons for having a genetic code. But if its origin can be explained by some sort of chemical determinism, then it increases the probability of a non-design explanation, and decreases the probability of a design explanation. The fact (if it is a fact) that there isn't a chemical connection between the genetic code and amino acids seems (to me, at least) to severely undercut the plausibility of a non-design explanation for the genetic code.
IP: Logged
|
|
Sebastian Ibstedt
Member
Member # 1659
|
posted 05. February 2006 17:52
I think Eric makes some important points. I think the genetic code is far from arbitrary. As mentioned, for replication to work, it must consist of an even number of letters, that is 2, 4 or 6. According to Gitt, In the Beginning was Information, we need at least 21 different codons for 20 amino acids and 1 stop codon, and because the average Shannon information content per amino acid would be ln20/ln2=4.39 bits (modified from Gitt because he doesn't count the stop codon, but the argument is still the same), we need a system that could produce at least this. So if we have a binary code, we must use quintets which would produce 25=32 different codons and 5ln2/ln2=5 bits/codon. But then the DNA code would be 5/3-1=67% longer than it is today and that would not be effective from a storage handling perspective. A quaternary code then would need triplets, producing 43=64 different codons and 3ln4/ln2=6 bits/codon, which clearly is enough to produce 21 codons and 4.39 bits/aa. For a senary code (6 different bases), it would be enough with dublets to produce 62=36 different codons and 5.17 bits/codon. Of course a senary DNA code would be 50% shorter than a quaternary, but it would on the other hand be much more complex, require more enzymes, more energy to build up, et.c.
Therefore, a quaternary dublet code is far from arbitrary - it is the most optimal one. It also has the highest redundancy in codon number and information content, which is useful for minimizing mutational effects. I do not think it is clearly known why some amino acids have more codons than other, but my guess is that it is a kind of security mechanism. For example, deamination of C to U is quite common, and as can be seen on http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=t#SG1, the six Leu codons have both C and T as first bases. In mRNA it would be good, but in DNA I don't know how necessary it is. Anyway, this reveals another reason that the genetic code is not arbitrary. If it had contained U naturally like RNA, the repairing mechanism might not have known if it was wildtype or mutated. Now it can separate c->U from wt-T, according to Watson, Molecular Biology of the Gene.
How organisms have changed from one genetic system to another is very interesting from an evolutionary perspective. It seems to me that a sudden change in tRNAs or in the mechanisms of aminoacyl-tRNA-synthetases that connects the right amino acid with the right tRNA would be fatal.
Sebastian Ibstedt [ 06. February 2006, 07:14: Message edited by: Sebastian Ibstedt ]
IP: Logged
|
|
Allen Lints
Member
Member # 1453
|
posted 05. February 2006 18:06
Jules said: "Allen, I looked at your pdf. Way over my head, I'm afraid. Are you proposing that the code came about randomly or by design? I couldn't tell. I am curious about the graphs for the different kinds of amino acids. I'm wondering if it might explain why we would want more than one triplet to code for certain amino acids." I am a fervent anti-Darwinist. But my perspective is not as severe as Florence Nightingale when she said "To understand God's thoughts we must study statistics, for these are the measure of His purpose". Florence Nightingale I do hope I am not the same as Lang unsophisticated forecaster. "An unsophisticated forecaster uses statistics as a drunken man uses lamp-posts - for support rather than for illumination." Andrew Lang. I don't believe this is over your head, but do remeber it is a work in progress and I notice I didn't provide you with the links to the others. I have added them and I will let you know when I have a more complete paper. Thanks for looking at it.
IP: Logged
|
|
KBC1963
Member
Member # 1868
|
posted 06. February 2006 14:34
"And not to ignore KBC1963, yes, we can see all sorts of design reasons for having a genetic code. But if its origin can be explained by some sort of chemical determinism, then it increases the probability of a non-design explanation, and decreases the probability of a design explanation. The fact (if it is a fact) that there isn't a chemical connection between the genetic code and amino acids seems (to me, at least) to severely undercut the plausibility of a non-design explanation for the genetic code."
Yes Jules, That would be a great help, I wonder though if we can determine what codes for the parameters of the individual component positioning and size.
IP: Logged
|
|
Jules
Member
Member # 181
|
posted 06. February 2006 20:04
Thanks for everyone's reply. Sebastian, by "arbitrary" I meant "not chemically determined." Which would seem to indicate that it is determined by factors an engineer would take into account, as you amply demonstrated. I'll have to re-read all this when I have more time.
IP: Logged
|
|
Art
Member
Member # 179
|
posted 06. February 2006 22:09
The assertion that “…no one has discovered a persuasive chemical connection between any particular triplet of nucleotides and the amino acid that this triplet codes for“ isn’t really true. There is in fact an excellent chemical basis for at least some of the genetic code – and it lies in the abilities of some amino acids to selectively recognize RNAs that possess their cognate codons and/or anticodons. As Yarus (Annu. Rev. Genet. 2002. 36:125–51) puts it:
quote: To summarize: on the basis of newly selected site structures, there presently appear to be three classes of amino acids. Arginine behaves straightforwardly; it follows the natural example of the group I active center and strongly overrepresents its codons (but not anticodons) within its binding sites. A second anticipated type includes glutamine and phenylalanine; with due allowance for the fact that these are negative results, these amino acids have no detectable relation to their coding sequences. The comment about negative results is not trivial; the glutamine codon CAG turns up in an RNA recognition domain for gln-cyanomethyl ester (61), adjacent to a metal site required for a 50 self-aminoacylation reaction. This association is a reminder that negative results can always be superceded. However, these amino acids might have been assigned their codons on another basis (such as preservation of protein structures) during the later evolutionary history of the code (27). Finally, the data suggest an unanticipated class of amino acids—isoleucine and tyrosine are bound in sites containing disproportionate numbers of both codon and anticodon sequences (though the potentially complementary triplets are not paired). The genetic code therefore seems, in part, stereochemical. It is based on primordial RNA-like amino acid binding sites whose parts seem to have been adapted to become the mRNAs and tRNAs of a later, more evolved translation apparatus. To return to the argument about RNA activities, the need for primitive RNA-based coding could be met with ordered amino acid binding sites as peptide templates (114), as well as in other ways (97).
(pp. 134-135).
The notion that the genetic code has an historical hierarchy of sorts is fascinating. That it is based in part on simple RNA-amino acid interactions is also interesting, if somewhat unsurprising. In any case, the claim that the genetic code is entirely arbitrary, without a physico-chemical basis, doesn’t really pass muster. There is a chemical basis for the code, at least in part. Discussion, speculation, and hypothesis-building needs to start at this point.
IP: Logged
|
|
Jules
Member
Member # 181
|
posted 28. February 2006 17:46
Thanks for the input, Art. I was afraid this would be just a thread with everyone agreeing. Now you've made it interesting.
Let's see if I understand any of what you wrote. First, Arganine binds to its codons, but not to its anticodons? And without an enzyme? I would be curious to know how it came to bind to its anticodons, with the help of an enzyme. Why the need for the "middlemen" of the enzyme and the tRNA?
Then it talks about glutamine and phenylalanine, and it seems to say that they do not bind at all, which would mean there is no physico-chemico connection? Or do I misunderstand?
Finally, isoleucine and tyrosine bind to both their codons and anticodons? Or do I misunderstand, again?
So are we talking about a total of 5 amino acids? And of these 3 show some sort of physico-chemico connection? And the other 15? Or have I revealed my almost complete ignorance of this subject?
IP: Logged
|
|
Art
Member
Member # 179
|
posted 11. March 2006 20:27
Hi Jules,
I’m a bit busy to explore your questions at length at the moment, so I thought I’d add another passage from a later review on this subject. Just to keep the thread moving at at least a snail's pace. The paper is in the 2005 volume of Annual Reviews of Biochemistry (ORIGINS OF THE GENETIC CODE: The Escaped Triplet Theory, Michael Yarus, J. Gregory Caporaso, and Rob Knight, Annu. Rev. Biochem. 2005. 74:179–98). Enjoy.
quote:
For this discussion, we revisit the occurrence of codons and anticodons in amino acid–binding sites isolated by selection amplification (SELEX), combining all data in our possession. We now know of 43 recently selected and characterized binding sites for 8 amino acids in RNAs with 2791 total initially randomized nucleotides. Our intent here is to use these data conservatively to arrive at minimal estimates for the probability of the calculated associations (26). To estimate the probability of observed associations between coding triplets and binding sites, we considered only those sequences for which direct experimental information about the binding sites was available (chemical protection/modification/interference mapping, conservation, or NMR).We considered only independently derived structures in which the binding site occurred in backgrounds with no significant sequence identity. For example, the minimal isoleucine site has been isolated at least 63 times, and the minimal histidine site has been isolated at least 54 times. However, only a minority of individually characterized examples are used for calculations in Table 1. Inclusion of other independent isolates would increase the statistical significance of the results (diminish the probabilities) by many orders of magnitude.
To calculate probabilities, we assigned each aptamer nucleotide to one of four categories: both triplet and binding-site nucleotide, triplet and not a binding-site nucleotide, not a triplet nucleotide but a binding-site nucleotide, or neither triplet nor binding-site nucleotide, depending on whether it was in a coding triplet (either a codon or an anticodon for the cognate amino acid) and whether it was in the experimentally determined binding site. These four counts were pooled across sequences for the same amino acid, and subjected to the G test for independence (with the Williams correction), which tests whether the proportion of binding site nucleotides that participate in coding triplets exceeds the proportion of nucleotides in flanking sequences that participate in coding triplets. The G test is similar to but more accurate for small samples than the chi-squared test traditionally used in genetics. Thus, the nonbinding parts of the aptamers act as an internal control for all manipulations and for the net nucleotide composition. Counts are pooled across codons or anticodons and across all aptamers for one amino acid to increase sensitivity.
To compare results across multiple analyses (different amino acids, codons, and anticodons), we used Fisher’s method for combining independent tests of a hypothesis. We combined the tests for codons and anticodons for each amino acid and combined the tests for all amino acids to get overall estimates of the probabilities bearing on the escaped triplet hypothesis (Table 1). We emphasize a few aspects of these results. 1. Codons and cognate binding sites are not independent sequences, and the association between anticodons and newly selected aptamer binding sites is even more unlikely. 2. Not all amino acids show codons or anticodons concentrated in their binding sites, and codons and anticodons are not necessarily associated. Arginine sites improbably contain only codons; tryptophan and histidine sites contain only an excess of anticodons. 3. For both codons and anticodons, there are significant arguments for more than one single amino acid. The argument for codons could rest on arginine or isoleucine alone, whereas the argument for anticodons could be based only on isoleucine, tyrosine, or tryptophan. Thus, the evidence for triplet association with binding sites does not rely completely on any particular selection, any particular amino acid, or indeed on experimentation by any one laboratory. 4. Conversely, the argument can be made from the most significant single amino acids without reference to pooled data—arginine for codons and isoleucine for anticodons would serve well. Thus, the major conclusions can be supported using homogeneous subsets of the data and do not emerge only by combining disparate experiments. 5. There are true negatives, such as glutamine anticodons and histidine codons, indicating that this set of techniques does not unexpectedly force a positive result. 6. Only one amino acid, glutamine, shows no significant associations with either codons or anticodons. Stereochemical associations seem the rule rather than the exception. 7. The amino acid sites showing strong associations with cognate triplets are chemically varied; basic, polar, aliphatic, and aromatic side chains are detected. 8. Positive triplets do not have any obvious sequence or compositional properties, being AU-rich, GC-rich, and mixed. 9. Controls using arbitrarily derived triplets with the same compositions, such as reversed codons (25), do not concentrate in amino acid–binding sites.
In summary, current evidence, construed in a way that minimizes its signifi- cance, supports significantly elevated codon concentration in the binding sites for isoleucine, and arginine as well as elevated anticodon concentration for six of the eight amino acids, with the exceptions of glutamine and arginine. It is intriguing that the one exception to stereochemical origin is glutamine, an amino acid for which the evidence of coevolution (see above) is among the strongest. The overall probability that codons and anticodons for eight amino acids are independent of their selected cognate binding sites (counting favorable and unfavorable cases together) is exceedingly low, 5.4 x 10^-11.
Although the numerical details may change with addition of data, the overall trends to association of codons with a few kinds of amino acids and anticodons with the sites for many or most amino acids appear secure by normal standards (Table 1). Because these sequences were selected requiring only affinity for free amino acids, there seems no simple or otherwise plausible alternative to the idea that progenitor structures that bound amino acids gave rise to the coding triplets of the present genetic code.
IP: Logged
|
|
Jules
Member
Member # 181
|
posted 16. March 2006 11:08
Thanks Art, for the update. I'll have to go look up "aptamer" so I can better understand the review. However, it seems to be saying that there is significant binding for 8 amino acids, two for their codons, 6 for their anticodons. And that since these were the only amino acids tested for, it reasonable to assume the other 14 amino acids would show similar results. Is that about right?
IP: Logged
|
|
Bruce Fast
Member
Member # 924
|
posted 16. March 2006 14:07
So far as I have read in this thread, there is some sort of assumption that 20 aminos is not arbitrary. However, let me suggest that 20 aminos is also totally arbitrary. I suspect that life could very well be pulled of with 12, and certainly life could have incorporated 30. So why 20 other than "arbitrary".
IP: Logged
|
|
|