|
Author
|
Topic: Wells' Molecular Phylogenies
|
Cornelius G. Hunter
Member
Member # 81
|
posted 12. November 2003 21:30
RBH:
The first reference (please see the final section) discusses the non convergence between mitochondrial and nuclear genes. It reads in part:
quote: The result, Naylor told the meeting, gave "really quite impressive" statistical support--for what was clearly the wrong answer. For example, the molecular tree clustered frogs and chickens in a clade with fish, even though these three species do not derive from a common ancestor. To make matters worse, echinoderms (which include the sea urchin and the starfish) branched closer to the vertebrates than did amphioxus, a primitive marine chordate that is well established as the closest living relative of the vertebrates. "I think this talk was fairly distressing for many people," says de Jong. And Diethard Tautz, a developmental biologist at the University of Munich, comments, "It has become clear that the analysis of molecular data is not as straightforward as many would have wished."
They were able to improve matters not by adding genes but by biasing the sample on certain amino acids (yet another mechanism that has not been observed). Again quoting:
quote: Naylor concluded that rather than trying to build better trees by sequencing more and more genes--an approach common among molecular phylogeneticists--"our efforts are probably better spent investigating which kinds of sites best reflect actual historical, phylogenetic signals." Michael Nedbal, an evolutionary biologist at the Field Museum in Chicago, says Naylor's talk was "an especially important message to those in molecular phylogenetics. Just like morphologists, molecular systematists must investigate how their characters are evolving before subjecting them to phylogenetic reconstruction."
So this brings us back full circle to the paper at hand, which is advocating using more genes. And like the paper at hand, the second reference is also a "whole genome" analysis (in fact, it uses more orthologues than the Rokas paper). It demonstrates that the five photosynthetic prokaryote groups do not converge. They found no significant support for any of the 15 possible phylogenies and had to resort to massive HGT as an explanatory mechanism:
quote: These results bolster the idea that the evolution of photosynthetic genes has been disconnected from divergence and speciation in these organisms, confirming the extensive role that horizontal gene flow has played in prokaryote evolution."
Not only is the required level of HGT rather high, but the transfer of sequences such as photosynthesis genes was not thought to be possible. The third reference is simply a summary of animal and plant morphological similarities (eg, placental and marsupials) which require massive convergence at the macro level. This has not changed since 1990. All three references substantiate my point. Finally, the physicist in my analogy is not meant to represent evolutionists. He is meant to represent people who think unexplained anomalies are not important in evaluating the evidence for a theory.
IP: Logged
|
|
RBH
Member
Member # 380
|
posted 12. November 2003 22:46
Of the news article in Science Hunter wrote quote: They were able to improve matters not by adding genes but by biasing the sample on certain amino acids (yet another mechanism that has not been observed).
However, adding a bit more context to the material Hunter quoted, we find quote: To figure out what was going wrong, Naylor and Brown looked more closely at the 12,234 nucleotides, to see which ones were providing accurate information about the expected phylogenetic tree and which were causing the problems. "We wanted to see what makes a good site good and a poor site poor," Naylor says. The results were very instructive. For example, when they grouped the nucleotides into codons--nucleotide triplets that code for specific amino acids--they found that codons corresponding to the hydrophobic (water-hating) amino acids gave an "absolutely rotten" fit to the tree. On the other hand, codons for amino acids that are hydrophilic (water-loving) or carry an electric charge provided a much better fit. But the best fit of all came from amino acids that seemed to be critical for determining the proteins' three-dimensional structure.
When the analysis was rerun using only the nucleotide sites corresponding to these amino acids, the expected phylogenetic tree reemerged with considerable statistical support. Naylor concluded that rather than trying to build better trees by sequencing more and more genes--an approach common among molecular phylogeneticists--"our efforts are probably better spent investigating which kinds of sites best reflect actual historical, phylogenetic signals." Michael Nedbal, an evolutionary biologist at the Field Museum in Chicago, says Naylor's talk was "an especially important message to those in molecular phylogenetics. Just like morphologists, molecular systematists must investigate how their characters are evolving before subjecting them to phylogenetic reconstruction."(Emphasis added)
Now, what is it that determines a protein's biological activity? Its shape. And the amino acids that are critical for determining the shape of proteins turn out to produce a phylogenetic tree that corresponds to the tree derived from other data. That's an interesting finding from an evolutionary point of view. But to say that using amino acids relevant to proteins' biological activity is "biasing the sample" and calling it "yet another mechanism that has not been observed" is simply incoherent. Should they use data ftrom biologically inert molecules? And what does "mechanism" mean in Hunter's remark "yet another mechanism that has not been observed"? What has not been observed? That some amino acids play a role in determining protein function, kinetics, and folding? They certainly do. See here for a recent review.
Clearly the Naylor and Brown findings based on hydrophobic, hydrophilic, and charge-carrying amino acids need explanation. I don't know what that explanation is, but I don't doubt that someone - a real biological scientist - is working on it.
RBH
IP: Logged
|
|
Cornelius G. Hunter
Member
Member # 81
|
posted 13. November 2003 00:09
RBH:
I'm happy to talk proteins with you, and I will in a moment (the bottom line is things are more complicated than you are making them out to be). But setting that aside for a moment, your charge that my point is incoherent needs some justification. Why is my point incoherent?
Attempting to elaborate, you ask: "Should they use data from biologically inert molecules?" First of all, none of the amino acids are "biological inert." (One could make arguments that glycine is chemically inert, depending on what you mean by that phrase, but glycine certainly is not biologically inert since it has important structural features). But in any case, to answer your question, yes, it is standard to use all the amino acids, not just a subset. That is what is done in the vast majority of molecular phylogenies, including the Rokas paper that you support. In the case of these mitochondria genes studied by Naylor and Brown, to make them look anywhere close (in terms of convergence) to nuclear genes, they had to select out only certain types of amino acids. This is called, as I said, biasing the sample. This is not "incoherent" as you charge. It is a description of what they did.
Furthermore, there is no empirical evidence of such mutational conservation. This is why I said this was yet another explanatory mechanism that has not been observed. There is nothing "incoherent" here.
Now on to proteins. You say that protein shape determines biological activity. While shape is certainly a factor, and quite important in some cases, there are plenty of other cases where you have a different function with the same shape (and also cases where you have the same function with proteins of different shape). You could have learned that by reading into the second paragraph of the Shakhnovich paper you cited. Reading further, still in the second paragraph, you could have learned that stability and folding kinetics appear to be what is most important in amino acid conservation.
This raises questions, which you seem to be unaware of, about Naylor and Brown's conclusion that the structurally-relevant amino acids are the important ones. Note that they do not justify this. They simply looked for the subset of amino acids that would provide a better fit, and found what they refer to as the structurally-relevant ones. First off, I'd be curious how they came up with that category, as this remains an open research question. I wonder which amino acids they consider to be structurally-relevant. One thing that is obvious is they don't consider the hydrophobics to be in that group (since the hydrophobics lacked convergence). That's curious since the hydrophobics, in fact, drive the folding collapse. In fact, I would argue that the hydrophobics are indeed structurally-relevant.
--Cornelius [ 13. November 2003, 12:08: Message edited by: Cornelius G. Hunter ]
IP: Logged
|
|
RBH
Member
Member # 380
|
posted 13. November 2003 14:06
As Hunter (and other readers) are no doubt aware, I'm operating well outside my domain of professional competence here. I'm going to have to take a break to familiarize myself with some of this stuff. That won't be immediately - I have a quarterly board meeting late next week to get prepared for, so I'll not have much spare time.
I do want to explore these issues, though, since they seem to be at the base of some of the critique of evolutionary theory that I should become conversant with.
RBH
IP: Logged
|
|
Matthew J. Brauer
Member
Member # 819
|
posted 13. November 2003 15:10
Cornelius Hunter wrote: quote:
Evolution predicts convergent phylogenies.
Of course, this is only true to a first approximation. To understand why, it's important to distinguish between gene trees and species trees.
Because of a number of well-understood genetic phenomena, a pair of loci may have very different population genetics parameters. In the case of mitochondria and human y chromosomes, there is a big difference in effective population size from the autosomes. Population size has a significant effect on the "coalescent time" of a gene, that is, of the time since any two haplotypes last shared a common ancestor. Selection also affects the coalescent time, as does recombination.
Since genes can have different coalescent times, their topologies can vary, depending on the relationships between the coalescent time and the time since a speciation event. This means that gene trees do not always correspond to species trees. Which means that gene trees are estimates of species trees.
Finally, I refer you to the simulation literature (e.g., Hillis, Huelsenbeck, et al.). If you start with an assumed tree and simulate sequence evolution along the branches, what do you see? Well, you get some non-convergent characters, i.e., homoplasy. That is to say, you expect to see homoplasy under any reasonable model of sequence evolution. These homoplastic characters don't help us to resolve the structure of the underlying tree (necessarily), but they are there (and they can help with the estimation of rates of sequence evolution). They are, for the narrow purposes of tree reconstruction, "noise".
If I understand you correctly, you seem to be saying that this is a problem. Your criterion for the usefulness of these methods is that there be zero variance in the estimators of tree shape. Otherwise (you seem to insist) the noise needs to be "explained".
I suggest another analogy: you are told to measure the December temperature of your back yard. You get 31 different results. Should you throw them all away? Or can you get useful information from them? Do you need to explain each individual deviation from the mean?
[typo edit] [ 13. November 2003, 15:39: Message edited by: Matthew J. Brauer ]
IP: Logged
|
|
Art
Member
Member # 179
|
posted 13. November 2003 15:41
quote: I suggest another analogy: you are told to measure the December temperature of your back yard. You get 31 different results. Should you throw them all away? Or can you get useful information from them? Do you need to explain each individual deviation from the mean?
WRT to the OP, the analogy that popped into my mind when the word "physics" appeared was Fourier transform NMR. One "pulse" of a dilute sample deconvolutes into a tenuous spectrum - not wrong, but probably in equivocal agreement with other spectra obtained with single pulses. However, accumulate or combine data from many, many pulses and the result is a strong, unambiguous and robust spectrum.
IP: Logged
|
|
Matthew J. Brauer
Member
Member # 819
|
posted 13. November 2003 15:47
I'd add, in agreement with Hunter, that partitioning the data so as to find the desired result is more than a little fishy.
Furthermore, it is surprising (if it's true) that it's the "structurally relevant" residues that give the best signal. The ideal sample should be, rather, one comprising sites that are not under strong selection. Strong selection on a trait would tend to reinforce convergent characters, leading to data that are positively misleading. That's why most phylogenetic methods look explicitly at silent (or at least conservative) substitutions, with the assumption that they are neutral.
I guess I ought to read the paper... .
IP: Logged
|
|
Cornelius G. Hunter
Member
Member # 81
|
posted 13. November 2003 17:55
Matthew:
Thank you for the comments. However, you are overstating my point such that my criticism of the evidence appears to be meaningless. I am not saying that evolution predicts zero variance trees and that there should be zero phylogenetic "noise." The sample references I supplied illustrate cases that are nowhere close to being in the noise. They illustrate data for which the well understood, and empirically observed, mechanisms are not sufficient.
Perhaps you (or others) have not read through this thread, so I'll repeat my point. Evolution predicts convergent phylogenies and the extent of the convergence that we observe is claimed by some to be strong evidence for evolution. The problem with this claim is that there are non convergent data which cannot be explained by empirically observed mechanisms. So, in fact, the data do not support evolution as claimed. It is not strong evidence, and argues against evolution. To give an evolutionary explanation of the data, we must employ unobserved phenomena. People who claim the evidence strongly supports evolution are like the physicist in my analogy who ignores the anomalous data. Your temperature in December analogy does not work.
Imagine a historian of antiquity claiming that his interpretation of an ancient event, which occurred in the month of December, requires that the temperature would have had to have been highly variable throughout the month (like your example of 31 different temperatures in the month). This would be reasonable since we sometimes observe this level of variability. But this is not analogous to the case of evolutionists claiming that various, unobserved, mechanisms are required to explain the data, yet the data are strong evidence for the theory.
Now imagine another historian analyzing the same event. He feels he has strong evidence for a different interpretation. There is one problem though. His interpretation requires that it was 400 degrees on one day in that month of December. Nonetheless, this historian claims the data are strong evidence for this interpretation.
--Cornelius [ 13. November 2003, 18:25: Message edited by: Cornelius G. Hunter ]
IP: Logged
|
|
Cornelius G. Hunter
Member
Member # 81
|
posted 13. November 2003 18:10
Art:
Your NMR example is not analogous with the non convergent character data we are discussing. This is not a case of a weak signal or a noisy signal. This is a strong signal that conflicts with the expected result and requires unobserved mechanisms to explain it.
--Cornelius
IP: Logged
|
|
Cornelius G. Hunter
Member
Member # 81
|
posted 13. November 2003 18:19
Matthew:
By the way, I did not intend to imply that Naylor and Brown's sample biasing was fishy. I would have done the same thing. They found highly conflicting phylogenies, and then asked the question: "What method adjustments are required to make the results look more like what we expect." Once we find out what adjustments will do the job, then we can think about what this all means.
I do agree with you that the issue is more subtle than the article's discussion. As you say, we expect unconstrained residues to reveal phylogenetic relationships. Also, it would have been nice for Naylor and Brown, instead of touting their results as a mandate for molecular phylogenists to search for higher order effects, to discuss the implications their findings have for the theory of evolution (but of course, they didn't write the article, they were merely being quoted). [ 13. November 2003, 18:29: Message edited by: Cornelius G. Hunter ]
IP: Logged
|
|
Art
Member
Member # 179
|
posted 13. November 2003 22:05
Cornelius,
A couple of things. First, I think the NMR analogy is pretty apt for the Rokas paper (which in fact is trying to deal with weak or noisy signals).
Second, I am somewhat taken by some of your statements. What I am reading is that you are claiming that several occurrences or mechanisms, processes that explain non-convergence (whatever that is ) have not been observed empirically. However, this cannot be said, at least in general terms, for any of the items on the list that I am seeing here - HGT, convergence (I presume, at the level of amino acid sequence, although this is fuzzy in this thread), “massive” gene loss, high evolutionary rates, non-uniform evolutionary rates. Each of these processes can be studied in real time (as well as with more conventional sequence comparison-based approaches). I might add, I don’t see how any of them are inherently “anti-Darwinian” (another odd term, IMO).
Matt,
I haven’t been able to get the Naylor and Brown paper that was cited in the news piece, but another study by Naylor (“Phylogenetic inference from conserved sites alignments, Grundy and Naylor, J. Exp. Zool 285, 128-139, 1999) sheds light into what this group is doing. Basically (or, more correctly, as far as I can tell - I’m not really a “phylogeneticist”), the use of conserved sites seems to be an approach for teasing useful information out of collections of sequences (such as mt DNAs from distantly-related organisms) that have diverged greatly, probably too much so to otherwise permit constructive phylogenetic analysis. In such cases, circumstances other than what they call history can create “false” phylogenies. Focusing on conserved sites seems to eliminate the misleading signals, probably because the conserved sites have not experienced the same extent of variation.
I’d agree that this approach seems to fly in the face of what is usually preached - unselected positions are the best fodder for phylogenetic analyses. I can see the potential utility, but I would worry that a general use of the conserved sites approach would be as likely as not to unveil a non-historical as a historical signal. (For example, it may well be that conserved sites in mitochondrial genomes in metazoans are subject to very similar selective pressures, and thus can be used to tease out historical information. This is less likely, IMO, to be true for other gene sets. But of course, I could well be wrong on this.)
Art
IP: Logged
|
|
Cornelius G. Hunter
Member
Member # 81
|
posted 13. November 2003 23:15
Art:
I did not say all those mechanisms are not observed. I said some are not. For the photosynthetic prokaryotes, for instance, what is required is massive HGT of many genes, including key genes such as photosynthesis genes. Both the magnitude, and the type of gene, are not observed. And for the Naylor and Brown mitochondria results, what is required are mutations across a large number of proteins that differ markedly between different amino acid types. This too is not observed.
--Cornelius [ 13. November 2003, 23:16: Message edited by: Cornelius G. Hunter ]
IP: Logged
|
|
Art
Member
Member # 179
|
posted 13. November 2003 23:56
Hi Cornelius,
I think it's useful to recall that individual occurrences HGT in bacteria are expected to involve blocks of genes - tens of kbp or more. If bacterial photosynthetic genes are arranged in operons (I'm not familiar with the details here), then what you are describing is more expected than not.
Your point regarding the Naylor and Brown study isn't clear to me. Are you claiming that mitochondrial genes are not known to evolve rapidly?
IP: Logged
|
|
Cornelius G. Hunter
Member
Member # 81
|
posted 14. November 2003 00:25
Art:
What I am saying is not particularly controversial. The massive HGT in the photosynthetic prokaryotes was a surprise, as well as the fact that it involved photosynthesis genes. And the mitochondria results was another complete surprise. Widespread mutations (across many proteins) are not observed to correlate so significantly with amino acid type.
--Cornelius
IP: Logged
|
|
Art
Member
Member # 179
|
posted 14. November 2003 20:27
Hi Cornelius,
While it may not be controversial to state that there was surprise regarding indications of the extent of HGT, and that the results of Naylor and Brown are rather unexpected, it is quite incorrect to assert that "massive" HGT has been neither observed nor explained, or that mitochondrial sequences do not evolve rapidly.
The statements in this thread regarding HGT are especially puzzling, as geneticists were utilizing HGT for decades before we ever knew just what was going on to construct rather detailed (and accurate) bacterial genetic maps.
IP: Logged
|
|
|