|
Author
|
Topic: IC, CSI, and fitness landscape topology
|
Mark Elkington
Member
Member # 120
|
posted 17. March 2002 07:30
One landscape consists of sparse needle-like columns jutting out of an endless death valley; another is made up of smoothly rising ridges and ranges. The expression "fitness landscape" when applied to biological evolution tends to conflate physical terra firma with abstract mathematical topology, with the effect of making the latter seem as obvious and convincing as walking up a hill. Yet when thinking about GAs, fitness landscapes are real and fundamental. Can the fitness landscapes of problems which can be solved by generalised GAs (or by any GA) be shown to be significantly different to biological fitness landscapes, which are claimed to be insoluble by such algorithms?
Behe's defines IC like this:
"By irreducibly complex I mean a single system composed of several well-matched, interacting parts that contribute to the basic function, wherein the removal of any one of the parts causes the system to effectively cease functioning. An irreducibly complex system cannot be produced directly (that is, by continuously improving the initial function, which continues to work by the same mechanism) by slight, successive modifications of a precursor system, because any precursor to an irreducibly complex system that is missing a part is by definition nonfunctional." (DBB p. 39)
Taking this definition, but allowing degrees of IC-ness, can we say that:
- the C in CSI = the C in IC - the S in CSI = f(IC pathway difficulty or improbability)
In another topic here I give an example of a (very approximately) biological GA which was used to generate 1,800 bits to configure a programmable electronic chip to discriminate between two tones, and even the spoken words 'stop' and 'go'. We could construct a fitness landscape for this problem by making the lower 900 bits a (very large) number representing an x-axis coordinate, the upper 900 bits the y-axis, and some relative measure of tone discrimination the z-axis, such that z = f(x, y), the fitness function. This would allow us to plot a three-dimensional landscape, and thereby visualise and analyse the nature of the terrain the GA actually traversed. What is the density of local optima which constitute solutions? For a random starting point, what is the probability that an algorithmically traversable pathway to a solution exists?
We could perform a blind search to determine solution density. However, given Dembski's proposed "universal probability bound" of 10^-150, the best we could do in this universe is map only a tiny fraction F of the total landscape:
F = 10^150 / 2^1800 ~ 10^-1650
More realistically, using a computer which could calculate 10^6 points per second, we could sample less than 10^14 points in a year, so even if we found no solutions, we could only claim that the density was less than or equal 10^-14. The example GA which found a solution explored only something like 5000 generations each of 50 varied offspring. Can we therefore deduce that the fitness landscape in this case must be highly favourable to solution convergence - either the solution density is extremely high, or the probability of hitting a traversable solution pathway is extremely high?
Even if there was only one solution (i.e. 1 in 2^1800) we can still defy the universal probability bound of 10^-150 if the landscape has the shape of a single smooth mountain (globally monotonic with a minimum slope); likewise for similar sparse topologies.
Even for this relatively simple example, the phase space is vast compared with search resources. For the human genome, the raw phase space is 4^3,200,000,000 bits. What seems to be unavoidable is that whenever the landscape is either not super-dense or is lacking in continuous upward gradients, Darwinian searching is probabilistically impossible*. One question then is, To what extent can we quantify the topology of real biological fitness landscapes? If this cannot be done, how can IC arguments like Behe's lead to an acceptable disconfirmation of Darwinism, no matter how compelling they may be qualitatively?
A comment on biological fitness landscapes. From the above figures, we know we lack the universal resources to map anything but tiny portions. And how do we determine the fitness function anyway? Given all this, perhaps the way forward is to examine specific instances, for example, Coupled Mutations and Quantization of Functionality. Or perhaps there are other mathematical ways of side-stepping these problems.
The diversity of life on earth can give an impression of seemingly endless "niches" -- an abundance of viable "solutions" to the fitness function. But when measured against the total phase space, the opposite is true -- life as we know it is a sparse feature of the landscape. A common counter is that there are many more unrealised potential solutions; even the belief that life is almost a physical inevitability.
Evolution with common descent implies an expanding, dynamic fitness landscape, but one that must be traversed along continuous pathways. Evolution cannot freely sample the whole phase space, it can only build off existing branches, like a game of Scrabble. To what extent may we deduce that these pathways must be continuously ascending? Presumably some chance is allowable, some scope to venture along level or even downward stretches. Estimating organism "optimality" in a certain lineage would reveal the sign and gradient of the slope, but to do that requires either knowledge of the fitness function, or some other subjective standard. Another less circular approach might be to measure randomness in organisms. It is generally believed that genomes contain coding, noncoding but functional, and nonfunctional DNA, the last category being proportionally the largest (though shrinking it seems as understanding grows). How much would nonfunctional regions be free to mutate and randomise, unchecked by natural selection? With limited knowledge here I suspect not with complete freedom: e.g. certain combinations would signal commencement of protein coding, which could be harmful and therefore disallowed. However, do we in fact observe the degree of randomness permitted by such constraints in these sequences, e.g. incompressibility?
Thus far I have grossly oversimplified organisms as merely genotypes. (The central dogma of biology (DNA -> proteins, never the reverse) has been accused of fostering an overly gene-centric and reductionistic model; see Brian Goodwin's "How the Leopard Changed Its Spots".) In fact, the fitness function acts upon the phenotype, not the genotype. And curiously, phenotypes seem far less tolerant of carrying random or obviously nonfunctional structures (a subjective judgement on my part). If we were able to measure phenotype randomness (I stress, as opposed to optimisation), and we find it to be very low, then we have effectively shown that evolution must be proceeding mostly up fitness gradients. How else would we get such a complete filtering of randomness?
What's my point here? Simply the conclusion that for Darwinian evolution to proceed, the fitness landscape must feature a large number of continuous, interconnected, steadily upward pathways. This seems to be a severe constraint, and intuitively an unlikely topology for real landscapes.
Empirically determined rates of mutation lethality, neutrality etc are commonly cited as either demonstrations of tractable evolutionary pathways, or as proof that sheer cliffs surround existing species on all sides. Evidence can be in the eye of the beholder. Nevertheless, it also seems that methods such as this are going to be essential in making the inferences ID needs to make about landscapes which are by direct means inscrutable.
______________ * Assuming that we accept the notion of a universal probability bound (however optimistic the estimate). In practice I think everyone does, because no-one seriously suggests the spontaneous formation of Homo sapiens. The assumption is that natural selection dramatically shortens the odds to some plausible amount - i.e. an implicit universal probability bound! [ 17 March 2002, 07:39: Message edited by: Mark Elkington ]
IP: Logged
|
|
Jack Foster
Member
Member # 79
|
posted 19. March 2002 00:46
Hi Mark:
Mark: One question then is, To what extent can we quantify the topology of real biological fitness landscapes? If this cannot be done, how can IC arguments like Behe's lead to an acceptable disconfirmation of Darwinism, no matter how compelling they may be qualitatively?
This of course assumes that Behe bears the burden of proof. It seems to me that if Darwinists assert evolution for some primary IC feature; ribosome, flagellum, F-ATP Synthase, replisome, or other molecular machines; which features by Behe's "unselected steps" definition would be isolated in phase space, then they should be expected to elucidate a potential pathway, which can then be empirically verified or falsified. I know that others don't agree that Darwinism need shoulder this burden. I believe that's because Darwinism is the only theory truly compatible with the accepted epistemological assumption of evolutionary biology: (methodological) naturalism.
Mark: Thus far I have grossly oversimplified organisms as merely genotypes.
I was going to point that out myself! Most with basic knowledge of Evolutionary Algorithms understand that "evolvability" is not a quality easily attained, and certainly the basic Darwinian algorithm is not sufficient to provide it. Wagner and Altenberg conclude that a sufficient genotype to phenotype map is a requirement for evolvability. Here form their landmark paper: "Complex Adaptations and the Evolution of Evolvability"
quote: Hence, the Darwinian solution of optimization problems is possible if and only if the problem is "coded" in a way that makes the mutation-recombination-selection procedure an effective one. The "representation problem" is how to code a problem such that random variation and selection can lead to a solution. The representation problem underlies the issue of whether selection, mutation, and/or recombination can produce adaptation.
For biology the "representation problem" has some unsettling implications. If, as evolutionary biology asserts, all adaptations are the result of mutation and selection, organisms have to be evolvable. But once one calls into question the inevitability of organisms being evolvable, one can ask, how and why did an evolvable genome originate in the first place? Is it a fortuitous consequence of physics, or of biochemistry, or a "frozen accident" from life's origin? Are the genetic representations of the phenotype a product of evolution? What, if any, are the evolutionary forces that shape the genotype-phenotype map?
In fairness to W & A, they conclude that evolvability is evolvable, since the genetic code itself is subject to genetic control. Still, for me, I can't help but see a chicken and egg problem. If a genotype to phenotype map is required to provide evolvability, how did the genetic code itself ever evolve into being?
Mark: What's my point here? Simply the conclusion that for Darwinian evolution to proceed, the fitness landscape must feature a large number of continuous, interconnected, steadily upward pathways. This seems to be a severe constraint, and intuitively an unlikely topology for real landscapes.
This is exactly right. Where does the fitness landscape come from? It is the "creation" of the self-replicating organism itself in relationship to its environment. A genotype to phenotype map with evolvability effectively constrains search space in an evolutionarily friendly way. There is significant evidence that a pure "unmapped" self-replicator simply cannot have evolvability because search space has not been constrained. What will the fitness function for simple self-replicator look like? Certainly efficiency will be rewarded. It's possible that this topological feature is so great that no evolutionary pathway away from this "efficiency well" can succeed in providing evolutionary advance. This could be one of the reasons that evolvability requires a mapping. An organism with g-p map has an IC "base camp" which prevents a fall into the efficiency well, and provides pathways for evolution to follow by appropriate constraint of infinite phase space.
The most interesting examples of silicon-based EAs have very strong genotype to phenotype maps. Sims' "Virtual Creatures" comes to mind. Most genotypes provide viable phenotypes; just like with Thompson's evolutionary system, with many genotypes providing starting output tone (also a friendly mapping). [ 19 March 2002, 01:29: Message edited by: Jack Foster ]
IP: Logged
|
|
Drosera
Member
Member # 139
|
posted 19. March 2002 22:11
quote:
What's my point here? Simply the conclusion that for Darwinian evolution to proceed, the fitness landscape must feature a large number of continuous, interconnected, steadily upward pathways. This seems to be a severe constraint, and intuitively an unlikely topology for real landscapes.
Wait a minute...
...aren't "smooth fitness landscapes" (it would be good to keep in mind that all of these things are simplified models of reality rather than reality itself) primarily a product of the fact that the physical laws of nature are basically continuous except at the quantum scale?
E.g., the following are the products of chemical and physical laws:
-gradual temperature gradients
-gradual chemical gradients (e.g. a diffusion of pesticide in soil, to pick a random example)
-gradual light gradients (e.g. in water)
...and I think one could argue that there is excellent evidence that populations adapt up and down these with relative ease.
The real question in my opinion is what are the capabilities of organisms to vary. The smooth environmental gradients are everywhere. But how tough is it, really, to get a light-sensitive spot, and then to make it gradually into a cup, etc.? Is the biochemistry of primitive light-sensitive spots really as complicated as the biochemical cascade that Behe describes for the mammalian eye? This is what he implies but I highly doubt it.
(...although I don't have any real knowledge of the topic...but someone should be consulting the lancelets, planaria, protozoans, etc.)
Drosera
IP: Logged
|
|
John Bracht
Member
Member # 5
|
posted 20. March 2002 00:27
Drosera,
The shape of the fitness landscape for proteins is anything but smooth, and is determined by the details of protein folding. See the thread "Coupled Mutations and the Quantization of Functionality" for a discussion of how the literature shows protein folding landscapes to be very rugged.
It is this protein folding landscape that evolution must manuver around on to produce new protein systems and pathways.
John Bracht
IP: Logged
|
|
Mark Elkington
Member
Member # 120
|
posted 20. March 2002 07:20
Jack,
Jack: This of course assumes that Behe bears the burden of proof. It seems to me that if Darwinists assert evolution for some primary IC feature; ribosome, flagellum, F-ATP Synthase, replisome, or other molecular machines; which features by Behe's "unselected steps" definition would be isolated in phase space, then they should be expected to elucidate a potential pathway, which can then be empirically verified or falsified. I know that others don't agree that Darwinism need shoulder this burden. I believe that's because Darwinism is the only theory truly compatible with the accepted epistemological assumption of evolutionary biology: (methodological) naturalism.
I should point out that despite my asking questions of Behe's approach, I personally find the IC argument very convincing. I'm wondering if/how this approach can be progressed from being more confirming data for the converted, to a non-dismissable challenge for evolutionists. Irrespective of what may be fair and reasonable, in reality the burden of proof falls largely on any challenger to the ruling paradigm.
Even worse, the IC proponent is required to prove a negative -- i.e. that no pathway exists, which is difficult if not impossible to do exhaustively. This means the evolutionist has a kind of "free out" defense, which can be used as a matter of argument formality, independent of evidence. Perhaps it's not quite that black and white...
Thanks for the excellent reference; I'll give it some more thought.
Jack: What will the fitness function for simple self-replicator look like? Certainly efficiency will be rewarded. It's possible that this topological feature is so great that no evolutionary pathway away from this "efficiency well" can succeed in providing evolutionary advance.
Interesting point. From memory, experiments with replicating RNA and AI confirm this: you tend to get evolution in the direction of smaller, faster, simpler, more efficient, not more complex.
IP: Logged
|
|
Janitor@MIT
Member
Member # 125
|
posted 21. March 2002 10:54
Very interesting alternative view:
Stadler, Barbel M.R., Peter F. Stadler, Gunther P. Wagner, and Walter Fontana. 2001. The topology of the possible: formal spaces underlying evolutionary change. (PDF Online)
Comparing traditional metrical spaces with non-metrical spaces exemplified by a minimization of free energy equivalence classes of RNA structures defining "neutral networks"... "distance" not Euclidean vectors or strongly corrrelated near-neighbor structure... space has non-local properties and lacks symmetry... adjacency and distance lose familar meanings... topology dissolves into "pretopology"... Moving across this landscape is very different from hil-climbing and gradient descent...
(Still digesting, but very interesting. Michael Behe might take a look at it and I think James Barham has an interest in topologies.)
IP: Logged
|
|
Paul A. Nelson
Member
Member # 26
|
posted 21. March 2002 12:24
Anyone who consults the Stadler et al. paper in JTB (thanks for the citation, Janitor) will also want to look at Gunther Wagner's commentary, which is based in part on the JTB paper:
Gunther P. Wagner, "What is the Promise of Developmental Evolution? Part II: A Causal Explanation of Evolutionary Innovations May Be Impossible," Journal of Experimental Zoology (Mol Dev Evol) 291 (2001): 305-309.
IP: Logged
|
|
|