ISCID Forums


Post New Topic  Post A Reply
my profile | search | faq | forum home
  next oldest topic   next newest topic
» ISCID Forums   » General   » Brainstorms   » Genetic Algorithms and fitness functions with coupled variables (Page 1)

 
This topic is comprised of pages:  1  2 
 
Author Topic: Genetic Algorithms and fitness functions with coupled variables
Iain Strachan
Member
Member # 96

Icon 1 posted 26. February 2002 16:32      Profile for Iain Strachan     Send New Private Message       Edit/Delete Post 
I'd like to explore the idea of demonstrating Irreducible Complexity using
Genetic Algorithms. The phenomenon of irreducible complexity, as I understand
it from Prof. Behe's book, occurs when two or more coincident changes are
required to provide an improvement to an organism resulting in selective
advantage. As Behe observes in the article on his sight answering the critics
of "Darwin's Black Box", when this happens in experiments (e.g. with bacteria
in Petri dishes), the evolution gets stuck and nothing happens. However, a
skeptic might argue that we don't have sufficiently long to observe it
happening. One always has the burden of proof; if something doesn't happen
then it could always be argued that it will happen given enough time.

Looking at it mathematically, from a very simplistic point of view, it would
seem that the expected time to wait for the right coincidence of mutations to
occur will scale exponentially with the complexity. Thus for N coincident
mutations to arise with a mutation probability p, we expect the probability for
all N to happen to be p^N, and hence the length of time to be p^(-N). I
sometimes don't think people realise just how pernicious an exponential scaling
law is. Mutation rates are of course very small (~10^9 or less?). But if we
were to make p=1/2, a simple example shows how quickly the time gets
ridiculous. The old story of the grains of corn on the chessboard illustrates
this. The story goes that the inventor of chess asked for a reward to have 1
grain of corn on the first square, 2 on the second, 4 on the third and so
forth. According to the Oxford Companion to Chess, the amount of corn required
would be the entire planet's corn production for 5,000 years continuously. But
to see how it scales, consider if Chess had been on a 9x9 board instead of 8x8.
A simple calculation now shows that it will take 655 million years to produce
all the corn. For a 10x10 board it rises to 344 trillion years.

Such times can not be demonstrated in the lab or on a computer, but it is my
hypothesis that a genetic algorithm can be shown to scale exponentially with
some measure of "complexity". My idea is that a GA cannot solve problems where
the variables are coupled in the fitness function, because the interdependence
of the variables gives rise to the phenomenon of irreducible complexity.

If we consider Dawkins's famous "METHINKS IT IS LIKE A WEASEL" example, it is
clear that this is not irreducibly complex. There is no constraint on the
order in which the correct letters fall into place, and so there are an
astronomical number of plausible paths from the start random string to the
desired result. However, by a simple generalisation of the Weasel example, we
can (I believe) show exponential scaling with complexity. I have carried out
some preliminary simulations and I believe I have demonstrated this.

To start with the standard "Weasel", I believe that Dawkins probably used just
the count of correct letters as his fitness function. However, I propose to
use the Euclidean distance in "letter space" as a fitness function. Therefore,
I take the difference in the alphabetic letter position from the target letter
(with a space counting as zero), and sum the squares of these over the 28
letters in the string. The minimum value of this function is zero, when all
the letters are the same as the targets. It can be seen that by using this
fitness function, I have reduced the Dawkins Weasel problem into a minimization
over a quadratic function - just about the simplest minimization problem you
can get. For "mutations", I allow an adjustment to the next or previous letter
in the alphabet (wrapping round at Z to "space"). A genetic algorithm will
solve this problem reasonably well, and it's pretty well equivalent to the
original Weasel. But as I said earlier, this allows astronomically many paths
to the target string. We need to simulate something like the constraint that
the sequence of intermediates are all viable. An example of this would be that
they would all be meaningful sentences, but this would clearly be impossible to
program. So I propose a simpler constraint, namely that all the letter indices
should add up to the same (or as close as possible) as the target string. Thus
we would introduce an extra term into the fitness function that was the
absolute value of the sum of the letters in the current string minus the sum of
the letters in the target string. To grade the severity of the constraint, we
multiply this by a "complexity factor", lambda. For lambda=0, we reduce the
problem to the original case, and for lambda very large, we have a "hard
constraint", where the trajectory in the 28 dimensional phase space is
constrained to lie on a 27 dimensional manifold, thus inducing coupling between
the variables, and requiring coincident mutations to take place. Varying
values of lambda vary the hardness of this constraint.

I have run extended simulations with multiple runs and varying values of
lambda, as an initial experiment, and found that the number of function
evaluations required scaled exponentially with lambda (i.e. produced a linear
plot on a log-scale). However, I'm a little puzzled; I frankly did not expect
the GA to perform as poorly as that, because even in the case of a "hard
constraint", one only needs to adjust two variables in the right direction
(i.e. two coincident mutations).

I should say at this stage that I dispensed with all the weasel mystique and
just used floating point numbers, with negative values allowed, and the target
being all zeros.

Anyway, I'm sure there is much more research to be done on this, and I would
welcome any suggestions for a fitness function to optimise where I can have a
scalable complexity measure, in order to demonstrate the exponential scaling
law in a reasonable amount of CPU time


IP: Logged
William A. Dembski
Member
Member # 7

Icon 1 posted 26. February 2002 18:25      Profile for William A. Dembski   Email William A. Dembski   Send New Private Message       Edit/Delete Post 
Iain's intuitions are exactly my own. I would like to encourage critics of ID who think that GAs can indeed generate specified complexity to weigh in on this thread.

Iain's "coupling problem" (as we may call it) promises to provide a computational justification why Dawkins's Mount Improbable can be a plateau with sheer cliffs on all sides, thus barring any gradual Darwinian ascent.

It seems that this possibility needs to be taken seriously. Moreover, it cannot be dismissed by presupposing Darwinism as true. Rather, this possibility itself throws the Darwinian mechanism into question.


IP: Logged
Iain Strachan
Member
Member # 96

Icon 1 posted 27. February 2002 08:44      Profile for Iain Strachan     Send New Private Message       Edit/Delete Post 
quote:

Iain's "coupling problem" (as we may call it) promises to provide a computational justification why Dawkins's Mount Improbable can be a plateau with sheer cliffs on all sides, thus barring any gradual Darwinian ascent.

Bill's mention of Dawkins's Mount Improbable having cliffs prompts me to remember another common practice used by GA programmers. There is in fact a technical term using the word cliff, namely the "Hamming Cliff". This is used to describe the situation in a GA where a bit string is used to encode a numeric parameter in the solution space. If a simple binary coding is used, we get the phenomenon of the "Hamming Cliff" which is a special case of my "coupling problem". For example, if we use a 10 bit binary code for each number, then the code for 511 is 0111111111 and the code for 512 is 1000000000. In other words the ten bits are coupled, and to progress from one to the other requires ten coincident mutations - and with a mutation rate of 0.01, for example, this would require 10^20 attempts. The problem would recur, to a lesser extent at any point where we flip over to a multiple of a power of two. It is known as the "Hamming Cliff" because the distance in "Hamming space" is large.

However, there is a simple way round this, and it involves using a smarter coding scheme. If we know that the function we are optimising is a smooth one, then we also believe that smooth trajectories in phase space are desirable. So the normal thing to do is to use a "Gray code" to translate from the bit string to the number. Successive numbers in the Gray code only differ by one bit, so the smooth jump can be achieved by a single point mutation (in the right spot). For example a 3-bit gray code would be:

0: 000
1: 001
2: 011
3: 010
4: 110
5: 111
6: 101
7: 100

This is an example of how we encode our prior knowledge of the problem space into the design of the algorithm (and could be argued, I guess, as a way of smuggling in CSI that is distinct from just the design of the fitness function).

It should also be pointed out that while the coupling problem for the encoding of numeric parameters is amenable to solution, clearly if there is coupling between the problem variables for the problem being solved, then there clearly isn't going to be an easy solution, unless the problem can be recast into an uncoupled form (e.g. in diagonalising a matrix).

[ 27 February 2002: Message edited by: Iain Strachan ]


IP: Logged
Erik
Member
Member # 160

Icon 1 posted 01. March 2002 19:09      Profile for Erik   Email Erik   Send New Private Message       Edit/Delete Post 
quote:
William A. Dembski: Iain's intuitions are exactly my own. I would like to encourage critics of ID who think that GAs can indeed generate specified complexity to weigh in on this thread.

Specified complexity relative to which probability distribution, Dr. Dembski? I am a critic of ID, but I do not know if I should also count myself among the "critics of ID who think that GAs can indeed generate specified complexity". AFAIK, an event that is specifed in the formal sense of "The Design Inference" has by definition the specified complexity -log2(p), where p is the probability of the event. But which probability distribution should I use to determine the probability? Always a uniform probability distribution? The probability that the GA will produce the event? Please clarify this so that I can categorize myself properly.

Erik


IP: Logged
William A. Dembski
Member
Member # 7

Icon 1 posted 01. March 2002 21:09      Profile for William A. Dembski   Email William A. Dembski   Send New Private Message       Edit/Delete Post 
Eric, specified complexity is not a term original with me -- it's been around now almost 30 years. Paul Davies will write that the Darwinian mechanism is able to generate specified complexity. Nonetheless, he is never very clear what exactly he means by specified complexity.

When I use it in the context of GAs, I usually intend the complexity to correspond to the probability of the target induced by the natural topology on the search space in question. Typically this is a uniform probability, but it needn't be. Placing a fitness function over the search space now induces a new probability distribution for the target relative to a GA, which can be translated into a waiting time for how long (how many candidates) on average need to be checked by the GA before landing in the target.

With respect to the natural probability measure (usually a uniform probability) on the space, the target represents an instance of specified complexity. With respect to the new probability induced by the fitness function, however, it does not. The point of natural mechanisms is not to generate specified complexity but to dissolve the very notion (cf. Dawkins's metaphor of climbing Mount Improbable -- Mount Improbable really isn't all that improbable to climb once one figures out a gradual Darwinian route). My point in invoking NFL is that the search for the target in the original search space gets displaced by a GA to searching the space of fitness functions over that space, thus intensifying the search problem rather than resolving it.


IP: Logged
Erik
Member
Member # 160

Icon 1 posted 02. March 2002 14:07      Profile for Erik   Email Erik   Send New Private Message       Edit/Delete Post 
quote:
William A. Dembski: When I use it in the context of GAs, I usually intend the complexity to correspond to the probability of the target induced by the natural topology on the search space in question. Typically this is a uniform probability, but it needn't be. Placing a fitness function over the search space now induces a new probability distribution for the target relative to a GA, which can be translated into a waiting time for how long (how many candidates) on average need to be checked by the GA before landing in the target.

Thank you for your answer, Dr. Dembski. Now that we have narrowed down the probability distributions to those having something to do with the particular genetic algorithm in question, let's be even more specific:

Let X be the search space, Y the set of possible fitness values and f:X-->Y the fitness function. In what follows I will write "GA" to mean a specific, but arbitrary, genetic algorithm. Now, suppose that a genetic algorithm on X finds a specified point x at the m:th step. Let x* be the set of all points that fit the same specification as x. Given that x is specified, how do I calculate its specified complexity? I can imagine several ways, but the two most obvious involve these probabilities:

P(x | GA, f, m) = "the probability that x is the m:th point visited by GA"
P(x* | GA, f, m) = "the probability that the m:th point visited by GA is in the set x*"

and the specified complexity D(x) would be given by either

(1) D(x) = -log2(P(x | GA, f, m))

or

(2) D(x) = -log2(P(x* | GA, f, m))

My question is: Which of (1) or (2), if any, is the correct way to calculate the specified complexity generated by GA? And if I got both my guesses wrong, then please state the proper mathematical definition. Without precise mathematical directives for calculating the probability that is to be logarithmically rescaled into specified complexity, it is impossible to decide whether to think that GAs can generate specified complexity or not.

Erik


IP: Logged
William A. Dembski
Member
Member # 7

Icon 1 posted 02. March 2002 16:27      Profile for William A. Dembski   Email William A. Dembski   Send New Private Message       Edit/Delete Post 
Eric,

There's no place in my writings where one "calculates the specified complexity." Complexity as a logarithmic recalibration of probability does get calculated. As I stress throughout NFL, the relevant probability is that of the target and not that of the elementary event, so you would want to calculate a probability for x* rather than x (by the way, your notation was ill chosen: both the search space and the GA are denoted by capital X).

As for doing the actual calculation for determining the probability of hitting the target relative to the GA, that will depend on the particulars of the fitness function and the way the search algorithm makes use of the fitness function (in practice Monte Carlo simulations will be the way to go). Frankly, though, the burden is not on the design theorist to do that calculation. Nor is it of interest to the Darwinian if that probability of hitting the target is very small. The point of the GA is to hit the target with reasonable probability (in which case the specified complexity of the target dissolves).

My argument in NFL is that if that probability is reasonably high (as it should be for a useful GA), then the search on the original phase space was transferred to the space of fitness functions. There's no magic to evolutionary computation. This is being admitted even by fans of evolutionary computation like Geoffrey Miller:

"Genetic algorithms are rather robust search methods for [simple problems] and small design spaces. But for hard problems and very large design spaces, designing a good genetic algorithm is very, very difficult. All the expertise that human engineers would use in confronting a design problem -- their knowledge base, engineering principles, analysis tools, invention heuristics and common sense -- must be built into the genetic algorithm. Just as there is no general-purpose engineer, there as no general-purpose genetic algorithm."


IP: Logged
Jesse
Member
Member # 112

Icon 1 posted 02. March 2002 18:26      Profile for Jesse   Email Jesse   Send New Private Message       Edit/Delete Post 
If Dr. Dembski has time to respond, I have some questions on this issue as well:

William Dembski:
My argument in NFL is that if that probability is reasonably high (as it should be for a useful GA), then the search on the original phase space was transferred to the space of fitness functions.

Wouldn't this just push back the question of what the appropriate probability distribution is for calculating the amount of CSI in an event? What is the rationale for using a uniform probability distribution on all possible fitness landscapes?

The NFL theorem shows that if you use such a uniform distribution, a darwinian algorithm is not likely to do any better than any other algorithm, on average. However, the relevance of this to biology is questionable. A fitness function is just some way of assigning fitness values to the set of all possible genotypes; the vast majority of all possible fitness functions will thus be totally chaotic and random-looking, with no correlation between genetic similarity and fitness levels.

In the real world, though, the "fitness" of a genotype represents the probability that it will survive and reproduce in a given environment. We thus expect that, in general, similar genotypes will have similar fitness levels, leading to the kind of smooth fitness landscapes one usually sees in textbooks, with hills, valleys, and so forth. Some graphics that show the difference between "smooth" and "chaotic" landscapes can be found here:

http://astronomy.swin.edu.au/~pbourke/terrain/freqland/

Relative to the set of all possible fitness landscapes, such smooth landscapes are astronomically rare, but the mere fact that we live in a universe with regular laws of nature seems to insure that real-world fitness landscapes will have this character. And smooth fitness landscapes are just the ones that we'd expect Darwinian algorithms to perform much better than average on.

Is Dr. Dembski claiming that if the laws of nature give a reasonable probability that life will emerge and evolve without intelligent intervention, then that just demonstrates that the laws of nature are themselves designed? How should we calculate the "true" CSI of living organisms in this case--relative to the probability distribution induced by the laws of nature, relative to a uniform probability distribution on all possible configurations of matter, relative to a uniform probability distribution on all possible fitness landscapes, relative to a uniform probability distribution on all possible laws of physics (which might lead to a non-uniform distribution on the set of all fitness functions induced by these laws, depending on how we choose the phase space of different possible laws), or something else?

[ 02 March 2002: Message edited by: Jesse ]


IP: Logged
William A. Dembski
Member
Member # 7

Icon 1 posted 02. March 2002 22:36      Profile for William A. Dembski   Email William A. Dembski   Send New Private Message       Edit/Delete Post 
Jesse asks, "What is the rationale for using a uniform probability distribution on all possible fitness landscapes?" That's not really the relevant question. For the Darwinist to circumvent the force of NFL, s/he must provide some rationale why the "smooth" fitness functions do not need explanation. With respect to any sort of principle of indifference (uniform probabilities providing a concrete instance), "smooth" fitness functions are in the minority (Jesse agrees: "the vast majority of all possible fitness functions will thus be totally chaotic and random-looking, with no correlation between genetic similarity and fitness levels"). So the Darwinist needs some rationale for why they should be smooth, smoothness having the effect of washing away the problem of specified complexity in the original search space. Probabilistically, this means Jesse should be asking, "What is the rationale for using a probability distribution that renders smooth fitness landscapes not vastly improbable?"

So how is this justified? To justify it simply by presupposing Darwinism and noting that the Darwinian mechanism couldn't work without smoothness is of course not enough -- we can't argue in a circle; we need some independent justification. Jesse finds it here: "The mere fact that we live in a universe with regular laws of nature seems to insure that real-world fitness landscapes will have this character." That's highly disputable. To be sure, Darwinists often claim that, but what's needed to make this claim work is to show that "regular laws" (whatever these might be) enforce some sort of continuity requirement (however that may be defined) between the topology of the search space and gradations of fitness. But I would claim that's falsified empirically (Michael Behe's irreducibly complex systems providing a case in point). Moreover, in many other contexts a continuous coordination between search space and fitness does not obtain (cf. the case of technological evolution and the case of written language).

But let's grant the Darwinist that "regular laws" somehow have the effect of continuously coordinating search space and fitness. Those regular laws then have the effect of concentrating probability over the space of fitness functions on the smooth fitness functions (unlike the uniform probability over the space of fitness functions, which does not privilege the smooth fitness functions). But what law ensures that this probability on the space of fitness functions will privilege the smooth fitness functions. Is there a probability on the space of such probabilities that does the trick? Probabilists deal with probability measures on spaces of probability measures all the time, so there's no mathematical incoherence here. But there is a regress. Jesse wants me to specify where the relevant uniform probability distribution falls. But again, that's not my job. The burden is on the Darwinist to show that a nonuniform probability distribution that makes Darwinism work falls on a relevant space. The problem is that any such nonuniform probability distrbution will be arbitrary, designed to make Darwinism work.

The Darwinist's job is not to establish specified complexity but to dissolve it. And dissolving it always requires some arbitrary assumption.


IP: Logged
Micah Sparacio
Member
Member # 6

Icon 1 posted 03. March 2002 11:17      Profile for Micah Sparacio   Email Micah Sparacio   Send New Private Message       Edit/Delete Post 
Iain, great post.

I was wondering if you could comment on the following regarding GAs.

Say we have a population of bit strings. Each bit string is a certain length, say 100 bits. Ok, now assume that our target is simply the bit string of all 1's. Let's grant the smooth fitness function which assigns fitness values based on the amount of 1's. Given this fitness function, we would arrive at the target in very little time. OK. Say we now decide to "couple" or coordinate portions of these bit strings. So, in order for any increase in fitness value to occur groups of bits in the bit string must all be 1's. For example, say positions 50-60 in our bit string were coupled together. In order for the fitness value of the coupling to be assigned (fitness value of 10 1's) the ga must come upon all 10 1's by random. The fitness function is not going to help overcome this coupling, because until all 10 bits are 1's, the fitness value assigned is 0.

In any case, I was wondering what degree of coupling (in this case it was 10 but we could go higher or lower), if any, do you think would prove to be unmanageable by a GA? Also, what effect do you think the following features/variables of the GA would have on the system:

1. Crossover
2. Rate of Mutation
3. Population Size


IP: Logged
Jesse
Member
Member # 112

Icon 1 posted 03. March 2002 12:19      Profile for Jesse   Email Jesse   Send New Private Message       Edit/Delete Post 
Thank you for the reply Dr. Dembski. You said:

Jesse asks, "What is the rationale for using a uniform probability distribution on all possible fitness landscapes?" That's not really the relevant question. For the Darwinist to circumvent the force of NFL, s/he must provide some rationale why the "smooth" fitness functions do not need explanation. With respect to any sort of principle of indifference (uniform probabilities providing a concrete instance), "smooth" fitness functions are in the minority (Jesse agrees: "the vast majority of all possible fitness functions will thus be totally chaotic and random-looking, with no correlation between genetic similarity and fitness levels"). So the Darwinist needs some rationale for why they should be smooth, smoothness having the effect of washing away the problem of specified complexity in the original search space.

What do you mean by "original search space"? Are you just talking about the phase space itself, with no probability distribution defined on it at all, or are you talking about a uniform probability distribution on this phase space? Since the amount of CSI is, as far as I know, always to be calculated by finding the probability of getting an event matching the chosen specification, it seems you always need some sort of probability distribution to talk about specified complexity, so my guess would be that the "original search space" must refer to a uniform probability distribution defined on the phase space. But nowhere in NFL did I see any attempt to argue that the uniform probability distribution should be priveleged over other possible distributions, or that it should be a sort of "default probability distribution" when we lack more specific knowledge of the probabilities. You refer to a "principle of indifference," is this the basis for your argument?

Probabilistically, this means Jesse should be asking, "What is the rationale for using a probability distribution that renders smooth fitness landscapes not vastly improbable?"

So how is this justified? To justify it simply by presupposing Darwinism and noting that the Darwinian mechanism couldn't work without smoothness is of course not enough -- we can't argue in a circle; we need some independent justification.

Yes, I agree, although I don’t think there are any Darwinists who actually argue this way.

Jesse finds it here: "The mere fact that we live in a universe with regular laws of nature seems to insure that real-world fitness landscapes will have this character." That's highly disputable. To be sure, Darwinists often claim that, but what's needed to make this claim work is to show that "regular laws" (whatever these might be) enforce some sort of continuity requirement (however that may be defined) between the topology of the search space and gradations of fitness. But I would claim that's falsified empirically (Michael Behe's irreducibly complex systems providing a case in point). Moreover, in many other contexts a continuous coordination between search space and fitness does not obtain (cf. the case of technological evolution and the case of written language).

Perhaps I should clarify what I mean by "smooth fitness landscape." I don’t mean to imply that there can be no jagged edges, no sharp changes in fitness from one point to another (in the neighborhood of an IC system, perhaps); I just mean that for most genotypes, nearby genotypes will tend to have more similar fitness levels than genotypes randomly selected from the entire phase space. Take any organism, change a few random nucleotides here and there, and what would you expect to see? I think it’s clear that in most cases, such changes will not lead to a radically different phenotype, and thus in similar environments both organisms would be expected to have similar probabilities of surviving and reproducing. This would probably be the case regardless of what genetic code life used, or what molecules it was based on (since the regular laws of nature insure that changing a single atom in a complex molecule will not tend to radically alter the way this molecule interacts with other molecules).

I’m not sure exactly what the analogue of the phase space and fitness landscape would be for technological evolution or language, but it seems to me that a fair amount of smoothness would be found in these cases as well—for example, changing a single letter in a sentence will not typically lead to a radical change in meaning, and most readers will be able to understand what was meant just fine.

But let's grant the Darwinist that "regular laws" somehow have the effect of continuously coordinating search space and fitness. Those regular laws then have the effect of concentrating probability over the space of fitness functions on the smooth fitness functions (unlike the uniform probability over the space of fitness functions, which does not privilege the smooth fitness functions). But what law ensures that this probability on the space of fitness functions will privilege the smooth fitness functions. Is there a probability on the space of such probabilities that does the trick? Probabilists deal with probability measures on spaces of probability measures all the time, so there's no mathematical incoherence here. But there is a regress. Jesse wants me to specify where the relevant uniform probability distribution falls. But again, that's not my job. The burden is on the Darwinist to show that a nonuniform probability distribution that makes Darwinism work falls on a relevant space. The problem is that any such nonuniform probability distrbution will be arbitrary, designed to make Darwinism work.

To be clear on what we’re talking about here, are you imagining a scenario where the "regular laws" ultimately reduce to the basic laws of physics themselves? As a thought-experiment, imagine we have access to the PC Of The Gods, which can run the program "sim cosmos" which solves the Schrodinger equation for the entire universe, and thus if you plug in any set of initial conditions at time t0 it can give you a precise probability distribution on the set of possible states at some later time t1. So, we plug in a dense particle "soup" similar to what we think the universe was like shortly after the Big Bang, and then calculate the probability distribution on possible states 12 billion years later. Suppose we discover that the formation of stars and planets, the occurrence of abiogenesis on suitable planets, and the evolution from simple self-replicators to complicated multicellular organisms is reasonably likely to happen at least once in a volume the size of the observable universe; it may not occur with probability 1, but let’s imagine the probability is at least much greater than 1 in 10^150.

What would this lead you to conclude about the CSI inherent in multicellular organisms? It seems that, relative to the chance hypothesis "unguided laws of physics operating for 12 billion years", multicellular organisms would not really contain much CSI in this scenario. In fact, presuming that intelligent beings were reasonably likely to arise as well, this would imply that the sort of things we ascribe to intelligence—art, novels, technology—also do not exhibit much CSI relative to this chance hypothesis. This shows why scientists who believe in the sufficiency of naturalistic explanations for all the various interesting phenomena we see in our universe will tend not to be very interested in the concept of "specified complexity," if they fully understand it—if their intuitions are correct,then specified complexity (CSI over 500 bits) probably does not exist at all, anywhere in the entire universe! Of course one could still talk about specified complexity relative to more narrow chance hypotheses, as in the Caputo example, but relative to the "ultimate chance hypothesis" taking into account all the relevant laws of nature, specified complexity may not exist in the first place.

For this reason, when Paul Davies or Manfred Eigen talk about "specified complexity" it seems fairly certain that they are not using the words in the same sense as you are—by "complexity" they probably means something closer to the ordinary definition of the word, something along the lines of "systems composed of a large number of differentiated parts," rather than your definition of complexity as the logarithm of probability. It wouldn’t make sense to look for a "law" to explain the origin of CSI, since by definition any event that has a reasonable probability of occurring due to the laws of nature does not exhibit much CSI relative to those laws.

What you seem to be saying in the paragraph above is that even if life could be shown to have low CSI relative to the laws of nature, that would just push the problem back to where the laws of nature themselves come from, why we have laws that give life a high probability as opposed to a low probability. You argue that it’s "not your job" to say what the meta-probaility-distribution on different probability distributions induced by different laws of physics should look like; instead, "the burden is on the Darwinist to show that a nonuniform probability distribution that makes Darwinism work falls on a relevant space. The problem is that any such nonuniform probability distribution will be arbitrary, designed to make Darwinism work." But it seems rather odd to me to say that the burden is on "Darwinists" to explain the origin of the laws of physics! I don’t think it’s "arbitrary" to just take the laws of nature as a given, since this is what is done in every other field of science besides biology as well.

By the way, why should specified events in biology be treated differently than specified events in other fields? Why doesn’t the regress you describe apply in these fields as well? For example, relative to a uniform probability distribution on all possible arrangements of water molecules in a small volume, the probability of finding them arranged in a nice symmetrical snowflake-shape is surely quite tiny, probably less than 1 in 10^150. Of course, we know about laws governing the interaction of water molecules that make the formation of snowflakes fairly probable. Would you be unsatisfied with this answer, and ask that scientists explain the origin of the probability distribution induced by the laws of nature by referring to some meta-distribution on all possible laws of physics? This question could be asked of any phenomena that matches any specification whatsoever—the spherical shape of stars and planets, the spiral shape of the galaxy, the regular interference patterns observed in quantum mechanics, and so forth. If explaining such specified phenomena with reference to the laws of nature is indeed unsatisfactory, then it seems the category of "regularity" will have to be eliminated from the explanatory filter altogether.

The Darwinist's job is not to establish specified complexity but to dissolve it. And dissolving it always requires some arbitrary assumption.

Only if it is "arbitrary" to refer to the laws of nature as the ultimate chance hypothesis. In any case, deciding that specified complexity exists at all would seem also to require some sort of arbitrary assumption, like the assumption of a uniform probability distribution on the set of all possible probability distributions, or all possible laws of physics, or something similar. Why should it be the job of Darwinists to "dissolve it" if you have not provided any clear reason to think that "it" exists in the first place?

[ 03 March 2002: Message edited by: Jesse ]


IP: Logged
Erik
Member
Member # 160

Icon 1 posted 03. March 2002 18:35      Profile for Erik   Email Erik   Send New Private Message       Edit/Delete Post 
Iain Strachan, just for fun implemented something similar to your simulation. The description of your simulation lacks many details (mutation rate, population size, range of lambda, etc.), so I made no attempt to replicate it, but simply implemented a little toy simulation. Instead of using a full GA with an evolving population, I used what Stuart Kauffman would call an adaptive walker. An adaptive walk works like this: (i) Generate an initial point x. (ii) Generate a new point x', by taking a small step starting from x in a random direction. (iii) If x' is "better" (in some sense), then update x = x', else do nothing. (iv) Go back to step (ii). An adaptive walker, then, is of course something performing an adaptive walk (see, e.g., Wilke & Martinetz). There are a number of advantages with this approach, most of them pragmatic. It is trivial to implement, for one thing, and it's easy to communicate in limited fora such as ISCID Brainstorming, for another. It also captures the concept of natural selection quite well (but not genetic drift, recombination, etc.). To be sure, there are also disadvantages. For instance, an adaptive walker is more likely to get stuck in a local optimum and is therefore less mobile. A population undergoing Darwinian evolution can only accurately be approximated by an adaptive walker when the spread of the population in the genome space is very small. It is intuitively obvious that a low mutation rate is a necessary requirement for this and Campos et al have shown that to be the case for NK landscapes. Nonetheless, your model is so idealized that it can only be regarded as either a mere pedagogical tool (as Dawkins intended for his Weasel GA) or a toy model. Therefore I don't think it matters whether this additional idealization is introduced.

My implementation of the adaptive walker requires only 14 lines of MATLAB code:

code:
function num_steps = adaptive_walker(N, lambda, mutsiz, cost_thres)

P = rand(1,N); % this is the location of the global minimum
X = rand(1, N);
cost = sqrt((X - P) * (X - P)') + lambda * abs(sum(X - P));
num_steps = 1;

while cost > cost_thres & num_steps < 50000 % you may want to change the cut-off
% generate a new "test point"
Xtest = X + (rand(1, N) - 0.5) * mutsiz;
test_cost = sqrt((Xtest - P) * (Xtest - P)') + lambda * abs(sum(Xtest - P));
% replace the old point if the test point is better
if cost > test_cost
X = Xtest;
cost = test_cost;
end

num_steps = num_steps + 1;
end



Inputs are the number of components of the vector representing the genome (N), your lambda parameter, a parameter that determines the size of the mutations (mutsiz) and a threshold value that determines how close to the minimum the adaptive walker must get (cost_thres). The output is the number of steps the adaptive walker needed to get sufficiently close (i.e. within the threshold value) to the minimum. In order to verify the exponential scaling of the number of steps with lambda, I ran simulations for all integer lambda from 0 to 100. The exact script is as follows:
code:
% steps plotted against lambda
tic

N = 10;
lambda = (0:150);
mutsiz = 0.05;
cost_thres = 0.1;

steps = zeros(1,length(lambda));
for k = 1:length(lambda)
steps(k) = adaptive_walker(N, lambda(k), mutsiz, cost_thres);
end

plot(lambda, steps, '+');
xlabel('lambda');
ylabel('number of steps');

toc



Despite that lambda = 3000 is ridiculously high I could see no significant indication of exponential scaling. The data are scattered into something that looks like a linear cone. I have tried a modest number of other values too, e.g.

N = 30; lambda = [0 10 30 100 300 1000 3000]; mutsiz = 0.025; cost_thres = 0.2; (you'll need a very high cut-off if you're going try these inputs--I used 10^7)

The corresponding data falls on a straight line (and the lin-log plot looks like a logarithm). This does not prove that you are incorrect about the exponential scaling for your simulation, but I think it does prove that your discovery is much more restricted than you thought--an artifact of the particular form of your toy model.

This brings me to my next concern: Are the predictions from your model robust against likely deviations between your idealizations and the real world? What known or hypothesized phenomenon are you trying to model, anyway?

Erik

References

Campos P., Adami C., Wilke C. (2002) "Optimal adaptive performance and delocalization in NK fitness landscapes", Physica A, 304 : 495-506

Wilke C., Martinetz T. (1999) "Adaptive walks on time-dependent fitness landscapes", Physical Review E, 60 : 2154-2159


[Edited out the point about neutral mutations.]

[ 03 March 2002: Message edited by: Erik ]


IP: Logged
Tom Stalnaker
Member
Member # 114

Icon 1 posted 03. March 2002 22:29      Profile for Tom Stalnaker   Email Tom Stalnaker   Send New Private Message       Edit/Delete Post 
If I could just comment on the discussion between Jesse and Dr. Dembski: It seems to me that ID theorizing becomes considerably less interesting if it is forced to push the origin of CSI back to the laws of physics themselves. If Darwinists could show that given the laws of physics as they are, the probability distribution of fitness functions over genetic space favors smooth ones, I think they've done a good day's work. When Jesse says "... it seems rather odd to me to say that the burden is on "Darwinists" to explain the origin of the laws of physics!" I think he is right on target.

Why don't ID'ers make a stand at a point later than the big bang, and acknowledge that the really interesting part of ID comes when it proposes that fitness functions, in the world as we know it, do not tend to be smooth enough to allow genetic algorithms to work. This has the added advantage that it renders ID falsifiable, which is good for everyone involved. If Darwinists can show how apparently IC structures are really not IC, they will have taken steps towards doing so. If ID people won't make this stand, then it begins to seem as though ID'ers are saying that by definition life is improbable, and since everyone agrees life is specified, life is then by definition designed. Well, sure, I guess so, but if what you mean by life being designed is that the laws of physics were designed, you certainly aren't saying anything of any interest to biologists, evolutionists, or most creationists for that matter.


IP: Logged
Iain Strachan
Member
Member # 96

Icon 1 posted 04. March 2002 08:19      Profile for Iain Strachan     Send New Private Message       Edit/Delete Post 
quote:
Originally posted by Erik:
[QB]Iain Strachan, just for fun implemented something similar to your simulation.
... snip description of adaptive walker ...
The corresponding data falls on a straight line (and the lin-log plot looks like a logarithm). This does not prove that you are incorrect about the exponential scaling for your simulation, but I think it does prove that your discovery is much more restricted than you thought--an artifact of the particular form of your toy model.

Erik,

Thanks for this post, which has helped to clarify my thoughts on this
issue. In the spirit of "Brainstorms", I don't claim that my little
demo is complete, and one of the reasons for posting it is to get some
ideas for a scalable complexity measure.

Firstly, I believe I can explain your linear scaling from your Matlab
demo (which I have taken and run myself and confirmed your results).
The reason you see a linear scaling is that you have taken lambda
outside the region where increasing it represents an increase in the
coupling of the variables. My results were obtained for much smaller
values of lambda (0-0.3), (I also used the square of the Euclidean
distance). What I was attempting to simulate is the gradual
introduction of a "ravine" into the error surface. But it is clear that
if we scale lambda to values in excess of unity, that the Euclidean-
squared term becomes negligible, and the problem becomes a minimization
of the second term. I therefore expect a linear scaling, because you
are trying to achieve a constant error threshold (0.1 in your Matlab
code), with a function that scales linearly in magnitude with lambda).
I would like to develop a better example of scaled complexity (e.g. one
could add a second penalty term requiring the sum of the squares of the
letter values in the Weasel string to be near a constant), and so forth.

However, your "adaptive walker" example serves admirably to illustrate
the point I am trying to make about exponential scaling with degree of
coupling. Let us suppose we have a search space in N dimensions. Let
us further suppose we introduce coupling between the variables that is
in the form of M constraints (each with some scaled penalty according to
the deviation from the constraint).

Then suppose we use the adaptive walker and guess a random direction of
search. What is the probability that we choose a direction that
improves the fitness? What our M constraints do is to reduce the
region of feasible solutions to an N-M dimensional manifold in N
dimensional search space. Therefore I would expect the probability of a
random guess in N dimensions to be scaled according to the ratio of the
surface area of an N-dimensional hypersphere to that of an (N-M)-
dimensional hypersphere. In other words, like K^{-M}. Thus the
expected wait between improvements scales exponentially with M.

Some years back, as a benchmark for looking at stochastic methods for
training neural networks, I implemented a version of "adaptive walker"
for training a Multi-Layer Perceptron neural network. The network was
to learn a mapping from 2-d to 2-d space, and had an intermediate
processing layer of 6 units. The number of adjustable parameters in the
model was 34. My version of "adaptive walker" was to choose a search
direction at random (by generating 34 random numbers from a Gaussian
distribution, to make it spherically symmetric), and then to perform a
deterministic line search along the chosen direction to find the minimum
in that direction. It was found that this algorithm failed to make the
slightest progress in solving the problem (whereas gradient based
search methods work just fine). The conclusion has to be that the
neural network error surface is characterised by steep ravines that
place many constraints on the allowed search direction. However, the
analysis of a neural network error surface is complex and hard to
visualize - I am seeking a simpler example than this to illustrate the
point.

Incidentally, random methods can be used effectively in training
Neural networks, when used in conjunction with an analytic formulation
of the gradient calculation - what I am questioning is the efficiency of
GA's using Darwinian evolutionary principles, as a method of "blind" gradient
search in high dimensional phase space.


IP: Logged
Iain Strachan
Member
Member # 96

Icon 1 posted 04. March 2002 08:48      Profile for Iain Strachan     Send New Private Message       Edit/Delete Post 
quote:
Originally posted by Micah Sparacio:
Iain, great post.

I was wondering if you could comment on the following regarding GAs.

Say we have a population of bit strings. Each bit string is a certain length, say 100 bits. Ok, now assume that our target is simply the bit string of all 1's. Let's grant the smooth fitness function which assigns fitness values based on the amount of 1's. Given this fitness function, we would arrive at the target in very little time. OK. Say we now decide to "couple" or coordinate portions of these bit strings. So, in order for any increase in fitness value to occur groups of bits in the bit string must all be 1's. For example, say positions 50-60 in our bit string were coupled together. In order for the fitness value of the coupling to be assigned (fitness value of 10 1's) the ga must come upon all 10 1's by random. The fitness function is not going to help overcome this coupling, because until all 10 bits are 1's, the fitness value assigned is 0.

In any case, I was wondering what degree of coupling (in this case it was 10 but we could go higher or lower), if any, do you think would prove to be unmanageable by a GA? Also, what effect do you think the following features/variables of the GA would have on the system:

1. Crossover
2. Rate of Mutation
3. Population Size


I think this is a very good example of the coupling problem, and probably better than my original one, as it's simpler. What you are proposing is very similar to the "Hamming Cliff" that I mentioned in another post here. You have to get all 10 bits before it improves; before that, you have "neutral" mutations. If instead we have, say a binary encoding for a smooth function (say sum of squares of the numbers the 10-bit genes using binary decoding), then you must get all 10 to move from 511 to 512. In this case, however, you might get beneficial mutations (e.g. taking you to 513), but also harmful mutations. But the roadblock will occur at every point where several bits change (like having 2 or more digits flicking over on the odometer). This problem is well-known in Genetic Algorithms, which is why the "Gray code" is introduced for smooth functions (see another post of mine on this thread).

In terms of how many "coupled bits" to put in, I guess the answer is to start with 2 and increase, doing multiple runs at each time, say 10, and record the average number of function evaluations. However, if you have a mutation rate of, say 0.01, then you will quickly get to prohibitively long run times at around N=3 or 4, I guess.

As far as population size is concerned, that is a big question. I have found in general that a larger population does not improve the speed of convergence (even if we had in principle a parallel computer cabable of evaluating all the individuals at the same time). I believe the reason for this is that in a large population, an individual that is slightly fitter than the others only gains a relatively small selective advantage. e.g. if one is 10% better than the others, if there are 99 others, then its advantage is pretty small. If, on the other hand, there are only 9 others, then its selective advantage is greater. This motivates the idea of sub-population based Genetic Algorithms. Here, instead of having, say, one population of 100 individuals, you have 10 populations of 10 individuals. These are allowed to evolve separately from each other for a fixed number of generations, and then a small number of individuals are swapped between populations, and the process is repeated. The results obtained in my original post were obtained using a sub-population algorithm, and it produced a huge speed up over the conventional algorithm (i.e. it made the simulation possible in a reasonable amount of time).

Crossover is a complicated issue, and not one I understand too well. It seems clear that crossover is beneficial, in allowing a more efficient exploration of the search space. However, check out the following paper:


Removing the Genetics from the Standard Genetic Algorithm

which calls into question the efficacy of the Crossover operator, on a problem tailor made for the crossover to work well.

Concerning rate of mutation, there is a trade-off. If the mutation rate is too high, then the problem will not solve, because as you close in on the solution you will get to the stage where there are many more harmful than beneficial mutations. This means that you have to get several coincidental beneficial mutations to succeed. If, however, you make the mutation rate too small, and you need several coincidental mutations to flip over your 10 bits, again you are in for a long wait. You might get lucky and find that the neutral mutations build up, as the coupled bits flip over one by one. However, while this happens, it is just as likely that a harmful mutation occurs somewhere else in the string. There is no real way to determine the optimum mutation rate for any given problem other than an exhaustive search (IMO).

[ 04 March 2002: Message edited by: Iain Strachan ]


IP: Logged


All times are East Coast
This topic is comprised of pages:  1  2 
 
Post New Topic  Post A Reply Close Topic    Move Topic    Delete Topic    Top Topic next oldest topic   next newest topic
 - Printer-friendly view of this topic
Hop To:

Contact Us | ISCID

All content © ISCID and content contributor 2001-2003

The ISCID Forums are aimed at generating insight into the nature of complex systems (e.g. biological complexity, organizational complexity, etc.) and the ontological status of purpose, especially from the vantage point of various information- and design-theoretic models.

Indexed by UBB Spider Hack  |  Powered by Infopop Corporation UBB.classicTM 6.3.1.1

PCID | Encyclopedia | Brainstorms | The Archive | News | Essay Contests | Chat Events | Membership