|
Author
|
Topic: I.G.D. Strachan: An Evaluation of "Ev"
|
John Bracht
Member
Member # 5
|
posted 20. July 2003 11:06
Pim,
The Schneider program explicitly compares each organism with the idealized target when it calculates whether each binding site is "above threshold" or not. It has to know precisely where the binding sites are, and whther they "ought" to be above or below the threshold (and it has to know precisely where in the genome the threshold value is encoded, etc). Based on the number of sitelocations above threshold, the program calculates a fitness value for each organism. Just as the VETE program has an explicit target in "the answer is fourty-two", the Schneider has an explicit target in the boolean array specifying the binding site locations (which is set up in the sub-function "creation"). Of course, the program also has to "know" that the binding site locations should all be "above threshold" (which is also fixed into the encoding of the program and does not co-evolve).
The fact that the sitelocations change from run to run is absolutely irrelevant--they are fixed within a run, and NEVER co-evolve with the rest of the organism's genome.
How is this NOT a fixed target in precisely the VETE sense?
John
IP: Logged
|
|
Pim van Meurs
Member
Member # 541
|
posted 20. July 2003 15:07
John asks some interesting questions
quote: Just as the VETE program has an explicit target in "the answer is fourty-two", the Schneider has an explicit target in the boolean array specifying the binding site locations (which is set up in the sub-function "creation").
An imho begging the question similarity, how is the location of the binding sites going to help increase the information in the genome? Mutation and selection it seems. Is the information pre-loaded? Nope, the information is contingent on mutation and selection it seems. Is the outcome fixed in runs with same initial conditions? Again the answer is no.
quote:
The fact that the sitelocations change from run to run is absolutely irrelevant--they are fixed within a run, and NEVER co-evolve with the rest of the organism's genome.
It seems to me that John is fighting a red herring here. I don't not disagree with John that sitelocations are randomly chosen at the beginning of the run but I disagree with John that this has any relevance to the findings that the information increases in the genome and that this increase requires both mutation and selection and that the recognizer/binding sites coevolve to an unpredictable combination.
Thus when John claims "The Schneider program explicitly compares each organism with the idealized target", I would ask John to take any run and explain to us what this idealized target is? The sitelocations? Is identifying the sitelocations the source of the preloaded information? How can that be if the locations are randomly chosen (no intelligent design here) and if the final outcome is unpredictable that is contingent? Most importantly if there was some preloading of information, how come that the information content of the genome is still close to zero at the beginning and increases during the run?
quote:
How is this NOT a fixed target in precisely the VETE sense?
I am somewhat confused where this argument is going? The argument is that given the randomly chosen sitelocations, information increases in a contingent manner. Can Vete do this? Or is the outcome "the answer is forty two" contingent?
Micah : Two points. First, many of its properties are predefined and predictable, and you've agreed with this in the past.
Sure, any simulation has parameters which can be set and changed. The question really is what the relevance of these predefined parameters is on the simulation. Arguing that the binding sites are randomly fixed as the reason for the increase in information in these locations requires some logical reason why the increase of this information is argued to be preloaded.
I am not convinced in other words that the similarity between Vete and the _location_ of the binding sites is relevant to the issue of the increase of information at these binding sites. [ 20. July 2003, 18:32: Message edited by: Pim van Meurs ]
IP: Logged
|
|
Erik
Member
Member # 160
|
posted 21. July 2003 12:00
RBH, the point of a simulation is not to instantiate a particular hypothesis about the ontology of a phenomenon, but rather to explore the predictions of a complicated model. When we can't explore the consequences of a model analytically, we use simulations to find out.
When we discuss specific simulations it is important to understand what it is they simulate. You see significance in the way the fitness evaluation is done only because you have not adequately distingished between simulating the life of an individual and simulating the evolution of a population. Sometimes a simulation of the evolution of a population is achieved as a by-product of a simulation of lots of individuals' life (e.g. AVIDA), and sometimes one simplifies the simulation by using an idealized black-box model of individual lifes (e.g. WEASEL and VETE) or by using statistically aggregated data (e.g. computer-aided solving of difference and differential equations in population genetics). Schneider's Ev is an intermediate case, since it simulates one part of individuals' lifes (binding-site recognition), but black-boxes all the other things that individuals do. quote: RBH: That "All fitness functions can be written as sums of distances to fixed points" does not imply that the various different operations by which a simulation might evaluate the fitness of a genotype are all equivalent in the eyes of theory.
The operations are indeed equivalent in the eyes of a theory which only describes the genotype frequency dynamics in a population. They are not equivalent in the eyes of a theory that describes how individuals live, develop, and interact with an environment. quote: RBH: 4. Evaluating fitness as the count of the individual bits (1s) in a genotype (on the assumption that there is a theory-based reason for more 1s being reproductively advantageous) is not the same evaluation operation (in a simulation) as calculating 100 minus the Hamming distance to the sequence 11 ... 1. In the first case the simulation is using only local information, the number of bits in the genotype together with a counting operation and comparisons of the counts for existing genotypes to determine relative fitness in the population. In the second case the simulation is using information about some distant state (11...1) -- the simulation has and uses information about the distance to the remote state 11...1.
In fact, the second case simulates a teleological evolutionary system while the first simulates a blind evolutionary system.
Suppose I want to simulate a horizontal spring attached to a wall in one end and an otherwise free mass in the other end, as shown in the picture below.  To do this, I use Hooke's law to write the force exerted by the spring on the mass in terms of the (signed) distance to the equilibrium position. That is,
F = -k(x - x0),
where x is the position of the mass, x0 is the equilibrium position, and k is a constant that characterizes the spring (Hooke's law black-boxes all the interactions between individual atoms in the spring that collectively generate the force towards the equilibrium position). Newton's law then gives the differential equation
m x" = -k(x - x0)
and my simulation consists of a program that solves this equation numerically using, say, Euler's method. My use of the (signed) distance to the equilibrium position to determine the force is analogous to the use of the Hamming distance for fitness determination.
I claim (for the sake of the discussion) that the simulation is an accurate simulation of the dynamics of the position of the mass. Does my simulation make use of "non-local information"? Is it teleological?
I add (for the sake of the discussion) to my above claim the claim that the simulation is an accurate simulation of the microscopic interactions inside the spring. Does my simulation, viewed with this extra claim in mind, make use of "non-local information"? Is it teleological?
Erik
IP: Logged
|
|
RBH
Member
Member # 380
|
posted 21. July 2003 16:10
Erik,
quote: RBH, the point of a simulation is not to instantiate a particular hypothesis about the ontology of a phenomenon, but rather to explore the predictions of a complicated model. When we can't explore the consequences of a model analytically, we use simulations to find out.
Sure.
You wrote quote: You see significance in the way the fitness evaluation is done only because you have not adequately distingished between simulating the life of an individual and simulating the evolution of a population.
On the contrary, it is because I distinguish between them that I see significance in the way the fitness evaluations are performed. The questions I am interested in asking are about the sorts of population dynamics that emerge from behaviors and local interactions of digital organisms in Avida (and other ALife simulations). The questions are of the form 'Given these local processes, what kinds of population dynamics emerge from the concatenation of individuals?' and 'What kind of local processes operating on individuals give rise to the population dynamics we observe in the world?' That is, the questions have to do with the kinds of population dynamical models that are induced by local processes operating at the level of individuals.
Given that, I have no reservations about focusing on local processes and local interactions as being of primary interest. The population dynamics constitute the dependent variable, not the model. In fact, one cannot do this kind of research with a purely populational model; that's the outcome, not the input to the research. You summarized it well: quote: The operations are indeed equivalent in the eyes of a theory which only describes the genotype frequency dynamics in a population. They are not equivalent in the eyes of a theory that describes how individuals live, develop, and interact with an environment.
One need add only "...or in the eyes of a theory of the generation of population dynamics by aggregations of individuals living, developing, and interacting with an environment."
RBH
IP: Logged
|
|
John Bracht
Member
Member # 5
|
posted 03. August 2003 03:06
Sorry for my absence from this thread; somehow grad student life has a way of sucking up all your free time!
Pim is trying to squirm out of a tight spot as I try to pin him down on one specific question: how are the fixed sitelocations in the Ev simulation NOT a fixed target, precisely like the one utilized by VETE?
Pim brings up the point that mutation and selection is necessary to produce the information at the sitelocations. This is just a red herring, since nobody is arguing this point. It's true of the VETE program as well--mutation and selection take the information pre-given by the fixed target and outputs that information.
Pim has tried to change the subject but I won't let him. He made the assertion that the Ev program utilizes no fixed target, while VETE does. Please answer me, Pim--how is the Ev program's fixed sitelocations array NOT precisely the same sort of fixed target we find in VETE? Continued dodging of the issue will be taken by me and others reading this as evidence that you simply have no response.
Let me point out that the fact that the binding site locations change between runs is absolutely irrelevant. Within a run (ie, within a given simulation), the program has to know how to evaluate each position along the "genome". It's got to know which areas are supposed to be above the threshold value, and which are not. In turn, the number of binding sites that are above threshold is used to calculate a fitness value for each organism. Hence, the binding sites need to stay consistenly located in order for the program to gradually evolve up the slope given by the fitness function. Furthermore, the binding site locations constitutes a fixed target defining exactly what is going to evolve.
The only way the VETE program could better simulate Ev would be if it chose a random string of letters as a target sequence every time it ran, instead of using "the answer is forty-two" as a consistent target. This would make the example perhaps closer to the Ev program simply because the target would change from run to run and have no semantic value, but it would do absolutely nothing to alter the accuracy of the VETE simulation as a model of the Ev program. It simply DOES NOT MATTER what target you pick, as long as it's consistent within a run. It can vary from run to run (as with Ev) or it can always be fixed with "the answer is forty-two"--any string of characters is as good as any other, and so the model is completely general.
Furthermore--consider the sort of information that Ev evolves. Schneider says that the program evolves enough information to locate the binding sites within the genome. But the "sitelocations" array contains precisely this knowledge--the positions of each binding site in the genome!!! In other words, the fixed target contains all the information that ever gets output, and the program just utilizes variation and selection as its mechanism of getting that information output. As Dembski and others have noted, variation and selection can be good conduits of information, but they never generate information. By the way, the Ev program sometimes fails to generate the desired solution--Pim takes this as an example of contingency in the program--but this just shows that random variation and selection is often a rather POOR conduit of information and it can FAIL to transmit the information that was pre-loaded. It would be better for Schneider's program to just output the information in the "sitelocations" array directly--then it would have a 100% success rate, and the same information would get output! In fact, it takes quite a lot of intelligent design to get the inputted information into a form that can be exploited by a variation/selection mechanism and transmitted to the output.
Enough for now. Pim, I expect an answer to my question. Thanks.
John
IP: Logged
|
|
Pim van Meurs
Member
Member # 541
|
posted 03. August 2003 05:46
Dear John,
What you seem to consider to be 'squirming' is in fact nothing more than me elaborating on the equivocation between two different types of information: One, the information needed to define the sitelocations and two, the information increase in these sites.
I am not sure why John wants to distract from the simple observation that information in these sites increased and that there was no fixed target to the information generated at these sites. Instead John seems to be interested in focusing on a non sequitur by confusing between the information needed to select the site locations and the content of the site locations. Thus while we may know where these sites may arise, their is no fixed target to the information content at these sites.
There are many 'fixed targets' if we are to use John's 'logic' here, parameters in the program define a variety of concepts. But that is not the real issue here, unless we want to confuse by refering to different information concepts.
So the binding sites are a fixed target but the information increase shows clearly that Ev does not present a fixed target for the content of the binding sites. It's that simple.
John then seems to build another strawman argument when stating that "By the way, the Ev program sometimes fails to generate the desired solution--Pim takes this as an example of contingency in the program--but this just shows that random variation and selection is often a rather POOR conduit of information and it can FAIL to transmit the information that was pre-loaded"
First of all what is the 'desired solution' I wonder? John is using terminology here which obfuscates my argument that the exact amount of information as well as the exact nature of the binding sites is contingent on mutation and selection. Dembski and others may have suggested that 'variation and selection can be good conduits of information, but they never generate information' but this seems irrelevant to the argument. Conduits of information is exactly what mutation and selection is all about. Mutation generates the variation and the environment through selection (mutual information) injects information into the genome.
Thus John seems to miss the point when he, I assume somewhat tongue in cheek comments that "It would be better for Schneider's program to just output the information in the "sitelocations" array directly--then it would have a 100% success rate, and the same information would get output!"
John still seems to be ignoring that the information 'location of the sites' is different from the 'information at these sites'.
I am not sure what John wants to argue with his final sentence "In fact, it takes quite a lot of intelligent design to get the inputted information into a form that can be exploited by a variation/selection mechanism and transmitted to the output. "
Certainly John is not arguing that our modeling of mutation and selection in nature is in any form evidence of the requirement for intelligent design or even a necessity?
I believe that I have answered your question namely that your question is at most a red herring, distracting from the information increase at the binding sites by focusing on information needed to specify the binding sites.
Let me finally comment on mutation and selection being a poor conduit of information as an explanation for the contingency found in Ev.
Could John explain to us that despite this, Vete seems to have found a solution to the problem in all instances I tried? And guess what: The solution was always the same.
John claim that "The only way the VETE program could better simulate Ev would be if it chose a random string of letters as a target sequence every time it ran, instead of using "the answer is forty-two" as a consistent target. This would make the example perhaps closer to the Ev program simply because the target would change from run to run and have no semantic value, but it would do absolutely nothing to alter the accuracy of the VETE simulation as a model of the Ev program. "
Is illogical for the following reasons
1. Vete would still zoom in on the solution with 100% accuracy 2. Ev does not preselect the information content but instead allows the sites to co-evolve to a solution which is truely contingent on the mutations and selection involved.
My conclusion is that Ev and Vete are very different in this aspect in that Vete requires a goal (a particular string) to be pre-specified while in Ev mutation and selection determine the "string (sequence)"
Additionally it seems that the information in the location of the binding sites is Rfrequency and the (evolved) information content of the binding sites is Rsequence. There is no reason to presume in advance that Rfreq=Rseq and indeed that was the purpose of Schneider's Ev. It may be easy to confuse the two information concepts especially when they are argued to converge.
As Kim et al state it:
quote:
Within the context of protein-DNA interaction, two central aspects of information can be quantified. On the one hand, positional information is characterized by
Rfreq = -log n/N
and
quote:
Sequence information, on the other hand, is the amount of information about the sequence gained after observing binding
Rseq = Hbefore - H after
[ 03. August 2003, 06:21: Message edited by: Pim van Meurs ]
IP: Logged
|
|
Pim van Meurs
Member
Member # 541
|
posted 13. August 2003 00:10
Just a keep-alive/bump message. I would like to see some of these issues resolved. Particularly the what I see to be a confusion between Rseq and Rfreq.
IP: Logged
|
|
John Bracht
Member
Member # 5
|
posted 13. August 2003 12:24
Pim,
I simply don't have enough time to be involved with this thread, particularly since I think the confusion is with you, not with me. The points I want to make have all been made before and yet the discussion never seems to go anywhere (at least with you). I'm sorry if this seems harsh, but I just don't have time to continually correct what I see as a simple refusal to see the obvious ways the VETE program is an accurate simulation of Ev. If you're unable to see the conceptual linkages, it's not my responsibility to enlighten you.
As for the rseq vs rfreq distinction, I agree with you that the exact values (rseq) that will evolve are unknown. That's a given. But we DO know 1. where the binding sites will be, and 2. that they will be above threshold (when evaluated by the recognizer system). This is what the program explicitly selects for. This is what the program explicitly outputs. Always! It's a perfectly fixed target, because the location of the binding sites is fixed, and they will be above threshold (never below threshold or right at threshold).
Furthermore, Schneider explicitly claims to be trying to explain why rseq comes to match rfreq. However, he's being very underhanded about the fact that rfreq is being input directly into the program and used directly as a fixed target for selection. The only other piece of information needed to specify what rseq will be is the fact that the binding sites need to be above threshold (instead of below or equivalent to threshold). This is also explicitly programmed into the system. So yes, it's an absolutely fixed target. As Iain Strachan pointed out, what you're evolving towards is not an explicit text string, but a mathematical/logical function that can operate upon bit strings. It's a fixed target, one step removed.
Precisely the same is true of VETE. Pim, I can take your argument and say that rseq for VETE is completely unspecified. Different text strings evolve every time you run it (try it if you doubt this). Yet the overall FUNCTION that evolves is fixed---such that it will give "the answer is fourty-two" when it operates upon the resulting string. The fact that many text strings are possible solutions makes no difference to the fact that the program selects directly for the desired outcome. Ev is precisely the same way--it evolves a function which, when applied to the genome, shows the binding sites to be above threshold and the non-binding sites to be below threshold. That's an explicit target that the evolutionary process explicitly works towards. I don't know how to make the points any more clear, so I will now cease responding on this point if Pim continues to deny it. The fact that Pim simply dodged my question from earlier (how the Ev program does NOT have an explicit target like VETE) is further reason for me to cease wasting my time with this discussion.
Sincerely, John
IP: Logged
|
|
Pim van Meurs
Member
Member # 541
|
posted 13. August 2003 12:50
Hi John,
You mention that "However, he's being very underhanded about the fact that rfreq is being input directly into the program and used directly as a fixed target for selection."
I do not see how you could reach this conclusion other than through a confusion about Rseq and Rfreq. Yes, the location is a fixed target but the information at the binding sites is NOT a fixed target and there is no a priori reason why the information at the binding site should evolve to be similar to Rfreq. Furthernmore the suggestion of 'underhanded' seems unwarranted and unnecessary, especially when reading Schneider's paper. Let me quote
quote:
Even if the genome were to double in length (while keeping the number of sites constant), Rfrequency would only change by 1 bit, so the measure is quite insensitive. Likewise, the number of sites is approximately fixed by the physiological functions that have to be controlled by the recognizer. So Rfrequency is essentially fixed during long periods of evolution. On the other hand, Rsequence can change rapidly and could have any value, as it depends on the details of how the recognizer contacts the nucleic acid binding sites and these numerous small contacts can mutate quickly. So how does Rsequence come to equal Rfrequency? It must be that Rsequencecan start from zero and evolve up to Rfrequency. That is, the necessary information should be able to evolve from scratch.
The purpose of this paper is to demonstrate that Rsequence can indeed evolve to match Rfrequency [12]. To simulate the biology, suppose we have a population of organisms each with a given length of DNA. That fixes the genome size, as in the biological situation. Then we need to specify a set of locations that a recognizer protein has to bind to. That fixes the number of sites, again as in nature. We need to code the recognizer into the genome so that it can co-evolve with the binding sites. Then we need to apply random mutations and selection for finding the sites and against finding non-sites. Given these conditions, the simulation will match the biology at every point.
and
quote:
To test the hypothesis that Rsequence can evolve to match Rfrequency, the evolutionary process was simulated by a simple computer program, ev, for which I will describe one evolutionary run. This paper demonstrates that a set of 16 binding sites in a genome size of 256 bases, which would theoretically be expected to have an average of Rfrequency = 4 bits of information per site, can evolve to this value given only these minimal numerical and size constraints. Although many parameter variations are possible, they give similar results as long as extremes are avoided (data not shown).
Underhanded? I'd say Schneider is quite clear about Rfreq and Rseq.
And unlike Vete, the information at the binding sites is not a fixed target but rather it is contingent on mutation and selection. In case of Vete Rseq is not only always exactly the same but the "binding sites" are exactly pre-specified.
When John thus states that "The fact that Pim simply dodged my question from earlier (how the Ev program does NOT have an explicit target like VETE) is further reason for me to cease wasting my time with this discussion." he seems to be ignoring the interactions in this thread in which I 1) accepted the prespecification of Rfreq, something he claims Schneider is 'underhanded' about but in fact this is well explained in the paper. I 2) showed that Rseq is not a priori required to evolve to be Rfreq and that unlike in Vete Rseq is contingent and that the binding sites are also contingent.
It would be helpful if John were to address these observations since they seem to go against the claims that Ev and Vete are somehow similar in any relevant manner.
As I said before it is important to realize the two very different information concepts which are argued to evolve to be similar. One is the pre-specified Rfreq, determined completely by the genome and the binding sites and the other one is the information at the binding sites. What Schneider found in nature, Rfreq~Rseq is elegantly explained by mutation and selection in which Rseq is not only contingent but the actual binding sites are also contingent. This differs signficantly from Vete.
John or anyone else may want to help me understand the following
quote:
Let me finally comment on mutation and selection being a poor conduit of information as an explanation for the contingency found in Ev.
Could John explain to us that despite this, Vete seems to have found a solution to the problem in all instances I tried? And guess what: The solution was always the same.
John claim that "The only way the VETE program could better simulate Ev would be if it chose a random string of letters as a target sequence every time it ran, instead of using "the answer is forty-two" as a consistent target. This would make the example perhaps closer to the Ev program simply because the target would change from run to run and have no semantic value, but it would do absolutely nothing to alter the accuracy of the VETE simulation as a model of the Ev program. "
Is illogical for the following reasons
1. Vete would still zoom in on the solution with 100% accuracy 2. Ev does not preselect the information content but instead allows the sites to co-evolve to a solution which is truely contingent on the mutations and selection involved.
On the one hand John seems to argue that mutation and selection are 'poor conduits for information', yet on the other hand we see Vete in which mutation and selection seems to be a very good conduit for information since the same solution is found with 100% accuracy, run after run. [ 13. August 2003, 12:59: Message edited by: Pim van Meurs ]
IP: Logged
|
|
John Bracht
Member
Member # 5
|
posted 14. August 2003 01:09
Pim,
Maybe you're just misunderstanding VETE. Here's my results from running the simulation a few times: ((note, I'm only showing the last line of the simulation here, showing the final evolved solution. Go here to play with it:
http://www.iscid.org/vignere/vete.php
Micah has set up quite a nice interface!)
Here's the meaning of the notation: (Generation count) (decryption Key) (Encoded text) ---> (Decoded Text)
run 1: 47 ( PLOWRYSNY) (TXQOXEQOSP YDOBFPLLYTL ) -->THE ANSWER IS FORTY TWO
run 2: 101 (ZVBFKECNPZ) (SCGFLSVJUQZDUFQTUGNZSRQ) -->THE ANSWER IS FORTY TWO
run 3: 94 (HGKEN OQYT) (AOPEONGMCKHPCETOFJWTACZ) -->THE ANSWER IS FORTY TWO
Notice, contra Pim, that the encrypted text and the decryption key both are radically different each time. It takes a different number of generations to evolve in each "run". Yes, the fixed target is always the same--but the actual values in the genome are fully contingent.
Now, the parallels. The encrypted text is analogous to Schneider's binding sites. It's contingent, because both the decryption key and the encrypted text co-evolve. Thus, there are large number of solutions the program can find. In this sense, rseq is not pre-specified. It's fully contingent (it's just as contingent as rseq in Schneider's Ev program). Just as VETE always converges upon the same output (the answer is forty two), the Ev program always converges on the same output (the binding sites are above threshold and the nonbinding sites are below threshold). As I said before, you're just evolving a mapping function to operate on the text string to produce a desired outcome. It's a fixed target, one step removed.
Does this help?
John
IP: Logged
|
|
Pim van Meurs
Member
Member # 541
|
posted 14. August 2003 23:36
I assume that John has dropped the 'underhanded' assertion about Schneider?
Could John help us understand the amount of Rseq that evolved in Vete? The claim of similarity so far seems superficial at most.
My guess is that it is exactly equal to Rfreq.
Why? Because in Vete there is a global goal namely the information in the code/key has to match the global string.
Which is why vete isvery different from Ev where Rseq is freely allowed to evolve. That its value happens to come close to Rfreq as observed in nature is the outcome of the experiment, not the goal, as I'd argue it is in Vete.
When I argued that John was confusing Rfreq and Rseq, John stated that he was not but in the Vete example the goal is explictly for Rseq to evolve to be equal to Rfreq and in fact, other than a trivial linear operation, the information is exactly the target string "the answer is forty two". Limiting the size of the encoder to 1 and you see what I mean. [ 14. August 2003, 23:40: Message edited by: Pim van Meurs ]
IP: Logged
|
|
Rex Kerr
Member
Member # 632
|
posted 15. August 2003 02:10
It's unclear to me what the point has been of the extended discussion on the differences between Ev and Vete.
To me, WEASEL, Vete, and Ev each produce exactly the expected results--indeed, it is difficult for me to imagine how any of them could go wrong as long as you set your mutation rates low enough so that replication noise didn't overwhelm the information gained per generation.
The insight that seems to be missing here is that sometimes targets are pre-specified in biology. Of course, the pre-specification presumably isn't intentional, but it's pretty hard to avoid the consequences if a calcium binding protein just happens to bind the regulatory sequence of a certain gene (for example).
In WEASEL, there is no distinction between the target and the genomic sequence; there is exactly one target, and it is exactly the genome of the desired organism.
In Vete, the genomic sequence is passed through a simple algorithm to generate a result string, which is the exactly specified target.
In Ev, the genomic sequence is passed through a more complex algorithm to generate a binding affinity at a certain location, which is the exactly specified target.
The basic process for all three is the same--use selection to find a target. The key difference with Ev--and, I presume, why it was judged worthy of publication--is because the author went out of his way to make the algorithm and target biologically relevant in ways that are not true for WEASEL and Vete. Strachan points out a number of errors in the information calculations used in Ev, but that doesn't change the observation that selection incorporates information into the genome from the environment (including other parts of the genome).
IP: Logged
|
|
Micah Sparacio
Member
Member # 6
|
posted 15. August 2003 07:50
Hey Rex, Could you clarify something for me? I was wondering if you'd be willing to go into some detail/specifics regarding why you think Ev is more biologically relevant than Vete or Weasel. I'm not contending that statement, just asking for clarification...
More specifically, how is it biologically relevant to intentionally pre-specify what you said in the previous post nature is capable of specifying without intention? What exactly do you see selection "specifying" in Ev that hasn't been intentionally specified before the simulation was run?
Having a list of what you see selection specifying in Ev would be helpful in evaluating how biologically relevant Ev actually is. [ 15. August 2003, 07:57: Message edited by: Micah Sparacio ]
IP: Logged
|
|
Pim van Meurs
Member
Member # 541
|
posted 15. August 2003 11:01
Micah:More specifically, how is it biologically relevant to intentionally pre-specify what you said in the previous post nature is capable of specifying without intention? What exactly do you see selection "specifying" in Ev that hasn't been intentionally specified before the simulation was run?
In Ev mutation and selection result in Rseq approaching Rfreq without specifying a global goal such as in Vete ("The answer is forty two"). In fact the resulting genome (code/key), although encoded is from an information perspective nothing more than the global string. In other words in Vete, the explicit goal is the genome to match the target string.
I believe that this simulation shows that Dembski's "law of conservation of complex information" applies only to closed systems
quote:
In both this model and in natural binding sites, random mutations tend to increase both Hbefore and Hafter since equiprobable distributions maximize the uncertainty and entropy [1]. Because there are only 4 symbols (or states), nucleotides can form a closed system and this tendency to increase appears to be a form of the Second Law of Thermodynamics [23,12], where H is proportional to the entropy for molecular systems [24]. Effective closure occurs because selections have little effect on the overall frequencies of bases in the genome, so without external influence Hbefore maximizes at 2Lbits per base ( bits for the entire simulation). In contrast, by biasing the binding site base frequencies, f(b,l), selection simultaneously provides an open process whereby Hafter can be decreased, increasing the information content of the binding sites according to equation
[ 15. August 2003, 11:11: Message edited by: Pim van Meurs ]
IP: Logged
|
|
John Bracht
Member
Member # 5
|
posted 15. August 2003 12:01
Pim,
I believe you're quite mistaken: Iain Strachan showed that Schneider incorrectly calculated rseq and that it actually does NOT evolve to match rfreq as claimed in his paper. The two information measures may or may not evolve to match in real biology, but to the extent that they do, it highlights a real disconnect between the Ev simulation and real biology.
I think you're narrowly focusing on this whole supposed "rseq evolves to macth rfreq" without realizing that this is just an aftereffect of the program directly targeting a solution in which the binding sites were recognizable. The direct target was to have binding sites above threshold, nonbinding below threshold. Obviously, this involves at least enough information to recognize the binding sites (since they will now stand out or be marked by the simple fact of being above threshold). So obviously, the binding sites will automatically evolve enough information to locate them in the genome, which is precisely what rfreq tells you--where the binding sites are located. (As noted above, Schneider evidently calculated rseq wrong, but I'm just saying that I can see how rseq might evolve to rfreq anyway).
BTW, on the "underhanded" comment. I was surprised to learn that Ev has a boolean string which specifies where the binding sites are located genomically. Intuitively, I knew the information must be there for the program to evolve its solution, but I thought Schneider might be a little more subtle about it. I think he should have been forthright in his paper that he injected this information and that it constituted an explicit, external target toward which the program evolved.
Ok, gotta run. I'm really busy right now so I'll probably be offline awhile.
John
IP: Logged
|
|
|