ISCID Forums


Post New Topic  Post A Reply
my profile | search | faq | forum home
  next oldest topic   next newest topic
» ISCID Forums   » General   » Brainstorms   » What Sort of Property is Specified Complexity? (Page 3)

 
This topic is comprised of pages:  1  2  3  4  5 
 
Author Topic: What Sort of Property is Specified Complexity?
peroxisome
unregistered


Icon 1 posted 05. September 2002 18:37            Edit/Delete Post 
Dear Kirk
quote:
I'm afraid that you are misrepresenting their work
quote:
You would have more accurately represented the work of Taylor et al. if you had presented your selected quote within its context, ....
I think what you are trying to say is that my paraphrase (not a quote) was entirely accurate.

quote:
a paper by Taylor et al., rules out random sequence libraries on the basis of the highly constrained specificity of active enzymes
I think we are agreed that is your quote, and that I am not misrepresenting you. To support this very strong statement, you adduce two parts of Taylor.
No.1. Taylor et al make calculations that you cannot make more than 3.3x10(47) 100-mer peptides with the mass of the earth.

That's not a problem. Even you are talking about getting a hit at ~10(24) variants, or ~10(24) fold less than the mass of the earth. I am not sure of my arithmetic, but I think that is less than 200 kg of protein. That is feasible for a random library.

No.2. You cite a sentence which says "estimate", "indicates" and qualifies the whole lot by saying "in a single step". You then say In the second quote, they simply say it will be impossible. First of all, it is literally untrue; they do not use the word "impossible". Secondly, they have qualified their sentence in three separate ways- by removing those qualifications, you are misrepresenting the original phrase.

I am glad you agree that Taylor et al obtained 10(4) clones out of a library of ~10(8). It seems evident to me that their calculation is to estimate the size of the random library needed to achieve the clones that they got, based on the fact that they factor in the experimental design that they used to achieve ~10(4) positives. Hence 10(4) clones out of ~10(24).

But this omits the possibility that they will achieve additional positive clones out of their 10(24) random proteins- a factor you have omitted.

quote:
I see that you have also failed to carefully read Keefe & Szostaks' paper, with the result that you are under the impression that I have made a mistake.
That is a personal comment, and I don't see how you can justify it.
Secondly, I have not suggested that you have made a mistake re: Keefe & Szostaks paper.

You have chosen to avoid using K&S's direct measurement of enzyme activity in random proteins- which is 4 in 6x10(12), and instead you have opted for an extrapolated estimate of 1 in 5x10(23), which is not directly experimentally based. This happens to favour your argument, and I think that is bad practice.

yours
per

IP: Logged
warren_bergerson
Member
Member # 262

Icon 1 posted 06. September 2002 11:26      Profile for warren_bergerson   Email warren_bergerson   Send New Private Message       Edit/Delete Post 
Kirk.

Quote: So physical laws cannot act as a non-random selection system when it comes to organic life, which has a very high specified complexity.

I stated yesterday that, IMO, one of the problems with the SC debate is that it was being framed as a choice between ‘RM&NS’ and ‘an external designer’. This framing, I suggest, ignores a vast array of (to use your terminology) of possible ‘non-random selection systems’. You respond, if I understand correctly, by suggesting that non-random selection systems are not compatible with known physical laws. [It is nice to have a clearly defined issue to discuss.]

Your conclusion, it can be demonstrated(I suggest), is factually inaccurate because it, like Darwinian and neo-Darwinian ‘theory’, ignores the fact that ‘evolutionary processes evolve’. [Biologists are well aware of the ‘fact’ that evolutionary processes evolve. Knowing a ‘fact’ is not, however, the same as incorporating the fact into a theory. If you recognize the evolution of evolutionary processes in ‘an evolutionary model or theory, then you will automatically recognize both ‘non-random or directed forms of generating diversity, and forms of selection other than natural selection. As early theorists recognized, but has been subsequently ignored, ‘evolution of evolutionary processes’ is not logically compatible with an RM&NS theory or model.]

SC defines the improbability of a ‘change process’ finding a functional feature with Nf possible forms from a set of N possible forms. If N is very large relative to Nf, if the options to be considered (mutations) are taken randomly from N, and if the selection process is based on the ‘survival benefit of the feature evolving’, then as Dembski correctly concludes, the improbable phenomena would not have evolved.

However, there are all sorts of highly effective, non-random techniques for finding a rare or improbable phenomena (a needle in a haystack) other than random search. Recognizing that evolutionary processes evolve, it is relatively easy to visualize the evolution of mechanisms to produce systematic non-random variations. It is similarly very easy to visualize the evolution of mechanisms that would select-out or eliminate non-viable options based on criteria other than functionality.

We know, or should know, from comparing selective breeding results to known rates of mutation, that there exist mechanisms in biological systems for systematically creating non-random variation. We also should know from comparing point mutation rates with the range of existing alleles, that non-Darwinian mechanisms exist for selecting out or eliminating most variation that is unlikely to produce functional or adaptive results.

Dembski, it appears to me, has done a good job of defining the complexity or improbability of features such as the flagellum. What his analysis, IMO, overlooks, is the possibility or likelihood that prior to emergence of the flagellum, biological systems may have evolved processes for ‘solving the flagellum problem very quickly’. The ‘evolution of the flagellum’ is not only an issue of ‘how complex the result of evolutionary change’, but also an issue of what processes had previously evolved to ‘speedily solve the type of survival problem which led to the flagellum’.

To return to my original point, the problem with SC is that it frames the evolution issue in terms of a choice between ‘RM&NS’ and ‘an external designer’. There are lots of reasonable alternatives to RM&NS. For example, as discussed above, a theory based on the ‘evolution of evolutionary processes’ would be an alternative to RM&NS that could potentially explain how evolutionary processes are capable of ‘finding’ or ‘evolving’ highly complex or improbable features. Such, evolved ‘quick find’ processes are easily explainable in terms of known physical laws.

[I am not claiming to know how the flagellum evolved, or even that it evolved by materialistic mechanisms. I am simply claiming there are a whole range of possible explanations other than RM&NS.]

Dembski, IMO, is doing a good job demonstrating that RM&NS can not explain the evolution of the level and type of complexity observed in biological systems. Again IMO, the issue Dembski raises also points out the failure of biology and geneticists to formulate viable ‘theories and models’ which fit the existing body of knowledge. The lack of such theories and models is, again IMO, due to serious inadequacies in the techniques available to formulate valid models and theories of complex causal relationships.

IP: Logged
Frances
Member
Member # 169

Icon 1 posted 06. September 2002 12:37      Profile for Frances     Send New Private Message       Edit/Delete Post 
Kirk,

In an effort to better understand your argument I have consulted with a handout

Let me first point out to you that your specificity is not the same as Dembski's specified complexity.

Your argument is that proteins could not have arisen by chance. But Dembski's filter includes regularities as well. In fact Dembski's argument is that if it can be shown that the probability of an event GIVEN all chance hypotheses (and Dembski includes regularities in his term 'chance') is less than some universal bound of 10^-150 that this is evidence of design. Your argument wrt proteins is that for the larger proteins the chance only pathway can be eliminated.
But what about chance and selection? Interestingly enough Tom Schneider has done exactly this

Tom shows that selection and chance evolved 64 bits in 704 generations. The probability for this to happen by chance only is 5x10^-20. This translates into 1 bit every 11 generations or 4400 generations for 400 bits.

This argument should not be unique to you since it seems to have been raised in your debate with Sarkar

In Christ

IP: Logged
charlie d.
Member
Member # 159

Icon 1 posted 06. September 2002 14:11      Profile for charlie d.     Send New Private Message       Edit/Delete Post 
I have a very naive question. Let's assume, as Kirk concedes, that simple functional peptides and ribozymes can be generated spontaneously, as long as their information is below 70 bits (I am pleased to note in this regard that Kirk agrees that, if not "magical", this threshold is "not written in stone", i.e. it's arbitrary, but anyway...).

Now, among the small, random rybozymes that have been selected for (I gather, all <70 bits), some can catalyze peptide bonds. If one such randomly generated ribozyme happens to randomly join together 2 randomly generated peptides each with 40 bits of information, what is the information in the product? If the information content of the product is 80, why can't this reaction happen by chance alone?

[ 06 September 2002, 14:58: Message edited by: charlie d. ]

IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 06. September 2002 16:15      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
Resp. to Peroxisome: You are a confusing person to deal with, Peroxisome. I did not say you were misrepresenting me. It was Taylor et al. that you appeared to be misrepresenting in that the context of the quote you selected contained a very important qualifier which you left out. However, a discussion on who is properly representing who is a rabbit trail. Let's get down to the specifics:

1. I say that the paper by Taylor et al. indicates that for the moderately active enzyme AroQ, a library of about 10^24 members will be needed. You say, that they obtained roughly 1 hit in 10^4. If you will look at the point in the paper where you got your 1 in 10^4 figure, you will note that it was for a sample of 'partially randomized proteins.' Of course, what we need is a value for a fully randomized library in order to calculate the specificity Nf/N, which Taylor et al. proceed to estimate as requiring about 10^24 members. You cannot take a value obtained by sampling a library of partially randomized proteins and calculate a realistic value for Nf/N unless you bring in the additional factors, which is exactly what Taylor et al. do to get 10^24.

2. 10^24 actually works out to be about 18 Kg of 100-mer proteins. It doesn't sound like a lot, but in explaining the origin of novel proteins we are concerned about how they obtain in the real world. Because of this, we must also concern ourselves with the real world problem of finding one functional, folded protein in a batch of proteins by some sort of natural sampling method. In reality, there would be a very large amount of sequence repetition, not all sequences would be of the same length, the proteins we did have would continuously be broken down and new ones (supposedly) generated which would only rarely be a completely new sequence, and we wouldn't have a well-equipped lab with intelligent scientists carefully supervising the operation. My point is that with ID, sifting through a ready-made hypothetical fully randomized library of 10^24, 100-mer proteins is not an insurmountable problem, but it is unlikely to occur in nature. Let's not even think about a 300-mer protein. Even if nature miraculously produced a fully randomized sample of 10^24 100-mer proteins and was able to sample one million proteins/sec, it would have only about a 50-50 chance of finding the jackpot after sampling steadily at that rate for 1.5 x 10^10 years. The universe isn't even that old. Bottom line: when I advance my falsifiable hypothesis that natural processes cannot produce any sequence or configuration with a specificity tighter than 10^-22, it is for the real world, which is where science operates. In the real world, we have to concern ourselves with plausible scenarios that will produce and assemble proteins and sample the results for possible future functionality in the future minimal organism. This includes plausible sampling rates and natural selection methods that select for, by a massive coincidence, functionality in the yet-to-be evolved minimal organism.

3. It is true that Taylor et al. do not use the word 'impossible'. Rather they use the phrase 'not be possible'. You are right on that count, although I should inform you that 'impossible' and 'not be possible' are equivalent in meaning.

4. Re. Keefe & Szostak's figure. In your last post you wrote, " the taylor paper found ~10(4) hits in that estimated 5x10(23), cited a paper which found 1 in 2x10(12), and actually found hits at 1 in 10(4)." The way your sentence was constructed, I thought you were claiming that the cited paper which found 1 in 2 x 10^12, actually found 1 in 10^4. This is bad grammar on your part. For my failure to see that this was just bad grammar, I apologize.

Having worked through your objections and accusations, Peroxisome, I hope that you can see that, according to Taylor, the probability of obtaining a functional AroQ protein is still 1 chance in 10^24. There is not enough time to sort through a library that size, even if it only weighs 18 Kg, and even if one examines one million proteins per second. Furthermore, I hope you now agree that the specificity of that enzyme is as I originally stated.

Resp to Warren:

Warren, I should clarify something. I did not argue that non-random selection systems are not compatible with known physical laws. The physical laws themselves act as non-random selection systems (e.g. the assembly of natural crystal lattices). What I was arguing is that outside of organic life and human designed systems, selection systems governed only by natural laws produce ordered configurations with extremely low information content. If the selection is relaxed, complex configurations can be formed but with extremely low specificity. But in neither case do we see any sort of non-random selection system that will be remotely capable of producing and selecting proteins that will, by some coincidence, be the exact 150 essential proteins needed for a minimal life form. I should emphasize that part of the minimal genome work results indicate that not just any combination of proteins will do. Rather there appear to be about 300 that are indispensable. But if you still want to perform brutal cuts, you might be able to get it down to 150, but those 150 are absolutely essential. I'm arguing that natural processes won't even be successful in producing 1.

I don't think Demski is ignoring the possibility of non-random selection processes. His filter certainly allows for such. It is just that if the non-random selection process is due to natural laws, the result filters out as 'necessity' with very low, or non-existent complexity and specificity. Same goes for me. My falsifiable hypothesis, that all configurations and sequences requiring more than 70 bits of information must be produced by ID, applies to non-random natural processes as well. If a non-random natural process is ever found that can produce configurations with a specificity finer than 10^-22, then my hypothesis will be falsified. A recurring problem with materialists, is that they continually want to use the systems within organic life to explain organic life. I'm arguing that the organic system, because it has an extremely tight specificity that vastly exceeds 70 bits of information, is produced by ID. So the materialist cannot use an ID produced system to explain how its products came about through completely natural processes until they can show that the organic system can be produced by non-organic, natural processes.

Resp. to Francis:

Francis, the handout that you refer to was written some years ago and needs to be updated. Unfortunately, it is now circulating in cyberspace until the end of time. If you want to understand my argument, I would direct you to the derivation I provided some posts back. I have a technical paper that should help as well, but it is currently being reviewed by a biology journal and I do not want to release the paper prior to publication. The section in the handout dealing with information theory was my approach over 10 years ago. I found a problem with starting with Brillouin's work in that biologists were familiar with Shannon, but not Brillouin. So I eventually derived the same equation, but starting with Shannon's work. My general approach has not changed, but details have, including the upper limit for naturally produced functional information. I'm more confident of a lower value.

Regarding Dembski's approach to specificity and my approach: I do not speak for Bill, but from my understanding of Bill's concept of specificity, I think that my definition of specificity is compatible with his descriptions of same. I have defined specificity as Nf/N, which can only be determined relative to some objective function. This satisfies Dembski's two conditions of conditional independence and tractability. I should point out that Nf/N is an approximation, based on the assumption (which I believe I stated with my derivation some posts back) that all options have roughly the same probability. When it comes to proteins, I've been using experimentally based estimates, which indicate a specificity so tight that variations between individual sequence probabilities become irrelevant.

Regarding Tom Schneider's paper. I've downloaded the paper and wish to read through it. I will retire from this discussion for several days until I've had an opportunity to read the paper. I will then post my response. The question I will consider is, 'Does Tom's paper show that high specified complexity can be obtained without ID, or that my 70 bit upper limit hypothesis has been empirically falsified?' I'm always suspicious of computer simulations, since all the ones I've seen thus far used a ridiculously high functionality rate for novel configurations, which guarantee success. Some also use Shannon's general notion of information, with no concern as to whether or not it is functional. For organic life, functionality is essential. I don't know if Tom does this. We shall see. I'll plan to resume this discussion no later than the middle of next week.

Resp. to Charlie:

My 70 bit upper limit is an attempt to be realistic, and at the same time recognize as many ID-produced configurations while, at the same time, allowing for no false positives. This is negotiable, of course, but one must be careful about jacking up this limit solely for the purpose of letting proteins slide in under the bar. The problem with doing this is that we increase the false negatives: configurations produced by ID that are not identified as such. Of course the absolute upper limit is 400 bits due to the constraints imposed by the energy, mass and age of the universe.

Regarding your suggestion for assembling proteins from smaller 40-bit polypeptides, this is similar to a scenario Chris de Duve proposed some years ago. First, in order to carry 40 bits of information, the randomly assembled protein must be functional in order to have an Nf that is not equal to N. I'm allowing that this is possible given my 70 bit hypothesis. However, simply joining two 40 bit proteins together does not necessarily produce an 80 bit protein. The new protein must have a new function. If it has no function at all, then Nf = N and the functional information collapses to zero. If it has the same function as one of the original proteins, then its total functional information remains at 40 bits; adding useless sequence to the end of a functional sequence does not increase its information. But it can certainly destroy the information if the overall sequence becomes non-functional. So the question we must answer is, 'if we join two proteins together, through natural processes, each of which carries 40 bits of functional information, how much information does the new protein carry. First we determine its function. Then we must come up with a reasonable estimate for Nf/N for the new protein, then we can use eqn (4) to see how much functional information the novel protein carries. In order to carry 80 bits of information, Nf/N would have to be about 10^-26. That is an awfully small target in sequence space to hit, simply by randomly combining 40 bit proteins. The bottom line is that functional information is quantized into miniscule regions of functional sequence space. Randomly combining 40 bit polypeptides will land you all over sequence space, depending upon the sequence of each 40 bit protein, but the miniscule size of 80 bit sequence space makes it unreasonable to expect that the random combinations will ever hit the jackpot under any realistic natural scenario in the history of the universe to date.

Once again, I'll leave this discussion for a few days until I've gone over the paper that Tom wrote. I'll plan to post a response no later than next Wednesday.

IP: Logged
Frances
Member
Member # 169

Icon 1 posted 06. September 2002 21:41      Profile for Frances     Send New Private Message       Edit/Delete Post 
Kirk,

You seem to claim yourself that your approach only looks at a random chance hypotheses. That the universe rules out chance hypotheses > 120 bits is merely restating Dembski's argument but Dembski does not merely look at pure chance but also includes regularities.

quote:

the ratio NF/N is the specificity, which can be defined as the probability of obtaining a functional configuration in a single recombination

And does thus not consider the many other chance/regularity hypotheses proposed by Dembski.

Your 'specificity' is what is also more commonly refered to as the R_frequency (R_f)

For a variety of binding sites, table 1 shows the Rf to be in the range of 10-20 bits.

Btw your equation (3) in your previous posting is very similar to Shannon's theory. But you seem to be using some of the terms in an inappriate manner. Now I realize that information theory is quite confusing, I am still struggling with its concepts but it would be helpful to accurately define the concepts:

H is not the Shannon information but the Shannon entropy. For M possible symbols H is defined as:

H= - Sum(i=1 to M) P_i log2 P_i bits per symbol. There is a maximum for equal probabilities and becomes closer to zero when one symbol becomes more common. For instances in which all probabilities are equal the formula simplifies to P=log2 M
Now we observe the actual probabilities and the difference in entropy is the information.
For example 400 bits of information contains a before entropy of 400, if the after entropy shows a equally equiprobable distribution then the amount of information generated is 400-400 or zero. If the 400 bits are all zero or one then 400 bits of information is 400 - 0 or 400 bits.

A good primer might be helpful to ensure that we use the correct terms.

As far as models for (origins and) the evolution of the genetic code, some interesting papers can be found at:

Freeland lab

Rob Knight et al

Computation models of the genetic code evolution based on empirical potentials

[ 06 September 2002, 22:19: Message edited by: Frances ]

IP: Logged
peroxisome
unregistered


Icon 1 posted 06. September 2002 23:14            Edit/Delete Post 
quote:
You are a confusing person to deal with, Peroxisome.
The confusion is not mine.
quote:
I'm afraid that you are misrepresenting their work
I made a statement of fact. My statement was of the form "They said 'paraphrase'", and it was accurate. There can be no misrepresentation, because my statement is either accurate or inaccurate. That is the fact, and the reason why you are reteating from your original comment is because it is self-evidently untrue.

your points
1. I am glad you agree that their data was 1 hit in ~ 10(4). Taylor et al at no point say 1 in 10 (24). They estimate that ~10 (24) random mutants would be needed to obtain the AroQ mutases (and they use the plural), and they obtained 10(4) AroQ mutases. You are misrepresenting their data by using a figure of 1 in 5x10(23), or 1 in 10(24), because they do not use that figure at any point. They obtain 10(4) hits in that estimated 10(24).

2. You started out with this contention.
quote:
a paper by Taylor et al., rules out random sequence libraries on the basis of the highly constrained specificity of active enzymes
and you supported this by reference to Taylor et al's calculation that only ~10(47) 100mers would comprise the mass of the earth.
I am intrigued to see that you work out that 10(24) 100 mer proteins is only 18 kg, and so you accept that it is theoretically possible to make a library this size, and that there could be an enzyme in this size of library. The distance from your original position is intriguing.

You are now forced to resort to some concept, which is "sampling" sequences at 10(6) per second. This is unreferenced, unjustified, and apparently plucked from thin air. It is good to see your argument has evolved, or perhaps, been incrementally designed [Wink] .

3. I am glad you agree that your original phrase "they simply say it will be impossible" is literally incorrect. They said "Our estimate of the low frequency of protein catalysts in sequence space indicates that it will not be possible to isolate enzymes from unbiased random libraries in a single step "; there are three qualifiers on "will not be possible", and it is fundamentally dishonest to misrepresent that statement without qualification as "will be impossible".

4. In my sentence, the "Taylor paper" was the subject, and the grammar was fine. It was you that chose to write "I see that you have also failed to carefully read Keefe & Szostaks' paper"; you had no knowledge of whether I had carefully read the paper, and the fact that you made such a statement in the absence of knowledge of the facts shows a reckless disregard of the truth.

quote:
according to Taylor, the probability of obtaining a functional AroQ protein is still 1 chance in 10^24
Let me be absolutely clear; this is untrue. At no point do taylor et al say 1 in 10 (24)/ or 1 in 5x10 (23). They estimate that a random library of ~10(24) would be needed to obtain the 10(4) mutases they found experimentally.
quote:
There is not enough time to sort through a library that size, even if it only weighs 18 Kg, and even if one examines one million proteins per second.
So far, this is unsupported assertion.

Finally, you have chosen to avoid using a direct measurement of enzyme activity in random proteins- which is 4 in 6x10(12), and instead you have opted for an extrapolated estimate of 1 in 5x10(23), which is not factually or directly experimentally based. This happens to favour your argument, and appears to be bias.
yours
per

IP: Logged
Art
Member
Member # 179

Icon 1 posted 06. September 2002 23:31      Profile for Art     Send New Private Message       Edit/Delete Post 
Sorry for being so slow in responding, Kirk. Hopefully, the following remarks are not too redundant (wrt to other commentary in this thread).

quote:
Regarding your calculations: Given your system in isolation, it doesn't seem like a large problem, but my upper limit of 70 bits is after taking practical considerations into account. In reality we don't have a library of 95-residue, ready-made proteins to draw from. If such a pool were to occur in nature, the proteins would be constructed in a step-wise fashion via a random walk. This would do two things. First, there would be repetition (in other words, there would not be 10^23 different sequences to draw from).
I don’t agree with any of this - this seems to be an arbitrary construct to try and push a very accessible probability into a zone that is friendly to ID. In fact, I think it far more likely that an original “pool” would consist of nothing but random oligomers, and not a much smaller number of repetitive units.

quote:
I would not agree with you that what Taylor et al. showed is that "functional proteins are not complex (in the Dembskian sense)" I can't speak for Bill, but I infer that when he uses 'specified complexity' he means to refer to a complex system that is specified beyond the upper limit to which nature might possible achieve. I put that limit at 70 bits. The absolute upper limit, including quantum events is about 400 bits, given 10^120 operations in the universe.
I believe I have made a good case that natural phenomena possessing millions of bits of information can arise spontaneously via purely natural processes.

But that’s somewhat beside the point, serving to note my continuing quibble with the use of informational “bits” to make inferences about complexity (as defined by Dembski).

Taylor et al. argue that the specific sequence family that they are studying should occur once every 10^24 random 100-mers. Complexity is determined by the size of the pool of sequences that we can sample. Since the pool sizes of interest here are clearly large enough to contain hundreds of functional sequences of this sort (as I showed previously in this thread), the protein is clearly not complex.

One can argue that AroQ is a special case, but that is, IMO, just a ploy to minimize the impact that this study has on the utility of information-based approaches for anti-evolutionary arguments. (My POV, IMO, opens new vistas for ID theory - opportunities that can be explored only if one frees oneself from anti-evolutionary prejudices.) I can assert, with more justification, that the upper limit of information content is going to converge, with time, on the 10^-10 to 10^-14 range.

quote:
Regarding revising downward Yockey's estimate for cytochrome c, I don't think there is warrant for doing this, mainly because they were not working with the same protein. Yes, if we use .83 bits/residue, we would get a significant decrease, but keep in mind that .83 is a conservative estimate. What we would need to do is to compare the structural constraints of the two proteins before we would be justified in maintaining the conservative estimate. Furthermore, I had another paper by Sauer that confirmed Yockey's estimate to within one order of magnitude. Be that as it may, I'm saying that 70 bits is the cutoff, Taylor's protein requires more than 70 bits, therefore it is not just going to need ID in the lab, it is going to need ID in nature as well, even if we do downgrade Cytochrome c.
I think it wise to reflect on what Yockey and Sauer did, and how much better the approach of Taylor et al. is. Yockey, in essence, aligned a number of previously-identified cytochrome C sequences and used the position-by-position variation to estimate the total number of cytochrome C sequences that might retain function. This approach has obvious limitations (as one would find if one used this approach on P450’s). The most pertinent of these for this thread is that Yockey had no way of identifying, let alone estimating the numbers of, alternative, completely unrelated sequences that can function as does cytochrome C. Thus, his result is properly viewed as an upper limit of the information content of cyt. C. The lower limit, something that is needed to make the exercise productive, is completely unknown.

Sauer’s group did the same thing in the lab that Yockey did on paper - they systematically varied specific positions in the Arc repressor (and maybe the lambda repressor - I forget), and used the results to estimate the number of Arc-related sequences that could retain function. This approach suffers from the same fundamental flaw as does that of Yockey - only Arc-related sequences are “counted”. Since unrelated sequences that function as the Arc repressor are not considered, the results give us an upper limit of information content. Again, no hint of the lower limit can be obtained from the work of Sauer’s lab.

Taylor et al. improved on Sauer’s approach by simultaneously varying large numbers of residues in known structural units. This approach helps to “count” variants that have clusters of compensating mutations that restore a modicum of activity (by maintaining an overall structural fold). It is hard to appreciate the possible contribution of compensating mutations can make to estimates derived from studies such as those with the Arc repressor, but the work of Taylor et al. hints at the magnitude of the effect - it increases the fraction of sequence space that is taken up by this one family of sequences from about 1 in 10^60 (pardon the use of very round numbers) to 1 in 10^24. This latter number, IMO, is a better estimate of the upper limit of the informational content of a protein.

quote:
Regarding your point, " Until we can estimate the true magnitude of the numbers of different, unrelated sequences that can satisfy a specification, we can only take results such as this as an upper bound," I should point out that estimates provided by Keefe & Szostak, and by Taylor et al. do not assume that only one sequence will do the job (1/N).
I agree as far as the work of Keefe and Szostak, but disagree strongly as far as Taylor et al. are concerned. These latter authors systematically varied one sequence, using a few invariant residues as anchors. Until they figure out how to completely randomize their test populations, their numbers are at best upper limits.

But things are not as bad now as they were a few years ago. Thanks to a number of combinatorial studies (including that of Keefe and Szostak), we can estimate the lower limit of the information content of functional proteins as on the order of 1 in 10^10 to 1 in 10^14. This range - 10^-10 to 10^-24 - is the one that ID theorists should work within. At least until data that tell us that these results are not generally applicable become available.

quote:
Rather they provide estimates of Nf/N. There estimates result from a sampling of sequence space, albeit small. As a related aside, if you carefully go over Blanco's paper and another paper by Hagihara & Kim (Hagihara, Y. & Kim, P.S. (2002). Toward development of a screen to identify randomly encoded, foldable sequences. PNAS, 99, no. 10, 6619-6624. doi:10.1073/pnas.102172099) you will notice that folding sequence space for those proteins appears to be very close to the parent protein. In other words, we have additional data that indicates that folding sequence space for a given proteins is highly constrained.
This has nothing to do with the existence of completely unrelated (at the sequence level) functional proteins. And it is not relevant to the more general theme of protein evolution, for the reasons I have given earlier in the thread.
IP: Logged
warren_bergerson
Member
Member # 262

Icon 1 posted 07. September 2002 09:37      Profile for warren_bergerson   Email warren_bergerson   Send New Private Message       Edit/Delete Post 
Kirk,

I don’t disagree that if you are going to formulate a general materialistic theory or model of biological design, then you must be able to explain how very complex, high information content systems can be formed or produced from very simple systems with very low information content. I happen to believe that is possible and in fact I have a model or theory which I believe can explain how this could occurred. The issue addressed by SC is not, however, complexity from nothing, but a complex biological system being capable of finding one of the Nf functional forms in a set of N possible forms.

The main point of Dembski’s argument is that a RM&NS system could not find Nf. A couple of ‘technical points’.

1)given enough time and no other physical constraints, an RM&NS system can logically find one of the Nf functional solutions no matter how large N or small Nf. RM&NS is falsified not by the size of N/Nf , but by the real world constraints imposed on biological design processes such as evolution. [Specifically the life form has to be able to survive the evolutionary processes.]

2)The ‘miracle of chance’ or ‘dumb luck’ argument that RM&NS ‘could’ explain any change if the right mutation occurred, may be logically ‘possible’ but it is not ‘scientifically’ acceptable. Scientific explanations must be reproducible and testable.

IMO, the SC argument is weak, because there are much simpler and more rigorous techniques for showing that RM&NS systems or even NS systems can not possibly explain what is known about the real world constraints on evolutionary change processes. In simple terms, any individual organism or species relying on an RM&NS evolutionary systems would die or become extinct long before ‘evolving’ solutions to even the most elementary survival problems.

To return to my original point, there are lots of non-random variation, non-natural selection processes which 1)can very quickly and efficiently find solutions to problems with very high complexity (large N/Nf), 2)which are compatible with physical laws, and 3)which can be observed and analyzed with standard scientific methods.

To begin, it is not difficult to identify biological systems that routinely find ‘solutions’ to very complex problems (i.e. find one of Nf functional or adaptive solutions from N possible options). The ‘program’ controlling information processing in a neuron has a complexity or information content of at least 2 to the 1000th power. These programs are routinely reprogrammed or readapted in milliseconds.

If a complex life form has 30,000 genes, and only two states (active and inactive) for each gene, then the ‘point in time complexity’ for an individual cell is certainly greater than 2 to 20,000th power( at least 20,000 of the 30,000 genes need to be in an appropriate state for the cell to function in a manner compatible with the survival of the organism). Cells apparently solve this ‘adaptive problem in seconds’.

The point I am making is that biological systems have all sorts of processes and mechanisms that routinely solve 70 bit, 500 bit, and 30,000 bit adaptive problems in very short time frames. From what we know, the ‘physical’ mechanisms responsible for these problem solving capabilities are relatively simple, and certainly compatible with what is known of physical laws.

Quote: I'm arguing that the organic system, because it has an extremely tight specificity that vastly exceeds 70 bits of information, is produced by ID.

And I would argue that a 70 bit problem is relatively trivial for a biological system. SC demonstrates the existence of a serious problem. The problem demonstrated, however, is not the inability of materialistic systems to explain SC, but the lack of adequate techniques within the biological sciences to formulate scientific models and theories of biological design which can explain SC. Dembski, whether he intended to or not, has demonstrated the inadequacy of the theorists as well as the inadequacy of the theory.

SEAMLESSNESS
Quote: So the materialist cannot use an ID produced system to explain how its products came about through completely natural processes until they can show that the organic system can be produced by non-organic, natural processes.

The issue of materialist versus non-materialistic explanations raises an interesting philosophical question. Many proponents of ID argue that biological design or the complexity (and beauty) of biological design suggests the work of a highly intelligent external designer. A view which, I believe, the majority of people accept.

Many of these same individuals then go on to argue that the complexity of biological design suggests that there must be non-material or discontinuous explanations for this complex biological design. They are, in effect, arguing that there is an external designer who was smart enough to created these beautiful complex designs, but she wasn’t smart enough to hide her handiwork. She was smart enough to create all the beauty and complexity we see in the world, but she wasn’t smart enough to come up with a simple logical materialistic process that could explain the complexity produced. She was, these individuals seem to be arguing, smart enough to hide all the seams relating the production and operation of the vast physical universe, but in designing a phenomena covering the surface of one tiny planet, she had to use non-materialistic miracles.

Personally, and this is obviously only a personal opinion, I think there is a rational basis for accepting the existence of an external designer. IMPO, it is not particularly rational to assume that there one small area in the universe where the designer was unable or unwilling to hide the seams of an otherwise seamless production.

IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 11. September 2002 17:04      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
I've just finished constructing this response, which is 10 pages long in Word. I'm bug-eyed and can't bear the thought of trying to focus my eyes enough to read through all this looking for typos, bad grammar and nonsensical statements. So please accept my apologies in advance for any nonsense you may find in my response (I can already hear the comments about the 'nonsense' bit).

My response seems to be too long to paste into this box, so I will post my response in two parts. The first part will explain more fully my approach to detecting ID and respond to some of your comments. The second part will focus only on the Scheider paper, which claims to solve the information problem.

First, I think I need to back up a bit and go through my method for detecting ID. This will hopefully clear up some of the questions that have been raised. It will also provide a good context within which to respond to Schneider's 'Evolution of biological information' paper, which has some serious problems and fails to explain the origin of functional information in organic life. But first, an explanation of my approach.

Although we have no basis for confidence regarding how other intelligent agents might think, there are two criteria that, if present, would force intelligent agents to utilize ID:

1. an intelligent agent desires a particular function and,

2. the unconstrained physical system is not likely to produce the desired function.

In order to achieve the desired function, the intelligent agent must impose certain constraints, using ID, upon the physical system. The configuration that results will be functional and, since constraints were imposed upon the physical system to achieve the function, the functional configuration will represent an anomaly within the system. Because of this, whenever we observe functional anomalies within a physical system, red flags ought to go up, whether we are archeologists, SETI scientists, biologists, or just a hiker out in the remote wilderness.

Fortunately, there is an objective method to quantify functional anomalies.

Let us refer to the state of a physical system that is not constrained as H. If H cannot reasonably be expected to produce the desired function, then constraints will have to be imposed upon the physical system. Let us refer to the state of a physical system that has been constrained to perform a particular function as Hf. The difference between H and Hf will be a measure of the constraints that have been imposed. Shannon has given us a method for quantifying the difference between two states, in terms of information. (Note to Francis: If the equation for H is given in base 10, then I think it ought to be referred to as 'Shannon entropy', as you have suggested. But if it is given in base 2, then it is in base 2 for the purpose of expressing Shannon entropy in units of information called 'bits'. Strictly speaking, you are probably right in insisting that H should always be referred to as entropy. Once we start taking the difference between the Shannon entropy of two states and expressing that difference in 'bits', then I don't see a problem in calling it 'information'.)

To quantify information using Shannon's approach, we simply take the difference between the two states, usually the 'before' and 'after' states. Shannon information, however, makes no distinction between meaningful or functional information, and meaningless or functionless information. Organic life, however, makes a very large distinction between functional and non-functional information. If the functional information contained in a gene becomes non-functional, it can have disastrous results. Failing to distinguish between Shannon's very broad definition of information (which does not distinguish at all between meaningless and meaningful, or functionless and functional) is an extremely common mistake within biological circles (Schneider included). In order to filter out meaningless information, one must be very careful with regard to 'before' and 'after' states. The 'before' state must be the state where no constraints are placed on the physical system, and the 'after' state must be the state where constraints are placed on the physical system to produce whatever function is under investigation. Usually in biology, the 'before' state is before some process 'x' and the 'after' state is after some process 'x'. This will yield the quantity of Shannon information sure enough, but will not necessarily distinguish between functional and non-functional Shannon information.

If the probability of each symbol or configuration is equal, then for the general calculation of Shannon information (I) = H1-H2 = -log2 (N1/N2) in bits, where N1 is the number of different possible configurations before, and N2 is the number of different possible configurations after.

If we are concerned about functional information (If), then If = H-Hf = -log2(Nf/N) where Nf is the number of functional configurations in the physical system with constraints and N is the total number of possible configurations permitted by the unconstrained physical system. (Frances, re. R frequency: R frequency is the amount of information needed to find a given location or site within a genome. Functional information as I have defined it is the amount of information required to find a protein with a given function within the sequence space of proteins of that number of amino acids. If the number of amino acids is an essential variable, then sequence space expands accordingly.)

Now for an example, using Dembski's notion of continency. A solution of NaCl has any number of possible configurations for the arrangement of the NaCl molecules. As the solution evaporates, the unconstrained physical system will produce a highly ordered crystal lattice. If we simply take 'before' to be the solution of NaCl and the 'after' to be after the solution evaporates. leaving behind a nice NaCl crystal lattice, then the computed Shannon information is large. However, if we are concerned about functional information, then the 'before' represents the outcome of the physical system without constraints. The unconstrained physical system permits only one possible outcome (ignoring dislocations, impurities, etc.), so N = 1. When N = 1, no further constraints to produce some other configuration are possible, so Nf = N = 1 and the functional information is zero. This is an example of 'contingency', the outcome is dictated by the laws of nature. No constraints have to be placed upon the physical system to produce a NaCl crystal, nature will do it all by herself. In other words, no functional information is required to produce the observed outcome. Tornados fall into this category. Given the right conditions, which occur routinely within the unconstrained physical system, a tornado will form, taking a highly ordered general configuration. They may represent a large change in the Shannon entropy of the local atmosphere (in the broad sense of Shannon information, which does not distinguish between functional and non-functional configurations), but they require zero functional information to produce, given their frequent appearance in the unconstrained physical system, and they carry zero bits of functional information.

Now consider an irregular-shaped stone, which is an example of Dembski's notion of complexity. The number of possible shapes for irregular-shaped stones is very high; the unconstrained physical system permits a huge number of outcomes (large N). Once the laws of nature have gone to work on a particular stone, the Shannon information to produce that exact shape would be very large, but what about the functional information? To calculate the functional information of a particular irregular-shaped stone, we have to look at it and ask the question, 'what function does this shape have that other irregular-shaped stones don't?' Chances are that it does not have any function that distinguishes it from any other irregular-shaped stone. The only function it seems to have is the low-level function of being an irregular-shaped stone. But a very large number of irregular-shaped stones have that low-level function, so Nf is very high, about equal to N. So no constraints have to be placed upon the physical system to produce an irregular-shaped stone and the functional information is zero, by equation (4). This is an example of complexity without specificity; the unconstrained physical system permits a wide variety of outcomes (unlike NaCl crystal lattices), but any outcome will do the job of producing an irregular-shaped stone.

Now we look at proteins, which is an example of Dembski's specified complexity. If we have an unconstrained physical system polymerizing amino acids, we should expect a huge number of possible outcomes (as Art argues when talking about a pool of oligomers. I think you are right Art, if we're talking about random assembly rather than the production of proteins via some process that involved replication or primitive RNA, which would tend to produce a lot of repeats.) Given the large number of possible outcomes, N is very large (N = 20^R, where 'R' = number of sites). Since organic life needs proteins that are functional, we must get some idea of how many different proteins will actually produce the function. So the number of functional proteins, for a specified function, is Nf. The functional information in this case is very large. This is a case of specified complexity; nature permits a wide variety of outcomes when assembling amino acids, but only a small specified subset will produce the particular function in question.

While it is true that Nf/N represents the probability of obtaining a functional protein, in a single recombination, out of a fully randomized library of proteins of the same length, it is also true that Nf/N represents the fraction of sequence space that will be functional. It literally represents the fraction of polypeptides of that length that will be functional. For this reason, a person might be tempted to blow off Nf/N because it represents a probability, but they cannot simply do that because it also represents an objective description of the frequency of functional proteins for that particular number of amino acids and that particular function. In other words, even for non-random selection systems, Nf/N represents how much work will need to be done for find the target. If Nf/N = 1, then no work at all will have to be done. The unconstrained physical system will do it for you. If Nf/N = some miniscule number, then a lot of work will have to be done by the non-random selection system in order to constrain the physical system sufficiently to select out all of sequence space except for that which represents Nf. In other words, Nf/N tells the person who endorses non-random selection exactly how much selection is going to have to be done. This can be expressed in units of information called 'bits' and calculated using equation (4). My approach is meant to be general, that is, for both random and non-random systems.

The hypothesis that I have advanced is:

HYPOTHESIS: all configurations requiring more than 70 bits of functional information will require ID.

The reason that some sort of limit must be established is that there is room for a certain about of functional information to be produced by chance within the 'noise' of natural outcomes where the noise is the outcomes of random events in nature. A good example of this is are the variety of radio signals we receive from deep space. Because of random events, there can be anomalies within the background noise which are probabilistically expected. To hold the attention of a SETI scientist, the anomaly must be sufficiently large to stand out against the background noise. So, for example, they might look for an anomalous signal that has a bandwidth of less than 300 Hz, which would have all the makings of being a functional radio signal from an ID source.

So, in general, I must set an upper limit for the magnitude of functional anomalies that we could reasonably expect to observe within the background noise of random natural processes. That limit can be expressed in the form Nf/N, which for the universe, would not likely be finer than 10^-120. It can also be expressed in terms of 'bits'. For biological processes, I have chosen to work with 'bits' because of Shannon's method to quantify functional anomalies, and I have set the upper limit at 70 bits. True, this figure is not carved in stone, but one is not free to pick any figure at all out of thin air. If it is too high, one begins to rule out too many human artifacts that have been obviously designed. If it is too low, it will begin to indicate false positives. So the upper limit for naturally produced anomalies must be a balance between keeping false negatives to a minimum, while at the same time not permitting any false positives. Dembski's filter does this, although his cut off is different from 70 bits. He has suggested an Nf/N of 10^-50, which works out to 166 bits of functional information.

Although 70 bits is chosen by myself as the upper limit for naturally-produced anomalies within the background noise of random natural events, it is theoretically possible that non-random, natural systems could produce a functional anomaly that was greater than 70 bits. To avoid circular reasoning, we can't simply assume that organic life is a result of non-random natural events, and therefore it is a result of non-random natural events, for organic life is the subject up for discussion. I think it highly unlikely that there is a non-random, natural system that can produce huge functional anomalies for the following reason. When the laws of nature rigidly determine the outcome of some processes, we get an effect that can be described by a simple equation. The effect is simple and carries zero functional information. In order to carry large amounts of functional information, an effect must be complex. To achieve a complex effect, we must relax the constraints that the laws of nature impose on the outcome. What results is a complex but highly unspecified outcome which still carries zero functional information. So from one extreme to the other, in the continuum of natural effects, we go from highly ordered to highly complex, but zero functional information throughout. In order to achieve a complex but highly specified event via the laws of nature, nature would have to relax its constraints on the outcome (to produce a complex event) while at the same time vastly increasing its constraints on the complex outcome to highly specify it to perform some function. I can say on the basis of an undergraduate degree in physics, I have not seen anything remotely like this within the laws of nature. Even chaotic interactions start off as a highly complex interaction of simple physical laws, but eventually fall out into a highly ordered outcome, lacking the complexity necessary to carry any functional information, similar to a crystal lattice.

As Leon Brillouin has pointed out, the total amount of information required to produce an effect is additive. A functional protein (in this case) will be the sum of the information required to construct the non-random selection system, plus any additional information that must be provided to achieve the functional protein. There are ways for organic life to produce novel proteins that, given the systems built into organic life, require very little, if any, additional information (e.g., antigen receptor proteins).

What this means for non-random selection systems that can produce functional proteins with no additional information, if my hypothesis is correct, is that the non-random selection system will have to be produced by ID, if what it produces carries more than 70 bits of information.

This arises out of the Information Transfer Theorem which essentially states that information flows downhill for redundant systems. A special case of this is the Central Dogma in biology. Information doesn't just 'happen'. It always takes functional information to make functional information, and there will be a net decrease in information over time. The only way to solve this continual deterioration in information is to have it maintained or 'topped up' by intelligent agents, who can produce unlimited amounts of functional information.

Because of this, ID can make a prediction regarding genetic information; it will deteriorate with time. Thus, we should predict the discovery of psuedogenes that, with a little tweaking, may once again be functional. We can also predict the slow accumulation of 'noise' within the genome of organisms. We can also predict the existence of a regulatory system that, necessarily, must be based outside the protein coding genes. That is, part of the regulatory system will have to be based outside the non-coding regions of the genome. This poses a challenge. On the one hand we predict the slow accumulation of 'noise', but on the other hand, we predict that not all non-coding DNA is 'noise', some of it has to be regulatory in function. We can also predict, on the basis of the information transfer theorem, that without ID, nature will never be able to produce an organism with a higher functional information content, from an organism with a significantly lower information content.

Warren, I don't disagree with you when you claim that biological systems can produce functional information. But the Information Transfer Theorem suggests that the functional information produced by any biological system will always be less than the total functional information contained already in that system. In other words, to go from a minimal life form with an information content of at least 42,000 bits to an organism with an information content in the seven figures, will require a non-random selection system that contains even more functional information than 7 figures. That can be provided by ID or it could be provided by an even more complex natural system (an extremely complex fitness landscape that has a topography that will essentially direct the production of all organic life and provide the necessary information for their genomes). Such a fitness landscape would contain all the information for all the organic life forms in advance of their emergence. The functional information contained in such a fitness landscape would be a very large number, not approached by any other physical process we've ever observed. Empirically, however, I don't think we can support the notion that the fitness landscape can do it. There doesn't seem to be any empirically verified driving force within nature that ever steadily pushes simple life forms (42,000 bits of functional information) into higher functional information states (Schneider's paper notwithstanding).

Art asserts that he believes that the upper limit of information content is going to converge, with time, on the 10^-10 to 10^-14 range (you must be referring to Nf/N Art, given your small numbers). I think there is a problem with that. First, it will require that the average functional information content per amino acid will decrease with larger proteins. Large proteins with a very complex topology would have a much lower information content per amino acid than a very simple, short protein. Of course, this would mean that genes that code for short proteins would contain much more information per codon than genes that code for large, complex proteins. In fact this flies in the face of even the more general Shannon information (with no regard for functionality) for complex and simple proteins. It would clearly require much more information to describe a very large protein with a very complex 3-D structure, than it would to describe a simple, short protein. A highly conserved protein such as ubiquitin clearly has a higher Nf/N, than a simple protein that can bind ATP. Furthermore, Axe has pointed out that sequence constraints are likely to increase with more complex requirements. Sequence constraints directly affect the functional information per amino acid. Loose constraints result in lower functional information per residue, while tighter constraints result in a higher functional information content per amino acid. That is why we should expect the total functional information for shorter proteins with a simple function to be smaller than longer proteins with a more complex function. So when Keefe & Szostak find a Nf/N that is larger than the estimated (from actual sampling of a partially randomized pool) Nf/N for a larger, moderately active enzyme (Taylor et al.), it is merely verifying what has already been predicted. It would be simplistic to take the Nf/N for an 80 amino acid protein and use that value for the average 300 residue protein. That is why I estimate the functional information content of a protein by multiplying an estimated functional information content/residue by the number of residues. For small proteins around 80 residues, I would use the value given by Keefe & Szostak. For larger proteins, around 100 residues, I would use the value suggested by Taylor et al. For even larger proteins with very tight sequence constraints, I am still using Taylor et al. figure with the assumption that I am now being very conservative (recall that the Taylor figure of 10^-24 was for only a moderately active enzyme). I can understand a committed materialist wanting to use the Nf/N supplied by Keefe & Szostak as indicative of 300 amino acid proteins, but in light of the fact that binding to ATP is a pretty simple function, requiring a pretty simple protein, I don't think the committed materialist is been intellectually honest (Note that I'm not saying this of you, but as a general statement). If I were to use an unwarranted figure to support ID, the same would apply. But using a bits/residue for a moderately active enzyme, that is only 100 amino acids long, and applying it to longer, 300 amino acid enzymes that in some cases have very complex topologies and function, seems to be erring on the side of caution, rather than inflation. (By the way, Art, for various reasons, some of which you mention, I do not use Yockey or Sauer's papers, preferring to use work that is more current and more based on experimental data (Keefe & Szostak, and Taylor et al.). It is true that Taylor et al. sample only the local area of sequence space to the point of non-functionality (as do Hagihara & Kim, but what that gives us a reasonable idea of the size of the local functional sequence space. The big question is whether there might be other areas of sequence space that would produce the same fold. I very much doubt this, given that the complex folds are a result of both the sequence and the energy well the sequence has. However, in the unlikely event that a totally different sequence (by that I mean that even the normally invariant residues are different) does produce the same fold, if Nf for that totally different sequence was about the same size as Nf for the AroQ sequence, it would result in an Nf/N of 2 x 10^-24 instead of 1 x 10^-24, which makes no difference to the functional information due to the logarithmic nature of the equation. Even if there were 10 such islands in sequence space that yielded the same fold, it would only drop the functional information content by 3 bits.

For me the discussion isn't about ID vs evolution; that is a false dichotomy. I'm prepared to grant you any process you wish to explain the origin and diversity of life. What I am arguing is that, regardless of the process, ID will be required or it just isn’t going to happen.

I probably haven't covered every point raised by each person (it was 10 pages when I printed it off just prior to taking my car for service, which is where I sit as I type this out), but I must address the Schneider paper.

IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 11. September 2002 17:06      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
Schneider states that his model is "representative of the situation in which a functional species can survive without a particular genetic control system but which would do better to gain control ab initio." His system uses a population of 64 individuals and focuses on a section of their genome that is 256 bases long. He refers to this as the organisms' genome, but in light of what he states is realistic above, I regard it as only part of a genome. The genetic sequence has two parts. "a section is set aside by the program to encode the gene for a sequence recognizing 'protein'," represented by a weight matrix. Then 16 non-overlapping binding site locations "were placed at random in the remaining portion of the genome." To start off with, each of the 64 sequences is "chosen randomly, with equal probabilities, from an alphabet of four characters (a,c,g,t, Fig.1)." I should note that the random sequence also applies to the protein coding gene. For each generation, the sequence recognizing protein surveys the 16 binding sites and measures the degree of mistakenness of each site. The 32 most mistaken organisms are wiped out and replaced with duplicates of the 32 less mistaken organisms. The net effect is to bias the population towards binding sites that more accurately match what is required by the sequence-recognizing protein. In each generation, one random mutation occurs per organism. This mutation can occur in any of the binding sites as well as in the protein coding portion of the genome.

Here are some comments:

1. The gene that encodes the sequence recognizing protein starts off fully functional (it can immediately recognize the degree of mistakenness of each of the 16 binding sites), in spite of the fact that it can start off with a completely random sequence. Simply put, Nf/N = 1 and the functional information content of that gene (and translated protein) is zero. A fully functional gene or protein that has an Nf/N =1 and carries zero functional information is unknown in biology. So right off the bat, Schneider's program fails to be a realistic model of the real world. To be realistic, we need to make an adjustment to the program. The gene has 256 bases, but 5 of those are for a 'tolerance value', so the protein it encodes is 39 amino acids long (keeping in mind the 'stop' codon does not translate into an amino acid). Since this is a short protein, but seems to have a function that is more complex than simply binding to ATP, let us use Keefe & Szostaks' figure for bits/residue, yielding .45 x 39 = 17.6 bits, or an Nf/N of 5 x 10^-6. What Schneider should have done is specify a set of functional sequences for his protein that had an Nf/N of about 5 x 10^-6. The program would have two parts, the first part would randomly mutate the 64 organisms, 1 base per generation, until one of the organisms hit upon the right combination. Only then would the selection phase begin. During phase one, there would not yet be any advantage for the non-functional protein, so 100 percent of the population would survive to mutate and replicate. I should point out that because of the random mutation criterion that Schneider uses in his program, the generation of the functional protein would be a random walk, with a probability of success of 64 x R!/R^R x Nf/N, which yields 5.8 x 10^-20. So chances are, it would take an average of about 10^20 generations before we could reasonably expect the evolving pseudogene to land in functional sequence space for that population. Just as an aside, if the organisms replicated once per minute (I'm being generous when we consider what is realistic), then we could expect it to take roughly 3 x 10^13 years before we could reasonably expect the gene to become functional. Oops, I forgot that the gene cannot be expected to mutate every generation. There is only a .5 chance that the single random mutation will occur within the protein encoding region, which makes things even worse. Of course, Schneider, to be realistic, could have had the population expand up to some reasonable level of equilibrium. This would make it harder, of course, for the trait to spread in the population, but the dramatic increase in fitness that Schneider postulates will guarantee that the trait becomes dominant.

2. In spite of the fact that each of the 16 randomly placed, but non-overlapping binding sites start off as random sequences, they seem to have an immediate function in the phenotype in that the less badly mistaken binding sites confer such a huge fitness advantage that they replace the 32 more badly mistaken ones in a single generation. In other words, no matter how badly mistaken the binding sites are, they can still be read and be bound to in varying degrees such as to confer a regulatory advantage, thereby dramatically increasing the fitness of the organism. Simply recognizing how mistaken binding sites are is not going to confer a selective advantage to the organism. The purpose, as Schneider indicates, is regulatory in nature, so right from the start, there must be some degree of binding, even if it is very low. In other words, the binding sites all start off with a certain degree of functionality. Not 100% of course, but not 0% either. This complicates the calculation of the amount of functional information that each binding site contains. It would begin with zero bits, even though it has a certain degree of functionality. Again, this is very odd. It is unusual to have any degree of functionality if the sequence actually contains zero bits of information. However, for Schneider's program to work, he must confer a certain degree of functionality upon the binding sites even though they contain zero information. Again, I see this as unrealistic. He should have had some minimum sequence constraints that a binding site would have to achieve (via random mutation) before it could begin to confer some selective advantage to the phenotype. The maximum amount of functional information that could be achieved at 100% correct by any one binding site is 12 bits, that would only be the case if there was only one specific functional sequence for the sequence recognizing protein

3. I have a problem with Schneider's fitness landscape. I can see the sudden achievement of a novel regulatory function as giving a significant fitness advantage to an organism, but this is not sudden. It is an incremental process but each increment confers such a huge selective advantage upon the 32 least mistaken organisms that they immediately replace the other 32 in just one generation, and they do this for each increment! In Schneider's model, we do not just see this as the organism gets close to the proper sequence for the binding sites, but we see this no matter how badly messed up the binding sites are. As long as half the population is ever so slightly more messed up, there is a huge selective advantage. This speaks of a fitness landscape that has a linear slope (the selective advantage is the same each time), the slope extends linearly all the way from 'completely mistaken' to 'perfect' (unlike the relative flat fitness landscape that surrounds functional sequence space in reality), and the slope is very steep, conferring an astonishing fitness advantage to the organism with every incremental move. To be realistic Schneider should have constructed a fitness landscape that was flat up to a certain sequence distance from the correct sequence, and then as the evolutionary trajectory entered a more realistic semi-functional sequence space, the fitness landscape would rapidly increase in downward slope toward perfectly functional sequence space.

4. I note that the location of the binding sites is non-overlapping, even at the start of evolution. To be realistic, this should not have been specified by the creator Schneider. Let the locations evolve. Of course, this would complicate things in that evolving binding sites might interfere with the result that you would probably wind up with less than 16 sites.

5. Nowhere does Schneider distinguish between information in the very broad sense in which Shannon uses it, and functional information. His probability calculation at the end is badly mistaken. The probability of obtaining 16 sites containing some degree of information is exactly 1, given that they are randomly programmed in from the start in some non-overlapping series of locations. The probability that each site will achieve 4 bits of functional information is .06. So the probability that you will obtain 16 sites each of which contains 4 bits of functional information is 1 x .06 = .06 (not 5 x 10^-20, as Schneider believes, forgetting that it was he that put the 16 sites there in the first place, by ID). Generating the same information 16 times does not equal 16 times the functional information. A 4-bit anomaly is well within the natural 'noise' that natural processes can generate, even if Schneider hadn't used the highly unrealistic parameters pointed out above. As well, keep in mind that every 1 bit increase in functional information requires an exponential decrease in the frequency Nf/N. In other words, the generation of functional information becomes exponentially harder as it is increased. It is nothing to generate 1 bit of functional information. It is 512 times harder to generate 10 bits of information and 6 x 10^29 times harder to generate 100 bits of information. Schneider should not assume that it is therefore possible to evolve functional information at the linear rate of 1 bit per 11 generations.

6. Finally, the function of the protein and binding sites is to regulate something, say the expression of some gene. If its function is to turn on a gene by binding to a sequence that normally encodes for a protein that turns it on, then the regulated gene must quietly sit there, having no function in the organism, until it can be turned on. Normally, we could expect such a non-functional gene to mutate into non-functional sequence space. Fortunately, Schneider builds functionality of the evolving regulatory system into the model right from the start. If the binding is meant to turn off a normally functioning gene, one wonders why the gene evolved in the first place, if turning it off confers such a huge fitness advantage. Of course, maybe sometime it is good to have it turned off and other times to have it turned off, but if that is the case, then something is going to have to regulate the expression of Schneider's protein and whatever it is going to have to fortuitously evolve into existence at the right time as well. The bottom line is that normally, it is not good enough just to evolve a novel protein if there is no function that has evolved simultaneously with it. Schneider just grants the existence of the function, just ready and waiting for the right protein to come along and activate it.

In summary, Schneider has used a fair amount of ID to generate only 4 bits of functional information. His scenario has serious flaws, if it is to represent the real world. The serious flaws aside, however, he has still utterly failed to show how large amounts of functional information can be generated. Furthermore, the scaffolding for the Roman Arch that he speaks about was put there by the creator, Schneider in this case, in the form of unrealistic parameters that guaranteed the outcome. However, in my view, one does not need a scaffold to produce 4 bits of information. As for Behe's bacterial flagellum, the functional information to produce the proteins and the regulatory system to bring together the right proteins into the right configuration would need an impressive scaffold. One which is so complex that I doubt that even Schneider himself has the intellectual capabilities to program, by ID, into his model. In general, every computer model of evolution I've seen to try to verify a materialistic model, suffers from either a Nf/N that is many orders of magnitude too loose and/or a complete failure to distinguish between the broad form of Shannon information, for which function is irrelevant, and the special case of functional information. It is the functional information in biology that needs explaining.

With the start of the new semester, I am finding it increasingly difficult to set aside time to give substantive responses to this discussion. With that in mind, I'll try to continue until Friday, which will be my last day to contribute.

IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 11. September 2002 18:52      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
Correction: shortly after I sent off my second post I realized there was a mistake in (5). The probability of finding 16 sites is 1, since Schneider has programmed the 16 non-overlapping sites into each organism. But the probability that each site has about 4 bits of information would be .06^16, which yields Schneider's figure (allowing for rounding off the .06 probability). Generating the same information 16 times does not increase the functional information unless there is an additional function served by the number of times the information is generated, which in this case, there is. He's right in showing that an event with a probability of 5 x 10^-20 occurred within about 10^3 generations given the parameters he built into the program. What he has not done, however, is show how such an event could occur under realistic natural conditions as I have pointed out in (1) to (4). The inclusion of a fully functional gene encoding protein with an information content of zero was an especially large cheat, although (2) through (4) are pretty bad as well. Given those cheats, the probability that he would get an average of 4 bits per site is close to 1. So the question is, is it reasonable to posit such a non-random selection system in nature? It all depends upon whether the (1) through (4) and (6) are reasonable to posit and all occur for the same organism. I don't think so.
IP: Logged
Art
Member
Member # 179

Icon 1 posted 11. September 2002 23:37      Profile for Art     Send New Private Message       Edit/Delete Post 
Hi Kirk,

I can commiserate with your struggles with long posts, so I'll try and make a few short points for you to think about.

1. My own opinion that the "true" informational content of functional polypeptides is going to converge on the values seen in random combinatorial studies is founded on some pretty intriguing experimental results and and one more general observation. The experimental results are those that show that the frequency of occurrence of functional polymers (polypeptide as well as RNA) is fairly independent of the size of the polymer - within a few orders of magnitude, an 80-mer is about as abundant as a 14-mer, a 90-mer as abundant as a 40-mer. It's pretty hard to explain these observations, unless one realizes that the true functional unit is probably very small.

This explains the general observation - that we can identify probable functions in otherwise unknown proteins by searching for very short motifs that are highly conserved in proteins of known functions. If large swaths of polypeptide were essential (this is the "high-information" scenario), the protein motif databases would look very, very different than they do.

This model also introduces a new way to think of proteins - not in an amino acid-by-amino acid manner, but in terms of small functional units. Information content becomes rather a different matter.

2. I think the rules of chemistry and physics are as much a constraint on things biological as they are on things like salt crystals and tornadoes. This means that the matter of contingency, as defined by Dembski, is as relevant to living things as it is to other examples. This dramatically affects the math that we see in these discussions, IMO.

3. I would suggest that the matter of unrelated functional sequences is much greater than one of finding one or two unrelated ones. Again, random combinatorial studies suggest that the scope of the "problem" is one of many, many orders of magnitude. This is why I am comfortable with the notion that the numbers obtained by Talyor et al. will in all likelihood converge on those obtained with other combinatorial studies.

IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 12. September 2002 09:29      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
Art, are you saying that Nf/N is about the same for proteins of different lenth, or that functional proteins of different length occur in roughly equal frequencies within organic life? If experimental results are indeed showing that Nf/N is fairly independent of the size of the polymer, within a few orders of magnitude, then it would follow that the functional information content of proteins would be relatively independent of the number of residues. This seems unlikely to me, given some highly conserved sequences, but I would certainly defer to hard data. Can you direct me to one or two papers that show these results? For my own work, I would be very interested in going over them.

Re. the rules of physics and chemistry constraining the outcome. Strictly speaking, I believe that true random events do not happen. In other words, nature is completely deterministic, even at the quantum level. So in a sense if organic life is a result of completely materialistic forces, then it is determined by the laws of nature. However, we know that the laws of nature don't always dictate just one possible outcome. Rather they dictate the environment within which the outcome takes place, but that environment permits a wide variety or outcomes, each of which is completely determined by a complex intereaction of numerous events. That is why we get such a wide variety of irregular shaped rocks. Similarily, I do not see anything in nature that dictates the sequence of proteins or genes. Rather, natural processes would more likely provide an environment for a wide variety of sequences. I don't see how this will affect the math that I have proposed. Equation (4) can be applied to outcomes that highly restricted by nature (e.g., a crystal lattice) and outcomes that nature permits a huge variety of (shapes of stones). When we see a functional anomaly, it could be a coincidence that the normally huge variety of outcomes has been constrained to produce just the right outcome to perform a positive function within a larger system. However, if the frequency of functional outcomes becomes too high such that it becomes more reasonable to suspect that someone is tampering with the system, then it is time to take a look at ID. The question is, do we have a scientific method to identify 'tampering with the system'. If we do not, then science is not in a position to assert, on scientific grounds, that no one is tampering with the system. My approach is an attempt to put forward a method for identifying such tampering, based on the falsifiable hypothesis that all functional anomalies that require more than 70 bits of information will be produced by ID (will have involved some tampering with the system).

IP: Logged
peroxisome
unregistered


Icon 1 posted 12. September 2002 13:22            Edit/Delete Post 
Hi Kirk
some easy questions for you, as I appreciate you are busy.
quote:
according to Taylor, the probability of obtaining a functional AroQ protein is still 1 chance in 10^24
Taylor et al at no point use the figure 1 in 10(24) or 1 in 5x10(23). Will you accept that they do not use this phrase in their paper ? YES/NO

quote:
a paper by Taylor et al., rules out random sequence libraries on the basis of the highly constrained specificity of active enzymes

...they simply say it will be impossible...

Both statements are literally untrue. Do you still argue that these quotes fairly represent the Taylor paper ? YES/NO

quote:
I can understand a committed materialist wanting to use the Nf/N supplied by Keefe & Szostak as indicative of 300 amino acid proteins, but in light of the fact that binding to ATP is a pretty simple function, requiring a pretty simple protein, I don't think the committed materialist is been intellectually honest.
In fact, there is no reliable data here that enables us to answer this question, and the plain fact is that no-one knows what Nf/N is for anything above Keefe & Szostak's figures.

You have a guess as to what Nf/N may be in big proteins; can I disagree with your guess without being intellectually dishonest ? YES/NO

quote:
If I were to use an unwarranted figure to support ID, the same would apply.
'nuff said.

per

IP: Logged


All times are East Coast
This topic is comprised of pages:  1  2  3  4  5 
 
Post New Topic  Post A Reply Close Topic    Move Topic    Delete Topic    Top Topic next oldest topic   next newest topic
 - Printer-friendly view of this topic
Hop To:

Contact Us | ISCID

All content © ISCID and content contributor 2001-2003

The ISCID Forums are aimed at generating insight into the nature of complex systems (e.g. biological complexity, organizational complexity, etc.) and the ontological status of purpose, especially from the vantage point of various information- and design-theoretic models.

Indexed by UBB Spider Hack  |  Powered by Infopop Corporation UBB.classicTM 6.3.1.1

PCID | Encyclopedia | Brainstorms | The Archive | News | Essay Contests | Chat Events | Membership