ISCID Forums


Post New Topic  Post A Reply
my profile | search | faq | forum home
  next oldest topic   next newest topic
» ISCID Forums   » General   » Brainstorms   » What Sort of Property is Specified Complexity? (Page 2)

 
This topic is comprised of pages:  1  2  3  4  5 
 
Author Topic: What Sort of Property is Specified Complexity?
warren_bergerson
Member
Member # 262

Icon 1 posted 03. September 2002 08:52      Profile for warren_bergerson   Email warren_bergerson   Send New Private Message       Edit/Delete Post 
Kirk,

I think in the discussion here it might be useful to distinguish between ‘point in time design complexity’, and ‘design process complexity’. If at a point in time, a phenomena can take N possible forms, if only Nf of these forms meet some requirement for functionality or adaptiveness, then we can say the ‘point in time complexity or design complexity’ is N/Nf.

There is, to my knowledge, no disagreement as to either the existence of ‘point in time design complexity’ or the general ability to quantify point in time design complexity. As far as I know, no scientist would seriously question that biological systems exhibit very high measurable levels of ‘point in time adaptive design complexity’ or ‘point in time functional design complexity’. To address a point raised on another thread, there is no doubt that biological systems contain examples of ‘non-optimal point in time adaptive designs’.

Point in time adaptive or functional design is a clearly observable, analyzable phenomena. By defining ‘possible’ and ‘adaptive’ or ‘functional’ there is no particular ‘theoretical’ difficulty in quantifying the complexity of a point in time adaptive or functional design. I don’t want to put words into the mouths of either Dembski or his opponents, but I find it difficult to believe that the ‘design debate’ involves a debate over ‘point in time adaptive or functional design’. I am also reluctant to accept that Dembski meant specified complexity as a measure of the complexity of ‘point in time functional design’.

To my knowledge, there is one and only one logical process or paradigm capable of producing or creating ‘complex point in time functional form or design’ from a ‘non-functional or non-adaptive form’. This is the Greek teleological causation paradigm involving 1)the introduction of diverse forms from the set of possible designs and 2)eliminating non-adaptive or non-functioning forms. This is the general paradigm used in Darwinian and neo-Darwinian theory, and it is the basic process associated with all forms of ID. (Adaptive design produced by an all knowing designer, are described or explained by this same paradigm. Such a designer, it is assumed, would only require one trial to select an adaptive form from a set of non-adaptive possibilities.)

In terms of the Greek paradigm, there are many different possibilities for getting from a non-adaptive or non-functional form to an adaptive or functional form. Design process complexity is, or can be defined as, the average number of trials needed to move from a non-adaptive form to an adaptive form using a specified design process. For phenomena with a point in time complexity of N/Nf, the design process complexity can vary from 1(trial) for an system involve ‘pre-knowledge’ to .5N/Nf for a random search process to ‘infinite design process complexity’ for search processes that are incapable of finding an adaptive form.

It was my impression, that the Dembski and Behe have been attempting to demonstrate that "there exist in biological systems phenomena whose design process complexity, calculated based on Darwinian ‘theory’, is so great as to be logically and mathematically impossible". I have therefore assumed that specific complexity and irreducible complexity were attempts to measure or estimate "the design process complexity of a random search process’. Such measures, IMO, demonstrate the logical/mathematical impossibility of the neo-Darwinian random search(random mutation) hypothesis or theory.

Specified complexity, as a measure or benchmark for random adaptive design processes, is IMO, also useful in evaluating the design process complexity of the various ‘directed design’ alternatives to Darwinian theory.

Quote: The question we wrestle with is what process can produce the specified complexity of a given protein.

Again, at least IMO, you are not formulating the question correctly. We know, in a broad sense both the ‘process’ and the ‘point in time complexity’. The substantive issues to be addressed are 1)identifying the specific design process involved, and 2)measuring ‘design process complexity’. Specified complexity, if it is to have any scientific or analytical value, IMO, needs to be defined or interpreted as ‘a measure of the random search design process complexity’.

IP: Logged
Frances
Member
Member # 169

Icon 1 posted 03. September 2002 12:01      Profile for Frances     Send New Private Message       Edit/Delete Post 
A quick comment. Warren talks about the neo-Darwinian random search but it is important to realize that neo-Darwinism is NOT a random search.
The mutations may be random wrt fitness but that does not make the search random. To see the difference, one has but to look at genetic algorithms.

IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 03. September 2002 14:09      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
The amount I have to respond to seems to be increasing exponentially. I will only be able to address a fraction of the comments raised. My apologies if I miss something that was of particular importance to someone.

Re. Francis' question on specificity: I gather from Dembski's paper on this topic that specified complexity represents a particular probability. I cannot speak for Bill, but I take specified complexity as equivalent to 'specificity' which I will now define more rigorously than my first post.

In 1948 Claude Shannon published his landmark paper, ‘The mathematical theory of communication’ (Shannon, 1948). Shannon’s expression for information in bits per symbol, H, which has also become known as Shannon entropy, is (the formatting is messed up on these equations, especially the subscripts and superscripts, but I think you can follow):

H = -K <sum>i pi log2 pi (1)

where K is an arbitrary, positive constant and pi is the probability if the ith symbol or configuration. For the purpose of this discussion I shall set K = 1 and assume that the probability of each of the N possible symbols or configurations is about the same (all pi = 1/N). This may not always be the case, but for our purposes, the variation will not be significant. Given this assumption, H reduces to

H = log2 N. (2)

Shannon's equation is a useful tool to objectively quantify the amount of information in a symbol or configuration. However, information, as defined by Shannon's equation is used in the broadest sense of the word and does not concern itself with whether or not the information is meaningful or functional. Because of the first condition stated earlier, we are interested in the amount of functional information. In it’s broadest sense, function can be defined as the ability to produce an effect that is useful within a larger system. For example, the apoptosis inducing factor (AIF) gene plays an essential part in the first step of programmed cell death necessary for embryonic morphogenesis and cavitation (Joza et al., 2001). This is its function within the larger system of the cell. In order to be functional, the AIF gene must contain not just information in the broadest sense, but functional information.
To quantify the amount of functional information IF required for a configuration, the difference must be taken between the Shannon information H of the unconstrained physical system and the Shannon information HF of a physical system that is constrained to produce the functional under investigation. This can be represented by

IF = H – HF (3)

which, using Eq. (2), yields

IF = log2 N – log2 NF

= -log2 (NF/N) (4)

where NF represents the number of functional configurations or symbols and the ratio NF/N is the specificity, which can be defined as the probability of obtaining a functional configuration in a single recombination. It is this probability that I am assuming to be equivalent to Dembski's use of 'specified complexity' (although I could be wrong and stand to be corrected).

For proteins, N = 20^R, (assuming only 20 amino acids) where R= # of sites. But, as Peroxisome points out, how can we calculate Nf? Nf can either be approximated experimentally, or computationally. It will only be approximated, since the size of sequence space is such that even if the entire universe were a quantum computer, it could only have performed 10^120 operations since time zero. Some work has been done on simple proteins. The most recent one that I am aware of is by Taylor, S.V., Walter, K.U., Kast, P., Hilvert, D. (2001). Searching sequence space for protein catalysts. PNAS 98, no. 19, 10596-10601. There they found that to obtain a moderately active enzyme of 95 amino acids, one would require a library of about 5 x 10^23 members. The inverse of this is 2 x 10^-24, which is the probability of obtaining a moderately active enzyme of 95 amino acids in a single recombination which, as has already been shown, is the specificity Nf/N of this particular enzyme. Given that N=20^95, we can solve for Nf to find that Nf = 8 x 10^99. What that means is that there are about 8x10^99 different sequences of 95 residues that will have the function of that particular enzyme. So at first glance, it might appear that the "sequence area which can produce a functional protein is not trivial," to use the words of Peroxisome. However, 8 x 10^99 is miniscule in comparison to 20^95 (4 x 10^123). One must not be fooled by size of functional sequence space for a given protein. That is why specified complexity (or 'specificity' as I'm defining it) is the key datum.

If we insert the specificity of Taylor et al. enzyme into equation 4, we can calculate the amount of functional information in this enzyme, which works out to be 79 bits. I highly recommend reading Taylor's paper to get a feel for how difficult it is to generate an active enzyme. In fact, they specifically conclude that ID (human, of course) will be likely necessary to generate novel enzymes in the lab.

Now, how do we approximate the Nf/N of proteins in general. I suggest we start with what data we have. Using Taylor et al., their moderately active enzyme contains about .83 bits per residue. This is well below the maximum info content of an amino acid, which is 4.3 bits. Further more, sequence constraints are likely to increase as structural demands are increased (Axe, D.D. (2000). Extreme functional sensitivity to conservative amino acid changes on enzyme exteriors. J. Mol. Biol., 301, 585-595. doi: 10.1006/jmbi.2000.3997. With this in mind, .83 bits per residue can be used as a very conservative estimate when calculating the information content and specificity of larger proteins. Using this approach, the average 300 residue protein would contain 249 bits of information and have a specificity Nf/N of about 10^-75. I want to remind the reader, that the empirical data we have suggests that this is very conservative for larger proteins.

Art states, "I'd also add that the repeated references to work such as that by Blanco et al. are a bit confusing, since no one (that I am aware of, at least) has ever proposed that all extant proteins arose by progressive, step-by-step means from a single hypothetical ancestor." That would make things nice and easy. If functional proteins have multiple origins, the probabilities become orders of magnitude smaller. However, let us set that aside. How often have we read accounts that start with the duplication of a gene. One gene remains unchanged to maintain the home fires so to speak, while the duplicate if free to launch off on some evolutionary pathway. Eventually, so the story goes, a completely novel protein appears (by 'novel' I mean a completely novel topology that, by coincidence performs a function that the cell just to happens to be ready and waiting for.

Here is were Blanco et al. work comes into play. If the evolutionary trajectory stays within the folding sequence space of the original protein, it is quite possible, indeed, an empirical fact, that a novel function can sometimes be obtained. But once the evolutionary trajectory departs the folding sequence space for that particular complex, 3-D topology, the resulting protein either cannot fold fast enough before it is degraded by the Golgi body, or it produces a fold which is unstable. In either case it is no longer functional and therefore not expressed in the phenotype and, hence natural selection can no longer guide the trajectory. This is a serious problem because the probability of obtaining a functional protein via a random walk is P = R!/R^R (NF/N), where R= # residues, which is orders of magnitude smaller. Furthermore, due to the specified complexity of functional proteins, the regions of functional protein sequence space are so miniscule that we can never expect even one average protein to come to pass in the entire history of the universe.

Finally, a few words about Warren's 'point in time design complexity.' First, I want to point out that the equation for specificity (which I am taking to be equivalent to specified complexity) has no time variable. It is simply an objective factor that emerges out of Shannon's equation for functional information. Nf/N appears to be what you refer to as 'the design process complexity of a random search process.' That being said, however, there is something to point out. Nf is set by physical laws if folding sequence space and functional sequence space are approximately congruent (functional sequence space may be slightly smaller than folding sequence space for a given protein, but the most recent paper I read on detecting protein folding leads me to believe that they are essentially the same). Under natural selection, however, N can be reduced. In theory, natural selection can cut off vast regions of sequence space, significantly increasing the probability of achieving a completely novel fold. However, natural selection can only work within folding sequence space for a given protein, which is only a miniscule fraction of non-folding sequence space. So, natural selection has negligible effect in reducing the size of N. Darwinian evolution of a protein can proceed freely within folding sequence space, but Darwinian evolution will not be possible outside of folding sequence space. Outside, evolution becomes a random walk, which, due to the specificity of protein folds, will never achieve a completely novel topology.

A word about models. Darwinian evolution has utterly failed to produce a model for the origin of the minimal genome, or the origin of completely novel topologies. Example, how did the minimal genome get generated? In general, it has failed to provide an explanation for organic life. In fact, Darwinian theory has introduced a huge problem for science; the origin of the first life form and the generation of completely novel protein, not to mention the origin of the almost inconceivably complex regulatory system in eukaryotes. A good scientific theory should solve problems, not create them. Everything else that nature does can be described relatively simply in the form of equations that seldom go for more than a few lines. Organic life is just the opposite. Instead of starting with first principles and being able to derive organic life, we find ourselves undertaking a massive reverse engineering project. In a reverse engineering project you don't start with a model of how the other civilization did it. Rather, you start with the finished product and try to eventually come up with a model as to how it was built. One thing we do know is that highly specified complexity is a mark of intelligent design. We also know that nature is not capable of high degrees of specified complexity. There simply is not enough time and matter in the universe. So when faced with the empirical fact that ID can produce artifacts, sequences and configurations which have extremely high specificity, and mounting empirical evidence that nature cannot, good science says to go where the empirical data leads you, in this case, ID.

IP: Logged
charlie d.
Member
Member # 159

Icon 1 posted 03. September 2002 15:32      Profile for charlie d.     Send New Private Message       Edit/Delete Post 
quote:
If the evolutionary trajectory stays within the folding sequence space of the original protein, it is quite possible, indeed, an empirical fact, that a novel function can sometimes be obtained. But once the evolutionary trajectory departs the folding sequence space for that particular complex, 3-D topology, the resulting protein either cannot fold fast enough before it is degraded by the Golgi body, or it produces a fold which is unstable. In either case it is no longer functional and therefore not expressed in the phenotype and, hence natural selection can no longer guide the trajectory. This is a serious problem because the probability of obtaining a functional protein via a random walk is P = R!/R^R (NF/N), where R= # residues, which is orders of magnitude smaller. Furthermore, due to the specified complexity of functional proteins, the regions of functional protein sequence space are so miniscule that we can never expect even one average protein to come to pass in the entire history of the universe.

Kirk, please read the review I quoted. It gives several examples of proteins for which both homology and structural transitions between folds are very solidly supported by data. One case, for instance, is that of bacterial luciferase and the nonfluorescent flavoprotein (NFP)1 encoded by the luxF gene of Photobacterium - the latter having lost a large part of the the luciferase family's TIM-like beta/alpha (ba)8–barrel, replacing it with a (ba)5b fold.

I understand that theoretical considerations make any successful protein fold transition seem practically impossible, but in the end, if the evidence suggests that this is not the case, it is likely the theoretical considerations that are faulty, and not the empirical evidence.

Indeed, the thrust of Grishin's review is exactly the opposite: he thinks that evolutionary changes in fold structure (although usually much less pronounced than those we are dealing here) are sufficiently common to invalidate classification of proteins solely based on structural properties, and he suggests phylogenetic considerations should take precedence over structural similarities. He concludes:
quote:
In summary, analysis of available protein spatial structures revealed that there is no strict correlation between homology and fold similarity. Homologous proteins can have different folds, and mechanisms such as insertions/deletions/substitutions, circular permutations, strand invasions/withdrawals, and hairpin flips/swaps emerge as leading causes for globally different protein structures within homologous families.

IP: Logged
peroxisome
unregistered


Icon 4 posted 03. September 2002 21:49            Edit/Delete Post 
quote:
PNAS 98, no. 19, 10596-10601. There they found that to obtain a moderately active enzyme of 95 amino acids, one would require a library of about 5 x 10^23 members.
That is not their data, and not what they said. You are referring to an "estimate" they made, and omitting the explicit and implicit caveats.

In fact, their data is that they found complementing mutants at ~1:10(4), and screened less than 2x 10(8) mutants.

they also suggest that their data supports the idea that modern enzymes evolved from simple precursors !

They also cite a true search of random sequence space (Keefe, A. D. & Szostak, J. W. (2001) Nature (London) 410, 715–718.), which found 4 ATP binding proteins per 6x10(12) proteins; very different from your estimate.

quote:
In fact, they specifically conclude that ID (human, of course) will be likely necessary to generate novel enzymes in the lab.
it is interesting to read the last paragraph of this paper, and contrast it with Kirk's version- there appears to be a discrepancy or two.
quote:
a paper by Taylor et al., rules out random sequence libraries on the basis of the highly constrained specificity of active enzymes.
This is a flat out untruth.

yours
per

IP: Logged
Art
Member
Member # 179

Icon 1 posted 03. September 2002 22:51      Profile for Art     Send New Private Message       Edit/Delete Post 
quote:
Art states, "I'd also add that the repeated references to work such as that by Blanco et al. are a bit confusing, since no one (that I am aware of, at least) has ever proposed that all extant proteins arose by progressive, step-by-step means from a single hypothetical ancestor." That would make things nice and easy. If functional proteins have multiple origins, the probabilities become orders of magnitude smaller. However, let us set that aside. How often have we read accounts that start with the duplication of a gene. One gene remains unchanged to maintain the home fires so to speak, while the duplicate if free to launch off on some evolutionary pathway. Eventually, so the story goes, a completely novel protein appears (by 'novel' I mean a completely novel topology that, by coincidence performs a function that the cell just to happens to be ready and waiting for.

Here is were Blanco et al. work comes into play. If the evolutionary trajectory stays within the folding sequence space of the original protein, it is quite possible, indeed, an empirical fact, that a novel function can sometimes be obtained. But once the evolutionary trajectory departs the folding sequence space for that particular complex, 3-D topology, the resulting protein either cannot fold fast enough before it is degraded by the Golgi body, or it produces a fold which is unstable. In either case it is no longer functional and therefore not expressed in the phenotype and, hence natural selection can no longer guide the trajectory. This is a serious problem because the probability of obtaining a functional protein via a random walk is P = R!/R^R (NF/N), where R= # residues, which is orders of magnitude smaller. Furthermore, due to the specified complexity of functional proteins, the regions of functional protein sequence space are so miniscule that we can never expect even one average protein to come to pass in the entire history of the universe.

Thanks, Kirk, for the comments. Please consider the following:

1. I don’t think it wise to “set aside” the matter of multiple origins of proteins just to give this particular manifestation of anti-evolutionary thought (as I allude to before, I don’t think these ideas can be seriously construed as part of any ID theory) some credence. The reference to duplicated genes, while interesting in its own right, simply does not apply to the issue here. The products of duplicated genes are, by and large, going to diverge in modest enough ways structurally, but very possibly in ways that have disproportionate evolutionary impact. This has nothing to do with protein evolution per se. (Of course, as charlie d. notes, there are more ways than the route examined by Blanco et al. for deriving new folds from extant structures. Which makes your “random walk” suggestion somewhat irrelevant. Again, the demonstration of random elongation mutagenesis is most illuminating in this regard.)

2. Let’s try and put the number of 10^23 in some perspective (allowing, if you may, a bit of rounding to make the task of typing a little bit easier). If one needs to sort thru 10^23 polypeptides, on average, to find one of reasonable activity, it follows that a collection of 10^25 is almost certain to find such a functionality. Moreover, if this number is representative of any functionaility, then a collection of 10^25 molecules will contain virtually all conceivable functionalities one can imagine. Just how big is this number? Well, it’s the number of polypeptides in a 100 nM solution that is in a cube of about 100 meters on a side. That’s a pretty modest size for a hypothetical “prebiotic soup”, and sounds quite within any physical limitations one might impose on such models.

It’s also the number of bacteria in a dilute (in terms of cfu’s) body of water 1000 miles by 1000 miles by 1 mile in depth. Which means that, if each cell, in the course of a days growth (say, 6 generations) generates one novel genomic rearrangement (not that unlikely, IMO), then we generate all possible functionalities each and every day in the biosphere. This does not sound like a paradigm of improbability to me.

Now, I am sure one can try and change parameters to move towards a lower probability than these back-of-the-napkin estimates indicate. But I am very skeptical that we can come remotely close to Dembski’s “line in the sand”. Basically, the number that was pulled from the paper by Taylor et al. indicate that, contrary to the ID party line, functional proteins are not complex (in the Dembskian sense), but rather are devoid of CSI.

3. From the perspective of ID and ev/cre, the thing that strikes me about the Taylor et al. paper is the fact that it revises, downward and in very dramatic ways, the informational estimates for proteins that were put forth by Yockey in the 1970’s. And the work by Taylor et al. remains an upper bound on the information content. These recent authors are, as Yockey did years ago, “stuck with” only one of an untold number of possible different folds that might possess a similar function. Until we can estimate the true magnitude of the numbers of different, unrelated sequences that can satisfy a specification, we can only take results such as this as an upper bound. The lower bound may be much, much smaller (perhaps approaching the limit observed in combinatorial studies involving small peptides). (I think this thought has been conveyed by others in this thread.)

IP: Logged
Frances
Member
Member # 169

Icon 1 posted 04. September 2002 00:02      Profile for Frances     Send New Private Message       Edit/Delete Post 
The term specificity as used here does not seem to be comparable to Dembski's definition of specified complexity. In Dembski's case one has to show that the probability for a particular protein to arise through all chance hypotheses is smaller than a certain limit. In this case you have shown that the probability for a protein to arise in one swoop is smaller than this limit but this is not representative of proposed evolutionary pathways for proteins.

For instance

From: Novel folded protein domains generated
by combinatorial shuffling of
polypeptide segments, Lutz Riechmann* and Greg Winter*, 10068–10073 PNAS August 29, 2000 vol. 97 no. 18.

quote:

There is considerable evidence that proteins may have evolved by the assembly of nonhomologous genes; thus, in multidomain proteins, contiguous domains often have different architectures homologous to those from other proteins (1). Individual protein domains also may have evolved in the same manner, by assembly andyor exchange of small gene segments (2), leading to diversification of the domain architecture and even the generation of entirely new folds (3).

Evolution of a Protein Fold
in Vitro Matthew H. J. Cordes,1 Nathan P. Walsh,1 C. James McKnight,2 Robert T. Sauer1* www.sciencemag.org SCIENCE VOL 284 9 APRIL 1999

quote:

A “switch” mutant of the Arc repressor homodimer was constructed by interchanging the sequence positions of a hydrophobic core residue, leucine 12, and an adjacent surface polar residue, asparagine 11, in each strand of an intersubunit b sheet. The mutant protein adopts a fold in which each b strand is replaced by a right-handed helix and side chains in this region undergo significant repacking. The observed structural changes allow the protein to maintain solvent exposure of polar side chains and optimal burial of hydrophobic side chains. These results suggest that new protein folds can evolve from existing folds without drastic or large-scale mutagenesis.

Recombinatoric exploration of novel folded
structures: A heteropolymer-based model
of protein evolutionary landscapes
Yan Cui*†, Wing Hung Wong*†‡, Erich Bornberg-Bauer§, and Hue Sun Chan¶

quote:

Point mutations lead to diffusive walks on the evolutionary landscape, whereas crossovers can ‘‘tunnel’’ through barriers of diminished fitness. The degree to which crossovers allow for more efficient sequence and structural exploration depends on the relative rates of point mutations versus that of crossovers and the dispersion in fitness that characterizes the ruggedness of the evolutionary landscape.


IP: Logged
Grape Ape
Member
Member # 399

Icon 7 posted 04. September 2002 12:27      Profile for Grape Ape     Send New Private Message       Edit/Delete Post 
Hello all.

I thought I would add a couple of things that are relevant to the topic of protein folding, function, nd evolution.

The first is the functionality of disordered regions. Many of the preceeding posts here have relied on the idea that a polypeptide must be compactly folded in order to obtain function or prevent aggregation/degredation. But while this is a general tendancy, it is not a necessity -- lots of exceptions exist. Here is a recent paper discussing intrinsically disordered proteins from an evolutionary perspective:

J Mol Evol 2002 Jul;55(1):104-10

Evolutionary rate heterogeneity in proteins with long disordered regions.

Brown CJ, Takayama S, Campen AM, Vise P, Marshall TW, Oldfield CJ, Williams CJ, Keith Dunker A.
quote:
The dominant view in protein science is that a three-dimensional (3-D) structure is a prerequisite for protein function. In contrast to this dominant view, there are many counterexample proteins that fail to fold into a 3-D structure, or that have local regions that fail to fold, and yet carry out function. Protein without fixed 3-D structure is called intrinsically disordered. Motivated by anecdotal accounts of higher rates of sequence evolution in disordered protein than in ordered protein we are exploring the molecular evolution of disordered proteins. To test whether disordered protein evolves more rapidly than ordered protein, pairwise genetic distances were compared between the ordered and the disordered regions of 26 protein families having at least one member with a structurally characterized region of disorder of 30 or more consecutive residues. For five families, there were no significant differences in pairwise genetic distances between ordered and disordered sequences. The disordered region evolved significantly more rapidly than the ordered region for 19 of the 26 families. The functions of these disordered regions are diverse, including binding sites for protein, DNA, or RNA and also including flexible linkers. The functions of some of these regions are unknown. The disordered regions evolved significantly more slowly than the ordered regions for the two remaining families. The functions of these more slowly evolving disordered regions include sites for DNA binding. More work is needed to understand the underlying causes of the variability in the evolutionary rates of intrinsically ordered and disordered protein.
A second issue is that of peridicity in determining secondary and tertiary structure. Various experiments alluded to here in which de novo foldable sequences are only rarely found use completely random amino acid sequences. However, due to the propensity for DNA to form repetitive elements, this scenario is relatively unlikely in nature, at least once you have an already living organism. A recent study confirmed that periodicity greatly increases the likelihood of producing an ordered protein:

J Mol Biol 2002 Jul 19;320(4):833-40

On the role of periodism in the origin of proteins.

Shiba K, Takahashi Y, Noda T.
quote:
Two different views have been proposed for origins of genes (or proteins). One is that primordial genes evolved from random sequences. This view underlies the concept of modern in vitro evolution experiments that functional molecules (even proteins) evolved from random sequence-libraries. On the contrary, the second view reminds that "random sequences" would be an unusual state in which to find RNA or DNA, because it is their inherent nature to yield periodic structures during the course of semi-conservative replication. In this second view, the periodicity of DNA (or RNA) is responsible for emergence of primordial genes. Although recent reports on the variety of periodicities present in proteins, genes and genomes are consistent with the second view, it has yet to be experimentally tested. We assessed the significance of periodicities of DNA in the origin of genes by constructing such periodic DNAs. The results showed that periodic DNA produced ordered proteins at very high rates, which is in contrast to the fact that proteins with random sequences lack secondary structures. We concluded that periodicity played a pivotal role in the origin of many genes. The observation should pave the way for new experimental evolution systems for proteins.
These are just two phenomena (the ability of disordered regions to produce function and the ability of periodism to create ordered regions) that greatly increase the probability of generating functional proteins via natural processes. Any legitimate estimate of the probability of de novo protein function arising in nature would have to take these (and many more) factors into consideration.
IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 04. September 2002 15:40      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
Resp. to Charlie: I temporarily do not have access to online journal articles for a few days, so I was not able to read the entire paper you cite, Charlie, but I did read the abstract. Here are some comments to what it appears that Grishin is saying.

First, I want to remind everyone that it is currently my thinking that if a new structure, modified structure, or new function can be obtained with less than 70 bits of additional information (corresponding to a tightening of the specificity tolerances of about 10^-22), then the new structure, modified structure, or novel function is within the reach of natural processes. The specificity of NFP1 is likely to be far too tight to be constructed from scratch by natural processes, but given the existence of the parent gene that codes for luciferase, and given that the information required to make the change appears to be well below 70 bits, there should be no problem, so far as I can see, in making the transition without the aid of ID. Keep in mind here that the existence of the Part A information already extant in the luciferase gene, and the existence of the system within a cell which makes DNA shuffling, cutting, pasting, etc., possible is taken as a given. The total information required to produce NFP1 via the route that Grishin describes must include the Part A information, which puts it well above 70 bits and, thus, would still require ID (to provide the Part A information). To put it in terms of specified complexity, I'm saying that the specified complexity of the Part A system is too high to be generated by non-ID processes, but given the Part A system, the additional specified complexity required to generate NFP1 is minimal and well within the reach of natural processes. I prefer to work in terms of functional information, but from equation (4) of my earlier post, the relationship between specified complexity, which I am defining as Nf/N, can be clearly seen. I have seen numerous similar cases.

The question is, can the origin of all proteins, and genetic regulatory systems be accounted for by the incremental addition of information, so long as it remains below 70 bits? The two extremes are:

1. none of the proteins, etc. can be produced (extreme ID position)
2. all of the proteins, etc. can be produced (extreme Darwinian/naturalist position)

As you can see above, I'm somewhere in the middle. I see two problems with taking the second extreme position noted above. The first problem is that the increase in specified complexity (or functional information) is cumulative, as Leon Brillouin has pointed out. What that means is that in an undirected process, the probabilities of each of the intermediate steps must be multiplied. Once the resulting probability becomes too small, then it is unlikely that such a fortuitous series of events will occur and one must come up with another, more plausible explanation. The ready response to this problem is to argue that the incremental ratcheting of functional information is not an undirected process, but directed through natural selection. There are two problems with this. The first problem is that natural selection can only begin once we have a minimal life form, which requires about 300 different proteins or, most certainly, 150 different proteins. The information required for this is roughly 42,000 bits (specified complexity of 10^-12,600). Keep in mind that the upper limit for the universe to date is a specified complexity of 10^-120 (given that the total # of operations that could have occurred in the universe, at quantum rates, is about 10^120). What the naturalist must do is to supply empirical evidence for some sort of mechanism that will direct the production of a minimal organism. Note that it must be directed, as random assembly is out of the question given the 10^120 operations limit of the universe (see Lloyd, S. (2002). Computational capacity of the universe. Phys. Rev. Lett. 88, no. 23, 237901-1 to 237901-4. Doi: 10.1103/PhysRevLett.88.237901).

Once we have a minimal organism then natural selection can begin operating, in theory directing the production of novel proteins and enhancing the regulatory systems. There is a second problem, however. The fine tuning of a protein within its functional sequence space is easily understood within a fitness landscape. An analogy would be a ball on a pool table. If the table is level, the ball won't move in any particular direction. But if a slope is introduced to the table, the ball will run downhill until it finds the lowest position in the pool table. A cell is a bit more complicated. The proteins and regulatory system are highly interactive. To produce a novel organism with, say, a thousand proteins requires that the fitness landscape be exceedingly complex such that all the individual evolutionary trajectories not only find their each respective 'lowest' (fittest) point in the fitness landscape, but those points must be such that the interaction between the different proteins, regulatory systems, etc. must be viable. In other words, we are not talking about a fitness landscape that has one simple slope, but one composes of many hills and valleys. Back to the pool table analogy, it would be similar to the problem of dumping a bunch of balls on the table with the objective of getting them to form a 'happy face'. Simply sloping the pool table won't do it, all the balls will run to the same location. Instead, we would need a pool table with a pattern of dips and hills that formed the shape of the happy face so that the balls would more likely settle out in that shape. A more well know analogy is Dawkins famous 'METHINKS IT IS LIKE A WEASEL' program. The phrase was included in the code, forming the fitness landscape, against which the randomly generated letters could be compared. Those letters that settled out right were retained and those that weren't continued to change until they settled out right as well. The problem with Dawkins' program, the 'happy face' pool table, and a fitness landscape that was sufficiently complex to produce a chicken, is that all have a specified complexity that far exceeds what nature is capable of producing. In other words, in attributing to natural selection the ability to guide the trajectories of evolving proteins and regulatory systems to account for their extremely high specified complexity (or functional information), we have merely shuffled the problem from one area to another. Now we must account for the marvelous specified complexity of the earth's fitness landscape to account for the diversity of organic life. I'm not saying natural selection cannot do it. What I am saying is that if natural selection did do it, then the configuration of the fitness landscape would have required ID and we're back to were we started.

Resp. to Peroxisome:

Peroxisome wrote, " That is not their data, and not what they said. You are referring to an "estimate" they made, and omitting the explicit and implicit caveats.
In fact, their data is that they found complementing mutants at ~1:10(4), and screened less than 2x 10(8) mutants."

Kirk: Of course it was an estimate they made, based on their data. It should have been quite clear from the calculations that I made immediately following that it would have to be an estimate. There simply is not enough time in the universe to sift through all the possible sequences. The fact remains that they conclude that a library of 5 x 10^23 members would be required for reasons which they cite (see page 10,600).

Regarding your statement that 'their data supports the idea that modern enzymes evolved from simple precursors', I'm afraid that you are misrepresenting their work. What they conclude is

" Our estimate of the low frequency of protein catalysts in sequence space indicates that it will not be possible to isolate enzymes from unbiased random libraries in a single step. The required library sizes far exceed what is currently accessible by experiment, even with in vitro methods (31, 35). Instead, as in natural evolution, the design of new enzymes will require incremental strategies in which, for instance, a suitable scaffold is first generated, binding and catalytic groups are subsequently added, and the ensemble is optimized in an iterative fashion. Our two-stage approach to binary-patterned mutases and work on the redesign of existing enzymes (36–38) demonstrate the power
of stepwise and modular procedures for directing the course of evolution. By iteratively combining combinatorial mutagenesis and selection with intelligent design, it may also prove possible to create novel protein scaffolds, unknown in nature, and to endow them with tailored catalytic activities."

This is the concluding paragraph in the paper. Note the following:

1. In lab, they conclude that the generation of novel protein scaffolds may be possible by iteratively combining combinatorial mutagenesis and selection with intelligent design (alluding to the process they used).

2. In nature the 'design' (bad word) of new enzymes will require incremental 'strategies' that will be the same (note 'as in natural evolution'). Of course, as they point out, intelligent design is necessary in the lab, but they must simply invoke 'strategies' for nature to do the same thing. They do not say that this provides evidence that nature can actually do this, as Peroxisome wants to believe. Rather, since the obtaining of enzymes from random libraries would be impossible (which Peroxisome seems to think is a 'flat out untruth' … read the opening paragraphs and the first line in the above paragraph, Peroxisome), which necessitated their approach used in the lab, they thereby conclude that evolution would have had to do the same thing (since they are philosophically committed to believing the extreme position that nature is capable of producing all the proteins. Note the contradiction in their position; in the lab, ID is required, in nature, the same process would account for it but without ID (to be fair, they don't say this, and I know that there are a lot of closet ID molecular biologists out there, but that is what I infer from what they say). Nowhere do they provide evidence that nature can actually do what they did in the lab; they simply assume it. Assuming something does not provide evidence that the assumption is true. That is circular reasoning.

Now to your reference of Keefe & Szostak. I am have this paper and it actually supports my point that I made when I stated that " sequence constraints are likely to increase as structural demands are increased (Axe, D.D. (2000)." I considered including a discussion of this paper but for brevity left it out. But since Peroxisome fails to see the significance of Keefe and Szostaks' paper, here goes.

What Keefe and Szostak found was that generating non-catalytic proteins of 80 amino acids from a random-sequence library and testing for the specified function of being able to bind to ATP showed that the specificity NF/N of a functional protein of 80 sites was roughly 1 in 1011 (Keefe & Szostak, 2001). Using Eq. (4), this yields a functional information content of about 36 bits, or about .45 bits/residue. Here we have a simple protein with an extremely simple function, binding to ATP. In the work of Taylor et al., which was for a slightly more complex enzyme, the functional information required per residue was up to .83 bits. These results verify Axe's contention that sequence constraints increase with increased structural demands. My conclusion, which appears to be entirely warranted, is that the use of .83 bits/residue as an estimate to calculate the functional information content of larger proteins. Keefe & Szostak's work confirms that.

Resp. to Art: Art states, " The products of duplicated genes are, by and large, going to diverge in modest enough ways structurally, but very possibly in ways that have disproportionate evolutionary impact." I agree. Duplicated genes will account for paralogues, for which I see no need of ID at present.

Regarding your calculations: Given your system in isolation, it doesn't seem like a large problem, but my upper limit of 70 bits is after taking practical considerations into account. In reality we don't have a library of 95-residue, ready-made proteins to draw from. If such a pool were to occur in nature, the proteins would be constructed in a step-wise fashion via a random walk. This would do two things. First, there would be repetition (in other words, there would not be 10^23 different sequences to draw from). Second, because of the first factor, the resulting probability of finding an active enzyme would decrease to R!/R^R x 10^-24 = 95!/95^95 x 10^-24 = 10^-64. That is part of my reasoning behind adopting an upper limit of 70 bits. I have a paper currently being reviewed by well-known biology journal. Prior to sending it off, I had the paper informally peer reviewed by a several molecular biologists, including one who reviews for the journal I sent the paper to. He asked me for some clarification as to why I chose 70 bits and here is his response to my clarification, "The number of bacteria on earth
is estimated to be about 10^30. Estimating that they replicate on average once per
year, and that their mutation rate os about 10^-10, then i wouldn't expect an event
whose probability is less than about one in 10^30 to have occurred in the lifetime
of the earth. So I guess I would place the limit at about 60-70 bits, roughly where
you did. But I think this is too high also, because it just takes a system in
isolation and doesn't consider how it would interact with other systems." Regarding the 'too high' comment, both of us agree that 70 bits is probably quite generous.

I would not agree with you that what Taylor et al. showed is that "functional proteins are not complex (in the Dembskian sense)" I can't speak for Bill, but I infer that when he uses 'specified complexity' he means to refer to a complex system that is specified beyond the upper limit to which nature might possible achieve. I put that limit at 70 bits. The absolute upper limit, including quantum events is about 400 bits, given 10^120 operations in the universe. Rather than show that functional proteins are not complex, Taylor et al. come up with an estimate based on empirical results for the degree of complexity of their particular enzyme. They do not show that nature could actually do what they did in the lab, as I've discussed in my response to Peroxisome.

Regarding revising downward Yockey's estimate for cytochrome c, I don't think there is warrant for doing this, mainly because they were not working with the same protein. Yes, if we use .83 bits/residue, we would get a significant decrease, but keep in mind that .83 is a conservative estimate. What we would need to do is to compare the structural constraints of the two proteins before we would be justified in maintaining the conservative estimate. Furthermore, I had another paper by Sauer that confirmed Yockey's estimate to within one order of magnitude. Be that as it may, I'm saying that 70 bits is the cutoff, Taylor's protein requires more than 70 bits, therefore it is not just going to need ID in the lab, it is going to need ID in nature as well, even if we do downgrade Cytochrome c.

Regarding your point, " Until we can estimate the true magnitude of the numbers of different, unrelated sequences that can satisfy a specification, we can only take results such as this as an upper bound," I should point out that estimates provided by Keefe & Szostak, and by Taylor et al. do not assume that only one sequence will do the job (1/N). Rather they provide estimates of Nf/N. There estimates result from a sampling of sequence space, albeit small. As a related aside, if you carefully go over Blanco's paper and another paper by Hagihara & Kim (Hagihara, Y. & Kim, P.S. (2002). Toward development of a screen to identify randomly encoded, foldable sequences. PNAS, 99, no. 10, 6619-6624. doi:10.1073/pnas.102172099) you will notice that folding sequence space for those proteins appears to be very close to the parent protein. In other words, we have additional data that indicates that folding sequence space for a given proteins is highly constrained.

Resp. to Frances: With regard to specificity, I've chosen 70 bits as the upper limit partly because, as you point out, proteins usually don't arise in nature via a single recombination from scratch where there are no duplicates in the hypothetical library. In reality, proteins are assembled in a step wise fashion either amino acid by amino acid, or by larger modules. In either case, the assembly becomes a random walk. What is not normally realized is that a step-wise assembly is orders of magnitude less probable than a one-swoop, single recombination. Where the probabilities become large is in cases such as you go on to discuss, provided the sequence of the novel protein, just so happens to be composed of smaller sequences already contained in existing proteins (the Part A information is high enough to produce the protein with minimal extra information required). As I discussed at the outset of this particular post, I have no problem with this. The big question is whether all the Part A information can be generated through natural processes. If we take the minimal genome as a very conservative starting point and theorize that all the other lifeforms could be generated from this minimal genome with the addition of no more than 70 bits per protein/regulatory system, then we still need ID. Note that it is an empirical fact that ID can produce much more than 70-bit configurations, although the generation of even a minimal life form may be beyond our intellectual capabilities without the aid of reverse engineering an actual example.

Resp. to Grape Ape:

What the existence of disordered proteins does is show that there is not always a tight correlation between functionality and stable folds. In general there seems to be a pretty tight relationship but, of course, this is a generality. For disordered proteins, functional sequence space becomes more relevant than folding sequence space. The specificity of those proteins would still be Nf/N. The question is whether there exists that requisite Part A information to construct them without ID contributing any specified complexity (functional information). Another relevant area for research would be to see if there is any sequence similarity between two different proteins such that one could trace a functional evolutionary path between the two folding proteins via the disordered protein. In other words, can the non-folding region of sequence space between two proteins be bridged by a functional, non-folding region. This may be the case, although I do not yet know of any, but the Part A information (the existing sequence from which to launch from) would be relevant. Another thing I would be interested in is in how much of non-folding sequence space is functional. Because of the general correlation between stable folds and functionality, I suspect that it is not large. A sampling of sequence space in the region of known disordered but functional sequences would be a place to start. In general, provided the binding sites remain functional, and the disordered regions non-functional, the folding sequence space for the binding sites would become the relevant area to focus on.

Re. Shiba et al, " We assessed the significance of periodicities of DNA in the origin of genes by constructing such periodic DNAs. The results showed that periodic DNA produced ordered proteins at very high rates, which is in contrast to the fact that proteins with random sequences lack secondary structures."

It is not surprising that periodic DNA will produce ordered proteins. However ordered proteins will have a very low information carrying ability and, thus, will produce simple topologies. Proteins with aperiodic sequences that are randomly generated should be expected to lack stable topologies given that stable folding sequence space for complex topologies seems to be miniscule. However, the cell requires a large number of proteins with very complex folds not likely to be generated by mutating, periodic DNA. In those cases the specified complexity is too high. What their work illustrates is that, if the protein carries only low amounts of information (low specified complexity), there are relatively simple ways to generate those sequences. But I am contending that the problems arise when it comes to accounting for proteins with a high specified complexity.

I haven't taken the time to read over this post looking for typos, so my apologies in advance for all the typos.

IP: Logged
warren_bergerson
Member
Member # 262

Icon 1 posted 04. September 2002 17:02      Profile for warren_bergerson   Email warren_bergerson   Send New Private Message       Edit/Delete Post 
Kirk,

First, let me state that I am impressed both by the amount of knowledge biologists have accumulated on this subject, and by your ability to articulate the points relevant to the issue being discussed.

It appears obvious to me looking at the information being discussed that 1)if a knowledgeable scientist believed there was a materialistic explanation for the evolution of proteins, 2)then the evidence would seem to clearly point in the direction of ‘step wise’ non-random process of change, and 3)there would seem to be any number of potential physical processes or mechanism that could produce the necessary ‘step wise’ non-random processes. As one probably overly simplistic example, it seems that reasonable that in some manner genes code the ‘assembly instructions’ for constructing proteins (or some precursor to a protein). If this is the case, then there is little difficulty imagining a materialistic process which could produce ‘systematic non-random changes in these instructions’. In is not difficult to imagine such systematic non-random processes ‘evolving’ new proteins.

My question- "Given that there is at the very least a reasonable amount of evidence suggesting the existence of a non-random evolutionary design processes, why isn’t anyone proposing and advocating such a theory?’ And in a related manner, "Why is Dembski spending so much effort developing a measure which, while it would discredit a RM&NS theory, would not be particularly relevant to some type of non-random theory of evolutionary change?"

To put it in simple terms, why is the question of biological design being ‘framed’ in terms of ‘RM&NS’ versus ‘RM&NS can’t work’, when there are so many obvious alternatives to these two options? I find it hard to believe that something so obvious to an outsider isn’t also obvious to the people working in the field. Why do there appear to be so few efforts to formulate alternative theories of evolutionary change?

IP: Logged
peroxisome
unregistered


Icon 1 posted 04. September 2002 18:34            Edit/Delete Post 
Dear Kirk
I said "they also suggest that their data supports the idea that modern enzymes evolved from simple precursors".
You said
quote:
I'm afraid that you are misrepresenting their work
The paper says:
quote:
our results support suggestions (9, 26, 27) that modern enzymes could have evolved from primitive precursors constructed from a relatively small number of polar and nonpolar amino acids.
I think it is obvious that my statement was in no way a misrepresentation. Will you accept that you have made an untrue statement, and apologise ?

Will you also accept that your previous statement:
quote:
a paper by Taylor et al., rules out random sequence libraries on the basis of the highly constrained specificity of active enzymes.
is in fact untrue ?

You are also trying to gloss over the fact that you used a basis of calculation of 1 in 5x10(23), when the taylor paper found ~10(4) hits in that estimated 5x10(23), cited a paper which found 1 in 2x10(12), and actually found hits at 1 in 10(4). You have ignored the possibility that a library of 10(23) would find additional hits, not found by the Taylor strategy. It strikes me that you may have used the highest estimate because it gave you the most favourable answer to your own position; but this would be a fundamentally dishonest approach.

I have pointed out clearly that on two occasions you have told an untruth. You should accept that you were wrong, or explain why not.

You have misrepresented the taylor paper, and used an inappropriate number to give a biased argument. You should address this point.

I raise these issues so bluntly, because it seems to me that they strike directly at whether you are a witness of truth.

yours
per
===================
edit; missed part of Kirk's post
Let me explain some elementary english. When Taylor et al write
quote:
Our estimate of the low frequency of protein catalysts in sequence space indicates that it will not be possible to isolate enzymes from unbiased random libraries in a single step.
they are saying that their estimate indicates... will not be possible... in a single step.
to highlight the important points.
"estimates" can be wrong
"indicates" is a major qualification of "will not be possible"
"in a single step" is a significant qualification, because it may well be possible in two steps !

It is a gross misrepresentation to equate their statement as:
quote:
since the obtaining of enzymes from random libraries would be impossible


[ 04 September 2002, 19:02: Message edited by: peroxisome ]

IP: Logged
Moderator
Administrator
Member # 1

Icon 4 posted 04. September 2002 19:17      Profile for Moderator   Email Moderator   Send New Private Message       Edit/Delete Post 
peroxisome,

This is a warning. From your very first post I sensed a hostile overtone. Hostility, and the "battle warrior" mentality are strictly frowned upon here. Either give other board participants the benefit of the doubt (don't assume that they are purposely misrepresenting data) and gently correct their mistakes, or leave our board. If you are not familiar with my moderating policy, please view the New Users Guide:

http://www.iscid.org/boards/ubb-get_topic-f-6-t-000076

IP: Logged
charlie d.
Member
Member # 159

Icon 1 posted 04. September 2002 20:06      Profile for charlie d.     Send New Private Message       Edit/Delete Post 
quote:
The ready response to this problem is to argue that the incremental ratcheting of functional information is not an undirected process, but directed through natural selection. There are two problems with this. The first problem is that natural selection can only begin once we have a minimal life form, which requires about 300 different proteins or, most certainly, 150 different proteins.

This is not correct, as I pointed out before on this board, natural selection will start working as soon as you have a population of inaccurate replicators in a limiting environment. These may be proteins, nucleic acids, or even noncovalent assemblies of simple organic compounds.
Thus, natural selection needs no proteins, metabolism (as usually intended), or let alone cells at all.

As for the rest of Kirk's argument, and this is my last word on it, it seems to me that it rests on some excessive reliance on theoretical models and some arbitrary "magical" cut-off. It also depends strictly on some questionable interpretation of published data, as pointed out above.

The evidence, as we know it, shows the following:
- simple proteins with detectable biological activities can be readily identified in random peptide libraries that fit into a test tube, using simple selection protocols for a small number of reiterated cycles.
- the possibility of gradual evolution of proteins within structural families (that is, for instance, by successive single aminoacid substitutions) has been empirically demonstrated. Some proteins may be more "flexible" than others in this respect, but certainly there seem no major hurdle to darwinian mechanisms here.
- significant transitions between structural domains, while maintaining protein "coherence" and stability, also seem to occur by common mechanisms, based on both evidence in artificial systems and inference from natural proteins. Successful transitions of this kind may be rare, which would explain the limited array of protein domains detectable in nature, but certainly not impossible.
It seems to me that, if these observations together do not prove that protein evolution is achievable by naturalistic mechanisms, they sure come pretty close.

I also note to conclude that a continuous moving of the goalpost has occurred in this regard from evolution-skeptical circles in the past few decades. First, biomolecules were supposed to be extremely rare and difficult to generate; now we know they can be generated in abiotic conditions and may be so abundant they can be found on comets. Then, "biological function" was supposed to be the insormountable barrier, except that random peptide and ribozyme libraries have been shown to be effective sources of biological and catalytic activities. Now, it seems, a higher IT-based, probabilistic threshold has been raised; we'll see how long it will stand. I think however this should sufficiently explain the scientists' skepticism regarding claims on the supposed impossibility of protein evolution.

IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 05. September 2002 16:59      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
Before I respond to the latest round of comments, I will summarize my overall response to the topic at hand.

I have shown that, since specified complexity (as I understand it) is a mathematical term that arises out of a mathematical derivation for an equation of functional information, it is a valid, objective term. Furthermore, when it comes to organic life, it is the physiochemical processes in the cell that determines functionality, not any subjective observer, which further keeps specified complexity as applied to organic life in the realm of an objective term.

I haven't seen any real objections to this (although I could easily have missed it due to the volume and depth of many of the comments). If this is so, then Dembski's contention that specified complexity is an objective term seems to stand.

The bulk of the discussion seems to center around whether natural processes can produce the kind of high degrees of specified complexity that we observe in organic life. There are only two known categories of artifacts that have high specified complexities:

1. certain sequences and configurations produced by human beings
2. organic life, including the simplest theoretical genome

Nothing else nature does produces anything remotely close to the kind of specified complexity seen in the above two categories. To be specific, nothing else nature does produces sequences or configurations remotely close to a specified complexity of 10^-22. That is why organic life so obviously suggests design to even the 'most hardened atheist' and Darwinist Michael Ruse states in Philosophy of Biology.

The materialist is philosophically committed to a completely natural explanation for the high specified complexity in organic life, regardless of the empirical evidence against him/her. Thus we have the discussion of whether or not natural processes can produce the high specified complexity witnessed in organic life.

Resp to Warren:

Regarding a 'systematic non-random processes 'evolving' new proteins,' we need to consider the non-random system that is going to do this. In order to avoid ID, the non-random system will have to have a specified complexity that is low enough for nature to achieve (looser than 10^22). In general, selection systems have a significantly higher specified complexity than the sequences or configurations that they are selecting. This certainly holds true for organic life. In this discussion, we have primarily focused on proteins. The idea has been advanced that functional proteins might be assembled in a modular fashion. But they will just as quickly evolve off on a continued evolutionary path back into non-functional sequence space unless the cell just so happens to have also evolved the need for the new function and that new function increases the fitness of the cell sufficiently for that new gene to increase in frequency within the population such that it is retained. So the system that 'identifies' novel functions is necessary for the last step in achieving and retaining the novel protein, just as Dawkins' program had a built in system for identifying the functional letters. But because it is necessary, we must include the specified complexity of the non-random selection system in our analysis of whether or not the novel protein can be achieved through natural processes. In other words, the total information required to produce the novel protein, even if the protein itself carries only 70 bits of functional information, significantly exceeds 70 bits (probably be at least 2 orders of magnitude). To use Dawkins' program, the total information content of his program included the information contained in the target phrase (since it was coded right into the program) as well as the information contained in the rest of the program. So Dawkins solved the problem of randomly generating the phrase 'METHINKS IT IS LIKE A WEASEL', but only be introducing a larger problem of how to get the non-random selection system occur without ID (he doesn't address this).

So what I am saying is that for organic life, there are mechanisms in place for point mutations, DNA shuffling, protein fusion, etc., which constitute the Part A information required for the processes. But the Part A information usually equals the Part B information if we ignore the selection system of the cell, which must provide the crucial need for the novel function, and vastly exceeds the Part B information contained in the novel protein if we include the cell, without which the protein becomes meaningless (or functionless).

There are other non-random selection systems in nature, although they can hardly be called systems. Physical laws can produce ordered systems but there is a major problem with ordered systems; they cannot carry high information. In fact they carry virtually no information. Some examples are crystals and patterns arising out of chaotic interactions. Both outcomes are completely determined by the laws of physics and both produce repeating sequences. When eqn. (4) is applied to them, they yield an functional information content of zero. That is precisely why high specified complexity is required for high information content; the sequence must be aperiodic (high N), but highly specified (low Nf). So physical laws cannot act as a non-random selection system when it comes to organic life, which has a very high specified complexity.

Resp. to Peroxisome:

You would have more accurately represented the work of Taylor et al. if you had presented your selected quote within its context, especially the disclaimer immediately following the quote you selected, as shown below:

" The binary distribution of hydrophilic/hydrophobic residues is inherent in the genetic code (NAN/NTN), and our results support suggestions (9, 26, 27) that modern enzymes
could have evolved from primitive precursors constructed from a relatively small number of polar and nonpolar amino acids. There is, nevertheless, a low probability of finding catalysts, even when both position and identity of all critical active site residues
are determined in advance. This finding contrasts with the ease of obtaining folded helical proteins through binary patterning (9), underscoring the exacting demands that catalysis places on protein design."
Note the counter that immediately follows the phrase you selected. This carries a great deal of weight in light of the rest of the paper and the careful intelligent determination of critical active sites and the ID (human) in the lab that they admit to in general. There is a bit difference between, "This suggests such and such," and "This suggests such and such, nevertheless, here is a serious problem."

Regarding your contention that my statement, " a paper by Taylor et al., rules out random sequence libraries on the basis of the highly constrained specificity of active enzymes," is false, I am mystified. Here are two quotes from their paper that should make it clear to an English exegetical expert such as yourself that what I said is true.

" Direct selection of catalysts from pools of fully randomized polypeptides is a conceivable alternative to de novo design, requiring no foreknowledge of structure or mechanism. An analogous approach has yielded RNA catalysts for a variety of
chemical reactions (5). However, a 100-residue protein has 20^100 (1.3 x 10^130) possible sequences. Even a library with the mass of the Earth itself—5.98 x 10^27 g—would comprise at most 3.3 x 10^47 different sequences, or a miniscule fraction of such diversity. Unless protein catalysts are unexpectedly abundant and evenly distributed in sequence space, such a strategy will clearly be impractical."

and,

" Our estimate of the low frequency of protein catalysts in sequence space indicates that it will not be possible to isolate enzymes from unbiased random libraries in a single step. The required library sizes far exceed what is currently accessible by experiment, even with in vitro methods (31, 35). "

In the first quote, note the 'it is conceivable but impractical' gist of the paragraph. The explanation included makes it pretty clear that 'impractical' is an obvious understatement; obvious within the context of what they have just stated in the paragraph. In the second quote, they simply say it will be impossible. Perhaps you were confused with the phrase, 'single step'. This should not be taken to mean that they would dip into their hypothetical library just once and if they weren't lucky, they would go home. Rather it means that they could dip all day long and never hope to pull out one of those rare functional sequences ready made.

Finally, there is your contention that the estimated Nf/N ratio of 1/5 x 10^23 is wrong. The mistake you have made here is that the 10^4 hits you speak of were not out of a library of 5 x 10^23 members. For the benefit of the other discussants, here is the section where they derive their estimate,

" Extrapolating from our data and from modest sequence constraints on interhelical turns (23, 28–30), we can estimate that if every position in the protein had been randomized, a library of ~10^24 members would have been needed to obtain AroQ mutases. This estimate is based on the experimentally observed frequencies for the binary-patterned helical modules and the assumption that only a single amino acid is tolerated at each highly conserved position in the active site (i.e., Arg-11, Arg-28, Lys-39, Arg-51, Glu-52, and Gln-88) Previous studies (23, 28–30) suggest further that 1–10% of all possible turn segments will yield active catalysts. Finally, based on the low incidence of active
clones observed in the H1/H2/H3 library, a templating/assembly effect of 10^4 for organizing the H1 and H2/H3 segments is included; this factor could turn out to be larger for assembly of helices and loops that have not been optimized by a preselection step. The required library size can thus be calculated as follows: 4,500 (binary patterned H1) x 17,500 (binary patterned H2/H3) x 20^6 (randomized active site residues) x 10^2
(fully randomized L1) x 10^2 (fully randomized L2) x 10^4 (templating effects) = 5 x 10^23.
The size of such a library is many orders of magnitude larger than that needed to identify noncatalytic ATP-binding proteins from random sequences (31). Although the estimated frequency of catalysts in protein sequence space will be contingent on the
choice of building blocks and structural motif, on the difficulty of the chemical reaction, and on the level of catalytic activity needed for selection, construction of a moderately active enzyme also appears to be substantially more difficult than obtaining a
ribozyme. For instance, it has been found that ~1 in 10^13 RNA molecules from a pool of random sequences promote a template-directed ligation with RNA substrates (32) …"

I trust that this clears up your confusion on this item. However, I see that you have also failed to carefully read Keefe & Szostaks' paper, with the result that you are under the impression that I have made a mistake. To clear this up, read their conclusions. I have included the key statement below,

"The frequency with which ATP-binding proteins occur in sequence space can be estimated from the observed recovery of four such proteins from a non-redundant library of 6 x 10^12 random sequences. On the basis of the average behaviour of the proteins
isolated before mutagenesis (Fig. 2), only about 10% of the potentially functional sequences present in the first round would be expected to generate correctly folded active proteins and thus survive to be amplified. Detailed measurements of the efficiency of
each step in the selection and amplification process confirmed that there were no significant material losses that would have affected the recovery of active proteins.
We therefore estimate that roughly 1 in 1011 of all random sequence proteins have ATP-binding activity comparable to the proteins isolated in this study. This frequency is similar to the recovery of ATP-binding RNAs from random-sequence RNA …"

Hopefully, this shows that your concerns were unfounded.

Resp. to Charlie:

First, I trust that your concern that I was putting forward a 'questionable interpretation' of published data has been thoroughly laid to rest after having read my above response to Peroxisome.

Second, rather than being a 'magical cut-off,' the upper limit that I propose is based upon the following falsifiable hypothesis, "all sequences, artifacts, or configurations that require more than 70 bits of information will require ID." Thus far, all the empirical evidence we have verifies this hypothesis and there is no empirical evidence that comes even close to falsifying this hypothesis. I will admit that 70 bits is not carved in stone, but we certainly cannot go above 400 bits. The Phys. Rev. Lett paper I cited shows that the maximum number of operations that could have taken place in the universe to date, assuming quantum frequencies (which are very fast) is about 10^120 (N= 10^120). The maximum specified complexity we could achieve in this scenario would occur if just one of those 10^120 operations had a unique function that no other operation had (Nf = 1). From eqn. (4) we see that the maximum information that universe is capable of having generated to date in any particular configuration is therefore 400 bits. Biological and chemical processes don't take place at quantum frequencies, nor is the entire energy and mass of the universe available to produce proteins. For this reason, among others, 70 bits is more realistic upper limit for my hypothesis. Ultimately, the materialist is going to have to put forward empirical data that falsifies my hypothesis.

Third, regarding natural selection prior to organic life: Keep in mind that specified complexity is always relative to some function. In the case of organic life, the function is biological. The kind of natural selection that must produce functional proteins, must be somehow attached to that protein's function within an organism. There are non-organic selection systems, but they will select according to their criteria, not according to what is biologically functional within a minimal organism. An analogy is sending my 8-year old daughter to the mall to select a bunch of stuff (I don't give her any criteria, I leave it up to her selection criteria). What I want to do is build a laptop computer, but since the selection required is not likely to be attached to the kind of selection my daughter will perform in the mall, I'm not likely going to be able to build a computer with what she selected.

The mere fact that there exist selection mechanisms and those mechanisms are through natural processes (topography selects where water collects) does not mean that these systems will have any relevance to biological function. The 150 genes in a bare-bones minimal genome are all absolutely essential. There is not one that could be missing. In fact the authors allow that such an organism is not likely to be able to survive under natural conditions. So we need some sort of non-biological, natural chemical selection that will select all the right proteins such that they will just so happen to have an essential function within the first minimal life form. Compounding this is the fact that the functional sequence space for these 150 proteins is so miniscule that even if the entire planet were proteins, there would not likely be a single future-functional protein for this non-biological natural selection process to select.

You are right when you say that gradual evolution within structural families has been empirically demonstrated, but we need more than just gradual evolution to account for the arrival of any particular evolutionary trajectory at an area of sequence space that is so miniscule that there is no hope of lucking out within the lifetime of millions of universes. Yes, successful modular shuffling does occur. But as I've pointed out, it requires more Part A information than nature is capable of generating within the lifetime of the universe.
Your assertion that, " simple proteins with detectable biological activities can be readily identified in random peptide libraries that fit into a test tube, using simple selection protocols for a small number of reiterated cycles" needs to be qualified. The 70 bit upper limit allows for such only up to only 84 amino acid proteins. Once you get larger than this, you can no longer expect to luck out. If one draws a series of letters out of a hat, it is not surprising if one occasionally came up with a meaningful (functional) word. This should not, however, be taken as evidence that therefore the entire text of Lord of the Rings could also be produced. It is essential in this discussion that we all agree that sequences or configurations with a very low specified complexity need not be produced by ID. It is only sequences or configurations with very high specified complexities that require ID and organic life is full of such examples.

Finally, I want to respond to your moving goalpost comment. The generation of all 20 (now 22) amino acids under plausible early earth conditions has yet to be generated, given that the geological evidence falsifies the reducing atmosphere postulated by Miller. However, I see no real theoretical problem generating them as they have a very low specified complexity. The problem is arranging those amino acids into a sequence that has an information content too high for any natural process to produce. I would strenuously disagree with your statement that, " random peptide and ribozyme libraries have been shown to be effective sources of biological and catalytic activities." That is not even remotely supported by the data, as we can see from Taylor et al. and keep in mind that they were working with a very simple, moderately active enzyme of only 95 amino acids. The average protein has 300 amino acids, with an information content of 249 bits and a conservative specificity of about 10^-75. We could never hope to find a functional 300 residue protein with a specificity like that. It's like observing a falling rock hit another rock and chip off a small piece and thereby concluding that now we have a natural explanation for Mount Rushmore. All we have is a natural explanation for an event of with a specified complexity of 1, which requires no ID at all.

If anything is changing it is the explanatory power of materialism in accounting for organic life. Two decades ago, we pretty well figured we had it all sewn up. Then we found that substitutions tolerated singly were often not tolerated in concert, and the functional sequence space began to narrow. Then we discovered that most of sequence space is non-folding, which further made the assembly of amino acids into functional proteins more improbable. At the same time we found that the functional sequence space for larger proteins seems to be miniscule. Now we are beginning to realize, with horror, that accounting for information carrying proteins is going to be a piece of cake in comparison with coming up with an explanation for the operating system that regulates everything. For a good summary of this, read, 'Pursuing arrogant simplicities' Nature, 21 March, 2002, vol. 416, p. 247. If anything, scientists should be (or perhaps are) becoming increasingly skeptical about any natural scenario being able to account for the emergence of organic life.

It is an empirical fact that ID can produce sequences and configurations that require vastly more than 70 bits of information. It also appears that the universe is not capable of producing more than 400 bits of information even at quantum frequencies. The simplest conceivable genome requires at least 42,000 bits of information. Science has to go were the empirical evidence points. The empirical evidence in this case points to ID.

IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 05. September 2002 17:07      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
In my qoute from Keefe & Szostak, there was a critical typo. The corrected version is as follows: "We therefore estimate that roughly 1 in 10^11 of all random sequence proteins have ATP-binding activity comparable to the proteins isolated in this study. This frequency is similar to the recovery of ATP-binding RNAs from random-sequence RNA …"

Note the exponent. My original post left out the'^' which then made it read '1 in 1011'.

Sorry for this. I should have previewed my post.

IP: Logged


All times are East Coast
This topic is comprised of pages:  1  2  3  4  5 
 
Post New Topic  Post A Reply Close Topic    Move Topic    Delete Topic    Top Topic next oldest topic   next newest topic
 - Printer-friendly view of this topic
Hop To:

Contact Us | ISCID

All cont