ISCID Forums


Post New Topic  Post A Reply
my profile | search | faq | forum home
  next oldest topic   next newest topic
» ISCID Forums   » General   » Brainstorms   » What Sort of Property is Specified Complexity? (Page 4)

 
This topic is comprised of pages:  1  2  3  4  5 
 
Author Topic: What Sort of Property is Specified Complexity?
warren_bergerson
Member
Member # 262

Icon 1 posted 12. September 2002 13:46      Profile for warren_bergerson   Email warren_bergerson   Send New Private Message       Edit/Delete Post 
Kirk,

I think you did an excellent job of explaining the concepts N and Nf which are the logical basis for quantifying the complexity or information content of a system. Complexity or information content(I think the terms can used interchangeably ) is a measurable real world, point in time, property of a system like temperature or speed.

Your comments, however, overlook the fact that living systems that exhibit what I call the ‘biological design’ property have important identifiable features in addition to measurable complexity. Of immediate concern, living systems exhibiting the biological design property have the ability to:

1. Maintain an adaptive or functional state- For a given set of options N, a specific form or option Nx is a member of the set of functional or adaptive forms only for a limited period of time. For biological systems, the set Nf is highly changeable or dynamic. In order to maintain an adaptive or functional state, biological systems are continually changing from Nx which is a member of Nf at point in time t to Ny different from NX which is a member of Nf at point in time t+k. The average time k involved in required changes from Nx to Ny is a measure of the stability of a system. Systems exhibiting the biological design property, in addition to very high information content, are very unstable (in order to maintain adaptive or functional states, biological systems must generally be capable of finding new adaptive forms Ny from N in a matter of seconds).
2. Create new information or complexity-Living systems have the ability to increase complexity or information content either by increasing the size of N or by splitting N. [The complexity of mature multi-cellular organism is vastly greater than the information content of the inherited in DNA material.]

Both of the above traits are easily observed and measured.

In analyzing complexity, it is, IMO, important to distinguish between ‘systems exhibiting the biological design property’ and ‘the results produced by systems exhibiting the biological design property’. Both the results and systems producing the results exhibit very high degrees of complexity. However, the systems exhibiting design have identifying properties in addition to complexity.

IMO, proteins represent the result of a system with the biological design property. Measuring the complexity of the proteins provides only an estimate of the complexity or information content of the system producing the proteins.

Quote: HYPOTHESIS: all configurations requiring more than 70 bits of functional information will require ID.

In my terminology, this would be expressed as "configurations requiring more than 70 bits of functional information will require a biological design process". Again, in my terminology, this would not be a valid hypothesis because there are systems which do not satisfy the criteria for a biological design process which could generate or require more that 70 bits of functional information.

IMO, there is a very clear unambiguous line in the real world between ‘systems exhibiting ID or the biological design property’ and ‘systems which don’t exhibit the design property’. I also believe there is a clearly identifiable distinction between ‘systems exhibiting the Darwinian or neo-Darwinian design property’ and those exhibiting ‘ID or biological design’. I do not, however, believe the distinction can be made entirely on the one variable (complexity or information content.)

Quote: This arises out of the Information Transfer Theorem which essentially states that information flows downhill for redundant systems.

I guess I am lucky I wasn’t aware of theorem or I might not have noticed that biological systems routinely create vast volumes of new information. The ability of biological systems to create information is easily demonstrated. As a simple example, the information content of an animals nervous systems is many magnitudes of complexity greater than the information content of the organisms DNA.

I assume the Information Transfer Theorem is valid for systems with certain constraints. The evidence would clearly suggest those constraints do not apply to biological systems.

Quote: I don't think we can support the notion that the fitness landscape can do it. There doesn't seem to be any empirically verified driving force within nature that ever steadily pushes simple life forms (42,000 bits of functional information) into higher functional information states

It should be noted that the ability of biological systems to create or increase complexity is a factual issue rather than a theoretical issues. If you have 1)a reasonably accurate ‘complexity thermometer’ and aim it at 2)‘systems with the biological design property’, then you will be able to measure if mature organisms are more complex than the genetic material from which they developed. Using design science techniques, I suggest, these calculations are easily performed and the conclusion very unambiguous.

IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 12. September 2002 14:23      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
Peroxisome, either you and I are not looking at the same paper, or English is not your first language. I don't mean this in a denigratory way; there are vast numbers of scholars whose mother tongue is not English and, therefore, can easily struggle when reading papers in the English language. In English, if one uses quotes (") around a statement, then one is making a literal quote. I did not use quotes, but my statements do fairly represent the paper by Taylor et al. Specifically, you said, "Taylor et al at no point use the figure 1 in 10(24) or 1 in 5x10(23). Will you accept that they do not use this phrase in their paper ? YES/NO."

Taylor et al. say, "Extrapolating from our data and from modest sequence constraints on interhelical turns (23, 28–30), we can estimate that if every position in the protein had been randomized, a library of ~10^24 members would have been needed to obtain AroQ mutases. This estimate is based on the experimentally observed frequencies for the binary-patterned helical modules and the assumption that only a single amino acid is tolerated at each highly conserved position in the active site (i.e., Arg-11, Arg-28, Lys-39, Arg-51, Glu-52, and Gln-88) Previous studies (23, 28–30) suggest further that 1–10% of all possible turn segments will yield active catalysts. Finally, based on the low incidence of active clones observed in the H1/H2/H3 library, a templating/assembly effect of 10^4 for organizing the H1 and H2/H3 segments is included; this factor could turn out to be larger for assembly of helices and loops that have not been optimized by a preselection step. The required library size can thus be calculated as follows: 4,500 (binary patterned H1) x 17,500 (binary patterned H2/H3) x 20^6 (randomized active site residues) x 10^2 (fully randomized L1) x 10^2 (fully randomized L2) x 10^4 (templating effects) = 5 x 10^23."

If there is any doubt in your mind that the calculated frequency is to be in the form of '1 over library size' note his discussion immediately following when he compares it to "1 in 10^13 RNA molecules from a pool of random sequences …".

If you really think that no one knows (or at least can make a very good calculation based on data) what Nf/N is "for anything above Keefe & Szostak's figures" then clearly you do not think that the results that Taylor et al. obtained are valid. Given this, you ought to submit a response to PNAS showing where Taylor et al. went wrong in their calculation.

Regarding the impossibility of finding a moderately active enzyme by sampling a random sequence library, Taylor et al state, "Our estimate of the low frequency of protein catalysts in sequence space indicates that it will not be possible to isolate enzymes from unbiased random libraries in a single step. The required library sizes far exceed what is currently accessible by experiment, even with in vitro methods (31, 35)."

'Not be possible' means 'impossible', 'can't be done'. Their opening paragraphs explain why.

If English is not your mother tongue, then I certainly understand some of your confusion. However, if English is your mother tongue, then your whole approach should be re-thought. Whatever your position is in the matter of ID and specified complexity, you really ought to defend it by focusing on the science rather than absurd quibbles on whether or not exact quotes are used rather than summaries or descriptions.

IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 12. September 2002 16:47      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
Resp to Warren:

Considering functional proteins that are folding, there is the island of stable folding sequence space (Ss), then within that island, there is a smaller island of functional sequence space (Sf), and then within that island is a smaller island of functional sequence space for a particular organism (Sx). Sx will never be bigger than Sf and Sf will never be larger than Ss. Ss is determined by physics, specifically, the energy well associated by the sequence. Theoretically, any protein with the same fold should be able to do the same job, but differences of only a few angstroms can make a large difference sometimes, so biology may restrict Nf to a smaller number than Ns. Then, a particular organism may further restrict what is functional for that particular organism. In that case, Nf is further restricted to Nx.

When I work with functional information, I typically set Nf = Ns, which has the effect of making the specificity as loose as allowable and, therefore, more attainable. Since Ns is set by physics, setting Nf = Ns means that Nf will never change. The demands of a particular organism may impose restrictions such that Nf = Nx. But since the demands of a particular organism may change with environment, or by mutation, Nx can change with time and, therefore, Nf (if we set Nf = Nx). So if we set Nf=Ns, we have a constant number that does not change with time. But if we set Nf = Nx, then we can only calculate a point in time value for Nf, which really doesn't tell us much about limitations to change with time.

You said that the complexity of a mature multi-cellular organism is vastly greater than the information content of the inherited DNA. Keep in mind that it is specified complexity that is important in calculating functional information. The information for building a mature multi-cellular organism is largely contained in the genome. Some information comes from other sources such as organelles as well. When this functional information is collected, the information transfer theorem indicates that it will be greater than the outcome, which is a mature, functional organism. Yes, environmental effects will introduce a lot of variation into the functional organism, but it is functional information that we are concerned with. So I would have to disagree with you. Mature organisms proceeding from what is encoded in the genome do not violate the Information Transfer Theorem (ITT). The central dogma in biology is there because biology, like every other physical process in this universe, is subject to the ITT.

You stated that biological systems create vast volumes of new information. Setting aside what organisms with intelligence are capable of (due to their ability to perform ID) I would need an example of this before I could respond to your claim. In order to be specific, lets focus on information carrying DNA, RNA, or proteins. I'm not aware of any examples. Mind you, I am distinguishing between the general concept of Shannon information, which does not at all distinguish between functional and non-functional information, and the special case of Shannon information which does. There are all sorts of natural processes that can increase Shannon information, but none that I am aware of that can increase functional information (aside from small fluctuations in the random 'noise' of events).

I don't doubt that biological systems can create new functional information, but it will always be less than what is already contained in their genome, if you are talking about creating new genetic information. Further more, due to the ITT, the sum of the new genetic information created and the existing genetic information that is lost per generation will always be equal to or less than zero. I should also point out that we are discussing whether or not ID was involved in the origin and diversification of biological systems. The fact that biological systems can create functional information is noteworthy, but says nothing about whether those biological systems are the product of ID. (Actually, it is evidence of an ID origin for biological systems).

IP: Logged
peroxisome
unregistered


Icon 1 posted 12. September 2002 18:40            Edit/Delete Post 
Hi Kirk
thanks for your queries on my ability to speak English.

I have a different issue to address. I believe that it is very important to be truthful. When making a scientific hypothesis, it is very important to cite fairly and honestly, and to acknowledge the limitations of data. It is fundamentally dishonest to misrepresent or skew data or references. Accordingly, when an honest mistake is made, it is very important to acknowledge that you are wrong, and to reflect an honest position.

You have made several statements which you supported by Taylor et al. When we look at Taylor et al, it fails to support your statements.

quote:
according to Taylor, the probability of obtaining a functional AroQ protein is still 1 chance in 10^24
This is an explicit reference to Taylor, yet they do not use this figure at any point. They estimate that 10 (24) clones will yield aroq mutases (plural); I think it is self-evident that this refers to the 10(4) mutants they got. It cannot be 1 in 10(24), because they specify (more than 1) in 10(24).

quote:
If you really think that no one knows (or at least can make a very good calculation based on data) what Nf/N is "for anything above Keefe & Szostak's figures" then clearly you do not think that the results that Taylor et al. obtained are valid.
Perhaps you didn't read Taylor et al ? They only screened ~10(8) clones- and not 10(24). They then make a guess as to how many random clones would need to be made to produce the clones they did get. Do you understand that it is different making a guess, or estimate, and knowing something ?

As another matter, it is unfaithful to represent their number as representing all the possible aroQ mutases. If you actually screened 10(24) clones, there is no telling how many aroQ mutases you would get; Taylor et al do not even address this. It is a perversion to suggest that they do.

quote:
a paper by Taylor et al., rules out random sequence libraries on the basis of the highly constrained specificity of active enzymes

...they simply say it will be impossible...

Strangely enough, I count four explicit qualifiers on the statement Taylor et al make. If Taylor et al had wanted to say "it is impossible.", they would have said it- and they did not say it.

quote:
If English is not your mother tongue, then I certainly understand some of your confusion. However, if English is your mother tongue, then your whole approach should be re-thought. Whatever your position is in the matter of ID and specified complexity, you really ought to defend it by focusing on the science rather than absurd quibbles on whether or not exact quotes are used rather than summaries or descriptions.
When I focus on the science, what I see is that you misrepresent your sources, and that you omit facts, in a way which supports your argument.

I raised this in the first instance because I said it struck directly to whether you were a witness of truth. The answer is clear and plain for all to see.

yours
per

IP: Logged
Grape Ape
Member
Member # 399

Icon 1 posted 12. September 2002 18:55      Profile for Grape Ape     Send New Private Message       Edit/Delete Post 
quote:

Kirk wrote:

I don't doubt that biological systems can create new functional information, but it will always be less than what is already contained in their genome, if you are talking about creating new genetic information. Further more, due to the ITT, the sum of the new genetic information created and the existing genetic information that is lost per generation will always be equal to or less than zero.

{Probably going to regret asking this, but...}

Kirk, would you say that the occurance of a gene duplication and the subsequent divergence of one of the paraloges to a new function would not be an example of a "functional information" increase over the prior state?

Unless I have misunderstood you here, your above comments seem to be at odds with these comments from page 1:

quote:
I need to point out that I did not assert that we have no idea how one protein could evolve into another, clearly a series of mutations could do it given enough time and opportunities. It is easy to look at two proteins which we theorize to be homologous and construct a mutational pathway to achieve the orthologous or paralogous protein. The problem arises when we calculate the probability of achieving the novel protein and then realize that it is too small.
I included that last sentence just so that I wouldn't be accused of quoting you out of context, but one need not have a "novel" protein (in the sense of de novo formation or completely unrelated function) in order to have an increase in functional information. Clearly two proteins with similar but different functions would represent a functional informational increase over a situation where there is just one, no? Or to expand things along those lines, certainly a large and diverse protein family or super-family would contain more functional information than a single protein belonging to that family? Keeping in mind of course that none of this requires crossing boundaries into new folds, nor does it require any de novo formation of functional proteins.
IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 12. September 2002 19:32      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
Resp to Grape Ape:

I agree with you; a gene duplication followed by a sequential divergence of one of the paralogues, if it achieved a new function, would produce an increase in functional information. I also grant that such a thing can happen without the need for ID, provided the increase in functional information is not greater than 70 bits.

It was a bit sloppy of me to say, "The problem arises when we calculate the probability of achieving the novel protein and then realize that it is too small." I see no need for ID in obtaining novel proteins, or paralogues with novel functions, provided that the additional functional information required does not exceed 70 bits.

I suppose a next question could be regarding the possibility that the paralogue, which has achieved a novel function with less than 70 bits of information, could also duplicate, diverge, etc, and achieve yet another protein with a novel function, all within the stable folding sequence space for that family of proteins.

My answer would be, only if the total information required to generate both paralogues was less than 70 bits. The reason for this is that the probability of achieving a second paralogue with a second novel function, from the first paralogue with the first novel function is the probability of achieving the first paralogue, multiplied by the probability of achieving the second paralogue given the existence of the first paralogue. So the addition of required functional information is proportional to the product of each of their probabilities, where each probability refers to the chance of making the step from the parent gene. I should say that if the new function can be achieved by a paralogue that is less than 70 bits of functional information away within the same region of stable, folding sequence space, I would question whether a new function has been achieved, or a previously existing function has been regained. We might have to look at the fitness the new function provided. If it was not significant enough to prevent its loss, it may have been lost. If it was very significant, then there is a chance that it is a genuine novel function. But even at that, it could still have been lost in a bottle neck event somewhere in the past. One thing that ID, combined with the ITT predicts, is that there ought to be a slow loss of functions over time, depending upon the size of the population and the selective advantage such functions confer. We may have psuedogenes that are nothing more than ancient genes that can be reactivated, given the appropriate genetic engineering

So in summary, you are right. Paralogues that achieve new functions increase the functional information content of the genome, but the upper limit for how far this can go is 70 bits, if my hypothesis is correct. If 70 bits is too low, it certainly would not exceed 400 bits.

IP: Logged
Moderator
Administrator
Member # 1

Icon 4 posted 12. September 2002 19:57      Profile for Moderator   Email Moderator   Send New Private Message       Edit/Delete Post 
This is a final warning for peroxisome.

Some of the basic rules here at brainstorms are that we treat each other with respect, give each other the benefit of the doubt, and avoid attacking a person or groups of people. The purpose of Brainstorms is to hold constructive dialogue. peroxisome has repeatedly restricted this process.

Regardless of the soundness of peroxisome's arguments, it is quite apparent that he/she is:

1. not giving Kirk the benefit of the doubt and
2. stirring up hostility or participating in what I call the "Battle Warrior" mentality and
3. calling Kirk's character into question.

One more instance of this and peroxisome will be banned for 30 days.

[ 13 September 2002, 06:26: Message edited by: Moderator ]

IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 12. September 2002 21:02      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
Resp. to Peroxisome: I think the word 'guess' misrepresents what Taylor et al. actually did (see quote from their paper in my last response to you). Anyway, if there is a mistake in their estimate, why don't you show us where it was made.

I should also point out what they said immediately following their calculation of 5 x 10^23. It is as follows:

"The size of such a library is many orders of magnitude larger than that needed to identify noncatalytic ATP-binding proteins from random sequences (31). Although the estimated frequency of catalysts in protein sequence space will be contingent on the choice of building blocks and structural motif, on the difficulty of the chemical reaction, and on the level of catalytic activity needed for selection, construction of a moderately active enzyme also appears to be substantially more difficult than obtaining a ribozyme. For instance, it has been found that '1 in 10^13 RNA molecules from a pool of random sequences promote …"

First, note that they state that the difficulty in finding functional AroQ mutases will be 'many orders of magnitude larger' than what Keefe & Szostak found for 80-residue, ATP binding proteins.

Second, note that they refer to their 'estimated frequency.' A frequency is 1/N and in this case, they seem to be calculating N as 5 x 10^23. Keep in mind that this is not a number pulled out of thin air, but a number based on their data from partially randomized sequences.

Third, they go on to state that the construction will be "substantially" more difficult than obtaining a ribozyme, which they go on to state has been found to have an Nf/N of "1 in 10^13. (Note to Peroxisome: No they do not say 'Nf/N', but it is clear from the text that 1 in 10^13 is just that.)

Fourth, note that they state that the frequency will be contingent on the 'structural motif', etc. They are clearly implying that not all proteins will have a similar functional frequency Nf/N, the implication that with increasing complexity of structural motif, etc., the functional frequency will decrease, as Douglas Axe has already pointed out.

All this to say that Taylor et al. seem to be indicating that the frequency (Nf/N) of moderately active AroQ enzymes in sequence space is a) orders of magnitude smaller than for an 80-residue, ATP binding protein, which has an Nf/N of 1 in 10^11, according to Keefe & Szostak and b) significantly smaller than the 1 in 10^13 figure for ribozymes and c) is about 1 in 5 x 10^23.

So your 'guess' that larger moderately active enzymes should have a similar Nf/N to Keefe & Szostak's 1 in 10^11 is hard to accept. It seems, therefore, that the onus is on you to show us where Taylor et al. went wrong in their calculation if you do not like their estimate.

IP: Logged
peroxisome
unregistered


Icon 1 posted 13. September 2002 06:10            Edit/Delete Post 
peroxisome's post here was an attempt to question the moderator's motives in the above warning. Discussion about the moderator's policy should only take place in a private discussion with: moderator@iscid.org

[ 13 September 2002, 06:25: Message edited by: Moderator ]

IP: Logged
Art
Member
Member # 179

Icon 1 posted 13. September 2002 08:30      Profile for Art     Send New Private Message       Edit/Delete Post 
Hi Kirk,

You asked about examples where Nf/N was independent of the size of the polymer being studied. Two specific examples come to mind: one from Bartel’s group (Ekland et al., Science 269, 365-370, 1995) and one from Szostak’s lab (Wilson et al., PNAS 98, 3750-3755, 2001). Briefly, Bartel found that the frequency of RNA ligases in a random population of 200-mers was fairly high, and that the frequency of long catalysts was roughly equal to short ones (I haven’t the details in front of me, but the longer ones, as I recall, were on the order of 90 or so nts, while the shorter ones only 40-50 nts). The end result was that, if length was a consideration, the longer family of sequences should have been less than 10^-15 as abundant as the shorter one, but this is not what was found. Szostak’s group found that the frequency of ca. 90 amino acid steptavidin binders in a library somewhat analogous to that used in the Keefe and Szostak paper was on the order of 1 in 10^13 - quite comparable to the frequency of much shorter (5-38 amino acids, according to work cited by Wilson et al.) streptavidin binding peptides. As remarkable for this thread is the observation that the longer polypeptides bound with a much greater affinity - IOW, greater “activity” was not accompanied by a significantly lower Nf/N.

How do we make sense of results such as these? (I suspect that they are going to be pretty general - the availability of longer randomized polypeptide libraries is such that not much has been done, and I feel that other targets are going to display similar characteristics.) The easiest way is to hypothesize that functionality in proteins (and RNAs) is not a matter of large, highly-constrained motifs, but rather of loose collections of highly degenerate ones. This “explains” why tools such as the PROSITE collection of motif signatures works in identifying functionality in unkown sequences. It also explains the lack of length dependence of Nf/N that is seen. (Look at things this way - if a functional unit needs only 5 amino acids, then it will be roughly as abundant in a library of 12-mers as in one of 90-mers.)

(Pardon the round numbers here - I am typing somewhat off the top of my head, and specifics are too long forgotten at the moment.)

IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 13. September 2002 10:43      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
Art, thank you for those two references. I'll look forward to going over them sometime in the next few weeks. I can't really comment on them until I've read them. Unfortunately, today is the last day I can afford the time to be in this discussion (but I must say that it has been time well spent).

Keeping in mind that I have not yet read the papers, here are some comments:

1. If the 90 residue protein was found to have an Nf/N of 1 in 10^13, then that is still an increase of two orders of magnitude from Keefe & Szostaks' 1 in 10^11 for an 80-mer. Of course these are just two samples, and Bartel's group's findings may not fit that trend at all.

2. I suspect that for the average 300 amino acid protein, there may be some very significant diversity in Nf/N. If, for the same organism, the sequence is very loosely conserved, Nf/N may be quite high. But for highly conserved proteins such as ubiquitin, Nf/N might be very small.

3. Even if it turns out in the end that all proteins have an Nf/N somewhere around 1 in 10^11, all that would indicate is that they are within the reach of natural processes, according to my hypothesis, which puts a lower limit for Nf/N of 8 x 10^-22. So ID would not be required to generate functional proteins. However, for the barebones life form, we need a minimum of 150 specific genes. Not just any genes, but 150 highly specified genes. Given that the specificity of the average gene is assumed to be about 10^-11, the genomic specificity for 150 specified genes would be about 10^-1650.

4. I suspect that obtaining functional proteins is much easier than obtaining the regulatory system that governs when genes are switched on and off and under what conditions such that a mature multi-celled organism can be produced.

5. A protein with a specified complexity Nf/N of 10^-11 requires about 37 bits of functional information. This is well below my threshold of 70 bits. This is not as conservative as some of the limits I've seen Dembski propose, but I still think that my 70 bits is conservative when applied to macroscopic events (where quantum events become insignificant). I have suspected for some years now that 20 bits may be more realistic, just so long as we talking about realistic natural processes. I feel safe enough at 70 bits to go public with that figure, but not yet at 20 bits. Data will eventually tell. Keep in mind that the selection systems in the lab have involved a great deal of ID. So all we can say is that under the assumed Nf/N of 10^-11, Kirk says that proteins can be produced by ID, if his 70 bit limit is realistic. Kirk may find that the data supports 20 bits, or 30 bits as more realistic. Data will tell.

6. This whole discussion began with the objectivity of specified complexity. If we take Nf/N as a measure of specified complexity, then I think we can see that it is a valid concept, just as valid as talking about functional frequencies, which also are represented by Nf/N, and just as valid as talking about functional information, which has Nf/N as its primary term. The question remains, just what degree of specified complexity is nature capable of, and is it high enough to rule out ID when it comes to organic life? It is already an empirical fact that ID (e.g., human) can produce systems and configurations requiring vastly more than 20 or 70 bits. Can nature do it as well? Since the laws of nature tend to produce effects that are repeatable, and capable of being described by simple equations, both of which are the opposite of what one needs to produce functional sequences that are not repeating, but yet are highly constrained to be functional, I see nothing in the laws of nature that is going to produce high degrees of functional specified complexity.

Ultimately, the kind of research program we need is to get some data on the Nf/N for each of the first 150 genes. Then we will have a better idea of just what kind of challenge natural processes have to meet, if it is going to happen without ID. Everything else nature does seems to be able to be readily reproduced, which is why science works in the first place. Since the first living organism is supposed to have been produced by nature, then those processes should be readily reproduced. They certainly shouldn't require hundreds or thousands of scientists, working over decades, perhaps centuries, to try to come up with a plausible (emphasis on 'plausible') scenario for the first living cell and then its diversification into all the organisms we see today. Thus far, our work on E. coli and drosophila isn't helping much. We have almost a century's worth of data indicating that there are natural limits to biological change. ID predicts that these natural limits will be due to nature's inability to produce any significant increase in specified complexity beyond what one could expect in the 'noise' of random events, which is very low level.

Again, thank you for those references Art. I'll look forward to reading them.

IP: Logged
warren_bergerson
Member
Member # 262

Icon 1 posted 13. September 2002 12:05      Profile for warren_bergerson   Email warren_bergerson   Send New Private Message       Edit/Delete Post 
Kirk,
There is at least one ‘logical/mathematical’ type of system known to be capable of creating functional information. This is the basic Greek ‘variation-selection’ teleological causation type system.(The logical process underlying Darwinian and neo-Darwinian theory). Thus the assertion that the ITT applies to life forms must logically be based on the ‘assumption’ that there are no ‘variation-selection’ processes( or any other information generating processes) operating on ‘genetic functional information’ within the lifetime of the organism. Do you agree with this logic?

The ITT is thus based not on ‘fact’, but on the ‘assumption’ that the only variation-selection process associated with genetic material are the neo-Darwinian processes. To put it another way, if it can be demonstrated either 1)that there exists within life-time ‘variation-selection mechanisms’ operating on genetic material or 2)that there is within life-time creation of functional genetic information, then it has been shown that ITT does not apply to genetic functional information. Again, do you agree with the logic so far?

The first step in showing that ITT does not apply to genetic information is to demonstrate that the amount of functional genetic information needed to ‘operate’ a organism far exceeds the amount of information transmitted in DNA. Using your estimates, the human genome transfers between generations something like 60 million bits of information (assuming the information is efficiently coded). This would seem to be a good ‘starting’ point assumption of the ‘maximum amount of information transferred between generations). To demonstrate that life forms create functional genetic information, all that remains to show is that the minimum functional genetic information needed to successfully operate an organism is greater than 60 million bits.

To perform this calculation, we would start by estimating how many ‘genetically controlled’ operations are required for an organism to survive and what is the minimum amount of information that would be required to perform each operation. In estimating the functional information needed to perform these operations, you would also need to allow for the additional level of functional information to provide for information to allow for evolutionary change.

To begin the process, can begin by estimating how many different compounds (proteins etc. ) a organisms is capable of generating and how much information would be required so that the organism can generate(and not generate) the appropriate compound at the appropriate time. I personally am not knowledgeable enough of the number of compounds, the number of start-stop mechanisms or the complexity of protein assembly processes to perform these calculations. But from the discussions I have read, the information needed to perform such calculations(develop reasonable estimates) appears to be readily available. It does not, however, take much knowledge of computer programming to suggest that it will be extremely difficult to operate a complex organism with 60 million bits of transferred information.

If one can demonstrate that biological systems create ‘functional genetic information’ then the next step would be to identify how this information is being generated. The issue here, IMO, is knowing what to look for. The assumption implicit in most analysis of functional information is that there is only a single process (random search) which generates functional information. In reality, there are a whole array of processes from ‘slower than random search’ to ‘random search’ to ‘access stored solutions’ which generate functional information. There are also forms of these information generating processes which generate novel or creative functional information. Once one can demonstrate that information in being generated, and once one recognizes the full range of potential information creating processes, it is not, IMO, terribly difficult to identify the physical process or mechanism which are generating the functional information.


Quote: The information for building a mature multi-cellular organism is largely contained in the genome. … Mature organisms proceeding from what is encoded in the genome do not violate the Information Transfer Theorem (ITT). The central dogma in biology is there because biology, like every other physical process in this universe, is subject to the ITT.

We both agree, I think, that the ‘functional genetic information’ observed in nature could not have been generated by neo-Darwinian RM&NS processes. We both attribute the generation of this functional information to some sort of ‘intelligent design process’. You are arguing that this ‘intelligent design process’ occurred at sometime in the past and the results of the design process are being transferred from generation to generation by genetic material.

I am suggesting, that intelligent design processes (I prefer the term biological design processes) are primarily within lifetime processes. Biological systems, I am suggesting, have a very powerful and very efficient ‘within lifetime’ capacity to generate functional information. I further suggest that these powerful within lifetime processes are fully explainable in terms of observable measurable processes.

The ITT, can be expressed as ‘a system will not generate, create or increase the level of functional information unless the system contains functional information creating processes’. The design science model or theory I am proposing suggests the existence of powerful within lifetime design processes and thus processes capable of generating within lifetime functional genetic information. This model thus ‘predicts’ the creation of vast amounts of functional genetic information within the lifetime of a complex organism.

Your comments suggest that this prediction is contrary to the predictions of existing biological models and theories. As outlined above, this appears to be a testable, verifiable prediction. It is also a major prediction clearly separating design science theories from competing biological models and theories. Would you agree?

There is, IMO, a strong tendency in biology to confuse facts with assumptions, and theories. ITT combined with the assumption of an absence of within lifetime functional genetic information generating processes produces the prediction that functional genetic information will not increase during an organisms lifetime.

Quite obviously this prediction is not a fact. Furthermore, the validity of my prediction and the validity of the conflicting ‘conventional biology’ prediction does not depend on my subjective opinion, your subjective opinion, or the collective subjective opinions of all biological professionals. The validity of the predictions depends on what can be observed and measured. IMO, the knowledge and technical skills needed to test these predictions already exists. We must now simply wait for someone to attempt the measurement.

IP: Logged
Grape Ape
Member
Member # 399

Icon 1 posted 13. September 2002 12:47      Profile for Grape Ape     Send New Private Message       Edit/Delete Post 
Hi Kirk. Thanks for your response.

quote:

I suppose a next question could be regarding the possibility that the paralogue, which has achieved a novel function with less than 70 bits of information, could also duplicate, diverge, etc, and achieve yet another protein with a novel function, all within the stable folding sequence space for that family of proteins.

Yes, that was where I was going with this. [Wink]
quote:

My answer would be, only if the total information required to generate both paralogues was less than 70 bits. The reason for this is that the probability of achieving a second paralogue with a second novel function, from the first paralogue with the first novel function is the probability of achieving the first paralogue, multiplied by the probability of achieving the second paralogue given the existence of the first paralogue.

I don't see how I could agree with you here. The key element you're leaving out is selection and subsequent fixation. If the first duplication event creates a new gene with a novel function, and that function is of some use to the cell (which is probably a prerequisite for the novel funciton evolving anyway), the extra gene should quickly spread throughout the population and exist in thousands (or millions or billions) of copies. Now that everyone has the first paralogue, the probability of a second occurance is the same (actually lower) than the first. The scenario that you put forth would only apply AFAICT if you're talking about a single individual which only reproduces to replacement (making an effective population of 1). In that case, the probability of two duplications happening to the same individual would be the square of the probability of one duplication occuring over a given time frame. But of course we're talking about a population with many individuals where events can happen in parallel, so this wouldn't apply.

Incidentally, once there is one paralogue, the probability of subsequent duplications becomes greater. The reason is simply because there are now two targets to be duplicated rather than just one. Also, given that the most common mechanism for generating duplicates is unequal crossing-over, the chances of this happening increase when you have similar sequences next to each other on a chromosome. This is, for example, why short tandem repeats grow like they do, and it also allows for gene families to rapidly expand. This is the most likely explanation for why genes tend to follow a power law distribution IMO.
quote:

So the addition of required functional information is proportional to the product of each of their probabilities, where each probability refers to the chance of making the step from the parent gene.

Again, the whole point of mutation/selection is that things can be "ratcheted up" such that each step, if it's not highly improbable itself, can be additive such that it can create cumulative change that would be too improbable given a single step. Multiplying the probabilities only assumes a single step. In other words, I think it's a legitimate argument to talk about "functional islands", but in order for the argument to work, you must show that these islands are not connected or cannot be bridged via a low probability event. (This is the essence of Behe/Dembski's IC argument.) In our situation here, we have a duplication/divergence event which is not itself too improbable, so therefore there is no intrinsic limit to how many iterations that this process can proceed. And since we now we have a situation where some amount of functional information -- say 10 bits -- can be added to the genome, then a repetition of this process can surely add up to more than 70. The only way I can see your argument working (at least in regards to this scenario) is to show that a [i]single[i] step must have been greater than 70 bits.
IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 13. September 2002 14:28      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
Warren, I'll respond to your post paragraph by paragraph, where the numbers below represent the paragraph numbers.

1. The Information Transfer Theorem (ITT) can be stated simply as 'it is a property of all redundant codes that information flows only in one direction.' There are various applications of this, but one is the transfer of information from a 64 codon alphabet to a 20 amino acid alphabet. Information is dropped in the translation that cannot be recovered by working backward from the protein. The Central Dogma is merely one application of ITT. In general, a theorem is not an assumption. Rather it is entailed by an axiom. In the transmission of information, some information can go missing or errors in the information can occur. In a perfect transmission system that is non-redundant, information is perfectly conserved. In no transmission systems, however, does the receiver actually receive more information than what is transmitted.

ITT, therefore, also applies to 'variation-selection' systems. The two-part systems works together to isolate a particular functional configuration. But in order to select for a functional configuration, the information for that function must be built into the system. That is why all Software programs that attempt to model the emergence of functional information in biological systems, must have the information for what to look for built into the variation-selection program. Since nature does not have the ability to front-load information into the world to produce the kind of functional information we see in organic life, every computer model that actually produces functional information will suffer from departures from realistic conditions, as I pointed out in my response to Schneider's paper.

Non-ID explanations for the origin of life postulate variation-selection systems where the selection system is also subject to variation. The effect this has on how much functional information can be produced is to reduce it. Schneider's program would have produced even more information if he had not allowed the gene that coded for the binding protein to evolve. So ITT dictates that variation-selection systems will never produce more information than what is front-loaded into the system in the first place. Secondly, if the selection system is subject to variation, errors will be introduced into the information that has been front loaded into the system, and the system will be inhibited from achieve its full original potential.

2. Given the above, ITT does apply to genetic information, as well as the larger variation-selection biological system.

3-6. The project you suggest is, in theory a good one, but it might be centuries before we would have sufficient knowledge and data to carry it out. Right now, I see genetics as the biggest reverse-engineering project that humanity has ever embarked upon. The more we learn about the information contained in the genome, the better we can see its relationship and effect on the morphology of the finished organism and the harder it is to come up with yet-wilder scenarios as to how it got its information. One thing we've learned is that protein coding genes alone, will not build the organism. But we've just begun to decipher the horrifically complex interacting regulatory system, with emphasis on 'just begun'. This system seems to be encoded into the genome along with the proteins. The discoveries we are making in this reverse-engineering project point to the possibility that ALL the information necessary to build a fully functional, mature organism is programmed into the genome and organelles, with possibly a small amount of regulatory function provided by other biological systems. Some organisms demonstrate information compression in that the same gene can produce two or even three completely different proteins by simply having the transcription shift one base. To do that Nf/N for the gene must be absurdly tight. There is a concept known as 'the poison pill' … in attempting to swallow it the victim chokes to death (sometimes applied to companies that acquire another company, the acquisition of which brings the purchaser to bankruptcy). If ID is true, then I predict that genetics, and specifically, decoding the genome, will be the poison pill for any materialistic explanation for the origin and diversity of life. As I did earlier, I strongly recommend reading the Nature article 'Arrogant simplicities' that I recommended earlier.

8,9: The 'biological design processes' that you suggest cannot be all biological if there is also intelligence involved. Perhaps you put the ID at the creation of these biological design processes, which is possible. But if that is the case, then all the information for all the organisms would have to be front-loaded into the system. Also, that information would then proceed to degenerate. Maybe I could sum it up with the following:

a) Only intelligent agents can create functional information
b) If biological design processes produce functional information, then the information must be front loaded into the system unless there is going to be ongoing input by the intelligent agent.

If intelligent human scientists, working for decades with biological systems, cannot move E. coli or drosophila up the ladder of higher functional information by continual variation and intelligent selection, then I see no empirical basis for the belief that biological systems can do it without hundreds of scientists and expensive labs.

10. I don't see the need for powerful within-lifetime processes for generating the information required to build a fully mature organism, I think it is all there in the genome. It just needs to be activated by the appropriate conditions. Nevertheless, I do concede that some direct intervention by an intelligent agent may be necessary, although I can't point to anything that I am aware of. But I don't think you are talking about this second option.

11. I think I would agree, Warren. I see the ITT, which empirically continues to be verified in all other information transfer systems, as beautifully consistent with ID, but a reef for a completely materialistic model for the origin and diversity of life. For the first hundred years, natural theories looked at morphology as providing verification for the model. With advances in genetics, we now see that evolutionary theory based on morphology is horribly simplistic. Evolution occurs on the digital level of bases in genomes. But the moment we start working with digital information, all the axioms and theorems of information theory go to work, including ITT.

12,13: Well … I would dispute your contention that huge amounts of functional information is created during an organism's life time. Small amounts can and are created by, say, our immune system. Lots of functional information is created by our minds (ID), but I think that all the information needed to construct a mature organism is pretty well tied up in the genes and regulatory system encoded in the genome. I do agree that more data or measurements are needed. That is why I am so anxious to get an idea of the Nf/N for a minimal genome.

IP: Logged
Kirk Durston
Member
Member # 174

Icon 1 posted 13. September 2002 15:16      Profile for Kirk Durston   Email Kirk Durston   Send New Private Message       Edit/Delete Post 
Grape, I specifically included the phrase 'given the existence of the first paralogue' to account for what you have pointed out. You are right in that the generation of the second functional paralogue is not independent of the existence of the first. The effect of fixation merely increases the number of opportunities for the second paralogue to evolve. The probability that a paralogue will achieve a new function via a random walk is R!/R^R x (Nf/N). If there is an incremental, but significant advantage for each step closer in sequence space, then the R!/R^R disappears, as the fixation insures that we at least don't go backward, although we could go sideways, which still slows things down. Selection can't cut in until a mutation has occurred, which is largely random in nature (though not always). The right mutations have to occur to produce the function, even with selection operating, and the probability that the right mutations occur is related to Nf/N and the size of the population, if we ignore the possibility of sideways, or neutral mutations. Of course, the greater the population, the greater the probability that the right mutations will occur. But the likelihood that the second paralogue will become functional is dependent upon the Nf/N of both functional paralogues and probabilistic limitations, which are also related to Nf/N. That is why I say the second function won't happen unless the total information required is less than 70 bits.

So the effect of selection (or more accurately, elimination), if the fitness advantage is large enough, is to get rid of the random walk. The Nf/N still remains, however, as the target for which the random mutations must arrive at, one step at a time. Selection merely preserves gains. The number of generations and the size of the population, however, may virtually guarantee that the target is reached by this ratcheting processes, if Nf/N is high enough.

I see the processes of generating functional paralogues as limited by the boundaries of folding sequence space for that protein fold and the number of functions that can be served by that same area of folding sequence space. In other words, there's only so much that nature can ratchet up the information and, as you pointed out, the need for the function has to be already there in order for selection to work. I get suspicious when I see two paralogues, each with different functions. Reason? Because it is also a mark of ID to be efficient and if one fold, with two or three slight modifications, can do two or three different jobs, then go for it. If I see three different functions, then I get REALLY suspicious. If I ever observed a paralogue achieve a novel, beneficial function, I would think it more likely that an old function that was lost, has just been fortuitously regained. It would answer the question as to why the unfulfilled function was there in the first place.

By the way, the generation of tandem repeats (i.e., genetic digital noise) indicates that the digital information of genetics is subject to the same problems as all other digital information storage and transfer divices, including being subject to the ITT.

Regarding how far apart these islands of folding sequence space are, I don't know. The Hagihara & Kim paper (Hagihara, Y. & Kim, P.S. (2002). Toward development of a screen to identify randomly encoded, foldable sequences. PNAS, 99, no. 10, 6619-6624. doi:10.1073/pnas.102172099) indicates that they may be a lot further apart than previously thought (their triple screening method revealed that the old one step method yielded a majority of false positives). Still, I have proposed elsewhere (if it is accepted for publication) a research program that not only measures the Nf/N for the 150 genes of a minimal genome, but also tries to get an idea of how far apart each of the 150 islands of folding sequence space are. The next step would be to examine all other biological proteins to see if there is a feasible path, via other proteins if need be, from the essential 150. Until then, I really can't say.

IP: Logged


All times are East Coast
This topic is comprised of pages:  1  2  3  4  5 
 
Post New Topic  Post A Reply Close Topic    Move Topic    Delete Topic    Top Topic next oldest topic   next newest topic
 - Printer-friendly view of this topic
Hop To:

Contact Us | ISCID

All content © ISCID and content contributor 2001-2003

The ISCID Forums are aimed at generating insight into the nature of complex systems (e.g. biological complexity, organizational complexity, etc.) and the ontological status of purpose, especially from the vantage point of various information- and design-theoretic models.

Indexed by UBB Spider Hack  |  Powered by Infopop Corporation UBB.classicTM 6.3.1.1

PCID | Encyclopedia | Brainstorms | The Archive | News | Essay Contests | Chat Events | Membership