|
Author
|
Topic: Paper On Specificity
|
Jerry D. Bauer
Member
Member # 756
|
posted 11. April 2005 08:10
I wanted to throw out this paper here first to see if anyone is interested in it at Brainstorms. Tear it apart. I think it walks:
ESTIMATION OF THE SPECIFICITY OF FLAGELLA IN ESCHERICHIA COLI THROUGH PROTEIN AMINO ACID LENGTH AND STOICHIOMETRY
ABSTRACT: Determining specificity communicates the specified complexity of a system and provides a means to classify and compare complex systems. Herein, the flagellum of E. coli is investigated by examining 27 component proteins and precisely defining the specificity of each protein in order to mathematically conclude estimations of specified complexity in the flagellar system as a whole. The protein secretory system is included as a sub-system of the flagellum, and known Mot and Fl proteins are scrutinized as to their role in the flagellum, the amino acids comprising them and their frequency of appearance within the system. This paper concludes that the specificity of flagella in E. coli is surprisingly high--estimated at 10^13,167,898. Furthermore, this specificity, when translated into bits that quantify information, yields an information content of 43,742,810 bits for the system in its entirety.
Paper is HERE in PDF format. [ 11. April 2005, 08:12: Message edited by: Jerry D. Bauer ]
IP: Logged
|
|
andyg
Member
Member # 415
|
posted 16. May 2005 19:00
You've just taken a series of proteins, and worked out the number of permutations that a protein of that length could consist of. What does this show?
If you want to get a different number for the specificity, why don't you take the nucleotide sequences, rather than amino acid sequences? Why choose one over the other?
IP: Logged
|
|
Jerry D. Bauer
Member
Member # 756
|
posted 16. May 2005 22:18
Please read the paper to see what it shows:
"Knowing there are 20 letters (amino acids) of the alphabet and taking this to the power of all the letters in each sentence (the protein) then multiplying these totals, we get the total information content of the manuscript. This goes back at least as far as Yockey [19] and Brillouin [18] before him." [as referenced in the paper]
Then taking that figure to base two as I did, we come out in Shannon's bits.
I could have calculated this in other ways but since proteins are normally the units used to describe flagella, might as well go with that rather than to calculate nucleotide sequences of the protein coding genes that translated them. Don't you reckon?
IP: Logged
|
|
andyg
Member
Member # 415
|
posted 17. May 2005 14:20
I take issue with the idea that proteins are normally the units used to describe flagella. If you believe the flagellum to be designed, surely the design is implemented at the level of the code that makes the proteins?
I also question the part of the calculation that includes the numbers of each subunit. Does having 12 copies of FliQ make the flagellum more "specified" than one having 11? Does this mean that an actin filament that contains 1000 actin monomers is more specified than one having 900?
This addresses a point I raised in response to your answers about the strings I posed to you. You seem to be correlating the length of a string (or the number of amino acids in a protein, or the number of proteins in a structure) with "specificity". What is your definition of "Specificity", and what - according to you - is the difference between possessing specified information, and possessing complex specified information?
IP: Logged
|
|
Christopher D. Beling
Member
Member # 723
|
posted 17. May 2005 21:30
Hi Jerry, Thanks for your paper which is so neat in its pedagogy - and for your goal of enumerating specificity. Two thoughts: 1) I see no real problem in specifying the information content on the flagellum on the basis of the protein sequences - since in my understanding of Yockey's work the information in the DNA sequence will be more than what you calculate. i.e. you will calculate a lower bound to the complex specified information. 2) I also agree with Andy that it is questionable to count a protein more than once. I know you are working from the basis of not taking any history into account. But to work on that basis when one knows that the proteins are being transcribed/translated from information on genes - seems to be calculating something that is not relevant and drastically overestimates the complex specificity of the system. i.e. all the advantage of working with a lower bound as in (1) is lost. There is surely no more information in two FliC proteins as there is in one. Chris [ 17. May 2005, 21:38: Message edited by: Christopher D. Beling ]
IP: Logged
|
|
Jerry D. Bauer
Member
Member # 756
|
posted 18. May 2005 01:11
Andy, that paper was not to show design in anything, just to calculate the information content of the flagellum of E. Coli, K-12 strain.
It doesn't make any difference whether we use proteins, or the nucleotide sequences of protein coding genes that caused them. Each amino acid in a protein can be traced back to three of the 4 nucleotides present in the sequence (codons). Just divide the nucleotide sequence by three and you have the same math. It's six of one and half dozen of the other.
Look up the term specified in the Encyclopedia and I think that will help you. And yes, longer sequences are more specified. When a sequence reaches 500 bits, this becomes CSI.
I would have no idea what you mean by "processing" CSI as opposed to SI.
IP: Logged
|
|
Jerry D. Bauer
Member
Member # 756
|
posted 18. May 2005 01:47
Hey Chris:
Yes, it would have been more had I gone with nucleotide sequences on this (if I didn't consider amino acids at all) but that's not the main reason. If you'll read the literature that led up to my inspiration for that project, Miller ( see The Flagellum Unspun) Dembski's response to Miller, Matzke getting in the middle of it using the work of Berg and others, I think you will agree proteins are the current debate. I hope I offered some evidence to sway it.
But I would have to consider the number of proteins present or Yockey's (also Brillouin's and Thaxton's) formula: Omega= I^N simply would not work because N is the total number of proteins present (of the same type).
quote: There is surely no more information in two FliC proteins as there is in one.
Ahh...but there is. Surely you would agree that two books contain more information than either one of them would alone.
Also,I would invite Dr. Miller to discuss the paper if he happens to be reading in. [ 18. May 2005, 08:54: Message edited by: Jerry D. Bauer ]
IP: Logged
|
|
andyg
Member
Member # 415
|
posted 18. May 2005 12:20
Jerry - you misread me. I wrote "possessing", not "Processing":
What is your definition of "Specificity", and what - according to you - is the difference between possessing specified information, and possessing complex specified information?
You seem to be conflating "information content" with "specificity". According to the ISCID defintion which you referred me to, they aren't the same:
quote: The second component in the notion of specified complexity is the criterion of specificity. The idea behind specificity is that not only must an event be unlikely (complex), it must also conform to an independently given, detachable pattern. Specification is like drawing a target on a wall and then shooting the arrow. Without the specification criterion, we'd be shooting the arrow and then drawing the target around it after the fact.
According to you, a string of 499 random bits does not have CSI, but a string of 501 random bits does have CSI. Correct? [ 18. May 2005, 12:53: Message edited by: andyg ]
IP: Logged
|
|
Jerry D. Bauer
Member
Member # 756
|
posted 18. May 2005 21:37
So I did misread that. Sorry.
My mathematical definition for specificity (there can also be a yes or no answer to the simple question "Is this particular part in the system specified?"): Specificity is inversely proportional to the odds of an event occurring.
They didn't really go far enough with the classic archer example in the Encyclopedia. Let me expand on it.
If a skilled archer is blindfolded and stands a hundred yards from a huge wall, say the wall of a football stadium the size of the Astrodome, and is then asked to hit the wall with an arrow, we wouldn’t be surprised if he did hit it, because the wall is so large the odds are in his favor. In fact, the wall is an enormous target, and we might be surprised if he missed. This action would communicate to us as simple information when we observe the arrow hitting the wall.
Now let’s repeat the experiment, painting the walls of this same stadium in black and white squares, each measuring 10 feet square, in a ‘checkerboard’ pattern as illustrated in the graphic.

At this point, the archer is blindfolded and asked to hit a white square. Again, it would not be surprising if he did hit it because, providing he hits the wall at all, he has a 50/50 chance of hitting a white square as opposed to a black one. But when the arrow hits a white square, this information becomes a little more specified to us, because the odds of the archer hitting the white square are a bit more against him doing so than just hitting the wall.
Suppose this wall is then painted into an ever increasing number of smaller colored squares, first a wall of 4 squares, then 8 squares, 32 squares, 64 squares, etc., with each square being a different color. The odds of the archer hitting the color he is specifically instructed to hit will ever increase against him. With the simple two color checkerboard pattern, the archer had a one in two chance of hitting the specified color, or ½ = 50% chance. With the four color checkerboard pattern, he had a one in four chance, or ¼ = 25% chance of hitting the color and so on. These odds will continue to decrease as the squares increase in number until eventually there will be a point where the odds of him hitting the color he is asked to hit will become so astronomically large against him that it becomes overwhelmingly unlikely he will hit it. As the odds against the archer go up, so does specificity.
In fact, specificity is just the denominator:

There must be a barrier in which those odds become so high against the archer that the event becomes impossible for him to achieve.
There is and this is called the upper probability bound. Based on the number of particles in the universe, the shortest possible time unit (tP or Planck Time) and the age of the universe, it has been calculated that an event having odds against it of occurring greater than 1/10^150 cannot happen in nature and if we find this, we can certainly begin to suspect design.
If we take this barrier to base 2, we will come out in bits. Thus the upper complexity bound is 500 bits.
Here is the difference between specificity and complexity:
Specificity = 10^150 Complexity = 500
When considered together we have CSI: information so complex and specified that it cannot be a result of nature. [ 18. May 2005, 21:41: Message edited by: Jerry D. Bauer ]
IP: Logged
|
|
andyg
Member
Member # 415
|
posted 19. May 2005 12:45
While I appreciate your explanation, I would suggest that it is very different from Dembski's definition. quote: The basic intuition here is straightforward. A single letter of the alphabet is specified without being complex (i.e., it conforms to an independently given pattern but is simple). A long sequence of random letters is complex without being specified (i.e., it requires a complicated instruction-set to characterize but conforms to no independently given pattern). A Shakespearean sonnet is both complex and specified.
This was taken from his article Explaining Specified Complexity
In your example, the simple checkerboard pattern is specified but not complex. IF the black and white pattern is replaced with a random array of 10 different colours, it is complex but not specified.
I welcome Salvador's comments on this.
IP: Logged
|
|
Jerry D. Bauer
Member
Member # 756
|
posted 19. May 2005 17:19
quote: While I appreciate your explanation, I would suggest that it is very different from Dembski's definition.
Please mathematically state Dembski's version of it.
IP: Logged
|
|
andyg
Member
Member # 415
|
posted 19. May 2005 18:46
I wasn't aware he had stated it mathematically. I cited an article from him above. If you disagree with his definition, you should say why, and prefereably join in the discussion with Salvador on the other thread, because he is claiming that very short strings can have CSI. You can't both be right. ![[Smile]](smile.gif) [ 19. May 2005, 21:31: Message edited by: andyg ]
IP: Logged
|
|
Jerry D. Bauer
Member
Member # 756
|
posted 19. May 2005 21:32
Oh, Sal and I can say opposite things and both be right. We’ve been doing this for years and if you want to know who is always right in these conversations, just ask us and the answer will always come out: “both.”
I hoped you would go into the math and notice something. This is based on the work of Dembski, but it is quite different math than the original and there is a reason for this. Dembski’s initial proposal had a flaw in it. We can read about that flaw in the ISCID Encyclopedia as they are quite frank and up front with it:
“Criticisms of Dembski's notion of specified complexity often target the notion of specification. Critics argue that it is a subjective concept, highly dependent on the observer's background knowledge and therefore not reliable as a scientific criterion.”
It WAS subjective as it was not mathematically quantified. Shifting the math the way I did, we can very accurately calculate the specificity in the simplest systems to the most complex and shift the complexity of information to where it should be, over to Shannon’s bits.
Dembski actually did this, but I‘m not sure he realized he did. We all are familiar with his universal probability bound being a barrier in that any event with more than 1 chance in 10^150 of occurring cannot happen in nature without design. But this is the SPECIFICITY not COMPLEXITY because the specificity directly relates to the odds of an event occurring similar to his analogy of painting a tiny circle on a huge wall and having an archer hit it. He admits this is a highly specified event should the archer hit it, but it only is because the odds are staggering against him doing so. So the odds are the specificity.
He also defines another bound: "Accordingly, specified information of complexity greater than 500 bits cannot reasonably be attributed to chance. This 500-bit ceiling on the amount of specified complexity attributable to chance constitutes a universal complexity bound for CSI."
Note another term for this: universal COMPLEXITY bound and this is expressed in Shannon’s bits.
So Dembski nailed this from the beginning, he just did not do so mathematically. I am saying the same thing in math that Dembski said with words.
IP: Logged
|
|
andyg
Member
Member # 415
|
posted 19. May 2005 21:55
I think we should take this discussion to theother thread. But before I do, I want to suggest (again) that you are quite wrong in your idea of what constitutes design:
quote: We all are familiar with his universal probability bound being a barrier in that any event with more than 1 chance in 10^150 of occurring cannot happen in nature without design.
I am looking out of my window now, and I see a passing cloud. The cloud is made up of huge number of water molecules. The probability of those water molecules coming together in the atmosphere to form a cloud of that particular shape is well below 10^150. Are we to conclude that clouds are intelligently designed?
As others have pointed out, the probability of being dealt 13 hearts in a bridge hand is just the same as any other bridge hand. However, 13 hearts signifies something special in a way that any old hand does not. This requires some extra property, other than simple probability, to tell us about its significance. Just as when you say
quote: He admits this is a highly specified event should the archer hit it, but it only is because the odds are staggering against him doing so.
No - the event is specified not simply because it is improbable (after all - hitting any part of the wall is equally improbable in this example), but because there is something special about the target
Please direct your reply to the other thread. Thanks. [ 19. May 2005, 22:23: Message edited by: andyg ]
IP: Logged
|
|
Christopher D. Beling
Member
Member # 723
|
posted 23. May 2005 12:04
Jerry, Just replying to your quote: Ahh...but there is. Surely you would agree that two books contain more information than either one of them would alone.
I would like to refer you NFL (p129) quoting Dembski:
"For an example in the same spirit consider that there is no more information in two copies of Shakespeares Hamlet than in a single copy. - - - .....Our information-theoretic formalism therefore agrees with our intuition that two copies of Hamlet contain no more information than a single copy"
What do you make of this! But I am interested in that you say that Thaxton and Yockey include the total number of proteins. Why? Chris [ 23. May 2005, 12:05: Message edited by: Christopher D. Beling ]
IP: Logged
|
|
|