|
Author
|
Topic: Towards a simple definition of CSI
|
Irving
Member
Member # 535
|
posted 20. March 2006 18:03
quote: Therefore, there is some concepts that are part of Specified Complexity that are not covered by Shannon (Shannon is more general).
Shannon is less general...since there are concepts that are part of Specified Complexity that are not covered by Shannon. CSI = Shannon + [these other concepts]. CSI is Information about Information.
IP: Logged
|
|
Bruce Fast
Member
Member # 924
|
posted 20. March 2006 19:14
quote: quote:
Therefore, there is some concepts that are part of Specified Complexity that are not covered by Shannon (Shannon is more general).
Shannon is less general...since there are concepts that are part of Specified Complexity that are not covered by Shannon. CSI = Shannon + [these other concepts]. CSI is Information about Information.
Let me see, "there are concepts [(specifics)] that are part of Specified Complexity that are not covered by Shannon." CSI is more Specific, specific is the opposite of general, Shannon is more general. [ 20. March 2006, 19:15: Message edited by: Bruce Fast ]
IP: Logged
|
|
Irving
Member
Member # 535
|
posted 20. March 2006 19:50
quote: Let me see, "there are concepts [(specifics)] that are part of Specified Complexity that are not covered by Shannon." CSI is more Specific, specific is the opposite of general, Shannon is more general.
Only if there were concepts of Shannon, not covered by CSI would Shannon be more general. CSI is not more specific, it includes concepts "outside" of Shannon. Shannon is a "sub-set" of CSI. Shannon would be more general if it covered all of CSI and then some. Or would you prefer to say that CSI is more comprehensive than Shannon? A CSI pattern may be fully compressed ala Shannon, but the pattern itself [and it's compression] is not ALL of CSI.
In a Venn Diagram you have one of three options...
1. The CSI circle is fully within the Shannon circle.
2. The Shannon circle is fully within the CSI circle.
3. The CSI circle and Shannon circle overlap, but one does not envelope the other.
I'd say #2, and may be convinced of #3, but I wouldn't say #1.
IP: Logged
|
|
Christopher D. Beling
Member
Member # 723
|
posted 20. March 2006 19:53
Irving, Sorry, but the correct answer is #1. One must not get confused the concepts of CSI and Shannon Entropy. The Shannon Entropy of some code (irrespective of whether it forms any specified object after its implementation or not) is simply the measure of complexity for that code (or it's object). -If you like it is the code's POTENTIAL information bearing capacity. It has nothing to do with whether the code bears any meaningful (functional (specified)) information or not.
For example Yockey works out the Shannon Information (Entropy) for the iso-1-cytochrome c protein homologues as 233 bits in his book Information Theory, Evolution and the Origin of Life p75. However, there are a HUGE number of amino acid sequences having this amount of Shannon Information of which the VAST MAJORITY have absolutely no specification. Any string of ~54 completely RANDOM amino acids (most strings leading to absolutely no function) would have a similar Shannon Entropy (since one has ~4.3 = log2(20) bits of Shannon Info per amino acid). So we must agree that Shannon Information is the MEASURE OF COMPLEXITY and has nothing whatsoever to do with specification. It is the C part of CSI. Indeed we can define the complexity operator, Cop.
Cop(X)=-log2(P(X))
where X is ANY element of the configuration space, such that P(X) is the probability of getting X purely by chance. For an object to qualify for CSI its complexity component must have at least 500 bits of Shannon Entropy (or Shannon "information") this corresponding to a probability of ~ 10 E-150. Hope this helps. Chris [ 20. March 2006, 20:12: Message edited by: Christopher D. Beling ]
IP: Logged
|
|
Irving
Member
Member # 535
|
posted 20. March 2006 20:49
quote: Irving, Sorry, but the correct answer is #1. One must not get confused the concepts of CSI and Shannon Entropy. The Shannon Entropy of some code (irrespective of whether it forms any specified object after its implementation or not) is simply the measure of complexity for that code (or it's object). -If you like it is the code's POTENTIAL information bearing capacity. It has nothing to do with whether the code bears any meaningful (functional (specified)) information or not.
Sorry Christopher, but I don't buy #1. I understand what Shannon Entropy is. You are correct that Shannon is a measure of complexity...the C in CSI. BUT, CSI is MORE than just the C. It is the S as well (you can't ignore that). Which has been my point all along. CSI is MORE than just the numerical measure of the complexity of the Pattern. Any attempt to reduce CSI to just another fancy name for Complexity is doomed to miss the point.
IP: Logged
|
|
David L. Hagen
Member
Member # 323
|
posted 20. March 2006 22:36
Thanks for your thoughtful comments Bruce to my proposition: quote: Reverse Engineering can typically recover "embodied" information and convert it to "encoded" design information. (Anyone have a better word that "embodied"?)
You stated: quote: I appreciate the term "embodied" information, yet I radically reject the "reverse engineering can typically recover" concept (when you do not permit access to the "encoded" information that may linger within the embodied info.). Consider the following: [quote]1 - If you give a computer programmer the challenge of making a "functionally equivalent" Microsoft Word program, the result could be very functionally equivalent, but the source code will be wildly different.
Thanks for distinguishing between transformations between encoded systems, versus the difficulties between encoded vs embodied information.
Note my qualification: Typically recover “embodied” information . . .”
Here I would consider the “functionally equivalent” features to be the CSI or “Design Information,” while the variations in source code between implementations are “synonymous” systems that achieve that CSI. I.e., where the CSI is defined as the “patent claims.” (Compare Yockey (2005) describing as “synonymous” the multiple codons that translate to a given amino acid, which others refer to as “degenerate” info, or as “redundant” info.)
However, where “program logic and operands” is the CSI, then various programs that duplicate that “logic and operands,” whatever the code used to implement them. E.g., between various character sets.
Reverse engineering can “typically” recover the electronic design from examining electrical and common electronic components. Even computer chips need to have the masks protected by copyright as well as patents, because of the relative ease of reverse engineering. ----------------------- quote: 3 - Even an examination, and reverse engineering of Mt. Rushmore will produce blueprints, but those blueprints will likely carry significant differences to the originals, even if the resultant blueprints could do an excellent job of reproducing the president's faces.
From an engineering point of view, the “engineering design” required to form Mt. Rushmore can effectively be recovered by measuring Mt. Rushmore. That can be reduced to surface curvature definitions. Looking at deviations for smooth surfaces can obtain some measure of the tolerance in the original design specification. By contrast where we can examine 10,000 samples of an article, a much closer estimate of the uncertainty can be obtained. While the original “blueprint” would not be recovered, an equivalent or synonymous design specification can usually be recovered.
The larger the number of components that can be evaluated, the closer the tolerances of the original design specification can be recovered. ------------------- quote: 2 - If you examine my phenotype in every detail (restricted to not examining the DNA or RNA, examine the proteins all you want) you cannot recover my DNA. For instance, my recessive genes leave no mark in my phenotype (as I understand it).
I do not consider “recessive genes” as “information that is transformed.” Rather they are discarded information or a “non-conservative” transformation of encoded to embodied information. Furthermore, the conversion of DNA to mRNA is a “synonymous” conversion which is not a completely conservative conversion. It converts the function specification (assuming no mutations), but not all the information preservation features such as included in the synonymous factors of the code from 64 codons to 20 amino acid components.
Non-conservative information There are also substantial categories where embodied information was non-conservatively transformed from CSI, and is not recoverable. One way transformations are designed to be non-conservative and very difficult or in principle impossible to be reverse engineered. E.g., hashing to form a digital signature. --------------------------- To clarify “typically” in my previous proposition: 1) Encoded CSI can be transformed between conservative code systems. 2) Encoded CSI can be transformed to a subset of a synonymous code set. 3) Conversion of CSI from a synonymous code set to an equivalent synonymous subset code is only partially conservative of information preservation. 4) Encoded CSI that is conservatively transformed to embodied information can in principle be recovered by reverse engineering. 5) A increasing portion of encoded CSI that is non-conservatively transformed to embodied information can in principle be recovered as a function of the increasing number of embodied components that can be evaluated. 6) The full original CSI cannot be recovered from non-conservative transformations. 7) Transformations from an extremely large code space to a reduced embodied phase space are practically non-conservative.
----------------- DH Ed March 21 DNA is such an enormous synonymous code space that the transformation to smaller proteins sequence space results in partial loss of information protection. Folding proteins to final form often requires chaperone molecules which at least have information from different genes, and possibly structural information replicated apart from the genome. The protein folding issues, and time delays -from DNA to RNA to protein sequence to folded protein - result in effectively a one way hash or transformation of information in biotic systems. Thus it appears effectively impossible that environmental impacts on proteins would have any impact on the genome. This results in the “Central Dogma” of modern biology.
Correspondingly, it appears to be very difficult if not practically impossible within known computing possibilities, to go from a functional specification of a protein back to a DNA sequence sufficient to make that protein.
Thus for that to happen within the Universal Probabilty Bound stretches the imagination and the very definition of science and knowledge! -------------------------
By “Design Information,” I start with the output of numerous categories of engineering. Now need to think how to evaluate obvious design principles and how to apply that to evaluating biotic systems.
------------- Re: Chris & Irving on Shannon entropy vs CSI I understand Shannon Entropy ("Information") to be the measure of channel capacity per Chris's comments. Yet it has no content, and thus Irving's comments that CSI is "more" than Shannon. [ 21. March 2006, 22:04: Message edited by: David L. Hagen ]
IP: Logged
|
|
Bruce Fast
Member
Member # 924
|
posted 20. March 2006 23:46
I find your 7 points to be logically accurate. The most salient of them being point 7.
quote: 7) Transformations from an extremely large code space to a reduced embodied phase space are practically non-conservative.
DNA is such an enormous code space that the transformation to proteins as embodied information is practically a one way hash that is effectively impossible to transform back to a subset of DNA code space within the Upper Probability Bound. This results in the “Central Dogma” of modern biology.
David, to re-track this thread to it's original point, what is your understanding of CSI? Does CSI, as you understand it, differ substantively from what I have called CBI? If so, in what way?
How does DI differ from CSI, from CBI?
IP: Logged
|
|
Christopher D. Beling
Member
Member # 723
|
posted 21. March 2006 09:54
quote: BUT, CSI is MORE than just the C. It is the S as well (you can't ignore that). Which has been my point all along. CSI is MORE than just the numerical measure of the complexity of the Pattern. Any attempt to reduce CSI to just another fancy name for Complexity is doomed to miss the point.
Apologese to you Irving. I do agree with everything you have said here, but what I don't understand is why you don't then affirm option (1) which is surely the consequent. The family of all coding strings having Shannon Entropy ("information") H is 2^H and can be very large. For example the number of family members for the 64 bit Shannon "information" family is 2^64 approx =1 E+20 - which is very large. Thus the Shannon circle is a BIG circle enclosing many strings. The fraction of these strings that carries a specification (pattern) S - i.e. has both C and S is very small - (possibly only a single string - if the specification is very tight) - so the circle enclosing the CSI is very small and it clearly lies within the Shannon circle? So I think we must affirm that 1. The CSI circle is fully within the Shannon circle. Chris [ 21. March 2006, 10:14: Message edited by: Christopher D. Beling ]
IP: Logged
|
|
David L. Hagen
Member
Member # 323
|
posted 21. March 2006 21:55
Bruce CBI Complex Blueprint Information quote: CBI is any packet of information containing: information (order) with the nature of a blueprint (an independent detatchable pattern) and sufficient complexity (having limited compressibility even knowing the nature of the data) to extend beyond UPB.
quote: . . .what is your understanding of CSI? Does CSI, as you understand it, differ substantively from what I have called CBI? If so, in what way? How does DI differ from CSI, from CBI?
I think your CBI is effectively what I think of as “Design Information”.
I understand CSI to include the Encyclopedia Britannica and the complete works of Shakespeare. We understand these to be clear products of intelligence. Neither can be explained by natural law or by stochastic processes (the proverbial crew of monkeys on keyboards.) Yet I don’t think of these as under “Blueprints” or Design as typified by products of engineers etc. Thus, I consider CBI to be a subset of CSI, using more popular terminology of “blueprints.”
4) Blueprint, Template, or Specified? Long term we want to develop methodology to evaluate the genome: cf Yockey(2005): quote: The existence of the genome and the genetic code divides living organisms from non-living matter. There is nothing in the non-living physico-chemical world that remotely resembles the reactions that are determined by a sequence (i.e., the genome) and codes between sequences (i.e., the genetic code) that occur in living matter. . . . Even many scientist do not understand the distinctions between DNA, which is material, and the genome and the genetic code, which are non-material.
Unfortunately, Yockey and others get very upset at describing the genome as a "blueprint." e.g., quote: The genome is sometimes called a "blueprint" by people who have never seen a blueprint. Blueprints, no longer used, were two-dimensional, a poor metaphor indeed, for the linear and digital sequence of nucleotides in the genome.
Yockey (2005) Sect 2.2 (sic - should be 1.3) p 6.
Speaking as an engineer who happens to have seen a blueprint, such comments are an ad hominem attack that reflects a poor understanding of engineering drawings (“blueprints”) and/or manufacturing. e.g.,
Blueprints as encoding 3D Design Information 1) Specialist Coding: It requires training to learn how to read the engineering design information incorporated into an engineering drawing or “blueprint”. (e.g., show a blueprint to the average “unwesternized” tribesman and they would probably find it useful to start a fire.) 2)1D-3D Information: Blueprints can encode one, two and three dimensional information as vectors and symbols; 3)Convertible to 3D CNC code: Industrial processes can re-encode the blueprint encoded information into a "one dimensional" sequence of codes to run a "Computer Numerical Control" ("CNC") machine; 4)Translatable to 3D product: CNC machines can in turn convert the 3D information encoded in the "1D" sequence into a three dimensional product.
Thus, I understand a “blueprint” to usually embody three dimensional design information which can be encoded into a CNC sequence from which a three dimensional product can be constructed. This is parallel to the information in a genome that can be translated via mRNA to form a 3D protein. ( See also: [url]http://www.iscid.org/ubbcgi/ultimatebb.cgi?ubb=get_topic;f=6;t=000589;p=3#000030]”blueprint” is a very misleading term . . [/url])
The term “template” gets a similar undeserved diatribe: quote: The linear structure of DNA and mRNA is often referred to as a template. A template is two-dimensional, is not subject to mutations, nor can it reproduce itself. This is a poor metaphor as anyone who a used a jigsaw will be aware.
Yockey (2005) Sect 2.2 (sic - should be 1.3) p 6
Such statements apparently indicate a limited understanding of manufacturing or of molecular biology. It is true that “template” is commonly used in engineering for a two dimensional guide. However “template” is also used for three dimensional “pattern” or guide” or “model” in both engineering and molecular biology. Cf quote: Eng: 1 A two-dimensional representation of a machine or other equipment sued for building layout design. 2 A guide or a pattern used in manufacturing items. Mol Biol: The macromolecular model for the synthesis of another macromolecule.
McGraw Hill Dictionary of Scientific & Technical Terms 6th Ed 2003.
So I understand your “Complex blueprint information” as embodying design information that can be encoded into CNC sequence.
Any suggestions as to how to define terms and “not cause offense” with Blueprint or Template? Or we just explain all that Blueprint means and say it means what we define it to mean?
(On “Design Information” I am working to put together a detailed description of it and its numerous implications and the relationship vis CSI, Shannon Information, and Kolgomorov complexity. More on that in a future brainstorm.)
(PS I edited my comments on DNA in the previous post to clarify what I intended re the difficulties of obtaining a DNA sequence from a functional specification of a protein. But per your request we will keep discussions on that for another brainstorm.) [ 22. March 2006, 21:04: Message edited by: David L. Hagen ]
IP: Logged
|
|
David L. Hagen
Member
Member # 323
|
posted 21. March 2006 22:24
Christopher quote: The fraction of these strings that carries a specification (pattern) S - i.e. has both C and S is very small - (possibly only a single string - if the specification is very tight) - so the circle enclosing the CSI is very small and it clearly lies within the Shannon circle? So I think we must affirm that 1. The CSI circle is fully within the Shannon circle.
Conversely, could not some specification in CSI be incompressible? e.g. what if we assigned each of 6 billion persons a unique but random number not previously assigned within 6 billion, and added their gender. THe number can be reassebled from the order assigned and the numbers and genders. I think of the concatenation of those numbers and genders in order of assigning them to be CSI.
Measure of CSI <= Shannon If so, would not the CSI in bits then be equal to the Shannon information for that string? If so, then the measure of at least one CSI number would equal Shannon Info, and not be "fully within the Shannon circle".
IP: Logged
|
|
Irving
Member
Member # 535
|
posted 21. March 2006 22:29
quote: ...but what I don't understand is why you don't then affirm option (1) which is surely the consequent.
Because Shannon is only applicable to the complexity of the pattern, not the whole of CSI. A particular, embodied instance of an information pattern is within the Shannon circle. However, a particular pattern of information is NOT CSI. It is only CSI when it is an independent, detachable pattern from the physical processes which bring it about. Thus Shannon can characterize the "pattern," but not the independent and detachable nature of that pattern. There are attributes of CSI that are "outside" of Shannon.
I will conceed that ALL instances of a CSI pattern can have a Shannon Entropy, and that NOT every pattern that has a Shannon Entropy is not CSI...thus if dealing specifically with ONLY the Shannon Complexity of an information pattern, the Shannon Complexity of CSI is within the circle of Shannon. However, with CSI one cannot separate the Complexity of an information pattern from it's Specification. To do so is to render what makes a pattern CSI...meaningless...and thus "in such treatment" it is no longer CSI. Any characterization of CSI muct include BOTH the complexity AND the specification for it to have any value. The nature of Specification places CSI outside the Shannon circle...
IP: Logged
|
|
Irving
Member
Member # 535
|
posted 21. March 2006 22:44
quote: How does DI differ from CSI, from CBI?
I think the tie between Computer Software, DI, CBI, CSI, DNA, Language, and the Genome lies somewhere within the concept of symbolic representation. Blueprints use symbols to represent a physical instantiation. Software uses 1s and 0s to symbolically represent reality. Language uses symbols to communicate. It is the use of symbols (virtual reality), that is indicative of abstract reasoning. Symbols are an abstraction from reality and may constitute an independent, detachable specification.
Mt. Rushmore is a symbolic representation of 4 Presidents.
IP: Logged
|
|
Bruce Fast
Member
Member # 924
|
posted 21. March 2006 23:28
David L. Hagen,
I appreciate your analysis of the CBI term. A sense of "this is acceptible" is really important. I have no great bond to the CBI term, it just kinda squirted out because CSI wasn't quite fitting. I am personally quite comfortable with your DI terminology.
Doug Wedel's thread "Must we infer a designer..." discusses pi as CSI information. It may be, I still am having trouble grasping what CSI is that DI is not, but I think you agree that the value of pi is not DI information, correct?
IP: Logged
|
|
Christopher D. Beling
Member
Member # 723
|
posted 22. March 2006 20:28
Irving, I am agreeing with most of what you say but still have some difficulties - which I think come from a different perception on words. quote: Because Shannon is only applicable to the complexity of the pattern, not the whole of CSI
Could you describe to me what you mean by the "complexity of a pattern". I have a concept of a pattern as either being a code (or embodied code) that has some symmetry, or some algorithm forming it. If this is what you mean by "pattern" then "complexity of the pattern" must mean the total amount of Shannon Info (entropy) in bits required to prescibe (or describe) the particular pattern? Am I right?
If this is what you mean then there is the problem of compression - the "patterns" that have symmetry can be formed by algorithms - and thus may look as having high complexity (i.e. large number of bits - Shannon entropy), but in fact the Kolmogorov (compressed) complexity is much smaller.
I also have a problem with the word "complexity". It seems it is used in two ways by IDers - which leads to some confusion: (i) It can just be used in the sense of "complexity" measure - in which case it gives the information (in bits) required to prescribe (or describe) the event (object). Small number corresponding to low complexity and large number to high complexity. I think this is the normal usage. (ii) It can also be used in the sense used by Micah (12th March): quote: Complexity (Universal Probability Bound)
that is an event (object) that has complexity is one that has a complexity measure exceeding the UCB (Universal Complexity Bound) = being below the UPB (Universal Probability Bound) in terms of its chance occurance. Personally I think we should should not use the word "complexity" in this case - but talk about the event (object) as being "complex"
Do you agree on this double usage? Do you agree that we should only use "complex" and not "complexity" when we are taling of an event beyond the UCB? How are you using the word? Sorry if I am being a nuisance. Chris [ 22. March 2006, 20:35: Message edited by: Christopher D. Beling ]
IP: Logged
|
|
Irving
Member
Member # 535
|
posted 24. March 2006 17:01
quote: Could you describe to me what you mean by the "complexity of a pattern". I have a concept of a pattern as either being a code (or embodied code) that has some symmetry, or some algorithm forming it. If this is what you mean by "pattern" then "complexity of the pattern" must mean the total amount of Shannon Info (entropy) in bits required to prescibe (or describe) the particular pattern? Am I right?
To me you are correct. What is at issue, is that not every pattern is CSI. Even if it's complexity exceeds the UPB. So you can calculate complexity measures above and below the UPB all day long, and not be dealing with CSI.
The exact same pattern could be CSI in one environment, and not CSI in another. The issue is Specification.
IP: Logged
|
|
|