|
Author
|
Topic: algorithmic info, probability, etc.
|
Groucho
Member
Member # 605
|
posted 12. December 2002 18:15
Hey all. I'm new.
[Note: This message board was rejecting my post until I replaced all my parentheses with brackets. Is it because they were unbalanced? Hope that clears up any confusion.]
Could someone please tell me if the following is: a] obvious b] vacuous c] specious d] all of the above e] none of the above
If the answer is e, has someone already come up with something similar?
For all z, let H[z] denote the algorithmic information in z, a la G. Chaitin, i.e. the size of the smallest program Pz in binary bits that generates z for all input [i.e. where input is ignored]. Note that if z is random, then H[z] = len[z], where len designates the number of bits in z.
For all z, let prob[z] [i.e. the probability of z] be equivalent to 1/2^H[z]. The reasoning for this is as follows: Suppose z is an n-bit number whose value changes at random. An initial presumption might be that [for some specific constant C] prob[z = C] = 1/2^len[z], but this is not true. To say that z is changing at random is equivalent in one sense to saying that we are ignorant of what is effecting the change. However, whatever is effecting the change in z must have a description, and presumedly a programmatic description. So let Pz designate this program that is changing the value in z. [We are assuming that any “input” to Pz is actually part of Pz]. If the bits in Pz were changing at random, what is the probability that Pz would output C? Thus the probability of z = C is equal to the probability of the smallest program Pz that could generate c. [Note: Pz is provably random.]
Note: The distinction between a program and the input to a program is an arbitrary an artifical one. Both input and program can be treated as binary strings. The binary string input to a program could be treated as a numeric constant or decoded and interpreted by the program as more instructions. The binary string encoding the program could contain segments that were interpreted by the program not as instructions but numeric constants. We could take a program P of length n, cut off some arbitrary portion of it of length m and call this portion the input to P. In any case, we assume that a program is decoded on some abstract computer device understanding some rudimentary set of instructions [e.g. an Unlimited Register Machine]. In such a scenario there will of necessity be arbitrary conventions as to what portion of the string designates the input and which portion designates the program.
Let b denote a binary string encoding the biological world as we see it today. Let m denote a binary string encoding the mutations [ along with relevant static aspects of the physical world]. Let f_mb be a set denoting any program that will transform m into b. Let P_mb be the smallest such program. [In this instance note that we are assuming that P_mb has input, i.e. m., that is not considered part of P_mb] Let e denote the actual laws in our universe comprising natural selection that transformed the input mutation sequence m into b. Thus, e is an element of f_mb.
Theorem 1: The combined algorithmic information in E and m must equal or exceed the algorithmic information in b. More specifically: H[m] + H[e] >=H[b]. Proof: [This should really be obvious, but what the heck:] Lemma 1: H[P_mb] <= H[e]. P_mb desgnates the smallest program that will transform m into b. If the smallest way to transform m into b is to simply use e, then we can just designate P_mb = e. Thus H[P_mb] cannot be greater than H[e]. It is trivial to show that H[e] could be greater than H[P_mb]. Lemma 2: H[P_mb] + H[m] >= H[b]. Suppose the contrary. Then H[P_mb] + H[m] < H[b], which mean that H[P_mb] + H[m] < len[Pb]. By combining Pm [i.e. a minimal program that generates m] with P_mb, we can form a program Pb' that generates b without reading input m and is smaller than Pb. However, this is not possible, since Pb is the smallest program to generate b.
H[e] + H[m] >= H[b] follows directly from lemmas 1 and 2.
From the above, we can directly infer that prob[e]*prob[m] <= prob[b]. Look at this from a slightly different angle: It is easy to see for all f,x,z that, prob[f[x] = z] = prob[z]. To return to our example from above, assume z is an n-bit random number whose value is being changed - not but one program Pz - but by a program Pxz and an input value x. [Note that we are equating f to Pxz.] Thus for any two random values Pxz and x what is the probability that Pxz[x] [i.e. f[x]] will generate output z? It has to be equal to 1/2^H[z], i.e. 1/2^len[Pz]. [Maybe not a proof, but should be obvious enough.]
Thus, the probability of the biological world arising by pure chance [i.e. by a “tornado in a junkyard”] is no more unlikely than the probability of e and m together arising by chance. Note furthermore that if the algorithmic information in e [natural selection] is very small [i.e. its probability is large] then of necessity the algorithmic information in m [the mutation sequence] must be huge, and comparable to the algorithmic info in the biological world itself. OTOH, If the probability of m occuring is at all feasible, then we are of necessity inferring an encoding of natural selection of considerable size, and comparable to the algorithmic info in the biological world itself. From this fact could we not draw inferences regarding the “intelligence” of natural selection? Generally speaking, the “intelligence” of a computer program or biological organism is directly related to the complexity of its behavior, which is directly related to the amount of algorithmic information encoding it.
Dembski, if you happen to read this, I was wondering if you could humor me and comment on it. From previous discussions I’ve read, I get the vague feeling there is something fundamental about the nature of information that invalidates my conclusions above. [It may be obvious to you that in certain aspects of my discussion I am self-taught.] I know that a large random number requires a very large program to generate it, but nevertheless [as I noted] intelligence is usually directly equatable to complexity, is it not? Is it meaningless or pointless to infer the amount of algorithmic information in natural selection from the amount in the biological world? In any case, your comments on the above [including the probability issues] would be appreciated. [ 17. December 2002, 23:59: Message edited by: Groucho ]
IP: Logged
|
|
Moderator
Administrator
Member # 1
|
posted 12. December 2002 19:04
Hey Groucho. Welcome to Brainstorms!
IP: Logged
|
|
Groucho
Member
Member # 605
|
posted 12. December 2002 22:31
An additional observation:
P_mb was designated as the smallest program that will transform the input mutation sequence m into the biological world b. P_mb is thus a measure of how "close" m was to b to begin with. For example if m can be transformed into b via a very simplistic program [e.g. "flip all bits in m" "add 1 to m", etc.] then the mutation sequence m is practically the same as the output b. Also, len[P_mb] <= H[b]. [The proof is fairly trivial so I'll leave it off.] Thus the maximum distance seperating m from b is H[b], the length of P_mb if it could not exploit anything in m in generating b. [ 15. December 2002, 11:30: Message edited by: Groucho ]
IP: Logged
|
|
Groucho
Member
Member # 605
|
posted 13. December 2002 22:28
Before my initial post above, I had perused this forum only long enough to notice that Bill Dembski was a frequent contributor. However, I now realize that he doesn't actually ever respond to anybody.
But I'm somewhat surprised that no one else saw fit to respond to my post in any way. After further perusal today, I've noticed that some of the most preposterous, incoherent, obscurantist, pseudo-scientific nonsense [posted repeatedly in this forum by persons who shall remain nameless] elicits numerous replies. By comparison, my post is a model of clarity. Is what I've written really so irrelevant, so incoherent, or so ill-informed to merit no comment whatsoever? Is there no one in the [ahem] "International Society For Complexity Information and Design" with enough knowledge of Computability Theory to respond to my posts in any way [with objections, corrections, or whatever]?
Believe me, I'm not begging for replies. I guess I just should have lurked longer before bothering to post. [Admittedly, the clarity of my post was compromised by my failure to proof-read adequately, entailing the subsequent corrections.]
By this point I'm sure writing for an audience of one [me], but what the hell.
In my initial post, I stated that: H[m] + H[e] >= H[b],
i.e. the algorithmic information in the laws comprising natural selection combined with that of the mutation sequence must equal or exceed the algorthmic information in the biological world today.
That is certainly true [and maybe obvious] but the statement of it can be made stronger:
Suppose that m' is a mutation sequence such that e[m'] = b and for all x, if e[x] = b then H[m'] <= H[x].
IOW, m' is the smallest mutation sequence that e will transform into b. Then
H[m'] + H[Pmb] = H[b] and H[m'] + H[e] >= H[b]. [Pmb is defined as in my initial post - the smallest program that will transform m into b.] [ 15. December 2002, 11:31: Message edited by: Groucho ]
IP: Logged
|
|
Moderator
Administrator
Member # 1
|
posted 13. December 2002 23:29
Groucho, it might good to wait at least a week before complaining about people ignoring your post.
Also, the following comments don't work, as they are simply demeaning to others on this board.:
quote:
I've noticed that some of the most preposterous, incoherent, obscurantist, pseudo-scientific nonsense (posted repeatedly in this forum by persons who shall remain nameless) elicits numerous replies.
We don't require that every idea that pops into a person's head be precise or well formulated or even "mainstream." We're more into developing and fostering the scientific imagination...imagination being significant as it has been influential in the development of many successful scientific theories.
With all of that said, I'll see if I can illicit a response to your thread, since it appears to my unknowing eyes to be quite interesting. [ 13. December 2002, 23:30: Message edited by: Moderator ]
IP: Logged
|
|
Iain Strachan
Member
Member # 96
|
posted 14. December 2002 09:38
Hi, Groucho,
Sorry if it's been a while to respond to your interesting post. I saw it on Thursday and have been considering a response. However, bearing in mind that:
(1) I have a full time job that limits my activities in this discussion forum to my spare time; (2) I have a family that also takes great inroads into said spare time; (3) I never like giving a "knee-jerk" and not carefully thought out response when a considered response has been requested;
.. bearing in mind all that, you have to be patient in awaiting a response (from me anyway). I've noted that many people here post very frequently and at great length. Maybe their minds just work a lot faster than mine ;-)
Anyway, you wrote:
quote: H[e] + H[m] >= H(b) follows directly from lemmas 1 and 2.
From the above, we can directly infer that prob[e]*prob[m] <= prob{b}. Look at this from a slightly different angle: It is easy to see for all f,x,z that, prob[f[x] = z] = prob[z]. To return to our example from above, assume z is an n-bit random number whose value is being changed - not but one program Pz - but by a program Pxz and an input value x. [Note that we are equating f to Pxz.] Thus for any two random values Pxz and x what is the probability that Pxz[x] [i.e. f[x]] will generate output z? It has to be equal to H[z], i.e. len[Pz]. [Maybe not a proof, but should be obvious enough.]
Thus, the probability of the biological world arising by pure chance [i.e. by a “tornado in a junkyard”] is no more unlikely than the probability of e and m together arising by chance. Note furthermore that if the algorithmic information in e [natural selection] is very small [i.e. its probability is large] then of necessity the algorithmic information in m [the mutation sequence] must be huge, and comparable to the algorithmic info in the biological world itself. OTOH, If the probability of m occuring is at all feasible, then we are of necessity inferring an encoding of natural selection of considerable size, and comparable to the algorithmic info in the biological world itself. From this fact could we not draw inferences regarding the “intelligence” of natural selection? Generally speaking, the “intelligence” of a computer program or biological organism is directly related to the complexity of its behavior, which is directly related to the algorithmic information encoding it.
In another thread, I have extolled the advantage of "brainstorming", where no criticism is allowed for a period, during which the idea is allowed to evolve. Because of that, I'm hesitant to make a critical response, but as you requested any sort of feedback, I will state here that I think there is a flaw in your argument, but that perhaps examination of that flaw would help to better understand the Dembski-an methodology.
You state that prob[ e ]*prob[m] <= prob[ b ]
and then conclude that for b to assemble from a tornado in a junkyard is more likely than e & m arising simultaneously.
Two problems here:
(1) m is, as you say, a random sequence of events, causing mutations to occur at random. Therefore it must be generated by a long (and hence very unlikely) program & contain a large amount of algorithmic information. But the problem here is that m lacks what Dembski would describe as specificity . In other words any sequence of bits for m would adequately serve the purpose, if evolution is the way it happens. In other words, the information hasn't been specified, but e simply culls advantageous changes whenever the randomness of m throws up something interesting. So in your equation with Prob(m), one really needs to consider all possible, suitably random bit strings and add them all up. Since most bitstrings are random, this means Prob(suitable m) is close to 1.
(2) The "Tornado in a junkyard" is also a random process, which could be represented also by a bitstream, say t, which itself must also contain a huge amount of algorithmic information.
The other general problem I have here is that your conclusion of there being a huge amount of algorithmic information in m + e is that it appears to be the exact opposite of what is the case.
I would direct you to the paper:
A Computer Scientist's View of Life, the Universe and Everything
By Juergen Schmidhuber, one of the co-directors of IDSIA, a high profile Artificial Intelligence research institute in Switzerland.
In the paper, Schmidhuber argues that our universe is for the most part regular, i.e. all the electrons obey the same laws, things are predictable in general, like dropped toast always lands on the floor and not on the ceiling. As a result of this, S. argues that the universe is run by a relatively short algorithm on a Grand Universal Turing Machine "out there" somewhere.
It is the existence of the "short" computer algorithm that drives us that is key (IMO) to Dembski's ideas about detection of design. If the universe could be described as a bit-stream that is N bits long, and the program that generates the bit stream is M bits long, then the probability of this happening is 2^(M-N), because at most 2^M universes can be generated by a string of M bits in length. Now if M is short, then N>>M, and we get a near impossibility. In another paper:
J. Schmidhuber. Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks, 10(5):857-873, 1997
which you can get from this page
Schmidhuber argues that this is a near miracle. Since he doesn't like the implications for "design" that this implies, in the former paper, he argues that in fact all possible universes are being run in parallel on a vast computer belonging to the "Great Programmer". This is a kind of algorithmic "multiverse" theory that gets you round the design problem. Most universes are chaotic and unpredictable; we just happen to be in one of the very few regular ones.
I think the above references may be of some use to you in developing and refining your theories.
Cheers, Iain.
Afterthought, added while editing. I don't know if this is relevant, but I would also question your assertion that m necessarily contains a huge amount of algorithmic information. The reason is that while we may be able to detect regularity, we are unable to detect randomness; we can only say that we have been unable to detect a scheme (ie a short algorithm that generates the info). Suppose that m is in fact generated by a pseudo-random number generator with an extremely long sequence. For example, the Mersenne Twister random number generator that I use in Genetic Algorithm simulations has a period of 2^19937 -1, or around 10^6000. Since there have only been 10^150 events in the cosmos, then it is entirely possible to conceive that m could have been generated as the output of a Mersenne Twister. We could never detect it, because we would have to generate the entire sequence, but it is plausible to suggest that a "Great Programmer" out there might have coded up the MT algorithm to drive the evolution of our universe. The MT algorithm contains less than 10^4 bits of "algorithmic information" in the code, and also requires a state table of 19936 bits. Hence the total algorithmic info is < 10^5 bits.
BTW; Quick hint on how to use the bulletin board system. Instead of making further posts correcting errors in your previous posts, it will be clearer if you use the "edit post" function; just click the rightmost icon that appears at the top of your post; then we don't have to track through several posts to collate the information ![[Razz]](tongue.gif) [ 14. December 2002, 10:16: Message edited by: Iain Strachan ]
IP: Logged
|
|
gedanken
Member
Member # 594
|
posted 14. December 2002 11:51
I have two difficulties with the ideas presented here. Some of these difficulties actually apply to ideas presented in Dembski’s NFL, but I don’t have the book at the moment and can’t directly relate my comments to the book.
Issue I:
I find that the “Chaitin” information measure is not a clearly defined quantity, for the following reasons.
(And I wish to establish some credentials in this area. I have developed both computer languages and computer central processing unit designs. These are not used professionally, just academic and experimental development. I have also studied algorithmic complexity at some length. But I wanted to establish experience in the areas of concern here, because it may be of some importance to the argument.)
All “computer programs” are written in a computer language. How long the shortest program is depends on what the language is. We can either make a firm declaration of what the available languages are, or we are making an open declaration that the language can range over all possible computer languages. I welcome any other declaration or definition -- so long as it has an objective test to determine if a given computer “language” is in fact among those allowed for writing the program that generates the information in question to be measured.
I note that the computer “language” could either be a hardware determined language, or a software determined language. For example we have “C++”, “FORTRAN”, “FORTH”, “Java”, etc. And there are hadrware machine languages like the binary code of the Power PC, binary code of the Pentium (version X), binary code of the Motorola 68xxx used in older Mackintoshes. (And of course there are assembly languages that serve as intermediates to those machine languages.)
Now the problem: I could define a software, or in fact a hardware language that generates any output in an arbitrarily short program. For example I could define my machine as interpreting a leading zero bit as meaning that all the rest of the program is a bit stream on one of these previous languages. But if the leading bit is a “one”, then the system puts out string S, where S is any particular description in question. For example, if the first bit is a 1, the language system (machine or software computer language) could output a work of Shakespeare, and otherwise if the bit is a 0 take the program and run it as a Pentium program.
The problem is of course, who defines the computer language? There is no objective criterion thereof. Since I can define the computer language I can choose the language that generates output string S in 1 bit. That is the shortest program (in the chosen language).
Since any specific output could have been produced by a suitably defined computer language in 1 bit, the “smallest program Pz in binary bits that generates z for all input” is always a single bit program. Therefore H[z] = 1 for all possible output strings S.
Note that the issue was quite distinct from the issue of how one actually finds the shortest program that generates the output. While intractable in a practical sense, it is something that can be defined because all programs up to the shortest could, in theory, simply be searched (in a mathematical sense) until the program that generates the output string S is discovered. This would in practical terms always take longer than the age of the universe, but this is not an “in principle” difficulty. I suspect that the choice of computer language is an issue that has not been explored.
Issue II:
Now one can choose the language arbitrarily -- such as Fortran IV or some specification. But this shows the arbitrary nature of the measure of information.
Why in nature is there any “probability” to be associated with bits produced specifically in “Fortran IV”? (Or chose your own definition of the language L, why is there a “probability” associated with the specific language L?)
Suppose I chose as my language some suitably defined system for processing genetic information in a physical system. Then RM&NS could be my computer processor, but the output is not deterministic. (Did we fail to mention that computer languages for information measure in the “Chaitin” sense must be deterministic, or else they are not useful?)
So my point here is that the association of a “probability” with the bit stream of the language is arbitrary, and has no relationship to probabilities of events in the real world. This is made especially clear when one considers the arbitrary nature of the choice of the computer language for the information measure. While a mutation may be highly likely in real-world genetic processing of organisms, it may produce an event that takes enormous volumes of code in the chosen computer language to represent.
IP: Logged
|
|
Iain Strachan
Member
Member # 96
|
posted 14. December 2002 13:33
Gedanken,
I believe I can have a go at answering both of the issues you raised.
(1) Concerning a machine that either runs a program or outputs a string S (say a Shakespeare work) depending on the leading bit being a 1 or a zero. Your claim is therefore that any string can be output with a 1-bit program. The fallacy here is that the string S which is output has to be a part of the program itself, and therefore must be accounted for in the length of the program. So the following C program:
code:
void main(void) { char S[] = "Shall I compare thee to a summer's day?"; for(int i=0;i<40;i++) putchar(S[i]); }
is going to be longer than
code:
void main(void) { for(int i=0;i<40;i++) putchar('*'); }
by virtue of the fact that the string had to be stored in the program data. Similarly your machine that outputs the string S has to store it somewhere & this must be accounted as part of the program length.
(2) The point concerning the arbitrariness of choice of the computer language is more subtle, and is dealt with by Schmidhuber in the paper:
"Discovering Neural Nets With Low Kolmogorov Complexity and High Generalization Capability", which can be retrieved from: this web site containing Schmidhuber's online publications
The relevant points are what he calls the Compiler theorem and the Invariance theorem .
The compiler theorem states tht there exists a Universal Turing Machine U with the property that for every TM C there exists a constant prefix Mu_C such that the output of TM C for bitstring p, will be the same for the TM U with the bitstring Mu_Cp.
In other words, Mu_C is a Compiler that translates a program written in TM C to one that runs on TM U
The Invariance theorem follows from this, in that for two TM's U1 and U2, then K_U1(s) = K_U2(s) + a constant. (because the length of the compiler is a constant and independent of the length of the program).
Since the difference in complexity is only a fixed constant, we can therefore make meaningful comparisons. Two different "programs", translated into the code for TM U therefore share mutual algorithmic information implicit in Mu_C.
Let's take an explicit example of how the Invariance theorem would work. Consider you had a TM G (for "Gedanken" ) that had a single byte assembly primitive whose effect was to print out the entire works of Shakespeare. If that were the case, in order to run a program written in this language on TM U, you would have to have a compiler Mu_G that contained the entire works of Shakespeare embedded in the string somewhere.
I think that should address your problem.
As for the search for the shortest program, in the above paper, there are two actual examples (admittedly on toy problems) where this has been carried out. In the paper, a TM is defined, and a probabilistic search for the shortest program is set up. This involves a refinement to the notion of K-complexity, because there is the "halting problem" to be considered. This is got round by modifying the complexity measure to add in the logarithm of the number of CPU cycles executed; (i.e. a long running program is considered more complex). The new measure is called Levin complexity.
Hope this addresses at least some of the points you raised.
BTW apologies if you read a truncated version of this post; I pressed the wrong button while composing the original post and posted an incomplete version
Iain. [ 14. December 2002, 14:28: Message edited by: Iain Strachan ]
IP: Logged
|
|
Groucho
Member
Member # 605
|
posted 15. December 2002 10:32
Iaian,
[Note: I still don't know why I can't post parentheses.]
Before addressing your comments specifically let me state my primary divergence with Dembski's ideas. Specifically, he treats "design" as an attribute which must transcend any mechanistic explanation [Or has he backed away from this]. I find this point of view to be suprisingly naive, inexplicable and nonconstructive for someone with a Phd in mathematics.
If a meaningful discussion about any entity is possible, no matter how intelligent and god-like the entity may be, then there must in theory be a useful description for it. Certainly, a useful way to treat an entity is as a program, i.e. an algorithm, i.e. a mechanism. Does it somehow limit the discussion to treat an entity as determinstic and one that behaves contigently based on input to generate output? No it does not. Certainly anything we could describe using the English language can be described at least as effectively [if not more so] via an algorithm. Furthermore, by treating it as an algorithm, we have access to all the results of formal computability theory, [as well as undoubtedly many other branches of mathematics].
Even if we ascribe the ability to design only to entities with a non-physical component [e.g. man?], does not there have to exist in theory a description [i.e. an algorithm] of how this non-physical component [e.g. a "spirit"] acts, and how it interacts with the physical? In any case, it is laughable to say that animals can't design.
Certainly in scripture the Judeo-Christian God ascribes to himself a certain deterministic modality [i.e. "If you do this, I will do this", etc.] Thus whether or not we're talking about "God" or "natural selection" we're talking about a specific thing that can be described via an algorithm. [Incidentally , God and "natural selection" are both disembodied.] We may not know what either creative entity will do in every circumstance. However, whether we call the creative entity "God" or "natural selection" it must account for the biological world as it exists today.
Presumedly, a direct coorelary to the biological world must have existed in the mind of God to begin with [if not something even much more complex]. Thus an algorithm for "God" must be at least as large as any minimal description of the biological world [i.e. H[G] >= H[b]]. This brings us to natural selection. If we were to encode all the static laws abstract principles, etc that govern natural selection, how complex a program would that be in relation to the natural world? I contend that if natural selection is simplistic, then there is an implicit requirement for most biological complexity to arise by chance mutation. IOW, if natural selection is very small, then most observed attributes of the natural world must be solely a result of random genetic drift. However this itself would be a miracle.
The above argument is what I am attempting to formalize. Certainly, I sympathize with the aims of Dembski, and in a subtle and very indirect way, I guess Dembski and I are contending the same thing. But as I've stated, Dembski's apparent axiom that design cannot result from a mechanism severely hampers his approach, IMO.
Before answering your specific objections, it might be useful to flesh out my original model as well as provide a specific example [Both of which I maybe should have done to begin with. I'll bracket this section so you'll know where it ends:]
----------------------------------------
e[m] = b: e - natural selection program m - the actual mutation sequence b - biological world today Pmb - the smallest program such that Pmb[m] = b. That is, for all f, if f[m] = b then H[Pmb] <= H[f].
Also [additions to my original post]:
m' - the smallest mutation sequence such that e[m'] = b. That is, for all x, if e[x] = b then H[m'] <= H[x].
Pm'b - the smallest program such that Pm'b[m'] = b. That is, for all f, if f[m'] = b then H[Pm'b] <= H[f].
S_fy[x] - a predicate that returns true if f[x]=y and false otherwise
As stated previously, e.m,m',b,Pmb,and Pm'b are all encoded as binary strings.
Let m equate to an arbitrary length sequence of couplets [B1,T1],[B2,T2],[B3,T3],[B4,T4]... where for each [BX,TX] - BX is a bit position <= N, indicating a mutation [change from 0 to 1 or vice versa] where N is the maximum relevant extent of the biological world [U] [i.e. len[U] = N]. TX is an elapsed time from the start indicating when the mutation took place. It can be assumed that at the start all bit positions in U contain 0. The initial segment of m can contain a series of couplets representing the starting state of U at elapsed time 0 [including non-living attributes], e.g. [3,0],[13221,0],[15888,0],[15889,0]...etc. Each such "mutation" at time 0 represents a change at that bit position in U from 0 to 1.
Let the output b also be represented by such a sequence, but interspersed with changes made by natural selection [i.e. the program e]: [B1,T1],[Bb,Ta],[Bb,Tb],[B2,T2],[Bc,Dc],[B3,T3],[B4,T4]...
Let Un[m] refer to a simple utility program that will convert an arbitrary-length mutation or output sequence [m or b] into an N-bit binary sequence representing the final state of U it cooresponds to. Thus Un in effect removes all the time info from a sequence, and records the last state at every bit position.
Thus, e,Pmb, and Pm'b are programs that convert input mutation sequences to the output sequence b. For clarity, the output sequence can be imagined as Un[b].
In any case, b,m,m'e,Pmb and Pm'b are still all encoded in binary.
Now, for an example - The Dawkins Weasel. [Hopefully the temporary switch from binary to alphabetic strings won't confuse anyone. Imagine a couplet in m containing a letter, or whatever works for you.]
Let b = "METHINKS IT IS A WEASEL"
b is the current biological state, and the only thing we know emperically. What can be inferred about m and e?
First of all, what we can NOT infer, merely from the fact that b is an english sentence, is that e itself can parse english sentences. The only thing we can infer is that e, if it was given nothing to work with in the input mutation sequence m, was nevertheless capable to generate the weasel string above, and thus must actually contain the entire weasel string [or the most efficient representation of it.] OTOH, if e does NOT contain a representation of the entire weasel string, then some portion of it must be derived from the mutation sequence m.
Suppose that instead of the weasel string, b was a string encoding [say in Pascal] an english language parser. In this case if e contained all this code [which it would have to for example, if it couldn't obtain parts of it from the mutation sequence] then e would be an english language parser.
These are the formulas that were previously proved: H[m] + H[e] >= H[b] H[m] + H[Pmb] >= H[b] H[m'] + H[e] >= H[b] H[m'] + H[Pm'b] = H[b]
The only situation that concerns us is if H[e] < H[b]. Otherwise, e is more complex and improbable than the biological world itself, and nothing more needs to be said.
If H[e] < H[b], the question arises, is it possible that prob[[e[x]=b]] > prob[b]? IOW, is it possible that prob[e]*[prob[[e[x]=b]|e] > prob[b]?
To return to the weasel example, suppose e is a program accepting any string cooresponding to "XXXXXXXXXIT IS A WEASEL". Since we know that the output b was "METHINKS IT IS A WEASEL", we know that the mutation sequence contained "METHINKS " to begin with. If e is defined as above, let Seb[m] be a specification for every m such that e[m] = b [i.e. b = the weasel string]. Thus Seb[m] = "If in the mutation sequence m the last occuring mutations at bit positions 1-9 coorespond to the 'METHINKS ' then accept else reject." Note that other than the word "METHINKS ", all the additional verbiage in Seb[m] is superfluous and not used in calculating H[Seb]. [Take my word for it for now.] Thus, H[Seb] = H[b] - H[e] = 9. Note also that if m' is the smallest mutation sequence such that Seb[m'] [i.e. e[m'] = b] then H[m'] = H[Seb] = 9. Note also that for all x prob[e[x]=b|e] = 2^[-9], because the probability that any n-character sequence [n >=9] begins with "METHINKS " is = 2^[-9], and there is an even distribution among mutation sequences cooresponding to such a string.
Thus, prob[e[m]=b] = prob[e]*prob[e[m]=b|e] = prob[b] = 2^[9-H[b]]*2^[-9] = 2^[-H[b]].
IOW, the probability of getting a mutation sequence that will cause e to generate the entire weasel string is certainly greater than the entire weasel string arising by chance. But ONLY given the existence of e. When the probability of e is factored in, the probability of the weasel string occurring chance is the same.
I believe the above scenario holds in all relevant situations. [in fact it may have already been proved.] Factor in the probality of any encoding for natural selection, and the probability of any mutation sequence leading to b (via e) is the same as if b happened purely by chance. However, "purely by chance" is a scenario antithetical to science. [BTW, that's the reason that Dembski's calculations regarding the Universal Probability Bound and the emergence of the flagellum are irrelevant, as he assumes pure chance, which even evolutionists would reject.]
Anyway, here's an attempt at a formalism:
For all f,x,y exists z: if f[z] = y, H[y] = n, H[f] = n-p, p<n --> prob[[f[x]=y]|f] = prob[S_fy] = prob[m'] = 2^[-p]
[where m' is defined as the smallest value [in terms of H] s..t. f[m'] = y] and Sfy is a specification for all x's such that f[x] = y]
"proof" [for now] - for all f,x,y prob[f[x] = y] = prob[y], prob[f]*prob[[f[x]=y]|f] = prob[y], 2^[p-n]*prob[[f[x]=y]|f] = 2^[-n], prob[[f[x]=y]|f] = 2^[-p]
-------------------------------------
Iaian:
I'll respond to your specific objections [as well as G.'s] in a seperate post. I wanted to give some further background above that would clarify my responses. Thanks for your patience. [ 18. December 2002, 12:34: Message edited by: Groucho ]
IP: Logged
|
|
Groucho
Member
Member # 605
|
posted 15. December 2002 14:32
Iaian:
quote: 1) m is, as you say, a random sequence of events, causing mutations to occur at random. Therefore it must be generated by a long (and hence very unlikely) program & contain a large amount of algorithmic information. But the problem here is that m lacks what Dembski would describe as specificity . In other words any sequence of bits for m would adequately serve the purpose, if evolution is the way it happens. In other words, the information hasn't been specified, but e simply culls advantageous changes whenever the randomness of m throws up something interesting. So in your equation with Prob(m ), one really needs to consider all possible, suitably random bit strings and add them all up. Since most bitstrings are random, this means Prob(suitable m) is close to 1.
Actually, if e(m) = b, then not just any mutation sequence "would adequately server the purpose." The specification I have in mind is S_eb(x): "if e(x) = b then accept else reject." The probability that x is such a sequence (given e) is (I contend) equal to prob(b)/prob(e), which is determined by the relative complexity of e and b. Of course the longer the sequence x, the higher the liklihood of S_eb(x), but this would be true if e did nothing at all. In what I've previously read in Dembski's writings, he apparently assumes that prob(e(m)=b|e) = prob(b), IOW he implicitly assumes that even with a natural selection mechanism in the mix (i.e. some set of laws determining what's preferable), anything that emerges via mutation and NS must be completely random. However, that's not the case if e does anything at all.
quote: The other general problem I have here is that your conclusion of there being a huge amount of algorithmic information in m + e is that it appears to be the exact opposite of what is the case.
The equation I gave H(e) + H(m) >= H(b) is so necessarily true as to obviate the need for any "proof" at all (although The proof I provided is completely valid.) (You might want to review my initial post (the corrections are now edited in.))
Of course if m is large enough, it will by itself by virtue of being a huge random number, be incredibly complex, so the statement H(m) + H(e) >= H(b) is somewhat trivial. That's why I added: H(m') + H(e) >= H(b), where m' is defined as the smallest mutation sequence s.t. e(m') = b.
quote: In the paper, Schmidhuber argues that our universe is for the most part regular, i.e. all the electrons obey the same laws, things are predictable in general, like dropped toast always lands on the floor and not on the ceiling. As a result of this, S. argues that the universe is run by a relatively short algorithm on a Grand Universal Turing Machine "out there" somewhere.
It is the existence of the "short" computer algorithm that drives us that is key (IMO) to Dembski's ideas about detection of design. If the universe could be described as a bit-stream that is N bits long, and the program that generates the bit stream is M bits long, then the probability of this happening is 2^(M-N), because at most 2^M universes can be generated by a string of M bits in length. Now if M is short, then N>>M, and we get a near impossibility. In another paper:
In The Design Inference, Dembski attempts to prove that design cannot result form chance processes or "law", the latter being an analogue for "mechanism." In another paper, he purports to prove that functions can't produce information. In his analysis of genetic algorthms he appears to be backpedaling somewhat from a dogmatic position on the subject. I don't necessarily want this to turn into a bash Dembski session, (otherwise I would go hunt up quotes from him on the internet, which would be difficult anyway, since a large percentage of his published work is accessable by sale only.)
However, if "design" can't result from known processes it's due to attributes of the known or theorized processes (their algorithmic complexity for one example), not because mechanisms can't "design" things.
In any case, you mention the Universe being modelled as a "Universal Turing Machine". It certainly could be modelled that way or quite possibly a more complex way as well. Anything that matched the available data most efficiently would do. But if a mechanism proposed is theorized as an explanation for life, then it better be pretty complicated, or all the information about life was in its input (and was preserved purely at random by the mechanism.) [ 15. December 2002, 14:53: Message edited by: Groucho ]
IP: Logged
|
|
gedanken
Member
Member # 594
|
posted 16. December 2002 23:58
Thanks, Iain, for the link.
The point I was making was that whatever was needed to produce any particular output string could be embedded in the TM itself. The string does not need to be embedded in the “program”, since it is a characteristic of the TM itself. So TM S (for Shakespeare) could dump the works of Shakespeare when given a 1, and when given a 0 followed by any program for some other universal TM U it could simply process the string as U would do so. It can be shown that S is a universal TM.
So for this particular TM S, the output of the works of Shakespeare are triggered by the single bit 1, and thus the K.. complexity is 1 as measured by this particular TM.
However, as Iain noted, the string might have to be embedded in the compiler that translates to a different TM (perhaps in most compressed form possible for the new target TM). But I can construct a TM B (for “bad”) that makes any particular string that you choose turn out to require a long compiler to that particular machine from your choice of machine U. My point was that it depends upon the particular TM, even though there are constraints upon this.
The reason that this point cannot be generalized in a useful way is a different aspect that comes from versions of the “invariance theorem”. This is the limitation | C_S (x) - C_U (x) | < c_SU for all strings x, where the constant c_SU depends strictly on the universal Turing machines S and U.
(This point was not in the link provided, and I’m sorry I don’t have a link at the moment).
So for the given machine S a particular finite class of outputs can be reduced in complexity measure by a constant. In my example, the particular machine S reduced the complexity measure of the works of Shakespeare by a constant which happened to correspond to the complexity measure of those same works as measured by some other machine U, when compared to that machine U.
In other words, a particular Universal TM S can reorder the list of works of Shakespeare to have short “programs”, but it cannot do so for an infinite list of such works.
It could also reduce a particular class of problems by a constant number of bits. The problem of course is that even though a countably infinite class of output strings could be reduced in complexity (with some increase of complexity felt in the measure for all other strings to compensate), it strictly is reducing the number of bits of the measure by this constant. So a sufficiently complex string (more complex by greater than particular machine’s constant c_SU) will still get a high complexity measure.
Putting additional complexity into the TM itself only staves off the measured complexity for a finite class or to a limited degree -- it cannot arbitrarily reorder the partial order produced by the Kolmogorov Complexity measure. But I still think that it is very interesting that the complexity of a given string or a given finite class does not have a single fixed Kolmogorov measure.
Most specifically, if one is considering the universe to be modeled by a TM, that particular TM could be one that puts a particular finite class of problems into the “simple” category (differing from other universal TMs that declare that finite class to be “complex”). It could be, for example, that the steps to create first life are simple and thus more probable by the measure as taken with a particular TM. This could correspond to the rules of Darwinian evolution being simplified in this universe. [ 17. December 2002, 00:28: Message edited by: gedanken ]
IP: Logged
|
|
Groucho
Member
Member # 605
|
posted 17. December 2002 12:49
Gedanken:
quote: Now the problem: I could define a software, or in fact a hardware language that generates any output in an arbitrarily short program.
quote: The point I was making was that whatever was needed to produce any particular output string could be embedded in the TM itself. The string does not need to be embedded in the “program”, since it is a characteristic of the TM itself. So TM S (for Shakespeare) could dump the works of Shakespeare when given a 1, and when given a 0 followed by any program for some other universal TM U it could simply process the string as U would do so. It can be shown that S is a universal TM.
Are you saying that there is no objective way to determine if a program that plays chess is more or less complex than one that can only play tic tac toe?
I can't emphasize the following more strongly: The issue you have raised is a complete NON-issue. For various reasons its not convenient for me to do this now [I don't have a .pdf reader installed], but if you go to Gregory Chaitin's home page [http://www.cs.auckland.ac.nz/CDMTCS/chaitin/], and read the first few paragraphs in on one of his earlier papers on the subject he specifically addresses this issue.
I'm more familiar with Chaitin than Kolmogorov on the complexity issue. I realize that "appeal to authority" is not a legitimate tactic of argument in a forum such as this. But are you saying that somehow that both got it wrong, and there is no objective way to measure complexity?
In my initial post I wrote " In any case, we assume that a program is decoded on some abstract computer device understanding some rudimentary set of instructions [e.g. an Unlimited Register Machine]" A URM program is comprised from a set of only three instructions Z[N] - move zero into register N; I[N] - increment register N; J[N,M,I] - "if the values in registers N and M are equal then jump to instruction i." Any program can be written using the above instruction set. Thus, we're assuming that at the lowest level, there's an abstract computer that understands only three instructions [and it doesn't have the works of Shakespeare imbedded in it.]
In your reference to TM's, I think you've confused a Turing Machine with a Turing Machine Program. A TM only understands a rudimentary set of instructions [akin to a URM] and doesn't have the works of Shakespeare or anything else embedded in it. Anything embedded in a TM program would be reflected in the length of that program
quote: Most specifically, if one is considering the universe to be modeled by a TM, that particular TM could be one that puts a particular finite class of problems into the “simple” category [differing from other universal TMs that declare that finite class to be “complex”]. It could be, for example, that the steps to create first life are simple and thus more probable by the measure as taken with a particular TM. This could correspond to the rules of Darwinian evolution being simplified in this universe.
Anything a program can do is of necessity "simple" for it. The question is, if creating life is simple for the universe, how long a program in URM instructions [or C instruction or pascal insturctions or 80X86 assembly instructions] would it be? If the universe puts the creation of life in the simple category, then H[b] [the algorithmic complexity of life] would still reflect this, as H[b] = len[Pb] where Pb is the "simplest" program that will produce b.
Thus, the equation I introduced still stands:
H[m'] + H[e] >= H[b],
H[Seb] + H[e] >= H[b],
prob[e]*prob[e[x]|e] <= prob[b].
prob[e]*prob[m'] <= prob[b].
[Where m' is the shortest mutation sequence s.t. e[m'] = b, and Seb is a specification for all x s.t. e[x] = b.] [ 17. December 2002, 13:01: Message edited by: Groucho ]
IP: Logged
|
|
Groucho
Member
Member # 605
|
posted 17. December 2002 18:01
Iain: quote: Afterthought, added while editing. I don't know if this is relevant, but I would also question your assertion that m necessarily contains a huge amount of algorithmic information. The reason is that while we may be able to detect regularity, we are unable to detect randomness; we can only say that we have been unable to detect a scheme [ie a short algorithm that generates the info]. Suppose that m is in fact generated by a pseudo-random number generator with an extremely long sequence. For example, the Mersenne Twister random number generator that I use in Genetic Algorithm simulations has a period of 2^19937 -1, or around 10^6000. Since there have only been 10^150 events in the cosmos, then it is entirely possible to conceive that m could have been generated as the output of a Mersenne Twister. We could never detect it, because we would have to generate the entire sequence, but it is plausible to suggest that a "Great Programmer" out there might have coded up the MT algorithm to drive the evolution of our universe. The MT algorithm contains less than 10^4 bits of "algorithmic information" in the code, and also requires a state table of 19936 bits. Hence the total algorithmic info is < 10^5 bits
I just wanted to acknowledge this point and admit I'm still not sure I see the relevance [so maybe you can clear it up for me.]
The only possible relevance of MT I can see is if it meant for example H[b] was much smaller than we presume [i.e. by using MT to encode it], in which case b would have a much higher liklihood of arising by chance. I don't know exactly how MT works but I think I can surmise adequately for my purposes. If MT's state table is 10^5 bits in length, then it can generate at most 2^[10^5] distinct outputs. Say MT is encoded such that you can provide it the state-table S as input containing any arbitrary content, along with N, a value indicating the number of bits to return as output before stopping. Thus, for a program P, if len[P] = N and N >= len[S], then prob[exists S such that MT[S,N] = P] <= 2^[S-N] IOW, if len[P] = 10^8 bits for example, then the chances of there even existing a value for S such that MT[S,N] = P is less than 1 in 2^[10^8-10^5]. Thus the possibility of any program P being minimally encoded by MT becomes vanishingly small very quickly, as len[P] becomes greater than len[S]. The probability of choosing an S at random [assuming one exists] such that MT[S,N] = P is 1 in 2^[10^5]. Factor in the probability of such an S even existing and its 1 in 2^[10^8]. But this was the same probability of P to begin with.
Thus I don't see the relevance of assuming a mutation sequence output by MT. Perfect randomness would give you an identical probability of hitting anything by chance. [ 17. December 2002, 21:46: Message edited by: Groucho ]
IP: Logged
|
|
gedanken
Member
Member # 594
|
posted 17. December 2002 23:23
Groucho said:
quote: Are you saying that there is no objective way to determine if a program that plays chess is more or less complex than one that can only play tic tac toe?
My question is whether Kolmogorov or Chaitin complexity qualifies as a well-defined measure of that question. I’m claiming that there can exist computers in which the program to play chess is actually simpler than the program to play tick tac toe. That computer would not be a URM or 80x86, but some theoretical computer could have that property -- in fact a countably infinite set of computers would have that property (considering various TM’s as those computers, for example.).
I think that we will agree that Kolmogorov or Chaitin complexity will eventually go up for a sufficiently large and complex string. There is no universal TM for which one cannot find a string that is more complex than any particular given string. Thus these measures do provide some sort of partial order property.
However I am questioning how well ordered these measures actually make the set of all possible strings. Since there can be great variability in how individual machines produce the measure, I am questioning how relevant the measure actually is.
I do believe that one can specify a very “primitive” architecture, such as a particular set of processors like Groucho did (such as the URM, or possibly an 80x86). Then, since a specific equivalent Turing Machine has been specified, there is a completely defined order (not partial order) on the complexity of all output strings. (Limited only by our ability of finding that particular order because of the halting problem.)
quote: In your reference to TM's, I think you've confused a Turing Machine with a Turing Machine Program. A TM only understands a rudimentary set of instructions [akin to a URM] and doesn't have the works of Shakespeare or anything else embedded in it. Anything embedded in a TM program would be reflected in the length of that program.
Actually you come close to understanding my point here. (I am in fact explicitly moving complexity from the program to the TM or computer.) The articles that discuss the Chaitin or K.. complexity assume exactly what you said -- that the TMs can be considered to be equivalent because they differ from any other universal TM in their measures by at most a constant depending only on the pair of machines. The problem is that constant can be arbitrarily large! There is no mathematical statement that there is any limitation on the complexity of the TM in question. In fact the proofs of the mathematical relationships depend precisely on leaving the TM unbounded in its properties except by precisely those in the proofs. And the articles fail to address this problem.
Nowhere in those articles is there any claim about the complexity of the TM or computer itself. If you can find one, please point that out to me -- where is there a claim that the TM is “simple”? Many of the aspects of the proofs of properties of TMs involves allowing completely arbitrary levels of complexity in the TM itself -- and the proofs would break down if one placed any limits on the size and complexity of the TM’s themselves. Since this is the basis for the mathematical proofs, I find that I am not in error to propose to use exactly those assumptions in examining the results of those proofs or that come from those proofs. Without some limitation or investigation of the complexity of the TM itself, these measures are not mathematically defined as making such a limitation!
(There may indeed be an extension of the K.. or Chaitin complexity that also integrates some measure of the complexity of the TM itself -- but then the use of such limits must be explained as to why they are relevant to what is being modeled and why they are applicable.)
Groucho’s article starts out with the issue of bits to represent a computer program in a particular computer model, and discusses probability of those bits as measuring the likelihood of that particular program arising. (One must ask -- how do programs arise? How is this related to real-world events in real-world systems that process events in any way.) Then the subject jumps to biological world -- taking the analogy of the language of the computer as a method of analyzing the chances of the process occurring in the real world of biology.
My point is that the physical world operates with relationships that we have been able to observe, and those relationships seem to be very complex. Physics, as much as it has attempted to find simple all-encompassing rules to understand everything has failed to find such rules that stay simple as one goes from realm to realm like macro to sub-atomic. (Even though there are also tremendous success stories in finding simplicities in nature, this has not born out in understanding that all of nature is simple.)
One can make the claim that the real world complexity can be compared in some way to a specifically simple TM or computer for purposes of understanding the probability of events in this world. But with the underlying complexity of the physical world, I find that the argument has not been justified that the simple TM or computer model (such as URM) is appropriate for judging probability in this physical world that seems to have enormous complexity in its relationships. And if I am free to choose the TM or computer model with which to make the measurement, I can change the order pretty much at will for reasonably finite sets of objects (modeled as bit “strings”).
Without finding an association that links the probabilities of bits in the model to random events in the real world, I am unconvinced about the efficacy of this type of model of complexity as relevant to finding probability of events. How do we justify the particular “simple” computer model as appropriate? This has not been done -- and I simply show the consequences of this lack of justification on results coming therefrom.
From Chaitin’s article “On the intelligibility of the universe and the notions of simplicity, complexity and irreducibility” (from Chaitin site:
quote: But, I must confess that AIT makes a large number of important hidden assumptions! What are they?
Well, one important hidden assumption of AIT is that the choice of computer or of computer programming language is not too important, that it does not affect program-size complexity too much, in any fundamental way. This is debatable.
…
(Emphasis added in last sentence, emphasis original in first sentence.)
“This is debatable”. I am debating it, in the current context of usage.
---
But I also want to point out that my post is directed at claiming that Chaitin measure is not coherent, not that the results Grocho introduced do not follow from the assumptions made. I feel that to continue this debate in this particular thread would really introduce two separate thought threads into the same forum thread.
I was responding in more detail because posters implied that what I said was not true, rather than saying that the statements were not relevant to this particular line of thought. I hope the above post clarifies my point, and if someone still disagrees with my point we can continue the discussion. But I would think that we could otherwise delay until and if this becomes relevant to the development of the original theme.
I can see how this can have some relevance to further issues that may arise in the future. I especially think that the understanding of how programs transform strings (as opposed to simply generate strings) is extremely relevant to this discussion. And this process is exactly how the proof of the Compiler and Invariance theorems are developed. So the issues that are at the heart of my issue may be important, even if the specific point is not yet needed. [ 18. December 2002, 01:26: Message edited by: gedanken ]
IP: Logged
|
|
gedanken
Member
Member # 594
|
posted 18. December 2002 01:37
I think that there is another highly relevant issue here. (And I’m making a separate post because the thought is really on a different track, though it is also bound up in the issues of the exact details of how K.. theories are developed.)
A very simple computer program could be:
If( rnd() > 0.5) write 1 to output tape, else write 0 to output tape. If haven’t reached terminal count, repeat previous steps.
The problem is this is not a valid Turing machine, according to the theories presented. The reason is there is no “random” function in a TM!
So a truly random sequence of events (as are predicted by modern Quantum Mechanics) cannot be represented by a TM.
This is at variance with the ideas presented in some referenced papers in which the universe is considered to be a simple TM. Since the universe is essentially random, and a TM according to these theories are essentially deterministic, we have an essential conflict. Pseudo-random generators are not “random” in the Quantum Mechanics sense, since they are essentially predictable and repeatable if one knows the input state. The same TM with the same input tape always creates the same output tape (presuming it halts).
So the output tape generated by my essentially very simple program above (with a “truly” random generator) would take a very large program to represent on most TMs if they are required to A: be fixed and deterministic, and B: to produce the identical output rather than the possibility of such output.
IP: Logged
|
|
|