|
Author
|
Topic: Who came up with this
|
sid
Member
Member # 1269
|
posted 23. April 2004 00:07
Let P(X) denote a computable predicate, i.e. a property, i.e. a program i.e. a function that returns true if X has property P and false otherwise. (Note: P *does not* denote "probability" in this thread.)
Let Pi(X) be such a property (denoting for example "X is intelligent").
Let Pn(X) be such a property (denoting for example "X contains natural selection laws")
Let E and B denote specific binary strings.
As binary strings, E and B can be interpreted as programs (decoded for example according to a rudimentary formalism like a Turing Machine or Unlimited Register Machine). Assume that E is a program that when executed outputs B. The string E contains any input it uses, i.e. let L(M) be a synonym for E, where M indicates the input of the program and L the code. L could refer to an intitial portion of the binary string E, and M the remainder of the string. (M for example could denote "mutations" and L, "laws"). Basically, everything E relies on as a starting state is contained within it, and E does not modify itself. Any memory that E modifies at run time is *not* considered part of E. ( E is "input-less" as described by G. Chaitin).
The following is according to elementary Bayesean probabability:
if prob(Pi(B)|Pn(E)) > prob(Pi(B)) then
prob(Pi(B) >= prob(Pi(B)|Pn(E))*prob(Pn(E))
IOW, assume that if natural selection has output B, then the probability of B containing intelligence is greater than with just blind chance. Then the probability that B is intelligent assuming we don't know *anything* about E or B is greater than or equal to the probability that B is intelligent assuming E contains natural selection laws *times* the prior probability that E contains natural selection laws.
So, for example, just pick some binary string B at random. Is it intelligent?. That's an example of intelligence occuring by blind chance. Now assume we know that B was generated by natural selection. In that case the probability of B being intelligent could conceivably be higher. But the chance of the thing that generated B containing natural selection has to be factored in. When you do that, you have greater chance of getting intelligence in B through blind chance.
PLEASE someone tell me you understand the above, and furthermore, you understand its significance. I tried this out in talk.origins, and not a single person over there had a clue.
Now, continuing on:
We have said that E and B are binary strings, and furthermore E is an input-less program that outputs B. According to G. Chaitin, E embodies a description of B. Therefore, a description of E cannot be smaller than the smallest description of B. Thus the smallest description of the natural selection laws and mutations which led to the biological world of today cannot be smaller than the smallest description of the biological world.
Furthermore, presumedly E describes itself as well. But it also describes B. So in essence the same description can be used for E and B. So in essence E and B are the same thing. So any time you have a set of conditions E that yields a specific outcome B, E and B are in effect the same thing.
(The following is just some informal ruminations.) If something E0 caused E, then the same principle applies to E0. Any cause will directly equate in a meaningful sense to the thing it caused. Now, this is not to imply that something *must* have a cause merely because it has some property or the other (e.g. Pi). Rather, the above is relevant only assuming the thing in question *did* have a cause. If something (e.g. "God") has always existed then by definition it doesn't have a cause. Thus G might output E0, although G is a starting state, without cause. (But we know for example that the biological world has not existed forever and thus presumedly had a cause.)
PLEASE someone tell me you understand the above, and furthermore, you understand its significance. I tried this out in talk.origins, and not a single person over there had a clue.
FURTHERMORE, I was wondering if anybody knows who originally came up with all this. I may have put my own spin on it, but I first read about it on the web a few years ago. I have gone looking for it since without success.
(The following can be inferred from well-known defintions of G. Chaitin. I am listing it here for the sake of completeness, but the preceding can be understood without reference to it.)
The probability that E contains natural selection laws, assuming we don't know anything about E, i.e.,
prob(Pn(E)) =
1/2^len(n_min),
where len(n_min) is the number of bits in the shortest encoding for an input-less program that outputs a string containing natural selection laws. [ 23. April 2004, 02:28: Message edited by: sid ]
IP: Logged
|
|
Claire
Member
Member # 725
|
posted 23. April 2004 01:05
Sid,
I understand your post. You said "just pick some binary string B at random"
"some". Complexity
"binary". Intelligence
then
"some binary". Context change then information (meaning with two word concepts).
"at random". Self created random. Design.
Claire [ 23. April 2004, 01:10: Message edited by: Claire ]
IP: Logged
|
|
sid
Member
Member # 1269
|
posted 23. April 2004 02:30
Claire -
I don't have a clue what you're talking about.
IP: Logged
|
|
Rex Kerr
Member
Member # 632
|
posted 23. April 2004 04:24
Claire's posts have a way of being a little cryptic.
In any case, sid, I understand your post just fine, or at least the mathematics. However, you don't need the conditional in your Elementary Bayesian probability thing. For *any* relationship between p(Pi(B)|Pn(E)) and p(Pi(B)) we have that
p(Pi(B)) >= p(Pi(B) & Pn(E)) = p(Pi(B)|Pn(E))*p(Pn(E))
Given that, I'm not sure what the significance of the first part is supposed to be.
In the second part, how can we assume that E describes E? E is E, but that's not the same as saying that it describes itself. In fact, if E is deterministic, and o(E) = B, then E describes itself iff E=B. But this property that o(F) = F (a fixed point of the function-output operator) is a strange one. It's satisfied by self-replicators, but if F is a random binary string then p(o(F) = F) is probably pretty low, depending on what the o(.) operator does. (Or, in other words, what machine you run your program on.)
And, most importantly, if B is something you wish to describe, you can't assume that the smallest E such that o(E)=B also has the property that o(E)=E. If it did, then B=E, which is quite some trick for a random string unless o(.) is a very boring processing environment. (I.e. "copy".)
So given these problems, I don't think I can see the significance of the second part.
Finally, if E is a fixed length N, and e(M) is the number of binary strings F of length M such that Pn(F), then
Pn(E) = 1/2^N * e(N)
and if E has a distribution of lengths given by the probability distribution p(N), then
Pn(E) = sum( N=1..infinity , p(N) * 1/2^N * e(N) )
which only reduces to your formula if p(N) = 0 for all values except N = len(n_min) as you define it, and then only if e(len(n_min))=1. If e(N)*p(N) rises fast relative to 2^N, then the formula won't even be a good approximation.
IP: Logged
|
|
sid
Member
Member # 1269
|
posted 23. April 2004 15:33
Rex Kerr wrote: quote: Finally, if E is a fixed length N, and e(M) is the number of binary strings F of length M such that Pn(F), then
Pn(E) = 1/2^N * e(N)
and if E has a distribution of lengths given by the probability distribution p(N), then
Pn(E) = sum( N=1..infinity , p(N) * 1/2^N * e(N) )
which only reduces to your formula if p(N) = 0 for all values except N = len(n_min) as you define it, and then only if e(len(n_min))=1. If e(N)*p(N) rises fast relative to 2^N, then the formula won't even be a good approximation.
First of all, if you could clean up the above so I can make sense of it, I would appreciate it.
You say that, "Pn(E) = 1/2^N * e(N)" and "Pn(E) = sum( N=1..infinity , p(N) * 1/2^N * e(N) )"
However, *I* defined Pn as a *property*, that is a predicate that returns true or false, "Let Pn(X) be such a property (denoting for example "X contains natural selection laws"). Furthermore, I said that I would not use P to refer to probability, but instead I was using "prob(...)" (so what is p(N), for example).
Your response above was to me defining the probability that E contains natural selection laws as, "prob(Pn(E)) = 1/2^len(n_min) "
This follows directly from *well-known results* of Gregory Chaitin who (along with Kalmogorov) is the author of algorithmic information theory.
Picture a universe where if you find a binary number X you can assume it was output by some program. [Note: when I say program I will mean *input-less* program as defined by Chaitin, unless stated otherwise.] Then the probability of X is the probability that some arbitrary program will output it, and that equates to 1/2^len(Px), where len(Px) is the number of bits in the shortest program that outputs X. This is a *well-known result*, and it is proven by Chaitin.
Now, the above assumes we don't know *anything* about the program that outputs X. We could presumedly have some knowledge (Pz(Y)) about the program Y that output X (where Pz is some property that Y posesses), such that prob(X|Pz(Y),o(Y)=X) > prob(X).
Such a model translates neatly to virtually any real-world scenario. For example, in an evolutionary paradigm, an ending biological state is the result of a series of mutations interacting with a rudimentary set of intrinsic selection rules. Computability theory tells us that the mutations and laws can be encapsulated in a program E. (To clarify, actually to correct what I have said previously, E *only* contains instructions. The final state of any memory it modifies at run-time is in effect E's output. We *do not* include in E's output any bit of memory that E did not directly modify.) Thus while E is running it modifies B (the biological world), and the ending state of B when E halts is the output.
So to sum up with another example:
What is the probability of Pi(B), that is the probability that B is intelligent assuming we don't know anything about B or E (the program that output B)?
prob(Pi(B)) = 1/2^len(i_min),
where i_min is the shortest program that outputs a string that is intelligent.
quote: p(Pi(B)) >= p(Pi(B) & Pn(E)) = p(Pi(B)|Pn(E))*p(Pn(E))
Given that, I'm not sure what the significance of the first part is supposed to be.
First of all, I want to emphasize that if B (the biological state of the world today) is finite then *purely from a logical standpoint* we can certainly assume that B was output by a finite describable mechanistic process. This is of course the assumption with evolution.
The point of the Bayesian formulation, p(Pi(B)) >= p(Pi(B)|Pn(E))*p(Pn(E)), is essentially to point out that even assuming a mechanistic cause for B, you don't get something for nothing.
Evolution doesn't say Pi(B) happened by blind chance. However, if evolution postulates some aspect of the natural world (e.g. Pn(E)) that *increases* the liklihood of Pi(B) over what you would expect with blind chance, the total liklihood of Pi(B) and Pn(E) is *decreased* by the same amount. So, you still have an aggregate event with the same liklihood as Pi(B) occuring by blind chance.
Do you understand what I am saying? Is this self-evident to evolutionists? Why is it irrelevant?
(And *please* don't bring up multiple universes, or some other esoteric B.S.)
quote: In the second part, how can we assume that E describes E? E is E, but that's not the same as saying that it describes itself. In fact, if E is deterministic, and o(E) = B, then E describes itself iff E=B. But this property that o(F) = F (a fixed point of the function-output operator) is a strange one. It's satisfied by self-replicators, but if F is a random binary string then p(o(F) = F) is probably pretty low, depending on what the o(.) operator does. (Or, in other words, what machine you run your program on.)
And, most importantly, if B is something you wish to describe, you can't assume that the smallest E such that o(E)=B also has the property that o(E)=E. If it did, then B=E, which is quite some trick for a random string unless o(.) is a very boring processing environment. (I.e. "copy".)
So given these problems, I don't think I can see the significance of the second part.
I'll grant you that the argument I used to equate E and B was a little informal and haphazard. However, the other thing I asserted regarding this, "Therefore, a description of E cannot be smaller than the smallest description of B" is absolutely true. E is a description of B. Thus E obviously cannot be smaller than the smallest description of B. Furthermore, the smallest descripion of E (call it e-min) cannot be smaller than the smallest description of B, because then e-min would describe B as well, but e-min is smaller than b-min, thus a contradiction (This second claim is proven more completely by others, but it should be kind of intuitive.)
So in a backward chain of causality, were "B is caused by E is caused by E0 is caused by E1 is caused by E2..." none of the referenced programs are smaller than b-min. Also (and this is demonstrated more convincingly by others) E,E0,E1,E2 are all descriptions of B.
Now as far as equating E and B, I may have to go hunt up some quotes to make this more convincing. But another way I personally might state it is, E can be transformed into B by a very trivial process, namely the universal program U, where U just decodes E into URM instructions and runs it. (The URM instruction set contains only three instructions). But U is only a few hundred bytes long (at most) Whereas E and B will be astronomical. You may not find this all that compelling. I may have to go hunt up some quotes (part of my problem, as I stated in my original post, since I don't remember who came up with all this). [ 23. April 2004, 16:20: Message edited by: sid ]
IP: Logged
|
|
sid
Member
Member # 1269
|
posted 23. April 2004 20:04
Rex Kerr wrote: quote: Finally, if E is a fixed length N, and e(M) is the number of binary strings F of length M such that Pn(F), then
Pn(E) = 1/2^N * e(N)
and if E has a distribution of lengths given by the probability distribution p(N), then
Pn(E) = sum( N=1..infinity , p(N) * 1/2^N * e(N) )
which only reduces to your formula if p(N) = 0 for all values except N = len(n_min) as you define it, and then only if e(len(n_min))=1. If e(N)*p(N) rises fast relative to 2^N, then the formula won't even be a good approximation
I don't want to split hairs about errors in your notation above, e.g. I assume by Pn(E) you mean Prob(Pn(E)). We definitely don't assume E is some specific length. In any case, I appreciate your reply.
However, I must assume that if your formula is correct it reduces to mine, prob(Pn(E)) = 1/2^len(n_min), as can be inferred from the following quote from G. Chaitin:
quote: "The second basic idea is that one must look not just at the size H(x) of the smallest program that calculates x, but also at the probability P(x) that a program picked at random calculates x. In other words, to understand the basic properties of the program-size complexity H(x), you need to study the algorithmic probability P(x), which is the probability that a program generated by coin tossing calculates x. This probability is well defined precisely because I stipulate that programs must be self-delimiting. And it turns out that algorithmic information theory is really a constructive version of measure (probability) theory, and that *THE SIZE OF THE SMALLEST PROGRAM FOR CALCULATING X IS ESSENTIALLY THE -LOG2 OF ITS ALGORITHMIC PROBABILITY, THE PROBABILITY OF CALCULATING X BY CHANCE* [emphasis added] H(x) = -log2 P(x) + O(1). " http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait0.html
IP: Logged
|
|
Rex Kerr
Member
Member # 632
|
posted 24. April 2004 00:11
In a Bayesian framework, of course
prob(Pi(B))
is independent of everything else. In fact,
prob(Pi(B)) = prob(Pi(B)|Pn(E))*prob(Pn(E)) + prob(Pi(B)|~Pn(E))*(1-prob(Pn(E)))
where ~ means "not".
But science in general and evolutionary biology in particular is not really concerned with picking worlds uniformly at random (or physical laws uniformly at random). We tend to restrict ourselves to *this* universe, and ask things like:
What is prob(Pi(B(t1))|B(t0),L)?
where t1 and t0 are different times, L is the set of physical laws that we know about, and B is the state of the world at a given time.
So I think I understand what you're saying, but we don't have enough information to know what to make of it. For instance, how do we know that prob(Pi(B)) isn't amazingly high?
As to the other claims--I agree that if g(F) is the least program that generates F, then g(g(F)) is at least as big as g(F). But I'm not sure that we can draw any profound conclusions from that, because we don't know what g(F) is for much of any F we care about.
Sorry about the sloppy notation in my first post. I'll try to be more careful. The bottom line, though, is that Chaitin's results hold with an error of size O(1), which may be a large constant, and only for uniformly random x. Eventually, as the size of x increases, H(x) will tend to -log2(P(x)), but for x that we actually care about, it is much less obvious that the O(1) term doesn't dominate the equality. My formula was for the actual value of the probability when x is not huge and may not be uniformly random.
Added in edit: Er, my formula assumes that x of a given length is picked uniformly at random. The probability of a given length may vary, though. [ 24. April 2004, 16:34: Message edited by: Rex Kerr ]
IP: Logged
|
|
|