ISCID Forums


Post New Topic  Post A Reply
my profile | search | faq | forum home
  next oldest topic   next newest topic
» ISCID Forums   » General   » Brainstorms   » Design Inference Methods vs. the NFL Theorem for Inference

   
Author Topic: Design Inference Methods vs. the NFL Theorem for Inference
Erik
Member
Member # 160

Icon 1 posted 27. March 2004 13:26      Profile for Erik   Email Erik   Send New Private Message       Edit/Delete Post 
Context (readers who do not need the contex can skip ahead to the question below): Is learning/inference possible and, if so, how is it possible? An important development on that topic was the publication of the results in the following two articles*:

Wolpert D.H. (1996) "The Lack of A Priori Distinctions Between Learning Algorithms", Neural Computation, 8(7):1341-1390 (Preprint available here. Note that the order of the pages is reversed so that last page comes first in the PDF and PostScript files.)

Wolpert D.H. (1996) "The Existence of A Priori Distinctions Between Learning Algorithms", Neural Computation, 8(7):1391-1420 (Preprint available here.)

In the first of these articles Wolpert proves that all methods of learning/inference have the same average error, where the average is taken over all mathematically possible scenarios for learning/inference. It is no coincidence that David Hume is quoted in the beginning of that paper, because it is a formalization of the problem of induction. The second article discusses some senses in which some methods of inference can be better or worse than others, despite the fact that their average performance are the same.

There is also a NFL theorem for a slightly different, but closely related, setting, namely optimization. In order to be a better-than-average general-purpose optimizer one must be able to effectively learn from past failures and successes so that one can make better inferences about potential solutions that haven't been tested yet. But, as we might suspect from the related NFL theorems for learning/inference, no optimization algorithm will be able to do this (at least it won't be able to it well) in all optimization scenarios. All optimizers have the same average performance, where the average is taken over all mathematically possible optimization scenarios.

The general lesson from these theorems is not that learning/inference and optimization does not work in the real world, but rather that unless we exploit given a priori knowledge about the problem at hand we will have no formal guarantees that our methods will work better than, say, random guessing. Wolpert sums up this version of the problem of induction like this:
quote:
"Accordingly, unless one can establish a priori, before seeing any of the data d, that the f that generated d is one of the ones for which one's favorite algorithm performs better than other algorithms, noe has no assurances that that algorithm performs any better than the algorithm of purely random guessing.
This does not mean that one's algorithm must perform the same as random guessing in the real world. Rather it means that, formally, one cannot establish superiority to random guessing without making some assumptions. Note in particular that you cannot use your prior experience -- or even the billion years or so of 'prior experiences' of your genome, reflected in the design of your brain --- to circumvent this problem, since all that prior experience is, formally, just an extension of the training set d."

From "The Bayesian and Computational Learning Theories" by David Wolpert. Click here for a PostScript version of this text, which appears to be a contribution to an encyclopedia of cognitive sciences.

Despite the intimate connection between learning/inference and optimization there is an important difference: When all scenarios are equally probable all learning/inference methods will have the same performance as random guessing, and all optimizers will have the same performance as random guessing. This, however, is only a statement about the relative performance of different methods. What about the absolute performance? When all scenarios are equally probable random guessing has a good absolute optimization performance but a bad absolute learning/inference performance. Readers interested in an elementary demonstration of the point that random guessing has good absolute optimization performance should check out my first post in this old ISCID thread. Readers interested in a formal publication on the subject should check out this article:

English T.M. (2000) "Optimization Is Easy and Learning Is Hard In the Typical Function", preprint available here

It's about around here that William Dembski comes in. Dembski has attempted argue that the NFL theorem should somehow undermine our confidence in evolutionary biology. He has also proposed a method for inferring that an event was due to intelligent design. Although Dembski does not use the phrase "problem of induction", any criticism of the Darwinian algorithm based on the NFL theorem for optimization is effectively the criticism that the Darwinian algorithm fails to solve the problem of induction. And since it is very easy (random guessing will do!) to achieve a good absolute optimization performance when all scenarios are equally probably, such criticism is very impotent. That criticism does, however, invite a severe criticism of Design Inference methods. Unlike optimization, inference is very hard (random guessing won't do!) so any Design Inference method must somehow get around the problem of induction.

-------------------------------------

Question: My question for ID advocates is this: How does your favourite Design Inference method get around the problem of induction? More formally, exactly what a priori knowledge do you invoke to escape the NFL theorem for learning/inference? (ID advocates who do not endorse NFL-based criticisms of evolutionary biology, and thereby avoid inviting this question, need not feel that the question is warranted.)

Erik

* Some of the results probably go back at least to a 1994 paper by Cullen Shaffer entitled "A conservation law for generalization performance", but I haven't this papar so I can't be sure.

IP: Logged
Micah Sparacio
Member
Member # 6

Icon 1 posted 27. March 2004 13:46      Profile for Micah Sparacio   Email Micah Sparacio   Send New Private Message       Edit/Delete Post 
Erik,
To make things clearer, could you tell me how you see the "problem of induction" being any more significant for the Design Theorist than for the Theorist of Other Minds?

IP: Logged
Erik
Member
Member # 160

Icon 1 posted 27. March 2004 14:09      Profile for Erik   Email Erik   Send New Private Message       Edit/Delete Post 
Micah Sparacio, your question assumes that I actually do see the problem of induction as more significant for ID advocates, which I do not. The problem of induction is significant for everyone. Because it is a very difficult problem we normally do not require anyone to solve it, but are content to make various assumptions that seem reasonable to us but which strictly speaking will not do.

Anyone who makes NFL-based criticisms of evolutionary biology (or anything else) forfeits such charity, though. Dembski has

(i) argued that the NFL teorem for optimization somehow undermines our confidence in evolutionary biology, and
(ii) proposed a Design Inference method that does not escape the NFL theorem for learning/inference.

It is (i) and (ii) together with the facts that

(a) the NFL theorem for optimization implies a good absolute performance, and
(b) the NFL theorem for learning/inference implies a bad absolute performance

that make my question warranted.

Erik

[ 27. March 2004, 14:12: Message edited by: Erik ]

IP: Logged
Janitor@MIT
Member
Member # 125

Icon 1 posted 31. March 2004 11:54      Profile for Janitor@MIT         Edit/Delete Post 
What makes an algorithm "Darwinian"?
IP: Logged
Erik
Member
Member # 160

Icon 1 posted 01. April 2004 15:33      Profile for Erik   Email Erik   Send New Private Message       Edit/Delete Post 
Janitor@MIT asks: What makes an algorithm "Darwinian"?

That is not an appropriate topic for this thread. This thread is dedicated to the problem of induction (as formalized in the NFL results), and the consequences for "design inference methods" of employing the problem of induction in an attack on evolutionary biology. I will reply, but any further discussion is best held in a separate thread (I promise to edit this post to include a link to such a thread if the thread is started within, say, a week from now and it is obvious that the thread is in reaction to points I've made here).

To answer the question of what makes an algorithm Darwinian we should first note that the adjective "Darwinian" is not a terribly important adjective. It's status is the same as the adjective "red" -- it's a useful adjective in practice because we are often far from the rather fuzzy boundary between "Darwinian" (or "red") and "non-Darwinian" (or "non-red")

Roughly speaking, an algorithm is "Darwinian" if the data it manipulates can be thought of as a population of genotypes, and the manipulation of data occurs in such a way that genotypes mutate and their proportions in the population change in ways that are analogous to biological evolution by differential reproductive success.

[ 01. April 2004, 15:34: Message edited by: Erik ]

IP: Logged
Rex Kerr
Member
Member # 632

Icon 1 posted 01. April 2004 23:08      Profile for Rex Kerr     Send New Private Message       Edit/Delete Post 
The problem of induction is actually a problem for everyone, I think. However, the problem of induction is not quite as bad as it seems, I think.

Suppose you make a billion observations a pair of variables x and y from time T1 to T2 in the past and find that they have a functional relationship y(t) = f(x(t)). Suppose further that functional mappings are picked at random from the set of all possible mappings. What is the probability that y(t) = f(x(t)) between times T2 and T3? Since each one is random, this is really asking what the probability that accidentally, f(x(t)) = z(t) where z(t) is chosen completely independently.

If our mapping is random, then we may as well pick z(t) at random. Even if f(x(t)) has only two states, we get a probability of only 50% of matching for each sample. Take a billion samples, and you have a probability of 2^-(1000000000). But we observe consistent functional mappings over two blocks of time all the time.

So we conclude that our step of picking a random mapping was almost certainly wrong. It's far from clear what the right method is to pick mappings, but what is clear is that reality does not pick uniformly from "all mathematically possible scenarios".

Given this, I think the NFL theorems are interesting, but are not directly applicable to real-world situations. I think Erik has raised an interesting self-contradiction that NFL-refutes-evolution advocates run into, but I think evolution advocates, ID advocates, and everyone else can avoid extreme paranoia about the problem of induction.

IP: Logged
gedanken
Member
Member # 594

Icon 1 posted 02. April 2004 19:44      Profile for gedanken         Edit/Delete Post 
A question for Eric and Rex:

I am looking at Dr. Dembski’s TDR Chapter 12, “Reliability of the Criterion”.

In this chapter he argues that the reliability of the Explanatory Filter can be judged on inductive grounds. Yet all the cases that he suggests as a basis for such an induction are precisely what I would consider to be a filter for examples that “work”. In other words I believe he has mechanized the description of cherry picking of evidence for his “induction”. (Specifically, he uses examples that we are likely to acknowledge a significantly high prior probability of “designer” action than the probability of missing a step in the analysis of the “law like” pathway. In other words we have side knowledge that the essential comparison has already been done, comparing prior probability of designer action against probability of error of the analysis, and this is precisely the aspect that would distinguish a reliable case from an unreliable case for the Explanatory Filter. The steps of the “EF” filter explicitly exclude this particular side knowledge, and that side knowledge is smuggled in by way of the description of the cases to be used for the so-called “induction”.)

How does “cherry picking” figure into this “problem of induction”? (Especially “cherry picking” that is hidden by complicating or obfuscating structure to the case being examined and techniques of examination being used?)

[ 02. April 2004, 19:53: Message edited by: gedanken ]

IP: Logged
Rex Kerr
Member
Member # 632

Icon 1 posted 03. April 2004 04:46      Profile for Rex Kerr     Send New Private Message       Edit/Delete Post 
I don't have TDR, so I can't comment in detail, but the problem of smuggling is exactly what I was trying to address in my "does anything at all exhibit specified complexity" thread.

So, basically, I think it's a huge problem. Since I don't think that induction is a problem in general, I don't think it's problematic for the EF either in the abstract sense. However, induction is not perfectly reliable, and it's especially error-prone when you choose a biased sample to study. So, yes, I do think that is a problem.

Whether it will run especially afoul of NFL theorems is, I think, better for Erik to judge.

IP: Logged
gedanken
Member
Member # 594

Icon 1 posted 03. April 2004 09:22      Profile for gedanken         Edit/Delete Post 
Rex, his TDR statements are repetition of arguments made elsewhere, probably even linked in papers on ISCID.

I am trying to understand if Eric is referring to a meta level of application of NFL theorems to the very processes of identification of so-called "design". I don't understand this yet, it may just take time for me, and re-reading.

quote:
That criticism does, however, invite a severe criticism of Design Inference methods. Unlike optimization, inference is very hard (random guessing won't do!) so any Design Inference method must somehow get around the problem of induction.
This might relate to the very possibility of "cherry picking" that I was referring to.

In my "motive" thread (which I expect to return to this summer, with a mathematical paper) I point out relationships of the EF and failure of the induction implicit in the EF, quantifying that failure and comparing that to probabilities used in the EF. If the probability of the "induction" having missed something (having failed) is less than the joint prior probability of the designer acting to cause the event, then that "problem of induction" is not damaging to the EF steps and the EF is fairly reliable. But when not, that problem seems to come into play. But that is not automatic that the "problem of induction" comes into play, depending upon other factors like the prior probability of designer action (if possible to be known or bounded).

To put this a little differently, intelligent design methods may in some cases be based on an inductive argument, and in others not dependent on an inductive argument. By that I mean specifically the eliminative concept from an inductive test being embedded into the ID detection method. The practical usage of the EF actually smuggles this concept of doing the comparison of prior knowledge of structure of designer action into the process (even though explicitly eschewed by the formal steps). If one codified this smuggling into formal process steps (made it explicit) then one may develop a procedure that does not suffer from the limitations that Eric is referring to. (But of course such a codified version may not be applicable in the areas of interest to those in the ID movement, such as historical investigations of biology.) Thus Eric’s statement about “any” design inference method may be in error in that respect, as some may not be explicitly eliminative or may only require eliminative up to a certain level of judgment that falls short of seriously invoking a “problem of induction”.

Or it may be that the “smuggled” comparison is precisely the distinguishing case between an ordinary “problem of induction” and a specifically extraordinary problem. The “ordinary” problem of induction may be solved when comparative methods can be employed (whether smuggled in or made explicit), even though there is a comparative induction involved.

[ 03. April 2004, 09:54: Message edited by: gedanken ]

IP: Logged
Janitor@MIT
Member
Member # 125

Icon 1 posted 05. April 2004 17:01      Profile for Janitor@MIT         Edit/Delete Post 
It’s not topical or appropriate to request that you define your terms, Erik? But I appreciate the patient response, and I will follow you this far: calling an algorithm “Darwinian” is about as meaningful as calling an algorithm “red.”

The English paper http://www.tomenglishproject.com/cec2000.pdf
compares apples and oranges, as the author himself states more than once. I understood English as asserting that the reliability of induction/optimization depends upon a strong (information-theoretic) conservation principle. (Sounds almost “Dembskian” doesn’t it?) Seems intuitive enough. (Indeed, it’s certainly one of the most assured general results of modern science—contra the “popular” interpretation of Hume.)

In English’s paper the comparison of “apples and oranges” (specifically the different problems posed to his “active learner” and his “optimizer”) is a little misleading, as induction and prediction are certainly complements. Indeed, for any intelligent, adaptive, system, they are necessary complements.

The "problem" of induction is real, but exaggerated. Science (not to be confused with any particular theory or philosophy of science) is generally reliable, obviously, because inductions, predictions, and the optimization of both inductions and predictions is generally reliable. Which tells us as much about ourselves as it does the world we live in.

If it were otherwise, then scientists might just as well flip coins to decide.

IP: Logged
RBH
Member
Member # 380

Icon 1 posted 05. April 2004 19:11      Profile for RBH     Send New Private Message       Edit/Delete Post 
J@MIT wrote
quote:
If it were otherwise, then scientists might just as well flip coins to decide.
If it were otherwise, there wouldn't be scientists to flip the coins nor coins to be flipped.

RBH

[ 05. April 2004, 19:12: Message edited by: RBH ]

IP: Logged
Erik
Member
Member # 160

Icon 1 posted 15. April 2004 12:41      Profile for Erik   Email Erik   Send New Private Message       Edit/Delete Post 
Gedanken asks me and Rex Kerr about possible biases effects when we focus exclusively on examples for which the Bayesian prior probability of "Designer action" is high. I agree that it is a problem, but I don't think the NFL theorem for learning provides any substantial insight into this particular issue. Since I can't think of anything else to say in response, I'll be content with a brief note on the kind of formalism that underlies the NFL results:

The NFL theorems only make sense within a framework called the Extended Bayesian Formalism. This framework encompasses both conventional and Bayesian statistics. Roughly speaking, one supposes that there is a "true" probability distribution---let's denote it by an upper-case P. In addition, an optimizer or learner may have its own personal probability distribution---let's denote this by a lower-case p. If the optimizer/learner is a good Bayesian, it will make decisions based on its personal probility distribution p, but its performance will be evaluated by us according to the "true" probability distribution P. Above-average performance is obtained when p and P are biased towards the same outcomes. When P is uniform the same performance is obtained regardless of the extent of the bias in p.

If we accept the Extended Bayesian Formalism and take the bias towards "Designer action" to be a bias in p, then this will not help against the NFL theorems. However, if we accept the Extended Bayesian Formalism and take the bias towards "Designer action" to be a bias in P, then it follows that learners who are biased (either via p or some non-Bayesian decision procedure) towards classifying things as "designed" will perform better.

This is probably not exactly what Gedanken is asking for, and that will be my answer. That is, I think analyzing "cherry picking" in relation to the version of the problem of induction I have brought up in this thread will probably lead to the answer of slightly different question than the one Gedanken wants to see answered.

In a subsequent post Gedanken suggests that the eliminative nature of the EF might help it against the problem of induction. It does not. The NFL theorem for learning applies to any decision procedure and if we grant the assumption required for its proof it follows that any eliminative decision procedure is no better than random guessing when it comes to determining what is "designed" and what is not.

-----------------------

Janitor@MIT wonders if it is not appropriate to request definitions of terms. It is extremely appropriate to request definitions of unclear terms that play important roles in arguments and deductions. For instance, if I had proposed a mathematical theorem that applied only to "Darwinian algorithms" then it would have extremely appropriate to ask for a precise definition of the term "Darwinian algorithm". However, in this thread I have only the adjective "Darwinian" to describe algorithms in the same way we use colours in ordinary life to describe surfaces, i.e. to communicate an informal property rather than a property that is crucial for deductions. Since a precise definition of "Darwinian" is not relevant to my argument it is inappropriate to draw the focus away from the actual topic by bringing up the issue.

Janitor@MIT also claims:
quote:
The English paper http://www.tomenglishproject.com/cec2000.pdf
compares apples and oranges, as the author himself states more than once. I understood English as asserting that the reliability of induction/optimization depends upon a strong (information-theoretic) conservation principle. (Sounds almost “Dembskian” doesn’t it?) Seems intuitive enough. (Indeed, it’s certainly one of the most assured general results of modern science—contra the “popular” interpretation of Hume.)

I have been unable to find where the author states something to the effect that he compares apples and oranges. Exactly what are you referring to, Janitor@MIT?

The notion that the reliability of induction depends upon a strong conservation principle is a misunderstanding. It is the analysis of the reliability of induction that could be said to depend on a conservation principle, and the conclusion of this analysis is that induction is unreliable.

Janitor@MIT also wrote:
quote:
In English’s paper the comparison of “apples and oranges” (specifically the different problems posed to his “active learner” and his “optimizer”) is a little misleading, as induction and prediction are certainly complements. Indeed, for any intelligent, adaptive, system, they are necessary complements.
Exactly what does this mean? Which problems does Janitor@MIT think should have been posed to the learners and optimizers, respectively, in order for the comparison to not be misleading?

Erik

IP: Logged
Janitor@MIT
Member
Member # 125

Icon 1 posted 15. April 2004 15:18      Profile for Janitor@MIT         Edit/Delete Post 
I've e-mailed English requesting a clarification. Until English responds I will not argue the paper, as it is posted and anyone can interpret it as they will. I asked English if he poses two significantly different tests to his "active learner" and his "optimizer." I said that I understood the paper as supporting (contra Hume!) the reliability of (scientific) induction , learning, or optimization, whatever you want to call it (LOL), because of a strong principle of causality.
Erik repeats that the "darwinian" adjective or analogy is meaningless. I accept that. No modern scientist beleives in "Darwinism." They certainly don't in computer science. Maybe in biology...

IP: Logged
michaelgoodrich
Member
Member # 393

Icon 1 posted 16. April 2004 09:39      Profile for michaelgoodrich   Email michaelgoodrich   Send New Private Message       Edit/Delete Post 
Erik,

I think you have a flawed premise.

My "favourite Design Inference method" does not need to "get around the problem of induction"!

After all, it is an inference, no?

An inference is not a deduction, and inferencing means one is engaging in induction.

cheers,

IP: Logged
gedanken
Member
Member # 594

Icon 1 posted 16. April 2004 22:10      Profile for gedanken         Edit/Delete Post 
Mr. Goodrich,

The question is whether the inductive argument is valid. In other words, is "engaging in induction" meaningful in the specific case?

For example a flawed induction that cherry picks its cases is one that is demonstrably not meaningful argument.

[ 16. April 2004, 22:12: Message edited by: gedanken ]

IP: Logged


All times are East Coast  
Post New Topic  Post A Reply Close Topic    Move Topic    Delete Topic    Top Topic next oldest topic   next newest topic
 - Printer-friendly view of this topic
Hop To:

Contact Us | ISCID

All content © ISCID and content contributor 2001-2003

The ISCID Forums are aimed at generating insight into the nature of complex systems (e.g. biological complexity, organizational complexity, etc.) and the ontological status of purpose, especially from the vantage point of various information- and design-theoretic models.

Indexed by UBB Spider Hack  |  Powered by Infopop Corporation UBB.classicTM 6.3.1.1

PCID | Encyclopedia | Brainstorms | The Archive | News | Essay Contests | Chat Events | Membership