William A. Dembski
Member
Member # 7
|
posted 11. August 2004 02:09
I'm afraid that in my initial reaction to Cosma Shalizi's criticism of my article "Information as a Measure of Variation" (see here), I conceded more ground than I needed to. I've now revised the paper in light of his criticism (see here). In doing so, I found that I needed to change very little. A few paragraphs needed to be added to make the connection with the Renyi entropy, but the logic of the paper remained intact. Moreover, Shalizi's remarks about information geometries proved largely irrelevant. The continuity spectra I defined were never intended to induce geometries on spaces of probability measures. Rather, they were intended to measure continuity (which is more a topological rather than a strict geometric property) of probability paths.
Shalizi's main beef was about the variational information being just a special case of the Renyi entropy (i.e., that I had reinvented the wheel). What becomes clear in looking at the literature on Renyi entropy, however, is that the variational information gets, as it were, lost in among the trees. The Renyi entropy is indexed by a parameter r that ranges over the nonnegative real numbers. Most of the information measures associated with the Renyi entropy as the parameter r varies are of no physical significance. Cover and Thomas in their book Elements of Information Theory discuss the Renyi entropy briefly (pp. 499-501), but when they do, they don't give any play to the case of r = 2, which is the case that corresponds to the variational information. Indeed, most of the play goes to the case of r = 1, which corresponds to the Shannon entropy. What my paper does, therefore, is highlight the centrality of the variational information as a special case of the Renyi entropy by showing how the variational information is directly generalized from the canonical information measure for events.
There's been some concern, expressed by Shalizi but also by critics jumping on the bandwagon, that I conceded that there was no new mathematics in my paper. The phrase "no new mathematics" is a term of mathematical use. It means that there is no new theorem that advances the frontiers of mathematical knowledge. To illustrate what's at stake, some years ago I had a novel idea about how to generate diffusion limited aggregates (DLAs). I submitted a short paper about it to a probability journal but got rejected because there was "no new mathematics" in it (these were the referee's exact words). What the paper provided, however, was a novel method for computing DLAs. The paper ended up getting published in the Journal of Statistical Computation and Simulation. Critics who charge that because my variational information paper contains "no new math," it therefore shouldn't be published at all, are thus missing the point.
What's novel about this paper and why, in my view, does it deserve to be published in a journal like Complexity? As I see it, the paper does several things:
(1) It reviews why the information associated with an event should take a certain form (namely, the surprisal value).
(2) It shows why this information measure for events gets shortshrifted in the information-theory literature, namely, because Shannon entropy tends to dominate, and this is true even when Shannon entropy is generalized to Renyi entropy.
(3) It provides a derivation of how the information measure for events properly generalizes to an information measure for probabilities. The derivation, as far as I know, is novel (perhaps Shalizi will prove me wrong).The endpoint of the derivation, however, is not novel, namely, it is the Renyi entropy for r = 2, which is equivalent to the variational information.
(4) Once the privileged place of the variational information is clear, the next task is to show how it can be used to assess the informational continuity of probability paths. Doing so will prove crucial in any application of the variational information to assessing evolutionary processes (be these physical or biological). But there's a problem. The variational information does not take into account the metric structure of the underlying probability space and therefore suggests far more discontinuity than is actually present in probability paths. It's therefore necessary to factor in the metric structure of the underlying probability space, and I argue this is properly done using the Kantorovich-Wasserstein metric. The continuity spectra I define are new. There are no new theorems here and thus "no new mathematics" in the way this term is typically employed by mathematicians. But there are some novel definitions of continuity that are mathematically well-defined and applicable to evolving systems.
One final word about the criticism that the mathematics in this article is "old." The variational information, as a special case of Renyi entropy, has been around for 43 years. Fisher information, which is the basis for a lot of current research in information geometry and which Roy Frieden in the last five years has shown provides a framework for understanding most of physics, was developed by Ronald Fisher in 1925 and is now 79 years old. Fisher information over the years has gotten a lot of play. The variational information, I submit, has not. My paper provides a rationale for thinking that this form of information has more going for it than previously suspected. [ 12. August 2004, 16:28: Message edited by: William A. Dembski ]
IP: Logged
|