David L. Hagen
Member
Member # 323
|
posted 08. February 2006 21:21
New Technique for Finding Needles in Haystacks: Geometric Approach to Distinguishing between a New Source and Random Fluctuations Ramani S. Pilla,1,* Catherine Loader,1 and Cyrus C. Taylor2 1Department of Statistics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, Ohio 44106, USA 2 Department of Physics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, Ohio 44106, USA
“We propose a new test statistic based on a score process for determining the statistical significance of a putative signal that may be a small perturbation to a noisy experimental background. We derive the reference distribution for this score test statistic; it has an elegant geometrical interpretation as well as broad applicability. We illustrate the technique in the context of a model problem from high-energy particle physics. Monte Carlo experimental results confirm that the score test results in a significantly improved rate of signal detection.” Phys Rev. Let. 95, 230202 (2005) 2 Dec. DOI: 10.1103/PhysRevLett.95.230202
Geometric Reasoning for Signal Discovery Ramani S. Pilla, Catherine Loader & Cyrus C. Taylor. “. . . In this modern era, scientific discoveries involve searching large quantities of data for small disorder---akin to finding the proverbial "needle in the haystack." Finding that "needle in a haystack" requires sophisticated statistical methods. The goal becomes searching for the unusual activity, or signal, among all random background data. Researchers working with large amounts of data encounter the fundamental problem of being able to distinguish a real signal from a random variation present in the data. That is, determining the statistical significance of a putative signal. . . .a financial institution monitors transactions for fraud and security risks;” 
Simulation and Visualization Credit: Ramani S. Pilla, Catherine Loader and Cyrus C. Taylor “. . . fundamentally important problems have a common underlying statistical thread: detecting a significant signal in a large amount of background `noise'. The statistical challenge then becomes developing fast, powerful, and reliable methods to distinguish the signal from random fluctuations in chaotic data. . . . A fundamental thread underlying this work is the concept of a random field; a mathematical model of evolutional fluctuating complex systems parametrized by a multi-dimensional manifold such as a curve or a surface. As the parameter varies, the random field carries information and therefore has complex stochastic structure. Data local to each location is compared to the background data looking for sufficiently large discrepancies indicating the presence of a signal. As one considers all possible locations for the signal, a random field is generated. At the core of our method is the idea of posing the problem in terms of classical "hypothesis-based testing" paradigm to detect statistical disorder in the data. There are two challenges in making the method a practically useful one: defining efficient test statistics (i.e., a function of data), and determining the critical cut-off value that enables the researcher to make a decision, at a given false positive rate, to reject the null hypothesis of no signal present in the data. Our method further exploits the flexibility behind the long-established geometric method pioneered independently by Harold Hotelling and Hermann Weyl in their 1939 seminal papers. This geometric method is extended to the current problem of detecting a signal; in particular, in creating an approximation to find the critical cut-off value. Our technique based on geometric reasoning significantly enhances the researchers' ability to distinguish a signal. . . . The relative improvement of the new method over the previous standard of chi-square goodness-of-fit is particularly salient when the signal is hard to detect. . . .In effect, detecting a real signal (the needle) present in random and chaotic data (the haystack) will lead to vital links to scientific success.” ------------------- Another software method for detecting unusual events is: Software Tool Finds 'Needles' In Data 'Haystacks' *D.S. Bright and D.E. Newbury, "Maximum pixel spectrum: a new tool for detecting and recovering rare, unanticipated features from spectrum image data cubes," Journal of Microscopy, Nov. 2004, pp. 186-193. -------------------------- An improved method of statistical identification of ordered data distinct from chaotic background. This to be very significant to the task of identifying complex specificity in Intelligent Design. See Brainstorms Discussion of interfaces under "Mechanically Specific Relativity"
{Edit: Added software reference and branstorm discussion link.} [ 08. February 2006, 22:26: Message edited by: David L. Hagen ]
IP: Logged
|