ISCID News Editor
Member # 1417
posted 20. April 2006 13:37
Source: PLoS Computational Biology
A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions
Received: October 20, 2005; Accepted: January 25, 2006; Published: March 17, 2006
Gustavo Glusman, Shizhen Qin, M. Raafat El-Gewely, Andrew F. Siegel, Jared C. Roach, Leroy Hood, Arian F. A. Smit
The identification and characterization of the complete ensemble of genes^ is a main goal of deciphering the digital information stored in the human genome^. Many algorithms^ for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational^ strand biases caused by transcription-coupled DNA^ repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns^ and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.”
To date, genes have been identified from genomic sequence using two basic concepts: the identification of specific signals delineating the structure of the genes and by similarity to previously known genes. Here the authors describe four novel algorithms based on a third basic concept: the identification and quantification of mutational and selectional effects of transcription. Central to this work is a detailed analysis of interspersed repeats, the “ Junk DNA^” left behind by transposon^ activity, that is usually discarded when predicting genes even though it amounts to nearly half the human genome. Using the new methodology, the authors identify thousands of potential novel genes, some of which appear not to code for protein products. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many “genomic deserts,” regions currently thought to be devoid of genes.
Read the full article at PLoS Computational Biology
Copyright: © 2006 Glusman et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
* To whom correspondence should be addressed. E-mail: Gustavo@SystemsBiology.org
[Emphases added by ISCID News Editor]
[Link-underlined terms with ^ indicate linked entry in ISCID Encyclopedia of Science and Philosophy as added by ISCID News Editor]
[ 20. April 2006, 13:45: Message edited by: ISCID News Editor ]