ISCID Forums


Post New Topic  Post A Reply
my profile | search | faq | forum home
  next oldest topic   next newest topic
» ISCID Forums   » General   » News & Features   » Improved algorithm for identification of cis-regulatory gene transcription sites

   
Author Topic: Improved algorithm for identification of cis-regulatory gene transcription sites
ISCID News Editor
Moderator
Member # 1417

Icon 1 posted 16. October 2004 11:31      Profile for ISCID News Editor   Email ISCID News Editor   Send New Private Message       Edit/Delete Post 
Biomed Central, BMC Bioinformatics, September 9 2004

Copyright © 2004 Sinha et al; licensee BioMed Central Ltd.
BMC Bioinformatics. 2004; 5: 129.
doi: 10.1186/1471-2105-5-129. Published online 2004 September 9.

Cross-species comparison significantly improves genome-wide prediction of cis-regulatory modules in Drosophila

Saurabh Sinha, Mark D Schroeder, Ulrich Unnerstall, Ulrike Gaul, and Eric D Siggia1

Saurabh Sinha: saurabh@lonnrot.rockefeller.edu; Mark D Schroeder: schroem@mail.rockefeller.edu; Ulrich Unnerstall: unnersu@mail.rockefeller.edu; Ulrike Gaul: gaul@mail.rockefeller.edu; Eric D Siggia: siggiae@mail.rockefeller.edu

Received July 8, 2004; Accepted September 9, 2004.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Abstract

Background
The discovery of cis-regulatory modules in metazoan genomes is crucial for understanding the connection between genes and organism diversity. It is important to quantify how comparative genomics can improve computational detection of such modules.

Results
We run the Stubb software on the entire D. melanogaster genome, to obtain predictions of modules involved in segmentation of the embryo. Stubb uses a probabilistic model to score sequences for clustering of transcription factor binding sites, and can exploit multiple species data within the same probabilistic framework. The predictions are evaluated using publicly available gene expression data for thousands of genes, after careful manual annotation. We demonstrate that the use of a second genome (D. pseudoobscura) for cross-species comparison significantly improves the prediction accuracy of Stubb, and is a more sensitive approach than intersecting the results of separate runs over the two genomes. The entire list of predictions is made available online.

Conclusion
Evolutionary conservation of modules serves as a filter to improve their detection in silico. The future availability of additional fruitfly genomes therefore carries the prospect of highly specific genome-wide predictions using Stubb.


Background
Several computational approaches to the problem of predicting cis-regulatory modules ('CRM's) have been reported recently. Berman et al. [1], Markstein et al. [2] and Halfon et al. [3] predicted CRM's involved in body patterning in the fly, and experimentally verified their predictions. The underlying principle in these algorithms was to detect dense clusters of binding sites, as determined by matches (above some threshold) to catalogued transcription factor weight matrices. The algorithm of Rajewsky et al. [4], called Ahab, avoided the use of thresholds on weight matrix matches by a probabilistic modeling of CRM's. Ahab predictions within the segmentation gene network were subjected to extensive experimental validation, with excellent overall success (Schroeder et al. [5]). Most predicted CRM's, when placed upstream of a reporter gene, faithfully reproduce one or more aspects of the endogenous gene expression pattern. Moreover, an analysis of binding site composition over the entire set of validated modules reveals that Ahab's prediction of binding sites correlates well with expression patterns produced by the modules and suggests basic rules governing module composition.

The Stubb algorithm (Sinha et al. [6]) extended Ahab's approach by incorporating the use of two-species sequence information. Stubb also allows the option of scoring positional correlations between binding sites, but this option was not exercised in this study. For each sequence window analyzed, Stubb first computes the homologous sequence in the second species and aligns them using LAGAN (Brudno et al. [7]). The sequence is then partitioned into "blocks" (contiguous ungapped aligned regions of high percent identity) and non-blocks (sequence fragments between consecutive blocks, in either species). Putative binding sites in blocks are scored under an assumption of common evolutionary descent, using a probabilistic model of binding site evolution. Thus a "weak" site that is well conserved will score higher, while a "strong" site that is poorly conserved will have its score down-weighted. The score of the sequence window includes contributions from binding sites in blocks as well as in non-blocks. Stubb is implemented so that it can be run either on single species or two species data. In the single species mode, it is practically identical to the Ahab program. The Stubb software is available for download from [URL=http://edsc.rockefeller.edu/cgi-bin/stubb/download.pl ]http://edsc.rockefeller.edu/cgi-bin/stubb/download.pl [/URL]

FULL TEXT

[ 03. December 2004, 21:35: Message edited by: ISCID News Editor ]

IP: Logged


All times are East Coast  
Post New Topic  Post A Reply Close Topic    Move Topic    Delete Topic    Top Topic next oldest topic   next newest topic
 - Printer-friendly view of this topic
Hop To:

Contact Us | ISCID

All content © ISCID and content contributor 2001-2003

The ISCID Forums are aimed at generating insight into the nature of complex systems (e.g. biological complexity, organizational complexity, etc.) and the ontological status of purpose, especially from the vantage point of various information- and design-theoretic models.

Indexed by UBB Spider Hack  |  Powered by Infopop Corporation UBB.classicTM 6.3.1.1

PCID | Encyclopedia | Brainstorms | The Archive | News | Essay Contests | Chat Events | Membership