ISCID News Editor
Member # 1417
posted 04. December 2004 01:37
Published November 11, 2004
Highly Conserved Non-Coding Sequences^ Are Associated with Vertebrate Development
Adam Woolfe , Martin Goodson , Debbie K. Goode, Phil Snell, Gayle K. McEwen, Tanya Vavouri, Sarah F. Smith, Phil North, Heather Callaway, Krys Kelly, Klaudia Walter, Irina Abnizova, Walter Gilks, Yvonne J. K. Edwards1 , Julie E. Cooke1 , Greg Elgar
In addition to protein coding sequence, the human genome contains a significant amount of regulatory DNA, the identification of which is proving somewhat recalcitrant to both in silico^ and functional methods. An approach that has been used with some success is comparative sequence analysis, whereby equivalent genomic regions from different organisms are compared in order to identify both similarities and differences. In general, similarities in sequence between highly divergent organisms imply functional constraint. We have used a whole-genome comparison between humans and the pufferfish, Fugu rubripes, to identify nearly 1,400 highly conserved non-coding sequences. Given the evolutionary divergence between these species, it is likely that these sequences are found in, and furthermore are essential to, all vertebrates. Most, and possibly all, of these sequences are located in and around genes that act as developmental regulators. Some of these sequences are over 90% identical across more than 500 bases, being more highly conserved than coding sequence between these two species. Despite this, we cannot find any similar sequences in invertebrate genomes. In order to begin to functionally test this set of sequences, we have used a rapid in vivo assay system using zebrafish embryos that allows tissue-specific enhancer activity to be identified. Functional data is presented for highly conserved non-coding sequences associated with four unrelated developmental regulators (SOX21, PAX6, HLXB9, and SHH), in order to demonstrate the suitability of this screen to a wide range of genes and expression patterns. Of 25 sequence elements tested around these four genes, 23 show significant enhancer activity in one or more tissues. We have identified a set of non-coding sequences that are highly conserved throughout vertebrates. They are found in clusters across the human genome, principally around genes that are implicated in the regulation of development, including many transcription factors. These highly conserved non-coding sequences are likely to form part of the genomic circuitry that uniquely defines vertebrate development.
Received July 30, 2004; Accepted October 21, 2004; Published November 11, 2004
Copyright: © 2004 Woolfe et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Academic Editor: Sean Eddy, Howard Hughes Medical Institute and Washington University, United States of America
Abbreviations: CNE, conserved non-coding element; CNS, central nervous system; EST, expressed sequence tag; EVL, enveloping layer; GFP, green fluorescent protein; GO, Gene Ontology; GRN, gene regulatory network; hpf, hours post-fertilisation; MLAGAN, multiple LAGAN; rCNE, regionally defined conserved non-coding element; UTR, untranslated region
* To whom correspondence should be addressed. E-mail: email@example.com
Citation: Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, et al. (2005) Highly Conserved Non-Coding Sequences Are Associated with Vertebrate Development. PLoS Biol 3(1): e7.
Introduction - Paragraphs 1 and 2
Identification and characterisation of cis-regulatory regions within the non-coding DNA of vertebrate genomes remain a challenge for the post-genomic era. The idea that animal development is controlled by cis-regulatory^ DNA elements (such as enhancers and silencers) is well established and has been elegantly described in invertebrates such as Drosophila and the sea urchin [1,2,3,4]. These elements are thought to comprise clustered target sites for large numbers of transcription factors and collectively form the genomic instructions for developmental gene regulatory networks (GRNs). However, relatively little is known about GRNs in vertebrates. Any approach to elucidate such networks necessitates the discovery of all constituent cis-regulatory elements and their genomic locations. Unfortunately, initial in silico identification of such sequences is difficult, as current knowledge of their syntax or grammar is limited. By contrast, computational approaches for protein-coding exon prediction are well established, based on their characteristic sequence features, evolutionary conservation across distant species, and the availability of cDNAs and expressed sequence tags (ESTs), which greatly facilitate their annotation.
The completion of a number of vertebrate genome sequences [5,6,7,8,9], as well as the concurrent development of genomic alignment, visualisation, and analytical bioinformatics tools (for an overview see ), has made large genomic comparisons not only possible but an increasingly popular approach for the discovery of putative cis-regulatory elements. Comparing DNA sequences from different organisms provides a means of identifying common signatures that may have functional significance. Alignment algorithms optimise these comparisons so that slowly evolving regions can be anchored together and highlighted against a background of more rapidly evolving DNA that is free of any functional constraints.
Another highly successful approach to increasing the resolving power of comparative analyses is to use multi-species alignments combining both closely related and highly divergent organisms [14,22,23,24]. By using large evolutionary distances, even the slowest-evolving neutral DNA has reached equilibrium, thereby significantly improving the signal to noise ratio in genomic alignments. Although non-coding sequences generally lack sequence conservation between highly divergent species , there are a number of striking examples where comparison between human and pufferfish (Fugu rubripes) gene regions has readily identified highly conserved non-coding sequences that have been shown to have some function in vivo [25,26,27,28,29,30,31,32,33,34]. Humans and Fugu last shared a common ancestor around 450 million years ago , predating the emergence of the majority of all extant vertebrates, implying that any non-coding sequences conserved between these two species are likely to be fundamental to vertebrate life. The Fugu genome has the added advantage of being highly compact, reducing intronic and intergenic distances almost 10-fold [7,36]. Without exception, all reported examples of non-coding conservation between these two species have been associated with genes that play critical roles in development. This suggests that some aspects of developmental regulation are common to all vertebrates and that whole-genome comparisons may be particularly powerful in identifying regulatory networks of this kind.
[Emphases added by ISCID News Editor]
[Link-underlined terms with ^ indicate linked entry in ISCID Encyclopedia of Science and Philosophy as added by ISCID News Editor]
Read the Full Research Article at PLoS Biology
[ 04. December 2004, 03:06: Message edited by: ISCID News Editor ]