Ask an account
You are here : Home / Home URGI / Research / Repeat annotation

Repeat annotation

TE annotation

TE annotation

Transposable elements (TEs) are mobile, repetitive DNA sequences that constitute a structurally dynamic component of genomes. The taxonomic distribution of TEs is virtually ubiquitous, they have been found in nearly all eukaryotic organisms studied. TEs represent quantitatively important components of genome sequences (e.g. 90% of the wheat genome), and there is no doubt that modern genomic DNA has evolved in close association with TEs. The forces controlling the dynamics of TE spread within a species are also poorly understood, as are the systemic effects of the elements on their host genomes. Insertions of individual TEs may lead to genome restructuring (e.g., the occurrence of inversions), mutations in genes or changes in gene regulation. Some TE insertions may even have become “domesticated” to play roles in the normal functions of the host. Despite their manifold effects, abundance and ubiquity we understand very little about most aspects of TE biology.


 One way of furthering our knowledge of TE biology is through the computational analysis of TEs in the growing number of complete genomic sequences. By detailed comparison of the abundance and distribution of TEs in entire genomes, we can infer the fundamental biological properties of TEs that are shared or that differ among species. However, meaningful inferences about TE biology based on computationally-derived TE annotations can only be done if we are confident about the results of these analyses.

 In general, the problem of TE discovery remains a major challenge for TE annotation. The key problem of de novo TE identification is the nested structure of the repeats in a genome. Many TEs are characterised by direct (Long Terminal Repeats, LTR) or indirect (Terminal Inverted Repeat, TIR) repeats at the ends of its sequence. In addition, TEs are often found embedded in larger repeats called segmental duplications or in other TEs. The nested structure of these repeats is difficult to interpret in order to extract from them true TEs and not mosaics of repeats. Consequently, a good TE annotation relies critically on an expertly assembled reference sequence set, data that currently cannot be obtained in an automatic fashion. This crucial step is now the bottleneck in any method or pipeline to annotate TEs in genome sequences. The task to assemble such reference set will be most difficult in genomes where only few TE families are known. In these situations, we need good de novo TE detection procedures.

 The identification of TEs typically relies on the results of a single computational method (see Bergman and Quesneville 2007 for a review). Our studies (Quesneville et al 2005) indicate that they suffer from high false positive rate and they are not enough sensitive. To achieve our aim, we develop pipelines that integrates results from multiple homology-based and de novo TE identification methods. Our pipelines uses the combined computational evidence to elevate the quality of TE annotations. Moreover, our system is designed for use with a genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations.

 Following on the annotation of the euchromatic transposable elements in D. melanogaster genome (Release 4) and A. thaliana which have been incorporated in their respective community database (FlyBase, TAIR), we are now engaged in several international collaborations to annotate TEs.

See the data obtained with our tools

Update: 18 May 2017
Creation date: 23 Dec 2009