Ask an account
You are here : Home / Home URGI / Data / Transposable elements / Arabidopsis


Data from Buisine, Quesneville and Colot (Genomics 2008) used to annotate the transposable elements (TEs) of the A. thaliana genome (release 5). Annotation are available on  TAIR web site .

 TE reference sequences.

Set 1 – “RU”:
 TE reference sequences available from the RepbaseUpdate (RU) database consist of a collection of consensus sequences built from previously characterized TEs. These reference sequences were built by different people, and thus different assumptions/criteria may have been used in each case.
 In order to circumvent this potential caveat and evaluate the performance of reference sequences constructed using distinct criteria, we have generated three additional sets.

Set 2 – “Optimized Coding" (OptCoding):
 This set originates from RU (set 1), but was modified in order to maximize the coding potential of TE references sequences. When possible, the sequence of each ORF was replaced by the sequence of the longest uninterrupted reading frame. In addition, frame shifts and internal stop codons were systematically corrected. The non-coding regions of the reference sequences were left unchanged and are therefore identical to that of RU. This set is designed towards the detection of TEs with an intact coding potential and makes divergent copies and non-coding TEs more difficult to detect.

Set 3 – "Maximized Size" (MaxSize):
In this set, reference sequences have been built with the aim of best representing the exhaustive sequence repertoire of a TE family. To this end, sequences were assembled from the longest genomic copies, and no attention was paid to the coding potential. This approach may lead in some instances to the creation of artificially long and chimeric reference sequences of mixed origin.

Set 4 - "Optimized" (Opt):
 This set maximizes both the coding potential and the total length of the reference sequences. This was achieved by merging the coding sequences of OptCoding and the non-coding sequences of MaxSize. The design of Opt is therefore intermediate between that of OptCoding and MaxSize.

 The Opt and MaxSize sets are aimed at detecting short sequences missed by RU because of their under representation, thus excluding them from the consensus.

Update: 12 Aug 2015
Creation date: 21 Jun 2010