REPET

Repet illustration

REPET


The REPET package integrates bioinformatics programs in order to tackle biological issues at the genomic scale. It is distributed under

and deposited to the Agence de Protection des Programmes ( APP ) under the Inter Deposit Digital Number FR 001 480007 000 R P 2008 000 31 235.

To download the last  REPET package v3.0 . What's new in this release CHANGELOG (19.10 kB) ? Previous releases are available here

For a quick install, REPET and its dependencies are containerized in a free docker image downloadable

  NEW!   A video tutorial for the usage of the REPET docker image is available here

 

How to cite REPET package:

The REPET package has evolved with improvements over time. The 3 pipelines of the REPET package are TEdenovo (building TE library), PASTEC (consensus classification) and TEannot (TE annotation).

Here is the full list of publications :

TEfinder (Blaster, Grouper, Matcher): Quesneville, H., Nouaud, D. & Anxolabéhère, D.  Detection of New Transposable Element Families in Drosophila melanogaster and Anopheles gambiae Genomes . J Mol Evol57 (Suppl 1), S50–S59 (2003). https://doi.org/10.1007/s00239-003-0007-2
TEdenovo: Flutre T, Duprat E, Feuillet C, Quesneville H (2011) Considering Transposable Element Diversification in De Novo Annotation Approaches. PLoS ONE 6(1): e16526. https://doi.org/10.1371/journal.pone.0016526
PASTEC (included in TEdenovo): Hoede C, Arnoux S, Moisset M, Chaumier T, Inizan O,  Jamilloux V, et al.  (2014) PASTEC: An Automatic Transposable Element  Classification Tool. PLoS ONE 9(5): e91929. https://doi.org/10.1371/journal.pone.0091929
TEannot: Quesneville H, Bergman CM, Andrieu O, Autard D, Nouaud  D, Ashburner M, et al.  (2005) Combined Evidence Annotation of  Transposable Elements in Genome Sequences. PLoS Comput Biol 1(2): e22. https://doi.org/10.1371/journal.pcbi.0010022
Long join procedure (included in TEannot): Ahmed, I., Sarazin, A., Bowler, C., Colot, V., & Quesneville, H. (2011). Genome-wide evidence for local DNA methylation spreading from small RNA-targeted sequences in Arabidopsis. Nucleic acids research, 39(16), 6919-6931. https://doi.org/10.1093/nar/gkr324

A methodological article to be able to tackle the TE annotation of large genomes, reduce the resources necessary for the construction of the TE library, improve the consensus library and have metrics on the quality of the annotations (NTE, LTE) is available here:

V. Jamilloux, J. Daron, F. Choulet and H. Quesneville, "De Novo Annotation of Transposable Elements: Tackling the Fat Genome Issue," in Proceedings of the IEEE, vol. 105, no. 3, pp. 474-481, March 2017, doi: 10.1109/JPROC.2016.2590833. https://ieeexplore.ieee.org/document/7562280

Brief description of REPET

Its two main pipelines are dedicated to detecte, annotate and analyse repeats in genomic sequences, specifically designed for transposable elements (TEs).

  •  TEdenovo: this pipeline starts by comparing the genome with itself using BLASTER. Then it clusters matches with GROUPER, RECON and PILER, clustering programs specific for interspersed repeats. For each cluster, it builds a multiple alignment from which a consensus sequence is derived. Finally these consensus are classified according to TE features and redundancy is removed. At the end we obtain a library of classified, non-redundant consensus sequences.
TE de novo pipeline
  • TEannot: this pipeline mines a genome with a library of TE sequences, for instance the one produced by the TEdenovo pipeline, using BLASTER, RepeatMasker and CENSOR. An empirical statistical filter is applied to discard false-positive matches. Short simple repeats (SSRs) are annotated along the way with TRF, RepeatMasker and MREPS. Then the pipeline chains, with MATCHER via dynamic programming, TE fragments belonging to the same, disrupted copy. A "long join" procedure is subsequently applied to connect distant fragments. Finally annotations are exported into GFF3 and gameXML files.
TE_Annot illustration

Repet package also contains PASTEClassifier ( Hoede C. et al. 2014 ) classifies TEs based on Wicker classification ( Wicker T. et al. 2007 )

Prerequisites and installation notes are described in INSTALL

The use of reference banks is optional but much advised to improve your consensus classification.

Currently there is different banks available formatted for a REPET use. Choose your bank according to its content in reference transposable element families on your species to be annotated :

 

There is a file specially formatted for REPET ("REPET edition") available with fees.

 

The Viridiplantae v3.0 formatted for REPET is available here

 

  • Here is a link to Dfam :

The Dfam3.6 formatted for REPET is available here .

 

Optional: if you want to search for protein domain by HMM profiles in TE consensus, you need to have hmmpfam (from package hmmer2) or hmmpress and hmmscan (from package hmmer3) and an appropriate bank of HMM profiles ( http://hmmer.janelia.org/ ). It is advisable to use our last version  ProfilesBankForREPET_Pfam35.0_GypsyDB.hmm

 

The old bank version is still available here:  ProfilesBankForREPET_Pfam27.0_GypsyDB.hmm.tar.gz bank , specially formatted for REPET.

Help and documentation

To discover and learn how to use REPET pipelines : read TEdenovo and TEannot tutorials, follow practical work .

Development

The development of REPET follows eXtreme programming guidelines since the release 1.3 (in July 2009).

Contact

Reporting bugs or asking for features are much welcome! Please contact us via email at urgi-repet[[@]]inra.fr.

If you want to receive updates, send an email to urgi-repet[[@]]inra.fr with the following information:

  • First name
  • Last name
  • Email
  • Institution
  • Address
  • City
  • Zip
  • Country
  • Architecture ( linux-x64 )
  • Job scheduling system (SGE or Torque)

Authors and contributors

Hadi Quesneville Olivier Inizan
Timothée Flutre Claire Hoede
Elodie Duprat Sandie Arnoux
Gaël Faroux Françoise Alfama
Delphine Autard Jonathan Kreplak
Timothée Chaumier Véronique Jamilloux
Marc Bras Mark Moissette
Benoit Bely Tina Alaeitabar
Anna-Sophie Fiston-Lavier Emmanuelle Permal
Erwan Ortie Emeric Henrion
Valentin Marcon Eric Penneçot
Johann Confais Mariène Wan

  Funding

  • The "Modulome" project" funded by 
  • The TransPLANT FP7 european project (EU 7th Framework Programme, contract number 283496)    europe
  •   logo INRAE

Access mode(s):

Downloadable

Keyword(s):

detection, annotation, transposable element, pipeline

eZ Publish™ copyright © 1999-2024 eZ Systems AS