Publications

International,  COM (communication) 21 Oct 2022   [hal-03815061] REPET evolutions: faster and easier

Transposable elements (TEs) are major players of structure and evolution of eukaryote genomes. Thanks to their ability to move around and to replicate within genomes, they are probably the most important contributors to genome plasticity. Their detection and annotation are considered essential and must be undertaken in any genome sequencing project. The REPET package [1, 2] integrates bioinformatics pipelines dedicated to detect, annotate and analyze TEs in genomic sequences. The two main pipelines are (i) TEdenovo, that search for interspersed repeats, build consensus sequences and classify them [3] according to TE features and (ii) TEannot, which mines a genome with a library of TE sequences, for instance the one produced by the TEdenovo pipeline, to provide TE annotations. The REPET package is in continuous improvement for speed by parallelizing several key bottleneck steps. In addition, several strategies which reduce the time required for analyzing large genome have been tested. With the speed improvement and adapted strategies, REPET is now able to annotate and analyze genomes such as the maize with more than 85% of TEs on a 2.3 Gb genome [4] on current computer cluster. With this tool, the PlantBioinfoPF platform ensures a TE annotation service. Indeed, we are now able to propose an automatic TE annotation of good quality through a process called ”Repet-Factory”. This process uses the REPET software suite with parameters optimized for TE detection specificity and computing time. This process is capable of successively annotate several genomes in batches with the required traceability and reproducibility of the analyzes. Moreover, a Virtual Research Environment (VRE) for TE annotation and its analysis has been developed on Virtual Machines (VM). An ansible script instantiate VMs with all packages and tools required for a complete genome annotation with the REPET package. This script allows this VRE to be easily re-instantiated in other infrastructures which greatly simplify the REPET package installation with all its required dependencies. We also simplified the distribution of REPET to increase its availability and portability to users, by developing a Docker image of REPET (https://hub.docker.com/r/urgi/docker vre aio). The REPET tool is a cornerstone of the platform. In addition to its use in the genome TE annotation service and its availability for download, it is also the basis of the RepetDB [5] database (https://urgi.versailles.inrae.fr/repetdb) hosted by the platform which provides libraries of reference TE sequences for more than 50 species

eZ Publish™ copyright © 1999-2024 eZ Systems AS