CONTACT | SITE MAP | ABOUT US | TEAM

Tools

You are here : Home / Home URGI / Tools / REPET / INSTALL

INSTALL

The REPET package is distributed under the CeCILL license . Please read distributed LICENSE file .
It has been deposited to the Agence de Protection des Programmes (APP) under the Inter Deposit Digital Number FR 001 480007

The latest public release of the REPET rel-v3.0 package (November, 2019) is freely downloadable here .

For more information about Repet package read doc/README file from REPET package or this page

NEW : For a quick install, REPET and its dependencies are containerized in docker image downloadable

Brief installation instructions

Dependencies
Install
References

Proper usage of REPET requires a Unix-like system (64 bits) running on a cluster with the following, widely used components:

Dependencies

The full usage of the pipelines from the REPET package requires to install external programs.
Below a quick description of the programs, with

its name under which it is known,
the version under which it has been tested,
the name under which it should be found in the user PATH,
the URL to download it.

For each one, it is much advised to read carefully and follow its installation procedure.

The full usage of the pipelines from the REPET package requires from the user to install external programs.
Below is specified a quick description of the program, the name under which it is known, the version under which it has been tested, the name under which it should be found in the user PATH, and the URL to download it.
For each of these programs, it is much advised to read carefully its respective installation procedure as possible bugs may come, not from REPET but from bad installation of external programs.

Programming language interpreter, Python > 2.6 and < 3.0
Python modules, MySQLdb (with a computer cluster, this module has to be reachable from the master and slave nodes), logging, yaml
Database management system, MySQL, v >= 5.0 . The table engine must be "MyISAM". With new MySQL versionit's always InnoDB. Set "default-storage-engine" option to "MyISAM" in "/etc/mysql/my.cnf"
Batch-queuing system: Slurm, >= 17.11.2 on ubuntu 18.04, >= 17.02.9 on centos 7 https://slurm.schedmd.com/download.html, SGE >= 6.1u5 on centos 5.5 , >= 2011.11 on centos 6 http://gridscheduler.sourceforge.net/ ,TORQUE, >=version 3.0.2 version 3.0.2, PBS
Pairwise alignment: NCBI-BLAST+ >=2.2.26, And/Or: NCBI-BLAST >= 2.2.26, And/Or: WU-BLAST 2.0

Optional but highly recommended:

HSP clustering: RECON, version 1.08 , recon.pl And/Or: PILER , version 1.0, piler
Protein domains search: hmmer3 (hmmpress and hmmscan) package, http://hmmer.org/
Consensus clustering: blastclust, version > 2.2.20, from NCBI-BLAST suite, MCL version 1.008, 09-308
Repeat masking: CENSOR, version 4.1 , censor ; And/Or: RepeatMasker, version 4.0.6. , RepeatMasker
SSR detection program: TRF, version 4.04 , trf ; And/Or: MREPS, version 2.6 , mreps
Randomized sequence generation, Shuffle, version 2.2 (in HMMER, squid), shuffle, or esl-shuffle in hmmer3 package

Optional banks (see tutorials in "doc" directory) but highly recommended:

For the full usage of the pipelines, you will need Repbase Update, the well-known data-bank of known repeats. The REPET edition is available here .
If you want to search for protein domains by HMM profiles in your TE consensus you need to have an appropriate bank of HMM profiles. A bank formatted for REPET here .

Optional:

MAFFT, v = 6.240, mafft, https://mafft.cbrc.jp/alignment/software/
RepeatScout version 1.0.5 https://bix.ucsd.edu/repeatscout/
Structural search for LTR retrotransposons: LTRHarvest , from Genome Tools 1.5.10 package, gt

/!\Warning : MATCHER (which is part of the BLASTER suite distributed with REPET) is also an EMBOSS program. Possible name conflicts.

Install

To install REPET package, extract files from REPET_linux-x64_X.X.tar.gz : tar -xvf REPET_linux-x64_X.X.tar.gz
Most parts of the REPET package are written in Python, an interpreted object language that does not require compilation.
The TE_finder suite included in REPET is written in C++ and the binaries are provided. If you need to install it from C++ sources go to the github repository

The binaries of REPET package must be used only on Linux 64-bits computer. Please contact us at urgi-repet[[@]]inra.fr if you would like to run REPET on a different architecture.

/!\Warning In REPET package some C++ tools implement multithreading, so you need a workstation or PC with at least 4 cores (e.g.: 2 bi-cores cpus).

References

Below is a non-exhaustive list of publications related to the REPET package and the programs it integrates:
* RECON: Bao, Z. & Eddy, S. R. (2002), 'Automated de novo identification of repeat sequence families in sequenced genomes.', Genome Research 12(8), 1269--1276.
* PILER: Edgar, R. C. & Myers, E. W. (2005), 'PILER: identification and classification of genomic repeats.', Bioinformatics 21 Suppl 1.
* MAP: Huang, X. (1994), 'On global sequence alignment.', Comput Appl Biosci 10(3), 227--235.
* CENSOR: Oleksiy K., Andrew J. G., Lukasz H., Jerzy, J. (2006). `Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor'. BMC Bioinformatics 7:474+.
* MCL: Enright A.J., Van Dongen S., Ouzounis C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research 30(7):1575-1584 (2002).
* REPEATMASKER: Smit, A. F. A.; Hubley, R. & Green, P. (1996-2004), RepeatMasker Open-3.0., <http://www.repeatmasker.org>.
* TRF: Benson, G. (1999), 'Tandem repeats finder: a program to analyze DNA sequences.', Nucleic Acids Res 27(2), 573--580.
* MREPS: Kolpakov, R.; Bana, G. & Kucherov, G. (2003), 'mreps: efficient and flexible detection of tandem repeats in DNA', Nucl. Acids Res. 31(13), 3672--3678.
* MAFFT: Katoh, K.; Misawa, K.; Kuma, K. & Miyata, T. (2002), 'MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.', Nucleic Acids Res 30(14), 3059--3066.
* RepeatScout Price A.L., Jones N.C. and Pevzner P.A. 2005. De novo identification of repeat families in large genomes. To appear in Proceedings of the 13 Annual International conference on Intelligent Systems for Molecular Biology (ISMB-05). Detroit, Michigan.

Update: 23 Mar 2021
Creation date: 29 Aug 2016

QUALITÉ ANNOTATION DATA-FEDE PLATFORM QUALITY PROJECTS DATA TOOLS SPECIES ABOUT US CONTACT US REGISTER EDIT

eZ Publish
Publication supervisor: A-F. Adam-Blondon
Read Credits & General Terms of Use
Read How to cite