RepetDB is an Intermine database that provides repeat consensus detected and classified by TEdenovo and used by TEannot to annotate copies in genomes. TEdenovo and TEannot are part of the REPET software package.
The easiest way to search for consensus repeats in RepetDB is to use the consensus search form on the homepage.
This form can search consensus repeats by organism (Taxon group selection), by classification (Wicker classification, potential chimeric or other elements like virus-like elements) or by similarity features found on them (protein profile features or blast hit on Repbase transposons).
The taxon group selection contains a tree of taxonomy groups fetched from the NCBI taxonomy database. You can select a specific species or a larger taxonomy group like ‘Eukaryota’.
The Wicker classification selection is divided in three parts: classes, orders and super families. Selecting any class, order or super family will restrict the selection of other ranks. Example: You select the ‘Helitron’ order, the ‘Class II’ will then automatically be selected.
Activating the “potential chimeric” checkbox will search for repeats consensus that have an ambiguous classification.
Activating the “not chimeric” checkbox will search for repeats consensus that does not have an ambiguous classification.
The other elements selection contains classification like: potential host gene, virus or unclassified (for consensus having an unknown wicker classification).
The similarity features search is a text area in which you can list accession or entry names from GyDB and PFAM for protein profiles and accessions or entry names from Repbase for blast hit features. This field will search protein profiles or blast hits detected on the consensus repeats.
Once you have searched consensus repeats and clicked on one, you get to the consensus page. This page contains the following parts:
The header part of the consensus page contains basic information like the consensus identifier, the consensus length and the wicker classification. But it can also contain information on the genomic annotation of the consensus with the number of copies and full-length copies, the number of fragments and full-length fragments and the cumulative coverage of the copies on the genome.
In this section is described the dataset in which the consensus have been detected, classified and annotated. You can find the origin organism and genome assembly. Optionally this section can also contain the list of the software used to generate this dataset, a comment on how this dataset was created and the name of the person that could inform you if you have any questions on this dataset.
This part contains a table of statistics on the consensus copies annotated on the genome. This table regroups statistics on the copies length, identity and coverage over consensus.
All the features that have been detected on the consensus can be visualized on an embedded Jbrowse browser. The reference sequence in this Jbrowse is the consensus on which similarity features and structural features are located. To check the details on these features, you can check the following “Similarity features” and “Structural features” sections.
If the consensus has any similarity features like protein profiles or transposons blast hits, this section lists them in a detailed table. Each type of feature has its own table displaying the positions on the consensus (“Query start” and “Query stop”), positions on the hit (“Hit start” and “Hit stop”), the hit e-value, the hit identity and a details on the hit (its source databse, accession, description or classification).
Transposons blast hit matches Repbase Transposon elements with a link that will lead you to the Repbase database.
Protein profile hit matches GyDB or PFAM profiles with a link that will lead you to the profile card on the databse website.
Like with the similarity features, the structural features have a table for each type of feature containing: SSR regions, ORF regions, Terminal repeats regions (TR) and Poly A regions.
Like any Intermine databases, RepetDB benefits standard features like the query builder, template queries or lists. In this section, these features will be described to help you use them at their full potential.
The query builder is a complex interface that can be used to make any kind of custom queries in the RepetDB database.
If you want to learn more on how to use the query builder, check to the following tutorial.
You can also use the query builder to modify an existing query.
To modify a query from the consensus search form:
Template queries are queries with parameters that has the advantage of being easily reusable. RepetDB shares public templates that you can find on the templates page but anyone can add private (for-your-eyes-only) templates.
To create a template:
RepetDB can operate on custom lists of data. You can save lists from results pages or create them by uploading lists of identifiers. Lists can be used when running template queries and analyzed by a series of widgets on a list analysis page. You can merge, subtract and find common members if you have more than one list.
All lists, public ones as well as personal ones (if you are logged in) can be viewed on the Lists page, where you can search them and do operations on them. To create a new list yourself, click on ‘Lists’, and then on ‘Upload’ in the toolbar on any RepetDB page: RepetDB’s list creation tool helps you upload a list of identifiers, the list can contain a mix of identifier types.
To preserve a list from query:
Descriptions and tags can also be edited after a list is saved.
All lists and queries you ran will be saved temporarily in RepetDB for the current session. To save them permanently, you can create a MyMine account. You only need to provide an email address and a password to generate an account, there is no other information required. Your saved data is always private.
You can then access all your lists, queries and templates via the MyMine page. In MyMine you can save lists and queries you create in the QueryBuilder. You can even use the QueryBuilder to turn queries into new templates of your own. You can export/import queries and templates as XML to share them with others.