The URGI newsletter #2

The URGI newsletter #2 May 2013


Data integration is a key challenge for modern bioinformatics.

Data integration aims at providing biologists intuitive exploration of data produced by diverse projects. Indeed, large scale international projects generate lots of heterogeneous and unrelated data and the challenge is to integrate them with other publicly available data. Recent improvements in the throughput of nucleotide sequencing technologies with the many applications in genomics and genetics fields critically emphasize the necessity to have powerful information systems able to store, manipulate and explore these data.

GnpIS, our multispecies integrative information system dedicated to plant and fungi pests, is aiming at tackling such challenges. It bridges genetic and genomic data, allowing researchers to cross genetic information (i.e. genetic maps, QTL, markers, SNPs, germplasms, genotypes) with genomic data (i.e. genomic sequences, physical maps, genome annotation, expression data) for species of agronomical interest.

GnpIS is now used in the frame of several large national and international projects such as the national projects “investissement d’avenir“ BREEDWHEAT , AMAIZING , PeaMUST , RAPSODYN , Biomass For The Future , and AKER .

Coordination and cooperation between bioinformatics teams are more than ever needed to manage and exploit the deluge of data. Indeed, genetic and genomic data are dispersed among several databases. They are scattered, difficult to access and not always in compatible formats. The need is for interoperability, data comparison at a global scale, with requirements of integration of data from a myriad of source (e.g. genetic, phenotypic, genomic, agronomic, etc...). An ambitious objective would be to develop an International Information System that captures in a single-entry point all the available data. The construction of such a system started in the framework of the  EU-FP7 project  Transplant and URGI is a partner of this endeavor. URGI is also involved in two French infrastructure projects: Phenome (an infrastructure for high throughput phenotyping) and IFB (Institut Français de Bioinformatique).


15 Feb 2013 

Efficient comparison of sets of intervals with NC-lists
Matthias Zytnicki et al.


07 Feb 2013 

Scouting and scraping the A. thaliana repeatome
Florian Maumus et al.

ALPHY (Alignement, Phylogénie, Génomique Comparative et Bioinformatique), Lyon, 6-7 février 2013

13 Jan 2013 

Impact of transposable elements on transcriptome: the example of Drosophila melanogaster
Hadi Quesneville et al.

International Plant & Animal Genome XXI / January 12-16, 2013  - San Diego, CA, USA



URGI 2012 highlights

Wheat@URGI : the URGI website dedicated to wheat

The wheat community could now access wheat data via a new URGI dedicated website. It allows browsing wheat data (sequences, genetic maps, physical maps, germplasms, SNP), tools (Triannot Pipeline) and projects description. The Wheat@URGI website is available at .

URGI hosts the IWGSC wheat survey sequence

All wheat survey sequence chromosome assemblies from IWGSC are now available for BLAST at the URGI sequence repository . IWGSC coordinating committee members could also download the sequence assemblies and the genome-zipper results giving their chromosome order deduced from their syntenic relationships. The procedure to ask an account is detailed here .

ISO9001 certification

Since 2008, URGI is establishing a Quality Management System according to the ISO9001 standard. It federates URGI members around the improvement of their working methods. We improved results reliability and traceability for our platform user satisfaction. Our Quality Management System has been approved by LRQA (Lloyd's Register Quality Assurance) as at the ISO 9001:2008 Management System Standard, on October 26th, 2012. It is applicable to the whole URGI unit on the following processes : Perform bioanalysis, Develop, design and maintain software, Put data and software into production, Valorise the realizations, Manage Unit and projects, Manage quality system, Manage databases, Manage system and network, Manage human resources, Manage financial resources.
More information


Links from Blast Results to Gbrowse

We provide new links between Blast results launched via  our Mobyle portal and our genome browsers .

This new functionality is currently available for 25 databanks:


  • Botrytis T4 ORFs protein
  • Botrytis B05.10 ORFs
  • Protein Leptosphaeria maculans proteins
  • Sclerotinia ORFs protein
  • Zea mays 5a Peptide sequences of the Working Gene Set. from (march 2011)        


  • Botrytis T4 genome supercontigs
  • Leptosphaeria maculans assembly
  • Leptosphaeria maculans Genes (CDS)
  • Vitis vinifera 8X PN40024 Chromosomes from The French-Italian Public Consortium
  • Vitis vinifera 12X PN40024 Chromosomes from The French-Italian Public ConsortiumZea mays refGen V2 chromosomes
  • Zea mays refGen V2 from (march 2011) (one databank per chromosome)

More information


End of 2 Grape projects: GrapeReSeq and Muscares

The aim of both projects was to take advantage of the existence of the grapevine genome sequence to develop strategies and comprehensive tools, for allele mining in the Vitaceae gene pool and genetic improvement of grapevine for resistance to disease. The data will be progressively release in the public domain. New grapevine NGS data are available on public site

  • PN40024 NGS data are now available on public site.
  • A 18K Illumina chip has been developed in the frame of the GrapeReSeq project

End of MetaQTL project

MetaQTL was an ANR bioinformatics project dealing with methods and bioinformatics tools for QTL meta-analysis and integration of meta-QTL, physical maps and genome sequence. A new version of the GnpIS GnpMap database module was developed to manage results from QTLs and MetaQTLs experiments. A paper is in preparation in collaboration with UMR Le Moulon.

End of GnpSeqNGS project

GnpSeqNGS was a project funded by GIS IBISA in 2011. The goal of this project was to improve GnpIS information system to manage NGS data, such as genomic sequencing or resequencing data produced by 454, Illumina or HiSeq technologies. For each run, it is now possible to get a description of the sequencing experiment and an overview of the bioinformatics analyses performed. Interoperability with other modules of GnpIS gives access to more specific information, such as polymorphism detection analysis through the dedicated module GnpSNP. Result analysis data files (for example VarScan results) can be downloaded or used in Galaxy environment. 

End of SyntenyViewer project

Plant Synteny viewer is a project funded by INRA BAP, whose goal was to develop a graphical tool to display and to navigate between genes in ancestral chromosome and modern chromosomes. The software was developped by URGI. A paper is in preparation in collaboration with INRA GDEC.

GnpIS Data summary

The following table summarizes data that are present in our GnpIS system (public data)

Data type Taxons Experiments Features
Genetic maps 7 - 68
Genetic Markers 7 - 32896
QTL 2 32 819
MetaQTL 1 11 19
SNP 42 449 193519
Indels 42 197 10441
Expressions 5 8 103
Genome 8 - 11
Genes 8 - 818867
Genetic ressources 4772 - 16587
Phenotypes 4772 - 80768
Phenotypes (GxE) 6 3 131

URGI organizes training sessions to help biologists and bioinformaticians to master methodology and tools, URGI in-house tools (analysis tools, databases), or tools and methods externally developped.

These sessions are organized either at INRA Versailles center or directly in the INRA center of participants .

For in-house tools, the sessions (course cost) are free. Some sessions are organized in collaboration with other platforms or initiatives at national level. The agenda is fixed each year according to user needs. Send an e-mail to for information and registration.

  • Software development:  eXtreme Programming methodology
  • Data Integration : Talend Open Studio software (ETL) to extract, transform and load data
  • Data analysis
    • REPET software for repeats analysis on genome
    • Apollo software (GMOD tools) for expertized annotation
    • MapHits URGI software (in GALAXY environment ) for polymorphism analysis (NGS sequences)
    • RNASeqDiff workflow (in GALAXY environment) for RNASeq differential analysis (NGS)
    • S-Mart tool (out and in GALAXY environment) for RNASeq, ChipSeq pre-post analysis (NGS)
  • Databases query : GnpIS: all the databases and associated query tools (quick search, Biomart, Galaxy)

Last events in 2013 : 

