The URGI newsletter #3

CJW Network - Developers united in eZ Publish CJW Network - Developers united in eZ Publish
The URGI newsletter #3 Jul 2014

Editorial

The URGI has recently invested the comparative genomics field in the frame of various academic collaborations ( Read et al. Nature 2013 Pont et al. Plant J. 2013 Slotte et al. Nature genetics 2013 ), studying genetic variation within or between species in order (i) to identify genes that are important for evolution of a particular species, (ii) to study genome evolution, and (iii) to improve the identification of genes involved in important traits across species. No doubt that comparative genomics plays today a key role in translational biology for agronomical researches.

Our GnpIS information system has been improved to support this research field. Until recently, we were only relying on the well-known GBrowse_syn demonstrated by our public instance of the GBrowse_Syn synteny viewer showing genome alignments of two Botrytis cinerea and a close related species Sclerotinia sclerotiorum . This has recently been completed by a new module, SyntenyViewer , a tool aimed at navigating in the synteny between species according to the representation of their common ancestral chromosomes.

This year, we also have significantly increased the wheat data we are hosting in the frame of our participation to the International Wheat Genome Sequencing Consortium ( IWGSC ). GnpIS is the main repository of the IWGSC and provides newly released public data and new dedicated tools (see article “Wheat 3B reference sequence”). Wheat scientists will find here  up-to-date genomic sequences, physical maps, SNP and transcriptomic data.

Improving wheat data sharing is an important goal at URGI. We chair the WheatIS Expert Working Group , an international collaboration endorsed by the wheat initiative  (supported by G20 Agricultural Ministers to coordinate worldwide research efforts in the fields of wheat genetics, genomics, physiology, breeding and agronomy). The Expert Working Group’s project aims at building an International Wheat Information System, called hereafter WheatIS   to support the wheat research community. The main objective is to provide a single-access web base system to the available data resources and bioinformatics tools.

In the last 18 months, we also have worked a lot at strengthening our capacity to deal with the “big data” challenge by improving our hardware and software infrastructure. Today we propose an enhanced analysis environment with our Galaxy server (see article “NGS analysis through URGI Galaxy server”) that used our improved computer cluster (see article “Updates and upgrades of The URGI's Cluster”).

Publications

23 Jun 2014 

Ancestral repeats have shaped epigenome and genome composition for millions of years in Arabidopsis thaliana
Florian Maumus et al.

Nature communications

02 May 2014 

PASTEC: An Automatic Transposable Element Classification Tool.
Claire Hoede et al.

PLoS One. 2014 May 2;9(5):e91929. doi: 10.1371/journal.pone.0091929. eCollection 2014.

07 Apr 2014 

Deep Investigation of Arabidopsis thaliana Junk DNA Reveals a Continuum between Repetitive Elements and Genomic Dark Matter
Florian Maumus et al.

PLoS ONE

18 Nov 2013 

Distribution, evolution, and diversity of retrotransposons at the flamenco locus reflect the regulatory properties of piRNA clusters
Zanni V et al.

Proc Natl Acad Sci U S A. 2013 Nov 18.

26 Aug 2013 

GnpIS: an information system to integrate genetic and genomic data from plants and fungi
Delphine Steinbach et al.

Database, Vol. 2013, Article ID bat058, doi:10.1093/database/bat058

News

Events

Wheat 3B reference sequence

In the frame of the 3BSEQ project , INRA – GDEC , CEA – Génoscope and INRA – URGI produced a reference sequence of the hexaploid wheat chromosome 3B. The sequence is composed of a 774 Mb pseudomolecule and 1450 unlocalized scaffolds.

 

The Wheat 3B reference sequence is available :

  • to BLAST (public access): whole chromosome and CDS only (nucleotide and peptide).

Note that URGI public BLAST allows also to query the IWGSC survey sequence assemblies.

The tracks are Scaffolds, BACs, Gaps, Markers, CDS, ncRNA, rRNA, tRNA and TEs.

 

The sequence analysis has been described by Frederic Choulet at PAG 2014:

We established a strategy that combined several technologies to sequence 8452 Bacterial Artificial Chromosomes pooled by 10 and were able to assemble a pseudomolecule of 774 Mb carrying 7264 protein-coding genes and 85% of transposable elements. Comparative genomics with model grasses revealed that wheat has recently undergone massive inter and intrachromosomal gene duplications. Distribution of both structural and functional features highlighted a striking compartmentalization. Chromosomal extremities, corresponding to regions where meiotic recombination takes place, are enriched in genes originating from recent duplication events, expressed in specific conditions, and with function related to adaptation, which contrasts with the features of the central region of the chromosome. Such reference sequence provides an important resource to support the identification of genes underlying important traits and novel insights into the organization, function and evolution of a complex polyploid genome.

Updates and upgrades of The URGI's Cluster

"The power of Isengard is at your command, Sauron, Lord of the Earth."
  Saruman

The URGI's cluster
 Our cluster is composed by 78 nodes wich represent 856 cores,  [ Tips : you can check it by the command 'qstat -f' on saruman. ] giving to our infrastructure a theoretical computing power of about 10 Tflops. We use Sun Grid Engine as job manager. The two main queue are all.q that is the default one and long.q.that is specified for for long jobs (up to one month) or when asking for more than 16Go of RAM.

New enhancement
 During the last year, we brought to the cluster new enhancements at a very low level of the infrastructure. The RAM amount of some nodes was upgraded as well as storage facility by moving from one netapp FAS3240 powered by DataOnTap 7 to two netapp FAS3240 appliance powered by DataOnTap 8.1 in cluster mode. This new system and hardware provides load balancing and failover management. Now, if one netapp fails, you will be able to continue your work without noticing it (although perhaps with a slightly slower access to the data). This upgrade will also allows us to plan the system update of the storage without outage.

Another enhancement is the connexion between the cluster and the data storage provided by the netapp by moving from an aggregate of four 1G ethernet links to  two 10G Ethernet links. This lowers the bottleneck between the nodes and the storage, quickly bringing the data to the computing nodes, providing quicker job processing.

A failover management system was installed on the SGE queueing system, in case of a master node failure with a shadow node which, within 5 min, takes over in order to keep the running jobs.

We tuned a little bit the node on the Saruman server used for job submission to cope with the lauch of heavy processes due to an uncorrect evaluation of the load they would generate. In such case Saruman was at best very slow and at worst crashed so that we had to reboot it, which consequently killed all the running jobs and processes. In order to avoid this, we set a limit on processes running on saruman which now cannot use more than 10Go of RAM, otherwise only this process crashes, but not the server neither the process of other users.

In order to be more efficient and to respond quickly to users, as a complement of ganglia , we added some monitoring on the cluster through nagios , which alerts the administrators by e-mail in case of problem on the cluster (ie : heavy load on submit node or crash of compute nodes).

Your system administrators Claire and Mikael.

NGS analysis through URGI Galaxy server

In the frame of national/international projects, the URGI platform develops tools and workflows for NGS analyses, focusing on structural variations detection, RNA-Seq Differential Expression analysis and comparative genomics. The Galaxy system was chosen as the integrative platform to set up workflows, allowing their reusability and accessibility. 
In April 2013 URGI plant and fungal platform jointly launched a new call for proposals dedicated to research units of the BAP, SPE and EFPA INRA divisions. We aimed at providing data analysis environments with the appropriate scale for NGS analysis (storage, computing resources and bioinformatic tools) and training to allow scientists to perform analyses by themselves. The first round of submitted projects were focusing on URGI lab thematics (SNP calling, RNA-Seq) in order to use our Galaxy instance and our in house workflows. In 2013-2014, 16 short-term projects were selected for a six months duration. Two Galaxy training sessions were held at URGI to help biologists in NGS data handling, tool execution and workflow design. Tutorials and training supports are publicly available and published as Galaxy pages. A Unix access to our Cluster with the same criteria of storage and duration was also provided to the advanced users of the projects (Unix familiar) who had to attend a training session on Cluster job submission .

As partner of « France Genomique » bioinformatic work package, and member of French Bioinformatic Infrastructure we are involved in the development of a virtual environment for the Galaxy server infrastructure with a larger scope that will beneficiate to the French scientific community.

To unsubscribe from this newsletter please visit the following link: unsubscribe
© 2024 http://urgi.versailles.inra.fr/
eZ Publish™ copyright © 1999-2024 eZ Systems AS