Genoak consortium (restricted)

Access restricted to GenOak project consortium


Genome assemblies:   fasta files

Assemblies Qrob_V1 (Plomion et al. 2016) Qrob_V2 Qrob_H2.3 (Haplome)

Qrob_pseudomecules

v2

Qrob_unassigned
Assembly size (bp) 1,354,311,717 1,455,104,916 814,282,569 716,731,785 97,636,684
# Scaffolds 17,910 8,827 1,409 12 Chr (871 scaffolds) 538
N50  256,640 821,707 1,342,530 57,352,617 621,902
L50  1,468 537 192 5 37
N90  35,065 198,501 333,129 44,977,106 96,968
L90  6,626 1,880 649 10 195
Max length (bp) 1,922,255 5,542,037 5,871,596 115,639,695 2,943,817
Min length (bp) 2003 2000 2095 39,860,516 2,095
#N (% assembly) 156,586,910 (11.6%) 67,010,588 (4.6%) 23,989,938 (2.9%) 20,278,712 (2.8%) 3,790,126 (3.9%)

N50: shortest sequence length at 50% of the genome

L50: number of scaffolds whose summed length is N50

Gene prediction statistics

Assembly / gene release v1 / v1 v2 / v2.2 Haplome_2.3
Gene quality regular unreliable regular & manual Low confidence regular & manual Low confidence Final list of genes
#Genes 110,058 79052 43755 25808
58,778 51,280 53,484 23,547 29,665 13,575 25808
# Uncomplete genes 4,500 2,021 515 0
Gene space (Mb) 120.6 22.5 150 20 83.5 11.7 75
Gene mean / median size (bp) 2,051 / 1,530 439 / 291 2,809 / 2,050 851 / 417 2,813 / 2,055 858 / 422 2,907 / 2,137
CDS mean / median size (bp) 1,025 / 810 266 / 243 1,062 / 831 286 / 273 1,068 / 831 286 / 273 1,174 / 942
#Polypeptide < 500 bp 10,881 51,280 12,847 23,547 7,000 13,575 4,367
#Polypeptide > 3Kb 1,695 NA 2,019 NA 1,162 NA 1,162
#genes with introns 44,595 (76%) 18,999 (37%) 42,627 (80%) 13,313 (57%) 23,723 (80%) 7,736 (57%) 20,297 (79%)
# introns per gene 2.7 0.5 3.2 0.9 3.2 0.9 3.3

 

Transcript sequences

 

Pseudomolecule + unassigned scaffolds:  JBrowse OakMine_PM1N  

 
 

Haplome release v2.3 gene release October 2016/11/25 

We use the results of a first round of orthoMCL+ CAFE results to curate the set of genes.

We deleted many genes associated to TE, unreliable genes and small regular genes splitted or very badly predicted belonging to clusters containing only Qrob genes (not any of the other 15 species in analysis). Finally we kept 25808 predicted protein

Quercus Robur gene information file  

List of genes deleted

GFF file

Coding sequences (Nucleotides + Amino acid sequences

Haplome release v2.3 unfiltered gene release  

43755 genes (out of 79052 from v2) were mapped on Haplome v2.3 assembly. Genes IDs and structure are inchanged.
Some tags could have been modified between Gene in V2 and gene in Haplome for 28 new genes tagged uncomplete (due to their percentage of N (>20%) in coding sequence or due to a stop in frame).

  • regular: 28488 genes with [N<20%] AND [[length > 500 nt] OR [gene with length < 500 nt with oak transcript evidence > 90% of coverage]]
  • manual_v1: 1130 genes predicted using mapping of gene from manual curation in v1
  • manual_v2: 47 genes manually curated in v2.2 
  • unreliable: 13575 "low confidence" genes with length < 500 nt without transcript evidence > 90% of coverage
  • uncomplete: 515 genes [without start or/and stop] OR [N>20%]

GFF file of Gene prediction H2.3 (Eugene + manual annotation) on assmelby release Haplome v2.3

Coding (uncomplete not included) sequences predicted on Haplome v2.3 (43240 proteins)

List of 2088 manually predicted genes in V2.2 (2067 using 1567 from V1 + 21 only in V2). 1181 genes out of these genes were recovered in Haplome. four out of them tagged uncomplete (stop in frame), thus not translated.

  • Manually annotated genes: Location in assembly V2 and haplome. Gene symbol and description are given when available: Qrob_V1_v2_haplome_AnnotV1.csv (237.53 kB)

List of 35297 genes in V2 not recovered in Haplome

 GFF file of TEannotation recovered in Haplome v2.3  

Counterpart between V2 (diploid) and Haplome V2.3

Assembly release v2.2 gene prediction : JBrowse , OakMine

79052 genes were predicted and tagged using different flags of quality reported in GFF file in tag gene_qual

  • regular: gene with length > 500 nt OR (gene with length < 500 nt with oak transcript evidence > 90% of coverage)
  • manual_v1: 1990 genes predicted using mapping of gene from manual curation in v1
  • manual_v2: 98 genes manually curated in v2.2 
  • unreliable: gene with length < 500 nt without transcript evidence > 90% of coverage
  • uncomplete: gene without start or/and stop

Genes updated between gene prediction V2 and V2.2: 150 genes manually curated or restored from eugene prediction to recover good ORF.  4 gene were deleted.  Qrob_v2_Genes_v2.2_20151202_genes_modified.lst  (2.34 kB)

GFF file of Gene  prediction V2.2 (Eugene + manual annotation) on assembly release V2. 

Coding (uncomplete not included) sequences predicted (V2.2) on assembly release V2. 

 Other annotation files (Annotation on assembly release V2)

Assembly release v1 JBrowse  

Genes (Coding sequences) predicted by Eugene :

  These fasta files contain reliable and unreliable Eugene predicted gene (without UTRs)- Qrob_Pxxxxxxx.1 : gene with length > 500 nt OR (gene with length < 500 nt with oak transcript contig evidence)- Qrob_uPxxxxxxx.1 : gene with length < 500 nt without transcript evidence (at the time of prediction pipeline)

List of plant species used to detect expansion/contraction of gene families in oak

scientific name species acronyme number of predicted proteins version reference (doi)  
Quercus robur Qr 43240 haplome (v2.3) this study  
Malus domestica Md 63514 v1.0 10.1038/ng.654 Phytozome11
Prunus persica Pp 27864 v2.1 10.1038/ng.2586
Populus trichocarpa Pt 41335 v3.0 10.1126/science.1128691
Citrus clementina Cc 24533 v1.0 10.1038/nbt.2906
Fragaria vesca Fv 32831 v1.1 10.1038/ng.740
Arabidopsis lyrata Al 32657 v1.0 10.1038/ng.807
Solanum tuberosum St 35119 v3.4 10.1038/nature10158
Arabidopsis thaliana At 27416 TAIR10 10.1038/35048692
Ricinus communis Rc 31221 v0.1 10.1038/nbt.1674
Glycine max Gm 56044 Wm82.a2.v1 10.1038/nature08670
Vitis vinifera Vv 26346 Genoscope.12X 10.1038/nature06148
Carica papaya Cp 27584 ASGPBv0.4 10.1038/nature06856
Theobroma cacao Tc 29452 v1.1 10.1186/gb-2013-14-6-r53
Eucalyptus grandis Eg 36376 v2.0 10.1038/nature13308
Citrullus lanatus Wa 23440 v1 10.1038/ng.2470 Download
eZ Publish™ copyright © 1999-2024 eZ Systems AS