top of page

Search Results

185 results found with an empty search

Blog Posts (158)

  • Biodiversity Reference Genomes at ENA and the ERGA Data Portal

    At this month's ERGA Plenary meeting , on Monday, November 17 at 15:00 CET , Joana Paupério and Alexey Sokolov will present about Biodiversity Reference Genomes at ENA (European Nucleotide Archive) and the ERGA Data Portal . Check more information below. Abstract Reference Genomes produced under the European Reference Genome Atlas (ERGA) are being publicly shared through the European Nucleotide Archive ( ENA ) and are accessible through the ERGA Data Portal . The ENA is the European node of the International Nucleotide Sequence Database Collaboration (INSDC) that also includes the National Centre for Biotechnology Information (NCBI) and the DNA Data Bank of Japan (DDBJ). These sequence repositories support the archiving of this reference data and collaborate with the community providing resources for the management, sharing and dissemination of data to promote re-use. Here we will present the reference genome data structure at ENA and the service developments to support genomes open and FAIR (Findable, Accessible, Interoperable and Reusable) data sharing. These include the development of metadata standards with the community for reporting enriched source information and setting up the data structure for increased accessibility and interoperability. Enhancements have also been made to support the upscaling of submissions to ENA, considering the diversity of taxa and genome characteristics. A new model for handling annotation is being developed at the ENA, decoupling annotations from genome records themselves. Cross references to other data types and search services were improved to facilitate reference genome findability and access, with rich metadata to support the uptake in biodiversity research. The ERGA Data Portal (https://portal.erga-biodiversity.eu) provides a single open-access platform ensuring FAIR access to all genome data generated by ERGA. It integrates data and metadata from major international repositories—BioSamples, ENA, Ensembl, BioImage Archive, and Wellcome Open Research—together with cross-references to GoaT, TolQC, and the NBN Atlas. Automated pipelines built with Apache Airflow and Apache Beam continuously harmonise and update the integrated dataset around each unique species taxon. The web portal, built with Angular and FastAPI, offers intuitive search and filtering tools, species-level detail pages, status tracking, and a publications browser. Programmatic access is available through an open API, enabling integration with external tools such as Ensembl dashboards, Jupyter notebooks, and institutional analysis workflows. Analytical and visual layers, powered by BigQuery and Python Dash, provide interactive phylogenetic, geospatial, and metadata dashboards for data exploration. Together, these components create a sustainable and extensible infrastructure supporting ERGA’s mission to deliver high-quality genomic resources for European biodiversity and to promote open, data-driven research and conservation. Speakers Joana Paupério is a Biodiversity Project Manager at the European Nucleotide Archive (ENA, EMBL-EBI), where she is responsible for biodiversity data coordination. She works with the community, understands their needs, and supports data structuring and submission to the sequence archives. She is involved in a number of projects and initiatives working towards FAIR biodiversity genomics data and infrastructure linking. Joana is also co-lead of the ELIXIR Biodiversity Community. Alexey Sokolov is a project lead at EBI, where he is responsible for building scalable, FAIR-compliant data platforms for life-science research. He has contributed to the development of modern genomic data portals and cloud-based analytics ecosystems supporting large international consortia. His work centres on transforming complex, heterogeneous biological data into accessible, well-structured resources that enable researchers to generate new scientific insights. 🔔 To receive the Zoom link and join this and our upcoming plenary meetings, register as an ERGA member . ▶️ You can watch all previous ERGA Plenary talks here . If you would like to suggest a speaker or topic for a future plenary session, please contact us at training@erga-biodiversity.eu . We welcome your input!

  • Join the Taxon Sampling SOP Hackathon!

    The ERGA Sampling & Sample Processing Committee invites everyone to join the online Taxon Sampling SOP (Standard Operating Procedure) "Hackathon" on Fri 5th December, 10:00-12:00 CET – we'll work in groups to advance on taxon-specific instructions to sample, process and ship the raw material for genome sequencing. Please contact samples@erga-biodiversity.eu for the Zoom link to join in! 🔗 Add this event to your calendar!

View All

Other Pages (27)

  • Our Partner Projects | ERGA

    OUR PARTNER PROJECTS ERGA is the pan-European partner of the Earth Biogenome Project (EBP) Regional Partners: French Atlas of Marine Genomes (ATLASea) Earth Biogenome Project Norge (EBP-Nor) Swedish Earth BioGenome Project Worldwide Partners:

  • Annotation_guide | ERGA

    Structural annotation - So you want to annotate protein-coding genes in your genome? Version 1.0 - August 2023 Top of Page 1. Before you start 2. Do you want to do your own annotation? 3. Evaluate your Annotation 4. Finalise your Annotation Authors : Alice Dennis, Jèssica Gómez, Leanne Haggerty, Lucile Soler, Aureliano Bombarley, Henrik Lantz, Florian Maumus, Hugues Roest Crollius, Fergal Martin, Jean-Marc Aury, Christian deGuttry, Robert Waterhouse, and the ERGA Annotation committee . STEP 1 - Before you start Step 1a: Be sur e the a ssembly is done and you are working with a frozen/stable version! Tabl e 1: Genome assembly evaluation before annotation. Rationale: Low consensus accuracy, incomplete genomes, and contamin ations lead to po or annotation. It is thus essential to evaluate your genome before you start the annotation process. 1. Consensus Accuracy and assembly completeness evaluated (suggestion: Merqury) 2.Gene space complet eness evaluated by: a. Conserved gene space (suggestion: BUSCO ) b. RNA-Seq mappig (suggestion: STA R /Minimap2 ) 3. Organelle/Contamination screening and removal: a. Organelle Genomes( suggestion: Minimap2 ) b. Contaminations (suggestion: BlobTools ) 4. Unc ollapsed duplication for the consensus haploid assembly (suggestion: purge_dups ) 5. Full THE completeness evaluated with the LAI (suggestion: LTR_Retriever ) 6. Does Your genome meet E BP standards ? Step 1b: Is your assembly done? If yes go to Step 1C , if no go to Table 1 . Step 1c: Is the assembly publicly available? Public release is necessary for annotation by Ensembl. If yes go to Step 1d, if no, go to Step 2. Step 1d: Is the public assembly linked to ERGA? If yes, go to Step 1f. If no, go to Step 1e Step 1f: This will make the assembly available for annotation at Ensembl rapid provided that relevant transcriptomic data are also publicly available (ENA). Step 1e : Instructions on how to link your project to ERGA STEP 2 - Do you want to do your own annotation? S tep 2a: Do you want to do your own annotation? If yes, go to Step 2b Step 2b: Gather all available Evidence data: Transcriptomic and protein datasets to support the annotation process. Table 2: Evaluation of your evidence data: the accuracy of the genome annotation process is very sensitive to the amount and quality of your evidence data. 1. RNA-Seq transcriptomic data a. Mapping evaluation (suggestion STAR ). b. Transcript models (suggestion: StringTie ). c. Gene space completeness (suggestion: BUSCO ). 2. Protein d ataset a. Gene space compl eteness (suggestion: BUSCO ). b. Percentage of full protein alignments (suggestion: Spaln ). 3. I soSeq transc riptomic data a. Mapping evaluation (suggestion: Minimap2 ). b. Transcript models (suggestion : StringTie ). c.Gene space completeness (suggestion: BUSCO ). Done? If yes, go to Step 2c . Step 2c : Repeat prediction. ERGA recommends: Repeat Modeler2 , Repeat Masker , Protein Excluder, TEclass , PASTEC , TEdenovo . Done? If yes, go to Step 2d . Step 2d : Ab initio training and prediction. ERGA recommends: AUGUSTUS and Gene Mark-ET/EP/ETP. Step 2e : Gene modelling. ERGA recommends: TSEBRA (BRAKER based predictions), Evidence Modeler , and MAKER . Done? For an evaluation of your evidence data go back to Table 2 . Once done, you are ready for the final quality and contamination check and you can go to Step 3a . STEP 3 - Evaluate your annotation Step 3a : Evaluate your annotation. There is no temporal order for the following suggestions: Step 3b : MAKER eAED scores. Step 3c : Gene family analysis. Step 3d : Genome visualization: 1. IGV ; 2. A pollo (manual curation); 3. EasyGB (JBrowser for a simple dataset). Step 3e : Generate basic gene model summary statistics and compare with related species. Step 3f : BUSCO , visual inspection in browser in context with evidence. Step 3g : Use mapped reads to estimate: 1. How many apparently transcribed regions don't have annotation?; 2. How many genes or exons are supported by read data? Step 3h : Compare gene content to related species with similar annotation approach. Happy with the metrics assessment of each of the parameters for which the annotation has been evaluated? Remember that some of them may depend on the phyla. This is your DIY annotation v1. Again, this is a stopping place. Do not go forward until this is complete. If you are happy with the metric assessment, you can move to Step 4 . STEP 4 - Finalise your annnotation Step 4a : Create proper file formats (ENA GFF3 format r ecommendations). Consider to change the Identifiers produced by the different gene annotation tools (e.g., gene-1) for a more meaningful Identifier (SpeciesCode+AssemblyVersion+Chr/Scf/Ctg-XXX+G+YYYYYY). Step 4b: Provide this annotation to Ensembl as a second track (via GFF3 submission to ENA) and go back to Step 1f. 1. Before you start 2. Do you want to do your own annotation? 3. Evaluate your Annotation 4. Finalise your Annotation

  • REGISTER | ERGA

    ERGA Registration Form

View All
bottom of page