top of page

Biodiversity Reference Genomes at ENA and the ERGA Data Portal

At this month's ERGA Plenary meeting, on Monday, November 17 at 15:00 CET, Joana Paupério and Alexey Sokolov will present about Biodiversity Reference Genomes at ENA (European Nucleotide Archive) and the ERGA Data Portal. Check more information below.


ree

Abstract

Reference Genomes produced under the European Reference Genome Atlas (ERGA) are being publicly shared through the European Nucleotide Archive (ENA) and are accessible through the ERGA Data Portal.

The ENA is the European node of the International Nucleotide Sequence Database Collaboration (INSDC) that also includes the National Centre for Biotechnology Information (NCBI) and the DNA Data Bank of Japan (DDBJ). These sequence repositories support the archiving of this reference data and collaborate with the community providing resources for the management, sharing and dissemination of data to promote re-use.

Here we will present the reference genome data structure at ENA and the service developments to support genomes open and FAIR (Findable, Accessible, Interoperable and Reusable) data sharing. These include the development of metadata standards with the community for reporting enriched source information and setting up the data structure for increased accessibility and interoperability. Enhancements have also been made to support the upscaling of submissions to ENA, considering the diversity of taxa and genome characteristics. A new model for handling annotation is being developed at the ENA, decoupling annotations from genome records themselves. Cross references to other data types and search services were improved to facilitate reference genome findability and access, with rich metadata to support the uptake in biodiversity research.

The ERGA Data Portal (https://portal.erga-biodiversity.eu) provides a single open-access platform ensuring FAIR access to all genome data generated by ERGA. It integrates data and metadata from major international repositories—BioSamples, ENA, Ensembl, BioImage Archive, and Wellcome Open Research—together with cross-references to GoaT, TolQC, and the NBN Atlas. Automated pipelines built with Apache Airflow and Apache Beam continuously harmonise and update the integrated dataset around each unique species taxon.

The web portal, built with Angular and FastAPI, offers intuitive search and filtering tools, species-level detail pages, status tracking, and a publications browser. Programmatic access is available through an open API, enabling integration with external tools such as Ensembl dashboards, Jupyter notebooks, and institutional analysis workflows. Analytical and visual layers, powered by BigQuery and Python Dash, provide interactive phylogenetic, geospatial, and metadata dashboards for data exploration. Together, these components create a sustainable and extensible infrastructure supporting ERGA’s mission to deliver high-quality genomic resources for European biodiversity and to promote open, data-driven research and conservation.



Speakers


Joana Paupério is a Biodiversity Project Manager at the European Nucleotide Archive (ENA, EMBL-EBI), where she is responsible for biodiversity data coordination. She

works with the community, understands their needs, and supports data structuring and submission to the sequence archives. She is involved in a number of projects and initiatives working towards FAIR biodiversity genomics data and infrastructure linking. Joana is also co-lead of the ELIXIR Biodiversity Community.


Alexey Sokolov is a project lead at EBI, where he is responsible for building scalable, FAIR-compliant data platforms for life-science research. He has contributed to the development of modern genomic data portals and cloud-based analytics ecosystems supporting large international consortia. His work centres on transforming complex, heterogeneous biological data into accessible, well-structured resources that enable researchers to generate new scientific insights.



🔔 To receive the Zoom link and join this and our upcoming plenary meetings, register as an ERGA member.


▶️ You can watch all previous ERGA Plenary talks here.


If you would like to suggest a speaker or topic for a future plenary session, please contact us at training@erga-biodiversity.eu. We welcome your input!

bottom of page