ERGA at the BioHackathon Europe 2025

luisamarins19
Sep 19, 2025
3 min read

Are you interested in bioinformatics and eager to connect with peers worldwide while working on projects with direct applications for the life sciences? Join this year’s BioHackathon Europe between 3–7 November 2025. Online participation is free and registrations are still open! Face-to-face registration is currently full, but you can join the waiting list.

“BioHackathon Europe is an annual event that brings together bioinformaticians and computational biologists from around the world. It’s organised by ELIXIR Europe, and offers an intense week of hacking, with participants working on diverse and exciting projects. BioHackathon is a community-driven event, which provides an opportunity for members of the life sciences community to meet and work together on topics of common interest. The goal is to create code that addresses challenges in bioinformatics research.”

This year, members of the ERGA community will lead two projects at the BioHackathon and are looking for more contributors to join their teams! The first project, proposed by the Data Analysis Committee, focuses on developing workflows for phylogenomics. The second project, developed by the Annotation Committee in collaboration with the Research Data Alliance, addresses FAIR metadata associated with genome annotations. Check out the project abstracts below and join the one that interests you most!

Project 3: Automatic workflow for benchmarking BUSCO genes for phylogenomics

Abstract

Phylogenomics is a central aspect of biodiversity genomics, as it reveals the relationships among organisms and key evolutionary processes such as introgression and gene flow. Genome-scale datasets are increasingly a reality in phylogenomics due to the availability of genomes for an ever-growing number of species. BUSCO datasets (universal single-copy orthologs) have become standard in assessing genome assembly completeness and are fully integrated into the pipelines of large genome consortia such as ERGA. Due to their low-copy nature, BUSCO genes are also increasingly used in phylogenomics, from genome skimming data to high-quality chromosome-scale genomes. Yet, their phylogenetic performance has not been thoroughly explored. Preliminary analyses show that BUSCO genes can recover robust phylogenetic relationships, but their single-copy nature is challenged: most BUSCO genes display varying levels of paralogy when using biodiverse species sets, and failure to account for this can negatively affect phylogenetic reconstruction.

This BioHackathon aims to build an automatic phylogenomics pipeline using the output of the BUSCO software. Contrary to existing pipelines, we aim to explicitly resolve paralogy events, thereby resulting in larger and more informative datasets. This pipeline will be used to benchmark the phylogenetic performance of the newly defined BUSCO lineage datasets, identifying not only the prevalence and evolutionary depth of the various paralogs but also resolving them for improved phylogenetic utility of BUSCO genes. This project will result in a fully-fledged FAIR-compliant phylogenomics pipeline based on BUSCO and an assessment of the phylogenetic performance of new BUSCO gene sets (version odb12).

Leads: Tereza Manousaki, Iker Irisarri, Tom Brown

Project 23: Streamlining FAIR Metadata for Biodiversity Genome Annotations

Abstract

Initiatives like the European Reference Genome Atlas (ERGA) and Australian Tree of Life (ATOL) comprise a scientific response to the current severe threats to biodiversity, generating thousands of reference genomes for species across the tree of life. Unfortunately, few solutions exist for structured reporting, quality assessment and persistent deployment of metadata pertaining to the annotation of functional and structural features along the assembled genomes.

ERGA is developing a format and repository for annotation reports to support collection of metadata for genome annotations in line with the existing ERGA Assembly Reports (EAR). ATOL is currently building a ""Genome Engine"" capable of producing automated genome notes and INSDC submissions for genomic data. Similar initiatives exist in the context of other biodiversity projects.

In parallel, the FAIRification of Genomic Annotations Working Group (FGA-WG) in the Research Data Alliance (RDA) has been developing a secondary, harmonised FAIR metadata model and infrastructure to improve discovery and reuse of publicly available genomic annotations/tracks, across biomedical and biodiversity fields.

This project brings together – for the first time – participants from two disjoint BioHackathon projects, BH2024 project #31 and BH2023 project #20, across ERGA, ATOL, FGA-WG, Ensembl, EBP-Nor and other initiatives, to form a ""supergroup"" to tackle the metadata challenges pertaining to biodiversity genome annotations!

For the 2025 BioHackathon, we will develop automated processing, validation and transformation of collected metadata consistently deployed across repositories for ERGA Annotation Reports and ATOL Genome Notes (primary metadata), and metadata harmonised according to the recommendations from the FGA-WG group (secondary metadata).

Leads: Sveinung Gundersen, Alice Dennis, Tiffanie Nelson

Check the full list of projects selected for this year’s BioHackathon Europe

Read about ERGA's participation in the BioHackathon Europe 2023.

#BioHackEU25