Search Results
168 results found with an empty search
- Biodiversity Reference Genomes at ENA and the ERGA Data Portal
At this month's ERGA Plenary meeting , on Monday, November 17 at 15:00 CET , Joana Paupério and Alexey Sokolov will present about Biodiversity Reference Genomes at ENA (European Nucleotide Archive) and the ERGA Data Portal . Check more information below. Abstract Reference Genomes produced under the European Reference Genome Atlas (ERGA) are being publicly shared through the European Nucleotide Archive ( ENA ) and are accessible through the ERGA Data Portal . The ENA is the European node of the International Nucleotide Sequence Database Collaboration (INSDC) that also includes the National Centre for Biotechnology Information (NCBI) and the DNA Data Bank of Japan (DDBJ). These sequence repositories support the archiving of this reference data and collaborate with the community providing resources for the management, sharing and dissemination of data to promote re-use. Here we will present the reference genome data structure at ENA and the service developments to support genomes open and FAIR (Findable, Accessible, Interoperable and Reusable) data sharing. These include the development of metadata standards with the community for reporting enriched source information and setting up the data structure for increased accessibility and interoperability. Enhancements have also been made to support the upscaling of submissions to ENA, considering the diversity of taxa and genome characteristics. A new model for handling annotation is being developed at the ENA, decoupling annotations from genome records themselves. Cross references to other data types and search services were improved to facilitate reference genome findability and access, with rich metadata to support the uptake in biodiversity research. The ERGA Data Portal (https://portal.erga-biodiversity.eu) provides a single open-access platform ensuring FAIR access to all genome data generated by ERGA. It integrates data and metadata from major international repositories—BioSamples, ENA, Ensembl, BioImage Archive, and Wellcome Open Research—together with cross-references to GoaT, TolQC, and the NBN Atlas. Automated pipelines built with Apache Airflow and Apache Beam continuously harmonise and update the integrated dataset around each unique species taxon. The web portal, built with Angular and FastAPI, offers intuitive search and filtering tools, species-level detail pages, status tracking, and a publications browser. Programmatic access is available through an open API, enabling integration with external tools such as Ensembl dashboards, Jupyter notebooks, and institutional analysis workflows. Analytical and visual layers, powered by BigQuery and Python Dash, provide interactive phylogenetic, geospatial, and metadata dashboards for data exploration. Together, these components create a sustainable and extensible infrastructure supporting ERGA’s mission to deliver high-quality genomic resources for European biodiversity and to promote open, data-driven research and conservation. Speakers Joana Paupério is a Biodiversity Project Manager at the European Nucleotide Archive (ENA, EMBL-EBI), where she is responsible for biodiversity data coordination. She works with the community, understands their needs, and supports data structuring and submission to the sequence archives. She is involved in a number of projects and initiatives working towards FAIR biodiversity genomics data and infrastructure linking. Joana is also co-lead of the ELIXIR Biodiversity Community. Alexey Sokolov is a project lead at EBI, where he is responsible for building scalable, FAIR-compliant data platforms for life-science research. He has contributed to the development of modern genomic data portals and cloud-based analytics ecosystems supporting large international consortia. His work centres on transforming complex, heterogeneous biological data into accessible, well-structured resources that enable researchers to generate new scientific insights. 🔔 To receive the Zoom link and join this and our upcoming plenary meetings, register as an ERGA member . ▶️ You can watch all previous ERGA Plenary talks here . If you would like to suggest a speaker or topic for a future plenary session, please contact us at training@erga-biodiversity.eu . We welcome your input!
- Join the Taxon Sampling SOP Hackathon!
The ERGA Sampling & Sample Processing Committee invites everyone to join the online Taxon Sampling SOP (Standard Operating Procedure) "Hackathon" on Fri 5th December, 10:00-12:00 CET – we'll work in groups to advance on taxon-specific instructions to sample, process and ship the raw material for genome sequencing. Please contact samples@erga-biodiversity.eu for the Zoom link to join in! 🔗 Add this event to your calendar!
- From Genomes to Impact: the Genomics for Biodiversity Conference 2025
The last few days of October marked the realization of a much-anticipated event: the Genomics for Biodiversity Conference , organized by BGE and ERGA. Hosted by the BIOPOLIS/CIBIO Association near Porto, Portugal, the conference was streamed live to a global audience, bringing together around 70 in-person participants and a similar number of remote attendees. In-person attendees, including most of the BGE case studies representatives. The conference marked the conclusion of BGE’s “ Genome Applications ” work package and had the goal to demonstrate the many different ways genomic science is being used to help address real-world issues, such as biodiversity conservation and the development of the bioeconomy. Importantly, the goal was to showcase genomic-informed actions that are already having an impact, rather than the potential applications of the data. The main highlight of the conference was the series of talks presenting projects (case studies) that have received support from BGE. In total, 27 case study leaders presented their results, outlined the methods used to achieve these results, discussed the relevance of their genomics insights in the policy context and described their efforts to share the newly generated knowledge with relevant stakeholders. The presentations sparked many interesting conversations that went beyond the scientific approach used by each project, addressing other important matters such as genomic outreach and how to best convert the results into actionable knowledge. The programme also included an exciting line-up of four Keynote talks addressing the links between biodiversity genomics, policy, and society. You can (re)watch the keynote talks here . On the final day of the conference, parallel sessions and open discussions focused on the engagement of stakeholders in biodiversity genomics research and how the BGE case studies are influencing European conservation and bioeconomy policies. The closing session brought together speakers from the wider biodiversity genomics community, who presented their work in engaging 5-minute flash talks — an opportunity to explore the landscape of research beyond the BGE project and Europe. Overall, the conference was a lively event and offered a great opportunity for networking and strengthening connections between members of the European Reference Genome Atlas (ERGA) Community. As the BGE Case Studies come to a close, stay tuned — many exciting publications and other outputs are on the way! Photo gallery
- Big data and small brown birds: how whole-genome sequencing can inform conservation of the threatened aquatic warbler?
An aquatic warbler from the currently restored West Pomerania. Genetic monitoring will reveal whether it is offspring of translocated individuals or an unrelated immigrant. Photo: Justyna Kubacka The ERGA-BGE case study led by Dr Justyna Kubacka of Museum and Institute of Zoology, Polish Academy of Sciences, is drawing to an end. You can read about the project here . In a nutshell, the study relied upon available DNA extracts to carry out whole-genome re-sequencing of – ultimately – 60 individuals of the aquatic warbler. This small brown songbird is a threatened habitat specialist breeding in central-European fens, a vanishing wetland habitat. The aquatic warbler went through a steep decline – especially within the past 150-200 years – caused by destruction of its breeding sites. Aquatic warblers were sampled in two geographical populations with hardly any breeding habitat in between. The first one was the Biebrza Valley, located in north-eastern Poland, a stable, large and well-connected population. It was sampled in 1997 and 2018, enabling temporal comparisons. The second one, located about 600 km westwards as the crow flies, was West Pomerania in Poland, a moderate-sized, isolated and steeply declining population. There, samples were obtained in 1999, allowing spatial analysis. With the whole-genome sequences at hand, thanks to ERGA-BGE funding, and supported by colleagues Dr Wiktor Kuśmirek and Dr Krystyna Nadachowska-Brzyska, Dr Kubacka set out to evaluate inbreeding, effective population size and genetic bottlenecks, as well as population structure between the two distant populations. Importantly, the team relied on the recently published, high-quality chromosome-level reference genome of the aquatic warbler. The project focused on two populations of aquatic warbler: Biebrza Valley marshes and Western Pomerania. Photos by Justyna Kubacka and Knyva. Results paint a worrisome picture. Genomic diversity was low in both populations and compared to that found in some passerines endemic to islands. The only good news was that no temporal trend was detected in Biebrza. The low diversity could compromise the ability of the aquatic warbler to adapt to changing conditions, such as increasing drought, emergence of new diseases or lowered abundance of preferred arthropod prey on the marsh, all of which are expected under global warming. This is concerning especially because – unlike in many other birds – it is only female aquatic warblers that incubate eggs and feed young. With this pattern of parental care, raising young successfully under environmental change could present a challenge. Hence, depleted adaptive potential could compromise population growth more in the aquatic warbler than in species in which both parents raise their offspring. The team then looked at inbreeding rates with a powerful tool, runs of homozygosity (ROHs) – stretches within a genome that originate from one ancestor. This kind of inbreeding results not only through mating of closely related individuals, but also when a population is small and so more strongly affected by genetic drift (random loss of genetic diversity) compared to a large population. Long ROHs (above 1 million base pairs) are indicative of inbreeding within the recent 10-50 generations (20-100 years in the aquatic warbler) and form a strong proxy of extinction risk. The pattern of long ROHs showed increased inbreeding in West Pomerania, compared to Biebrza, and no temporal shift in the latter. This means that over the 20-100 years before sampling, West Pomerania had faced enhanced genetic drift and accelerated loss of genomic diversity. On top of this, average relatedness between individuals was clearly elevated in this population, relative to Biebrza. Genomic diversity was similar across all the studied populations. However, this amount of genomic variation is shown by endemic island passerines with small populations. Recent inbreeding - identified with long runs of homozygosity (ROHs) was elevated in West Pomerania, indicating enhanced genetic drift in the last 20-100 years before sampling. Aquatic warblers from West Pomerania were more interrelated than the Biebrza birds, pointing to lack of immigration and low population size in the former. With the dense genomic data, the team was also able to do detective work and track back historical changes in effective population size. This parameter shows how strong the work of genetic drift is, which depends mostly on population size. The results were consistent with the decline in numbers and showed a recent genetic bottleneck around the industrial era in the mid-1800s, when wetland destruction accelerated. This result was especially clear for the Biebrza samples. For West Pomerania, the effective population size was much lower over the examined period than in Biebrza. A picture of a founding event, followed by expansion in the mid-1800s and then a steep drop was revealed. Therefore, genomic diversity of West Pomerania could have been depleted at least twice. Historical changes in effective population size revealed a clear genetic bottleneck in Biebrza around the industrial revolution, when wetland degradation accelerated. In West Pomerania, effective population size has been lower and genomic diversity appears to have gone through a bottleneck twice. Finally, the team checked whether the two populations bear signs of genetic divergence. Results indicated only very week genetic differentiation and no clear evidence for two separate genetic populations. Apparently, either insufficient time has passed for genetic change to accumulate between West Pomerania and Biebrza, or some weak connectivity had been maintained. Nevertheless, it was too little to stop genomic erosion in West Pomerania. While the analysis relied on historical samples, it brings an important insight for present day conservation efforts. A large-scale translocation project is being carried out to restore the nearly lost West Pomerania population. The results of the ERGA-BGE project indicate that genetic drift and inbreeding could have contributed to its swift decline. To restore this population through translocation while avoiding the genomic history repeating itself, it is crucial to improve the breeding habitat and extend its area. It is also fundamental to enlarge or create stepping-stone habitats between West Pomerania and the nearest large and stable populations, to restore gene exchange. The work could not have been performed without the collectors, Prof. Andrzej Dyrcz, Dr Benedikt Giessing, Dr Jarosław Krogulec and Grzegorz Kiljan. For a quarter of century, the samples were curated by Prof. Michael Wink at the Heidelberg University and by Dr Martin Päckert at the Senckenberg Natural History Collections in Dresden, Germany. The project received funding from the European Union under the European Union’s Horizon Europe research and innovation programme, co-funded by the Swiss Government and the British Government. The bioinformatic analysis was conducted with the support of the Interdisciplinary Centre for Mathematical and Computational Modelling at the University of Warsaw, Poland (ICM UW). About the Author Dr Justyna Kubacka is an evolutionary ecologist and ornithologist and is constantly gaining skills in population genomics. She works at the Museum and Institute of Zoology, Polish Academy of Sciences, Warsaw. She belongs to the aquatic warbler genome team of the ERGA Pilot reference genome project. In her free time, she loves doing cross-country skiing and gardening.
- Genomic Connections #6 - The code of life: the world of bioinformatics
In this month's episode of the Genomic Connections podcast, Kasia and Christian talk with Rutger Vos and Diego de Panis about their experiences in bioinformatics and the central role this relatively new discipline plays in transforming vast amounts of data into biological insights. Rutger is a bioinformatician and head of the Data Competence Center at the Naturalis Biodiversity Center (Leiden, Netherlands). Diego is a biologist based at the Leibniz Institute for Zoo and Wildlife Research (Berlin, Germany) with 10+ years of data-driven research experience assembling sequences and connecting genes with phenotypes and biological processes. This episode comes in a different format than usual: it was screenplayed and produced as a video interview, rather than as a podcast. You can watch the video here , and listen to the audio-only version here . 🎧 You can listen to Genomic Connections on Spotify and PocketCast . You can listen to Genomic Connections on Spotify and PocketCast . Check out this recent Connections post in which we discuss "What is bioinformatics?" 🔔 Follow the Genomic Connections Podcast on Spotify to make sure you never miss an episode! https://open.spotify.com/show/01aF7AUVF0PvydbxZADTvN?si=PFC5G62gRtCE2D14esbWnQ Do you have any suggestions about how we can improve the podcast or biodiversity genomic-related topics you would like us to cover? Send us a message! media@erga-biodiversity.eu
- Celebrating 100+ BGE genomes!
In October 2025, Biodiversity Genomics Europe (BGE) reached a milestone: over one hundred genomes have been sequenced and publicly released as part of the project . More than simply collecting samples and sequencing genomes, BGE demonstrates that it is possible to achieve this in a distributed manner, building capacity and contributing to the development of standardized workflows that can be scaled up across Europe. In this way, BGE plays a key role in supporting the ambitious goals of the European Reference Genome Atlas ( ERGA ), the regional node in Europe of the Earth BioGenome Project . While many more BGE genomes are currently on the way - in the process of data generation or computational assembly - the release of the 100th high-quality reference genome marks an important milestone for the project. Genomes across the tree of life From its onset, BGE aimed to sequence broadly across the tree of life, allocating resources to sequence groups of species that are often neglected in biodiversity research - the so-called “Dark Taxa”. Species were also prioritized considering their relevance to agriculture, fisheries, key ecosystem processes, as well as endemic, threatened species, pests, and disease vectors. Taxonomic distribution of the first 108 BGE genomes: representatives of 10 different eukaryotic phyla have already been sequenced! Species sequenced as of October 6th, 2025. About one third of the BGE genomes released so far are so-called “community genomes”. This means that they were produced with the direct support of the broader community of biodiversity researchers in Europe through open calls promoted by BGE - who had a chance to nominate species for sequencing and - if selected - sample them and author the genome reports that characterize those genomes. Click here to read more about this “community-driven” approach to genome sequencing. Another third of the genomes were sampled in targeted campaigns in some of Europe’s most important biodiversity hotspots in different areas of Greece, Spain, and Slovenia. These countries are exceptionally rich in biodiversity and so were considered as priority areas where genomic data will help the generation of knowledge about many species that are still poorly known to science. Finally, some of the genomes sequenced are connected to BGE’s set of Case Studies - applied projects that seek to demonstrate the use of biodiversity genomics in different real-world settings. One of these projects - led by researchers at Estación Biológica de Doñana-CSIC in Seville, Spain - investigates the genomic basis of vectorial capacity and insecticide resistance in Culex mosquitoes using reference genomes and population genomics. The genomes of four Culex species were produced by BGE and analysis is underway - the new knowledge brought by this study will inform authorities and could lead to an impact on genomics-informed public health policies. Click here to learn more about this case study. Sequencing & Assembling - a continental effort Following sampling, the subsequent steps of DNA extraction, sequencing and genome assembly are performed by six different sequencing centers distributed across Europe. The “BGE Sequencing network” includes Genoscope , The Centro Nacional de Análisis Genómico CNAG , The Wellcome Sanger Institute ’s Tree of Life Programme, SciLifeLab , University of Florence and the University of Bari . These institutions vary in size, infrastructure, and areas of expertise and by bringing them together, the BGE project has enabled valuable exchanges and technical developments that strengthen European capacity for genomic sequencing and research. The BGE network of sequencing centers: 6 partner centers distributed across Europe are responsible for producing and assembling the data for a diverse set of species, exchanging protocols and their experiences. Open genomes - data available to all All data produced under BGE are made openly available for anyone to use, and the genome assemblies are no exception. All the genomes produced so far - and the ones that will follow - are made available through the European Nucleotide Archive ( ENA ) - and can be accessed and downloaded by anyone, anywhere in the world. Head to the ERGA Data Portal to browse the data produced by BGE and other ERGA associated initiatives: https://portal.erga-biodiversity.eu/home To further increase the transparency and reproducibility of the reference resources produced, BGE is also producing Genome Reports: standardized technical publications which describe all the methodological steps taken to produce a reference genome: sampling, sequencing, assembling and annotating. The reports are available in two dedicated collections: The “ Genome Reports from the Biodiversity Genomics Europe Project ” Collection hosted at Open Research Europe, an open access publishing venue for European Commission-funded research. The “ European Reference Genome Atlas (ERGA) Community Genome Reports ” Collection hosted at Pensoft’s RIO Journal, which also serves as an open platform for other ERGA associated genomes, beyond BGE. Ongoing work and more genomes to come! Reaching 100 genomes is an important milestone, but it’s not the end. Many more genomes are currently in production, and an increase in the number of releases is expected in the coming months as the BGE project approaches its conclusion. Stay tuned for more updates! Other important Sequencing Status Milestones Reached for ERGA-BGE Species. Explore the ERGA-BGE page in the Genomes on a Tree platform ( https://goat.genomehubs.org/projects/ERGA-BGE ) for an overview of all the assemblies being produced by BGE’s genome stream. 🧬 Check the collection of ERGA-BGE genomes at the European Nucleotide Archive 🌍 Check the distribution of all genomes sequenced under the ERGA umbrella 📑 Check the collection of ERGA-BGE Genome Reports already available Learn more: What is a reference genome and what does it mean to sequence it? What is the European Reference Genome Atlas? How are ERGA and BGE connected? Photos by Georgiakakis,P. , Frédéric Zuberer, Trichas,A., Elicio Tapia, Sébastien-Lavergne, Stephan Koblmüller, Alfredo Garcia, Schmidt Ocean Institute, Simon Vasut, Ricardo Jorge Lopes, Miquel Pontes, Helena Bilandžija, Thomas Daftsios, FP Lima, Mina Trikali, Jairo Patiño, Daniel Fernández, Pedro Oromí.
- Madeiran Insect Bioblitz: Genomes in the laurel – insects across an Atlantic island.
About the hotspot Set in the Atlantic Ocean, Madeira’s steep valleys, high heath, and pockets of ancient laurel forest shelter a remarkable variety of insects, many found nowhere else in the world. Several species are particularly vulnerable to extinction due to the small ranges, fragmented habitats, and fast-changing environmental conditions of the islands. Add heat, habitat loss, and invasive species, and the pressure mounts. Building reliable genomic references helps us spot look-alike species, track arrivals, and make smarter decisions about protecting what’s unique about this UNESCO World Heritage site. Figure 1: Madeiran Landscapes . Photos by aqualuso, Jérémy Glineur, Daria Agafonova and Ad Thiry - obtained via Canva. Did you know? Madeira's Laurisilva forest alone shelters 500+ endemic invertebrate species, and roughly 1 in 5 of ~3,000 recorded insect species are unique to the archipelago. Madeiran insects play key roles in pollination, decomposition, and maintaining forest health, yet many species remain poorly known to science. About the activity The Madeiran Insect Bioblitz (MIB) project engaged citizens in biodiversity genomic research, exploring the diversity of insects and generating reference genomes and barcodes. Two main activities were organized in Funchal: A Diptera workshop was hosted at the University of Madeira and brought together technicians, educators, and researchers involved in nature conservation. Led by the specialist Dr. Paula Riccardi, participants learned to identify fly families from 6 sites using specimens collected with Malaise traps. They sorted, labelled, and compared Diptera with a reference collection, gaining practical skills in taxonomy and specimen handling, important for conservation. Photo Gallery: Diptera workshop Next the bioblitz moved to the Funchal Natural History Museum . Open to the broader public, including amateur naturalists and local community members, this activity combined indoor exploration of the museum’s insect collections with outdoor sampling in the butterfly garden. Participants learned about insect anatomy, sampling methods, and identification techniques, while also using the iNaturalist platform to document species. Through nets, lenses, and a dose of curiosity, participants helped building a growing reference collection, sharpened their eyes to Madeira’s insects, and strengthened local conservation know-how. Those hands-on moments, spotting, sorting, and seeing tiny worlds up close, sparked fresh questions. What else lives in our gardens? Which species are still waiting to be found? That curiosity now fuels ongoing monitoring and research, turning small discoveries into big wins for ecosystem health. We extend a big thank you to everyone who was involved in organizing these events and also to all the participants! Pictures taken by Luena Soraya, Thais Coppen, Hugo Silva, Paula Riccardi, and Emily Hartop. This initiative was funded through Biodiversity Genomics Europe ( BGE ), a project funded by the European Union's Horizon Food, Bioeconomy Natural Resources, Agriculture and Environment Framework Programme: More Information: Madeira Insects Bioblitz (MIB) Website and activities reports: https://entomoteca.web.uma.pt/mib/ UNESCO World Heritage - Laurisilva of Madeira: https://whc.unesco.org/en/list/934/ Announcement of the bioblitz event in a local news portal: https://funchalnoticias.net/2024/09/13/bioblitz-de-insetos-no-museu-de-historia-natural-do-funchal/ Secretaria Regional do Ambiente (SRA), A floresta Laurissilva da Madeira – Património Natural , Funchal, 2004.
- Connection #8 - Bioinformatics: reassembling the book of life
The European Reference Genome Atlas ( ERGA ) and the European node of the International Barcode of Life ( iBOL Europe ), two international communities of scientists brought together under the Biodiversity Genomics Europe Project, are joining forces for “Connections,” a series of blog posts that explore the fascinating world of Biodiversity Genomics and the intersection of their communities. In our previous posts, we compared DNA to a book: barcodes help us identify which book we are holding, while reference genomes enable us to read every page. But here is the twist: by the time DNA leaves the wet lab, the book is broken as if we have run the pages through a “paper–shredder”. DNA extraction, library preparation, and DNA sequencing all turn the long DNA sequence into millions of pieces (Check the Connections blog #3 for an overview of these different steps of the genomic workflow). Bioinformatics is the art of turning that pile of shreds back into something we can read, search, and compare. It is the art that turns barcodes and reference genomes into usable knowledge. Figure 1: Informatics and advanced computing are necessary to analyse the huge amount of data generated for genomic research. Bioinformatics is the product of molecular biology meeting computing. Bioinformatics facilitated the development of the first sequence alignments and substitution matrices, dynamic programming, the creation of searchable databases, and the first “find-it-fast” tools that supercharged homology searches. As sequencing scaled, assembly algorithms emerged, followed by hybrid approaches for long-read platforms. Alongside the algorithms came various file formats (FASTA/FASTQ/BAM/CRAM/GFF/GTF), workflow engines, and the hard-won lesson that reproducibility matters more than quick fixes. For barcoding, the task is targeted: extract a standard marker (or “abstract”), check its quality, align it against a trusted database, and report the most accurate match with confidence. Think of well-indexed catalogues and fast look-ups, ideal for monitoring and quick assessments. For reference genomes, the task is editorial. Correct sequencing errors, assemble the million pieces into chromosomes, phase haplotypes, polish with multiple evidence tracks (long reads, linked reads, Hi-C, RNA-seq), and annotate genes and repeats. That finished “book” enables population genomics, local adaptation, and conservation genomics studies. Figure 2: Examples of some common bioinformatics tasks when working with genomic data from across the tree of life. Bioinformatics is the art that turns raw data into knowledge with useful applications for biodiversity. Modern analyses involve dozens of steps, quality checks, trimming, deduplication, mapping, variant calling, assembly, scaffolding, annotation, all wrapped in containers and workflows to make sure a colleague can re-run them on Tuesday and get an answer on the same day. Good metadata is the structure that holds all the pages: sample, permit, locality, preservation, instrument, kit, and version numbers (Check this episode of the Genomic Connections Podcast to learn more about the importance of metadata). Without that structure, even the finest assembly becomes a vague curiosity. A few field notes from the trenches Everyone has a story of a 2 a.m. run that failed because a file was called final_FINAL_reallyFinal.fastq.gz. We have all been rescued by checksums, saved by containerised toolchains, and learned never to delete intermediate files before the multi-QC report is green. We name scripts after pets, we comment our code (eventually), and we celebrate the day a 500 GB BAM shrinks elegantly into a reproducible VCF. Why does this matter to BGE? For iBOL Europe, robust bioinformatics means clean barcode libraries, sound assignments, and credible trend analyses. For ERGA, it means reference genomes that stand up to re-analysis and can power subsequent population, functional, and comparative genomics, the applications stakeholders care about (from conservation planning to bioeconomy uses). Bioinformatics is not an afterthought: it is a research field itself! It is the bridge from sequencer output to decisions. Treat pipelines as publishable methods, treat metadata as data, and treat your future self as a collaborator who deserves clarity. In the next post, we will demonstrate how these computational foundations are applied in practical settings, including monitoring, policy, and management, without losing sight of the overall context (or the pages).
- ERGA meets VGP in New York City
Last week, ERGA was present at the Vertebrate Genomes Project 2025 Conference , held at The Rockefeller University in New York City from September 30 and October 1. ERGA and VGP share a long-standing collaboration, working together on shared workflows for genome assembly and evaluation and collaborating under the Biodiversity Genomics Project ( BGE ). So far, dozens of vertebrate genomes from species found across Europe have been sequenced under the ERGA umbrella, directly contributing to the VGP’s goal of sequencing all ~70,000 living vertebrate species. Discussing vertebrate genomes in the heart of New York City. Tom Brown, coordinator of the ERGA IT & Infrastructure Committee , presented the work within the BGE project on FAIR (Findable, Accessible, Interoperable, and Reusable) genome assembly publishing and establishing distributed models of genome generation across Europe. As the VGP and the Earth BioGenome Project (EBP) begin their journey into the Phase II expansions of each project, attention must be given to fully FAIR reporting and publishing of all outputs from sample to reference genome, and coordinated across all nodes of the EBP. In New York, Tom presented ERGA’s solutions for generating reports for all genomes produced as part of the BGE project and sharing all bioinformatic workflows within WorkflowHub . Photos by Chul Lee . Relevant links VGP Website: https://vertebrategenomesproject.org/ Larivière, D., Abueg, L., Brajuka, N. et al. Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nat Biotechnol 42 , 367–370 (2024). https://doi.org/10.1038/s41587-023-02100-3
- ERGA at the EMBO course in genome sequencing, assembly, curation, and downstream analyses
This September, the heart of Tuscany beat to the rhythm of genomics! From the 8th to 12th September 2025, Florence, Italy, welcomed researchers from around the globe for the third edition of the EMBO Practical Course on Genome Sequencing, Assembly, Curation, and Downstream Analyses . This week-long event explored the entire workflow of reference genome projects - from sample preparation to sequencing, assembly and annotation and provided a key opportunity to showcase ERGA and how its collaborative infrastructures and shared standards support the generation of high-quality reference genomes across Europe. EMBO course instructors and participants enjoying a moment together in front of Florence’s Basilica di Santa Croc The course was designed for action. Participants worked hands-on with real datasets - PacBio HiFi, Oxford Nanopore, Illumina Hi-C and RNA-seq - to perform de novo assembly, scaffolding, haplotig purging, and genome annotation within the Galaxy platform , using the Training Infrastructure as a Service ( TIaaS ). Beyond technical exercises, the course offered insights into reproducible workflows, pangenome development, and AI-assisted functional annotation. ERGA workflows and tools drew particular interest, showing how standardized approaches can streamline genome projects and make high-quality genomic data widely accessible. Participants at all career stages left with new knowledge and skills to apply these cutting-edge practices in their own research. This edition of the EMBO course was organized by Claudio Ciofi (University of Florence, IT) and Giulio Formenti (The Rockefeller University, US; ERGA Council member for Italy). The co-organizers and invited speakers included Aureliano Bombarely (IBMCP/CSIC, ES; ERGA Council member for Spain), Silvia Manrique (CSIC, ES), Jean-François Flot (Université libre de Bruxelles, BE), Nadège Guiglielmoni (University of Cologne, DE), Astrid Böhne (Museum Koenig Bonn, DE), Tom Brown (Leibniz Institute for Zoo and Wildlife Research, DE; Chair of the ERGA IT Committee), Kirsty McCaffrey (The Rockefeller University, US), Marco Sollitto (University of Florence, IT), Alice Mouton and Björn Grüning (University of Freiburg, DE), with assistance from Camilla Reginatto De Pierri (University of Florence, IT; Chair of the ERGA TKT Committee). Text by Camilla Reginatto De Pierri, from the ERGA Training & Knowledge Transfer Committee
- Efficient evidence-based genome annotation with EviAnn
This month's ERGA BioGenome Analysis & Applications Seminar will feature a talk by speaker Aleksey V. Zimin about EviAnn (Evidence-based Annotator), a novel evidence-based eukaryotic gene annotation system. Tuesday, September 23rd 2025 - 15:00 CEST Youtube link: https://www.youtube.com/live/n1usz4-mCXo 📅 Add the seminar to your calendar Abstracts Efficient evidence-based genome annotation with EviAnn For many years, machine learning-based ab initio gene finding approaches have been central components of eukaryotic genome annotation pipelines, and they remain so today. The reliance on these approaches was originally sustained by the high cost and low availability of gene expression data, a primary source of evidence for gene annotation along with protein homology. However, innovations in modern sequencing technologies have revolutionized the acquisition of gene expression data, allowing scientists to rely more heavily on this class of evidence. In addition, proteins found in a multitude of well-annotated genomes represent another invaluable resource for gene annotation. Existing annotation packages often underutilize these data sources, which prompted us to develop EviAnn (Evidence-based Annotator), a novel evidence-based eukaryotic gene annotation system. EviAnn takes a strongly data-driven approach, building the exon-intron structure of genes from transcript alignments or protein-sequence homology rather than from purely ab initio gene finding techniques. We show that when provided with the same input data, EviAnn consistently outperforms current state-of-the-art packages including BRAKER3, MAKER2, and FINDER, while utilizing considerably less computer time. Annotation of a mammalian genome can be completed in less than an hour on a single multi-core server. EviAnn is freely available under an open-source license from https://github.com/alekseyzimin/EviAnn_release and from Bioconda as “eviann”. Practical introduction to genome annotation with EviAnn In the second part of the presentation I will explain how to use EviAnn, to annotate genomes of small and large eukaryotes. I will show how to find and download protein evidence from NCBI and describe inputs and outputs of EviAnn. For demonstration purposes, I will run an annotation of a small fungal genome. Speaker Dr. Aleksey V. Zimin Research Scientist, Department of Biomedical Engineering Johns Hopkins University, USA I have been working in the field of Bioinformatics since 2002, beginning with my collaborations with The Institute for Genomic Research (TIGR) and Celera Genomics. The main goals of my research are (i) developing algorithms and software for de novo genome assembly and annotation for the latest generation sequencing data and (ii) applying the software to produce high quality annotated assemblies for the most challenging genomes. I lead the development of the open-source MaSuRCA genome assembly package, which is currently able to produce accurate high-quality assemblies from sequencing data produced by Illumina, PacBio, and Oxford Nanopore instruments. As of today, MaSuRCA was used to assemble over 2600 eukaryotic genomes submitted to NCBI GenBank. I played a leading role in producing assemblies for many challenging genome projects, including the 22 Gbp genome of Loblolly pine (Pinus taeda), the 17 Gbp genome of bread wheat (Triticum aestivum), the 3Gbp Atlantic salmon (Salmo salar), and many other plants and animals. In recent years, I was the leading author of several widely used bioinformatic software titles such as MUMmer4 sequence aligner, POLCA and JASPER assembly polishers, and SAMBA scaffolder. Most recently the focus of my research has encompassed transcriptome assembly, protein alignment and genome annotation algorithms. My most recent work includes a novel automated genome annotation package called EviAnn, which sets new standard for automated genome annotation software. The software titles that I develop and/or maintain are available under an open-source license from my github repository https://github.com/alekseyzimin and several titles are also available from Bioconda.












