Efficient evidence-based genome annotation with EviAnn

luisamarins19
Sep 17, 2025
3 min read

Updated: Sep 23, 2025

This month's ERGA BioGenome Analysis & Applications Seminar will feature a talk by speaker Aleksey V. Zimin about EviAnn (Evidence-based Annotator), a novel evidence-based eukaryotic gene annotation system.

Tuesday, September 23rd 2025 - 15:00 CEST

Youtube link: https://www.youtube.com/live/n1usz4-mCXo

📅 Add the seminar to your calendar

Abstracts

Efficient evidence-based genome annotation with EviAnn

For many years, machine learning-based ab initio gene finding approaches have been central components of eukaryotic genome annotation pipelines, and they remain so today. The reliance on these approaches was originally sustained by the high cost and low availability of gene expression data, a primary source of evidence for gene annotation along with protein homology. However, innovations in modern sequencing technologies have revolutionized the acquisition of gene expression data, allowing scientists to rely more heavily on this class of evidence. In addition, proteins found in a multitude of well-annotated genomes represent another invaluable resource for gene annotation. Existing annotation packages often underutilize these data sources, which prompted us to develop EviAnn (Evidence-based Annotator), a novel evidence-based eukaryotic gene annotation system. EviAnn takes a strongly data-driven approach, building the exon-intron structure of genes from transcript alignments or protein-sequence homology rather than from purely ab initio gene finding techniques. We show that when provided with the same input data, EviAnn consistently outperforms current state-of-the-art packages including BRAKER3, MAKER2, and FINDER, while utilizing considerably less computer time. Annotation of a mammalian genome can be completed in less than an hour on a single multi-core server. EviAnn is freely available under an open-source license from https://github.com/alekseyzimin/EviAnn_release and from Bioconda as “eviann”.

Practical introduction to genome annotation with EviAnn

In the second part of the presentation I will explain how to use EviAnn, to annotate genomes of small and large eukaryotes. I will show how to find and download protein evidence from NCBI and describe inputs and outputs of EviAnn. For demonstration purposes, I will run an annotation of a small fungal genome.

Speaker

Dr. Aleksey V. Zimin

Research Scientist, Department of Biomedical Engineering

Johns Hopkins University, USA

I have been working in the field of Bioinformatics since 2002, beginning with my collaborations with The Institute for Genomic Research (TIGR) and Celera Genomics. The main goals of my research are (i) developing algorithms and software for de novo genome assembly and annotation for the latest generation sequencing data and (ii) applying the software to produce high quality annotated assemblies for the most challenging genomes. I lead the development of the open-source MaSuRCA genome assembly package, which is currently able to produce accurate high-quality assemblies from sequencing data produced by Illumina, PacBio, and Oxford Nanopore instruments. As of today, MaSuRCA was used to assemble over 2600 eukaryotic genomes submitted to NCBI GenBank. I played a leading role in producing assemblies for many challenging genome projects, including the 22 Gbp genome of Loblolly pine (Pinus taeda), the 17 Gbp genome of bread wheat (Triticum aestivum), the 3Gbp Atlantic salmon (Salmo salar), and many other plants and animals. In recent years, I was the leading author of several widely used bioinformatic software titles such as MUMmer4 sequence aligner, POLCA and JASPER assembly polishers, and SAMBA scaffolder. Most recently the focus of my research has encompassed transcriptome assembly, protein alignment and genome annotation algorithms. My most recent work includes a novel automated genome annotation package called EviAnn, which sets new standard for automated genome annotation software. The software titles that I develop and/or maintain are available under an open-source license from my github repository https://github.com/alekseyzimin and several titles are also available from Bioconda.