top of page

Search Results

27 results found with an empty search

  • Our Partner Projects | ERGA

    OUR PARTNER PROJECTS ERGA is the pan-European partner of the Earth Biogenome Project (EBP) Regional Partners: French Atlas of Marine Genomes (ATLASea) Earth Biogenome Project Norge (EBP-Nor) Swedish Earth BioGenome Project Worldwide Partners:

  • CS - Citizen Science

    citizenscience@erga-biodiversity.eu < Back CS - Citizen Science citizenscience@erga-biodiversity.eu The Citizen Science and Outreach Committee aims to facilitate collaboration and communication between scientists, stakeholders and citizens to increase trust in the scientific process and ensure that genomics research reflects the needs and perspectives of the broader community. By engaging a broad range of stakeholders, including policy makers, non-governmental organisations, industry representatives and citizens, the Committee seeks to foster a fruitful multi-stakeholder dialogue and support the traditional community knowledge through the use of citizen science and stakeholder perspectives. To achieve its goals, the Committee works to engage citizens and stakeholders in the research process and to promote public understanding of genomics through various public events and other activities. In addition, the Committee supports policies that support genomics research and its translation into practical applications for the benefit of society. (V.1.0 02.05.2023) Chair Elena Buzan Coordinators Christian de Guttry Luísa Marins Steering Committee Jacob Höglund Lino Ometto Svein-Ole Mikalsen Chiara Bortoluzzi Welcome to the new members of the ERGA Executive Board! Press Releases Connections #9: How Biodiversity Genomics drives conservation impact ERGA News #32 - November 2025

  • Media & Communications

    media@erga-biodiversity.eu < Back Media & Communications media@erga-biodiversity.eu The Media & Communications Committee is responsible for communicating ERGA's goals, actions, and accomplishments internally and externally. Our committee produces newsletters, press releases, blog-posts, manages the website, and maintains social media accounts. We are responsible for developing communication strategies, implementing plans for publicising ERGA events and activities, and ensuring that all relevant information is disseminated in a timely and accurate manner. It is our responsibility to raise awareness about ERGA both inside and outside the scientific community, in order to encourage more people to support and join our community and contribute to our mission. Coordinators Christian de Guttry Luísa Marins Steering Committee Alice Mouton Jan Zwilling Follow #ERGA ! Stay connected! Follow us on social media for updates and insights. 🌍 https://linktr.ee/erga_biodiversity #Genomes for #Biodiversity Welcome to the new members of the ERGA Executive Board! Press Releases Connections #9: How Biodiversity Genomics drives conservation impact ERGA News #32 - November 2025

  • Annotation Committee

    annotation@erga-biodiversity.eu < Back Annotation Committee annotation@erga-biodiversity.eu Annotations transform genomes into larger sources of knowledge and offer critical added value to genome assemblies. They serve as a direct link between the genome sequence and function and facilitate comparisons across taxa, both on a large and small scale. The ERGA Annotation Committee is composed of researchers and bioinformaticians who have experience in using computational methods to predict the structural and functional composition of whole genomes. We strive to understand and use the most advanced analytical methods for annotating genomes, to develop new methods for annotation and its evaluation, and to align with the standards set by the EBP. We are committed to promoting the best possible annotation pipelines are available to and used by the annotation community and new researchers entering this field. The committee meets regularly to review the progress of the annotation in ERGA-genomes, discuss current issues and challenges, and propose changes to improve the annotation process. (V.1.0 16.05.2023) Chair Alice Dennis Coordinator Christian de Guttry Steering Committee Aureliano Bombarely Hugues Roest Crollius Henrik Lantz Fergal Martin Florian Maumus Committee Resources Structural Annotation Guide Pre-trained AUGUSTUS models 💡ERGA Knowledge Hub Pipelines BRAKER3 Protein-Coding Annotation Pipeline Welcome to the new members of the ERGA Executive Board! Press Releases Connections #9: How Biodiversity Genomics drives conservation impact ERGA News #32 - November 2025

  • FAQs | erga

    Frequently Asked Questions about ERGA Frequently Asked Questions (FAQ) What is ERGA? The European Reference Genome Atlas is a community of peers working to advance the generation of reference genomes for European Biodiversity. ERGA members share a passion for biodiversity and see reference genomes as key resources that can boost our understanding of biodiversity and inform conservation strategies. Our community is made up of researchers with very diverse expertise and backgrounds working in the European continent or interested in European biodiversity. ERGA also represents the European node of the global Earth BioGenome Project , which has the goal of coordinating the generation of reference genomes for all of Earth’s Biodiversity. What are ERGA’s main goals? ERGA’s Core Objectives are to: Create and consolidate a collaborative and interdisciplinary network of scientists across Europe and associated countries to deliver reference genome sequences; Connect relevant infrastructures across Europe following a distributed model for genome sequence generation and analysis that can increase dynamically; Develop guidelines and best practices for state-of-the-art reference genome sequence generation, and disseminate them through training and knowledge transfer; Connect BioGenome initiatives working on European species to each other and with ERGA’s own initiatives to maximise synergies. How can I get involved and contribute to ERGA? Firstly, please register as an ERGA member . Membership is free and will ensure you receive our monthly newsletter and information about upcoming events and meetings. Once you become a member, you will have easy access to ERGA meetings . Our monthly plenary meetings are a good starting point to get to know the community. If you are interested or need support with a specific step of the genome generation process, you might want to interact with or even join one of the open ERGA committees . Each committee has their own way of operating and a monthly meeting slot. If you want to participate in any of the committees just send an email to the committee’s address to be added to their communication channels and learn the best opportunities to contribute. If you have an ongoing genome project of any European eukaryotic species, you can associate it with ERGA as an ERGA Community Genome. Check this page for more information on this procedure. What are the benefits of joining ERGA? If you are a researcher working on biodiversity genomics, joining and following ERGA’s activities can bring many advantages, including: Taking an active role in the generation of high-quality reference genomes for biodiversity conservation; Networking - through our network you will be able to interact and collaborate with colleagues from all across Europe working on topics related to your research; Get support from the ERGA Committees - as a member, you have direct access to groups of specialists in all steps of the genome production workflow; Go beyond science - Besides producing reference genomes and connecting researchers, ERGA is also committed to reaching out beyond academia to disseminate the importance of biodiversity and the role of genomics; From theory to practice - Lead the application of genomics technologies to biodiversity research and conservation directly in the field. What is the policy of ERGA on data? Check our Open Data Policy . This covers key requirements and recommendations regarding the collection, processing, storage, and publishing of metadata and data related to the production of high-quality reference genomes. If you have questions or concerns about our data policy, please reach out to the IT & Infrastructure committee at itinfra@erga-biodiversity.eu . How can I connect with other members of ERGA in my country? To interact with the ERGA Community in your country, please contact your country’s Council representative through the email available here and ask about any local initiatives already in place and how to engage. If your country is not yet represented in the ERGA council, we are happy to welcome new countries and hope to have representation from all European countries! Please refer to the Governance Document for more details on how to join the ERGA Council as a representative of your country. How can I get in touch if I have other questions? You can reach out to ERGA through many channels. Here are some ways to get in touch with us: Email us at contact@erga-biodiversity.eu You can join the ERGA Keybase team and ask your question in one of the many channels (instructions for this are provided when you sign up to become a member) Social Media: You can also follow us on X @erga_biodiv (previously Twitter), ERGA LinkedIn and Mastodon .

  • Library | ERGA

    ERGA Library Filter by Category Select Category Publication Biodiversity Genomics Research Practices Require Harmonising to Meet Stakeholder Needs in Conservation Year: 2025 DOI/URL: https://onlinelibrary.wiley.com/doi/10.1111/mec.70001 Next Publication A chromosome-level genome assembly of the European green toad (Bufotes viridis) Year: 2025 DOI/URL: https://doi.org/10.1093/g3journal/jkaf002 Next Publication Chromosome-level reference genome assembly for the mountain hare (Lepus timidus) Year: 2025 DOI/URL: https://peercommunityjournal.org/articles/10.24072/pcjournal.514/ Next Publication The genome sequence of the Violet Copper, Lycaena helle (Denis & Schiffermüller, 1775) Year: 2025 DOI/URL: https://doi.org/10.12688/f1000research.156485.1 Next Publication Nuclear and mitochondrial genome assemblies for the endangered wood-decaying fungus Somion occarium Year: 2025 DOI/URL: https://doi.org/10.1093/gbe/evaf003 Next Publication Chromosome-scale genome assembly and de novo annotation of Alopecurus aequalis Year: 2024 DOI/URL: https://doi.org/10.1038/s41597-024-04222-y Next Publication A Faroese perspective on decoding life for sustainable use of nature and protection of biodiversity Year: 2024 DOI/URL: https://doi.org/10.1038/s44185-024-00068-0 Next Publication The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics Year: 2024 DOI/URL: https://doi.org/10.1038/s44185-024-00054-6 Next Publication Building a Portuguese coalition for biodiversity genomics Year: 2024 DOI/URL: https://doi.org/10.1038/s44185-024-00061-7 Next Publication Contextualising samples: supporting reference genomes of European biodiversity through sample and associated metadata collection Year: 2024 DOI/URL: https://doi.org/10.1038/s44185-024-00053-7 Next Publication First Chromosome-Level Genome Assembly of a Ribbon Worm from the Hoplonemertea Clade, Emplectonema gracile, and Its Structural Annotation Year: 2024 DOI/URL: https://doi.org/10.1093/gbe/evae127 (Funded by the Research Council of Norway project “InvertOmics—phylogeny and evolution of lophotrochozoan invertebrates based on genomic data” (project number: 300587 to T.H.S.) Next Publication The genome sequence of the Violet Carpenter Bee, Xylocopa violacea (Linnaeus, 1785): a hymenopteran species undergoing range expansion Year: 2024 DOI/URL: https://doi.org/10.1038/s41437-024-00720-2 Next 1 2 3 1 ... 1 2 3 ... 3

  • OUR COMMUNITY | ERGA

    Executive Board Council of Countries Committees Our Partners Pilot Project Former Contributors top OUR COMMUNITY ERGA is an bottom-up initiative based on people, consisting of hundreds of scientists across the entire European continent and beyond. Research institutions, infrastructure facilities as well as partner genome projects will play an important role within ERGA. Finally, ERGA will have a focus on societal needs particularly related to Biodiversity conservation. For that purpose, ERGA will work engaging governmental and non-governmental entities and will closely involve citizens in different actions. Executive Board Executive Board executive-board@erga-biodiversity.eu Robert Waterhouse Chair Ann Mc Cartney Vice Chair Olga Vinnere Pettersson Vice Chair Tyler Alioto Scientific Officer Kay Lucek Partnership Officer Stefaniya Kamenova Dissemination Officer Lada Lukić Bilela Social Integration Officer Camila Mazzoni Funding Opportunities Officer Jaakko Pohjoismäki Genomic Outreach Officer List of Former Executive Board Members > Council of Country Representatives Contact the national representatives for more information on the ERGA community in your country! Andorra Andorra@erga-biodiversity.eu Manel Niell List of Former Council Members > Council of Countries Committees SSP - Sampling & Sample Processing samples@erga-biodiversity.eu More > DAC - Data Analysis Committee analysis@erga-biodiversity.eu More > Media & Communications media@erga-biodiversity.eu More > Social Justice Committee socialjustice@erga-biodiversity.eu More > SAC - Sequencing and Assembly Committee assembly@erga-biodiversity.eu More > ITIC - IT & Infrastructure Committee itinfra@erga-biodiversity.eu More > CS - Citizen Science citizenscience@erga-biodiversity.eu More > Annotation Committee annotation@erga-biodiversity.eu More > ELSI - Ethical, Legal, and Social Issues elsi@erga-biodiversity.eu More > TKT - Training and Knowledge Transfer training@erga-biodiversity.eu More > Committees Our Partners Our Partners ERGA is the pan-European partner of the Earth Biogenome Project (EBP) Affiliated Initiatives Associated Partners Pilot Project Pilot Project Committee Coordinators pilot@erga-biodiversity.eu Giulio Formenti Alice Mouton Ann Mc Cartney Learn more about the Pilot Project Former Contributors SSP - Sampling & Sample Processing Committee Filter by Type Astrid Böhne (Former Committee Chair) Former Contributors

  • Glossary | ERGA

    Glossary This page provides explanations about terms and acronyms often used within ERGA and in the context of Biodiversity Genomics. You can filter the terms alphabetically or according to categories: Annotation Citizen Science Data Analysis ELSI IT & Infrastructure Media & Communications Other Sampling & Sample Processing Sequencing & Assembly A B C D E F G H I J K L M O P R S T V W References > (Genome) annotation The process of identifying the functions of different pieces of a genome. This includes genes that code for proteins and non coding features (e.g. intron-exon structure of protein coding genes, promotors, transposable elements). Typically performed using computational methods, followed by manual curation. (Genome) assembly A genome assembly is a representation of an organism’s genome that is made using computer programs to turn (assemble) raw sequence data into longer, continuous sequences. (Genome) completeness An estimate of how well a reference genome represents the complete sequence of the target organism. A complete genome should equal the haploid genome size of the target, but may be defined when ‘all chromosomes are gapless and have no runs of 10 or more ambiguous bases, there are no unplaced or unlocalized scaffolds, and all expected chromosomes are present.’ (https://www.ncbi.nlm.nih.gov/assembly/). There are different approaches to estimate the completeness, like BUSCO, analysing K-mers, etc. ABS Access & Benefit Sharing BGE Biodiversity Genomics Europe. The BGE Project has received funding through a Horizon Europe call on Biodiversity and Ecosystem Services. The overarching BGE project includes two streams of genomic research: reference genomes and barcoding, in an effort to establish ERGA and BIOSCAN as the European nodes of the Earth Biogenome Project and of the International Barcode of Life (IBOL), respectively. BUSCO A bioinformatic method (Benchmarking Universal Single-Copy Orthologues) used to estimate the completeness of the coding fraction of an organism’s genome based on the proportion of (lineage specific) single copy orthologous genes that are found in a genome assembly. Biodiversity genomics The application of genomic methods to research biodiversity. CARE Principles The CARE principles for Indigenous data governance (https://www.gida-global.org/care) provide a governance framework that supports the recognition of rights and interests Indigenous Peoples’ to their physical and digital data as well as their Indigenous Knowledges. CBD Convention on Biological Diversity COPO The Collaborative OPen Omics (COPO) platform is for researchers to publish their research assets, providing metadata annotation and deposition capability. It allows researchers to describe their datasets according to community standards and broker the submission of such data to appropriate repositories whilst tracking the resulting accessions/identifiers. Learn more about COPO in this article by the Earlham Institute. CS Citizen Science Committee Chromosome-level assembly the process of generating a contiguous sequence of all chromosomes of a genome, often aided by genetic maps or proximity ligation techniques (3C-seq, Hi-C); term also used to refer to the resulting genome sequence. Council meetings During the monthly ERGA council meetings, the representatives of countries and other genome projects associated with ERGA meet to discuss and vote on important matters related to ERGA’s governance and actions. The council is the main decision making body of the consortium. Learn more about ERGA's structure in our Governance Document. DAC Data Analysis Committee DSI Digital Sequence Information - learn more: https://www.cbd.int/dsi-gr/ DToL The Darwin Tree of Life Project aims to sequence the genomes of 70,000 species of eukaryotic organisms in Britain and Ireland. EBP The Earth BioGenome Project EBP Genome assembly quality standard 6.C.Q40 Minimum reference standard of 6.C.Q40, i.e. megabase N50 contig continuity and chromosomal scale N50 scaffolding, with less than 1/10,000 error rate. For species with chromosome N50 smaller than a megabase this will be C.C.Q40. Additional recommendations include K-mer completeness >90%, BUSCO complete single-copy single >90%, BUSCO complete single duplicate < 5%, and Gaps/Gbp <1000. EC European Commission ELSI Ethical, Legal, and Social Issues (Committee) ENA The European Nucleotide Archive (https://www.ebi.ac.uk/ena) is a global repository for sequence data and provides resources that support management and access to sequence data. ERGA European Reference Genome Atlas ERGA Plenary Our plenary meetings are open to all registered ERGA members and generally include short updates given by committee chairs and one invited talk on various themes connected to biodiversity genomics (watch the previous ones here). ERGANews ERGA’s monthly newsletter, includes important updates about the consortium, each of the committees and associated projects. Our newsletters are usually published on the first Tuesday of each month. All editions of the newsletter are stored here. Equity Deserving According to the Canadian Council (https://canadacouncil.ca/glossary/equity-seeking-groups) equity deserving groups are those individual researchers, communities, Peoples, regions or countries that have identified barriers to equal access, opportunities, and resources due to disadvantage and/or discrimination and that are actively seeking, and deserving of social justice and reparation. The discrimination experienced could be caused by attitudinal, historic, social, and environmental barriers that could be based on a plethora of characteristics that are including (but not limited to) sex, age, ethnicity, disability, economic status, gender, gender expression, nationality, race, sexual orientation, and creed. FAIR Principles A set of principles to guide appropriate management and curation of scientific data (https://www.go-fair.org/fair-principles/) that emphasise data accessibility and use by ensuring that data are Findable, Accessible, Interoperable, and Reusable. Due to the increasing amount of scientific data being reposited, FAIR guidelines promote a data format that is amenable to automated computational access of data by stakeholders Galaxy Galaxy is an open source, web-based platform for data intensive biomedical research. Genome Report A genome report is a technical publication that describes all the steps taken to produce a reference genome: sampling, sequencing, assembling, annotating. They often have a standardised format and structure that allows readers to quickly and easily understand the quality of the genome and how it was generated. GoaT Genomes On A Tree HE Horizon Europe , sometimes refers to the BGE project funded under HE HSM Hierarchical Storage Management is both a data management and data storage technique which transparently manages the movement of data between the different layers of a tiered storage based on file size thresholds, usage and I/O pressure. Usually, a tiered storage is composed of one or more layers of disk arrays, ordered by capacity, latency, redundancy and storage cost. A slow but economically effective archival layer is at the bottom, composed of magnetic tape libraries and automated tape robots, with the highest capacity and latency. The movement between layers is automatically triggered. Haplotype A haplotype refers to the collection of genetic material within an organism that is inherited together. Haplotype may be used to describe a few loci or any number of chromosomes (a chromosome-scale haplotype). Hi-C Sequencing-based method used to study three-dimensional interactions among chromatin regions by measuring the frequency of contact between pairs of loci. Since contact frequency is related to the distance between a pair of loci, Hi-C linking information is used to help with scaffolding stages during a genome assembly process. Hi-C map / graph production The occurrence and frequency of Hi-C contacts are analysed and used in assembly scaffolding. They are typically visualised in Hi-C 2D heatmaps with the full genome sequence on the X and Y axis and a markup for each observed contact. HiFi reads HiFi (High Fidelity) PacBio reads are produced by taking multiple sequences of the same molecule to provide a consensus sequence that is usually 12-20kbp long and has a low error rate (>99.9 % consensus accuracy). INSDC International Nucleotide Sequence Database Collaboration (https://www.insdc.org/) is an initiative between the DDBJ, EMBL-EBI and NCBI that together act as a global repository of sequence data and associated metadata, and provide tools and services that allow access to genomic resources. ITIC IT & Infrastructure Committee IsoSeq This is a sequencing protocol developed by PacBio that aims to sequence full-length transcripts using the accurate, long read capabilities of PacBio HiFi technology. IsoSeq data facilitate analysis of transcriptomes and genome annotation by identifying full-length isoforms of transcripts. JEDI / DEIJ Justice, Equity, Diversity, and Inclusion Subcommittee K-mer A K-mer is a DNA sequence of length k; for example, the sequence AGCT contains the 3-mers (K-mers of length 3) AGC and GCT. Library DNA, cDNA, or RNA that has been prepared for NGS within (usually) a specific size range and containing adapters, which are designed to be appropriate for (a) specific sequencing platform(s). M&C Media & Communications Committee Metadata A collection of data that provides contextual information about multiple characteristics of other, corresponding original data. ONT Oxford Nanopore Technologies (ONT; https://nanoporetech.com/) is a next generation sequencing technology whereby sequence data are generated from the changes in current that occur as single-stranded DNA or RNA molecules pass through nanoscale protein pores (nanopores). ONT provides long read data (up to several megabases) that facilitate genome assembly. Omni-C Modified version of Hi-C that uses a sequence-independent endonuclease during its protocol to produce more even sequence coverage increasing overall resolution. Open data Open data are freely accessible and unrestricted data that can be accessed, used,reused and shared with third parties for any purpose. PUID A permanent unique identifier is a unique label for an object that does not change, such as the Digital Object Identifier (DOI) attached with a scientific publication. PacBio Pacific Biosciences (PacBio; https://www.pacb.com/) is a single-molecule, real time (SMRT) next generation sequencing technology in which sequence data are generated by fluorescent light emission that occurs when a DNA polymerase adds nucleotides. PacBio produces long read data (tens of kilobases) that facilitate genome assembly. RNA-Seq RNA-Seq is a technique that determines the complete or partial RNA sequence using NGS. The RNA expression profiles vary in different tissues of the same organism and can be influenced by physiopathological circumstances. RNA-Seq data facilitate genome assembly by providing empirical evidence for annotation of transcribed regions. Reference genome An accepted standard representation of an organism’s DNA sequence. High-quality reference genomes typically have high completeness (chromosome-level with few gaps in sequence), few errors, and are annotated and accessible. A reference genome serves as a tool for alignment-based analyses, such as variant calling or RNAseq, and has many other applications, for example, phylogenetics and evolutionary relationships, identification of genes and variants, functional analysis and comparative genomics. Reference genomes referred to as “drafts” are those that are under active construction and refinement, and not yet finalised through manual curation. SAC Sequencing and Assembly Committee SOP A standard operating procedure (SOP) is a document that provides detailed instructions on how to perform an activity, outlining the step-by-step process required for its execution. SRA Sequence Read Archive SSP Sampling & Sample Processing (Committee) TKT Training & Knowledge Transfer Committee References The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics (Glossary)- bioRxiv 2023.09.25.559365; doi: https://doi.org/10.1101/2023.09.25.559365 How genomics can help biodiversity conservation; doi: https://doi.org/10.1016/j.tig.2023.01.005 Refererences

  • Annotation_guide | ERGA

    Structural annotation - So you want to annotate protein-coding genes in your genome? Version 1.0 - August 2023 Top of Page 1. Before you start 2. Do you want to do your own annotation? 3. Evaluate your Annotation 4. Finalise your Annotation Authors : Alice Dennis, Jèssica Gómez, Leanne Haggerty, Lucile Soler, Aureliano Bombarley, Henrik Lantz, Florian Maumus, Hugues Roest Crollius, Fergal Martin, Jean-Marc Aury, Christian deGuttry, Robert Waterhouse, and the ERGA Annotation committee . STEP 1 - Before you start Step 1a: Be sur e the a ssembly is done and you are working with a frozen/stable version! Tabl e 1: Genome assembly evaluation before annotation. Rationale: Low consensus accuracy, incomplete genomes, and contamin ations lead to po or annotation. It is thus essential to evaluate your genome before you start the annotation process. 1. Consensus Accuracy and assembly completeness evaluated (suggestion: Merqury) 2.Gene space complet eness evaluated by: a. Conserved gene space (suggestion: BUSCO ) b. RNA-Seq mappig (suggestion: STA R /Minimap2 ) 3. Organelle/Contamination screening and removal: a. Organelle Genomes( suggestion: Minimap2 ) b. Contaminations (suggestion: BlobTools ) 4. Unc ollapsed duplication for the consensus haploid assembly (suggestion: purge_dups ) 5. Full THE completeness evaluated with the LAI (suggestion: LTR_Retriever ) 6. Does Your genome meet E BP standards ? Step 1b: Is your assembly done? If yes go to Step 1C , if no go to Table 1 . Step 1c: Is the assembly publicly available? Public release is necessary for annotation by Ensembl. If yes go to Step 1d, if no, go to Step 2. Step 1d: Is the public assembly linked to ERGA? If yes, go to Step 1f. If no, go to Step 1e Step 1f: This will make the assembly available for annotation at Ensembl rapid provided that relevant transcriptomic data are also publicly available (ENA). Step 1e : Instructions on how to link your project to ERGA STEP 2 - Do you want to do your own annotation? S tep 2a: Do you want to do your own annotation? If yes, go to Step 2b Step 2b: Gather all available Evidence data: Transcriptomic and protein datasets to support the annotation process. Table 2: Evaluation of your evidence data: the accuracy of the genome annotation process is very sensitive to the amount and quality of your evidence data. 1. RNA-Seq transcriptomic data a. Mapping evaluation (suggestion STAR ). b. Transcript models (suggestion: StringTie ). c. Gene space completeness (suggestion: BUSCO ). 2. Protein d ataset a. Gene space compl eteness (suggestion: BUSCO ). b. Percentage of full protein alignments (suggestion: Spaln ). 3. I soSeq transc riptomic data a. Mapping evaluation (suggestion: Minimap2 ). b. Transcript models (suggestion : StringTie ). c.Gene space completeness (suggestion: BUSCO ). Done? If yes, go to Step 2c . Step 2c : Repeat prediction. ERGA recommends: Repeat Modeler2 , Repeat Masker , Protein Excluder, TEclass , PASTEC , TEdenovo . Done? If yes, go to Step 2d . Step 2d : Ab initio training and prediction. ERGA recommends: AUGUSTUS and Gene Mark-ET/EP/ETP. Step 2e : Gene modelling. ERGA recommends: TSEBRA (BRAKER based predictions), Evidence Modeler , and MAKER . Done? For an evaluation of your evidence data go back to Table 2 . Once done, you are ready for the final quality and contamination check and you can go to Step 3a . STEP 3 - Evaluate your annotation Step 3a : Evaluate your annotation. There is no temporal order for the following suggestions: Step 3b : MAKER eAED scores. Step 3c : Gene family analysis. Step 3d : Genome visualization: 1. IGV ; 2. A pollo (manual curation); 3. EasyGB (JBrowser for a simple dataset). Step 3e : Generate basic gene model summary statistics and compare with related species. Step 3f : BUSCO , visual inspection in browser in context with evidence. Step 3g : Use mapped reads to estimate: 1. How many apparently transcribed regions don't have annotation?; 2. How many genes or exons are supported by read data? Step 3h : Compare gene content to related species with similar annotation approach. Happy with the metrics assessment of each of the parameters for which the annotation has been evaluated? Remember that some of them may depend on the phyla. This is your DIY annotation v1. Again, this is a stopping place. Do not go forward until this is complete. If you are happy with the metric assessment, you can move to Step 4 . STEP 4 - Finalise your annnotation Step 4a : Create proper file formats (ENA GFF3 format r ecommendations). Consider to change the Identifiers produced by the different gene annotation tools (e.g., gene-1) for a more meaningful Identifier (SpeciesCode+AssemblyVersion+Chr/Scf/Ctg-XXX+G+YYYYYY). Step 4b: Provide this annotation to Ensembl as a second track (via GFF3 submission to ENA) and go back to Step 1f. 1. Before you start 2. Do you want to do your own annotation? 3. Evaluate your Annotation 4. Finalise your Annotation

  • SUPPORT | ERGA

    ERGA Support Request

  • ERGA Community genomes (beta) | ERGA

    Are you embarking on a reference genome project? Do you want to learn about the steps required for success? Then join the growing family of ERGA Community Genomes! ERGA aims to coordinate the production of high-quality annotated genome assemblies that represent eukaryotic biodiversity in Europe. A key part of this is building capacity across European researchers and institutes by supporting the growing community of scientists in biodiversity genomics through the provision of guidelines, workflows, and best practices that explain and greatly facilitate the successful execution of the many steps required along the complex workflow for reference genome generation. The guidelines below cover many of the main steps along the genome generation workflow, providing step-by-step advice and answers to frequently asked questions to help researchers navigate the complexities and find out where to turn for additional assistance: Menu 1. Pre sampling 2. Sample Acquisition Strategy 3. DNA/RNA extraction 4. Libraries preparation 5. DNA sequencing data 6. RNA sequencing data 7. Assembly completed 8. Annotation completed 9. Downstream analysis PLEASE NOTE: This is a work in progress . The initial beta version of these guidelines has been developed with input from the ERGA Committee Coordinators and its continued development and further elaboration are still ongoing and will include all ERGA Ccommittees. Contributors: Tom Brown, Diego de Panis, Joao Pimenta, Christian de Guttry, Ann Mc Cartney, Rita Monteiro, Javier Palma, Luisa Marins, Astrid Böhne, Robert Waterhouse, Camila Mazzoni. 1. Pre sampling Letter of Support Do you wish to indicate in your grant proposal that you are knowledgeable about where and how to find support for the genome generation pipeline? Considering the difficulty of obtaining funding for research in areas where you have no prior experience, your application can be supported by a letter of support from our chairs. If you would like to have this type of assistance, please indicate so on this form. Your grant's genomic section I. Are you in need of assistance in preparing a complete, convincing, and coherent grant application with realistic budget estimates for your project? It can be challenging to prepare your first grant in this area of research, and ERGA provides its hub-of-knowledge to assist you in your first journey into the world of reference genomes. An online meeting can be arranged where you can benefit from the experience of researchers who have already passed through this process several times. II. The grant has already been written but you are unsure of the content of the reference genome generation section? The expertise of the ERGA's Committees can assist you in this endeavour. Upon request, we will conduct a brief review of the genomic section of your grant proposal to help ensure it is of a high standard. Check the current status of the reference genome for the species you wish to sequence Is another research group already producing the reference genome for your species? On one of the following portals, you can check to see if anyone has already produced the reference genome for your species: ENA : The European Nucleotide Archive (ENA) operates as a public archive for nucleotide sequence data. By bringing together databases for raw sequence data, assembly information and functional annotation, the ENA provides a comprehensive and integrated resource for this fundamental source of biological information. ERGA data portal: This portal allows you to see species for which high-quality reference genomes are already being or have already been produced by ERGA-Affiliated projects. GoAT : Genomes on a Tree presents genome-relevant metadata for all Eukaryotic taxa across the tree of life. Metadata in GoaT include, genome assembly attributes, genome sizes, C values, and chromosome numbers from multiple sources. GoaT also collects information from various BioGenome projects about the species they plan to sequence and/or have already started sequencing. Other BioGenome consortia: Darwin Tree of Life (DToL) , Catalan Initiative for the Earth BioGenome Project (CBP) , The Earth BioGenome Project (EBP) , the Vertebrate Genomes Project (VGP) . 1. Pre sampling 2. Sample Acquisition Strategy 2. Sample acquisition strategy You are planning your field work and would like to know where to start? Here are some key considerations to help you get started: a. Permits Sampling collections should comply with the local and EU regulations. Make sure to have all the required permissions authorising collecting, exporting and sequencing species with open access data deposition before the collection of the species. i. ABS - Nagoya : The Nagoya Protocol is an international agreement that was adopted in 2010 as a supplementary agreement to the Convention on Biological Diversity. The Nagoya Protocol sets out rules and procedures regarding the access to genetic resources, and the sharing of the benefits derived from their usage. It also provides guidance on how to ensure the fair and equitable sharing of the benefits arising from the utilisation of traditional knowledge associated with genetic resources. The Protocol has been ratified by over 100 countries, and has been widely praised as an effective tool for promoting the conservation and sustainable use of biological diversity. To proceed, researchers should first verify whether their country has signed the relevant agreement, and subsequently, they should reach out to the designated focal point. Researchers should outline their intentions to conduct genome sequencing and subsequent release, while also requesting the necessary permit. ii. Sample collection permit : It grants permission to the holder to collect wildlife samples, with the understanding that all samples are used for scientific research and educational purposes only. The holder of this permit must abide by all applicable international, national, and local legislation in the collection of wildlife samples. No sample may be collected without prior approval from the relevant authority. Some general rules, with more detailed information presented in the ERGA Sampling Code of Conduct: 1.Ensuring that all samples collected are properly labelled and documented. 2.Providing adequate protection of the samples while in transport. 3.Maintaining records of all samples collected. 4.Obtaining authorization to transport the samples to the appropriate research facility. 5.The holder of this permit is responsible for disposing of all unwanted samples in a manner that is respectful to the environment and in compliance with all applicable laws. 6.The holder of this permit must provide a copy of the collected samples to the appropriate authorities upon request. 7.The undersigned agrees to abide by the conditions of this permit. b. Traditional Knowledge and Biocultural labels It is important to determine whether an indigenous population or a local community is involved in the project or whether the species is of special concern to them. In that case a label should be requested. The Labels allow communities to express local and specific conditions for sharing and engaging in future research and relationships in ways that are consistent with already existing community rules, governance and protocols for using, sharing and circulating knowledge and data. The primary objectives are to enhance and legitimise locally based decision-making and Indigenous governance frameworks for determining ownership, access, and culturally appropriate conditions for sharing historical, contemporary, and future collections of cultural heritage and Indigenous data. For more information check the ‘Local Contexts’ website. c. Sample Collection What sampling procedures do you follow on the field? For the generation of reference genomes, the perfect method is liquid nitrogen. There are many organic materials that can be stored in liquid nitrogen, including cells, tissues samples and entire individuals. As liquid nitrogen rapidly freezes samples, it provides researchers with the capability to store samples for long periods of time and minimises their DNA/RNA degradation. It is fundamental to collect samples and process them following the requirements specified by the sequencing facility. For species that can be maintained alive, they can be transported to a lab for processing and fast freezing the material immediately after dissection to prevent DNA and RNA degradation. The sample should be dissected on top of a plate on ice, to keep the sample cold, and fast frozen in liquid nitrogen. d. Taxonomic Validation Did an expert taxonomist confirm the identity of the collected species? Taxonomic validation is a complex and important process that is necessary for accurately classifying organisms by their physical and genetic traits. Reference genomes have already been created, but eventually the species was not what was targeted. Whenever possible, we recommend to DNA barcode the sample to prevent this from occurring. e. Vouchering A voucher specimen consists of a representative sample of the collected species. A voucher preserves as much as possible of the physical remains of an organism, serving as a verifiable and permanent record of wildlife. The sample is typically collected in the field and preserved in a herbarium or museum collection. Separate specimen voucher and take scaled pictures, following the requirements from the respective collection facility. (link of some facilities as an example). In addition, it should be noted that e-vouchers, which involve digital documentation and images, are also permissible in certain cases. f. Biobanking Biobanking refers to the storage of biological samples for research purposes. Animal/plant tissue biobanking is used to track genetic changes over time, which can help understand the evolution of species. Material for biobank should be deposited in biobank repositories. In addition to tissue biobanking, DNA biobanking is also possible. Ideally tissue and DNA are from the same specimen that will be sequenced, but for very small specimens a different individual can be used. The material should be preferably deposited in a repository in the same country of origin of the material. If national infrastructure is not available – or in addition to this, the LIB Biobank at Museum Koenig, Bonn, can centrally store any ERGA project samples. For contact information and sample requirements, please see LIB Biobank deposition guidelines . g. Storage Samples must be kept as cold as possible to prevent DNA degradation prior to sequencing. If possible, place the sample tubes into dry ice, a charged LN2 Dry Shipper (< -150ºC ) or a -80ºC freezer. Please note that wet ice and -20°C freezers are not appropriate for the storage of tubes containing samples intended for genome sequencing. h. Material Transfer Agreements MTAs are agreements between two parties, typically a provider and a recipient, that govern the transfer of biological samples. MTAs are used to ensure that the provider of the material is adequately compensated for the use of the material, and that the recipient of the material is legally and ethically responsible for its use. Sample providers should be aware of any MTA, for example when sending biological material between their research facility and sequencing centres/biobanks. Please check the requirement with your sequencing centre and biobank. More information can be found in the CETAF Code of Conduct and Best Practices (Example MTA without change in ownership ). i. Shipping All samples must be shipped on dry ice or in a dry shipper. Please make sure that they refill at borders/often. Be careful on the regulation of non-EU countries in Europe. j. ERGA manifest Do you wish to learn what metadata you need to submit with your sample in order to register it with ERGA as an ERGA Community genome? This is the ERGA sample manifest. Fields marked in bold are the mandatory variables. k. ENA mandatory fields: The European Nucleotide Archive (ENA) operates as a public archive for nucleotide sequence data. This is the ENA checklist of minimum requirements to register a physical sample. 3. DNA/RNA extraction Did you acquire the samples and you are ready to extract DNA and RNA? Ideally, high molecular weight (HMW) DNA and RNA should not be shipped, but extracted on site or handled very carefully prior to delivery. DNA a. DNA extraction protocols: DNA extraction is the process of isolating DNA from cells, tissues or other biological samples. b. High Molecular Weight DNA extraction protocols: Please see in the following section Libraries Preparation for a list of recommended protocols for extracting and preparing HMW DNA for sequencing. RNA extraction protocols RNA extraction involves separating ribonucleic acid (RNA) from a cell or a tissue sample. DNA concentration, integrity, and purity i. DNA concentration: is typically measured in nanograms per microliter and can be determined using techniques such as Qubit assays. ii. DNA integrity: is a measure of the quality of the DNA. DNA integrity can be determined using gel electrophoresis or PCR-based methods. It is important to ensure that the DNA is intact and not degraded, as this can affect the accuracy of results. iii. DNA purity: is a measure of the level of impurities in DNA samples. It is important to ensure that DNA is free from contaminants, as this can affect the accuracy of results. DNA purity can be assessed using spectrophotometry based methods as NanoDrop. 3. DNA/RNA extraction 4. Library preparation You’ve extracted your DNA and are wondering how to go about getting the required DNA/RNA to assemble or annotate your genome? DNA library preparation is a key step in the process of sequencing. The library preparation will determine the quality of your assembly and annotation. Ensuring that the DNA is processed properly in order for accurate and reliable results to be obtained. Here you can find our recommended protocols library preparation such as PacBio, Oxford Nanopore Instruments, Chromatin Conformation Capture (HiC) sequencing and whole-transcript sequencing, among others. PacBio HiFi Typically made up of DNA fragments around 10-15kb in size and with an accuracy of over 99%, PacBio HiFi reads are constructed by circularising DNA and creating a Circular Consensus Sequence (CCS) with high accuracy. This protocol has a history of producing high-quality reference de-novo genomes for a wide range of species and genomes. ONT Oxford Nanopore Technologies offers an alternative to read long pieces of DNA via electrical fluctuations caused by the nucleotides passing through a membrane pore. The reads sequenced here can be much longer than with PacBio HiFi (typically over 30kb, but ultra-long libraries are established to sequence reads of over 200kb in length) but come with a higher error rate. As the hardware and base-calling software have improved over time, the error rates have reduced from over 15% to almost 1% in modal error rate. Hi-C Arima or Dovetail genomics 3-dimensional Chromatin Conformation Capture libraries allow us to gain insight into the organisation of the genome into Topologically Associated Domains (TADs), Eu- and hetero-chromatin and chromosomes. In the generation of a reference genome, we leverage the information that regions close together in the linear are more likely to be close together in 3D space to order and orient our smaller assembled sequences (contigs and scaffolds) into chromosomes. HiC protocols generally follow the steps of isolating nuclei, cross-linking chromatin in its 3D conformation, digesting the DNA at either enzyme motif sites (Arima) or DNAse-exposed areas of the genome (Dovetail) and then sequencing the two cross-linked regions via paired-end sequencing on an Illumina device. Illumina shotgun sequencing Useful for error-correction of the final assembly, or identifying sequences from parental lines when performing a trio-binned assembly, Whole Genome Sequencing (WGS or Shotgun Sequencing) aims to sequence the entire genome in short fragments (typically 100/150bp paired-end libraries) with high accuracy (Q30 or 99.9% accuracy). RNA-seq Recommend to help the annotation process of creating your reference genome. Sequencing of RNA-seq libraries is typically performed on an Illumina instrument after RNA has been extracted from your tissues of interest (usually brain or gonad for genome annotation), converted to cDNA and finally amplified before loading onto an instrument. Iso-seq The PacBio Iso-seq protocol offers full-length sequencing of transcripts, which is particularly powerful when annotating alternate isoforms in the genome. The sequencing is performed on a PacBio instrument and again leverages the repeated sequencing of circular cDNA to create a high-accuracy consensus sequence for each transcript. 4. Libraries preparation 5. DNA sequencing data 5. DNA Sequencing Data You’ve finished the DNA sequencing for your genome and want some guidance with your assembly to ensure you meet ERGA quality standards? The Sequencing and Assembly Committee will prepare a number of workflows that you can download and run to assemble your genome. I’m having trouble with my assembly Assembling a partially quintaploid, highly-repetitive, AT-rich genome? The Sequencing & Assembly Committee (SAC) would love to hear about your genome and can advise on what to do next. Contact assembly@erga-biodiversity.eu to arrange a presentation at the fortnightly committee meeting to get some feedback from our members. 6. RNA sequencing data - “An assembly is nothing without an annotation” After you have produced a reference-quality genome assembly, you should think about annotating the key features of your genome. This includes, but is not limited to, finding and recording the locations of: Repeat sequences; Transposable Elements; Telomeres and Centromeres; Protein-coding sequences; Micro-transcript sequences (miRNA); Non-coding sequences (ncRNA). The Annotation committee has prepared a number of workflows that you can download and run to assemble your genome. The Annotation Committee can guide you with some of these steps, or for ERGA Community genomes, we also recommend uploading your genome to ENA, where ENSEMBL can annotate your genome using publicly-available transcript data. 6. RNA sequencing data 7. Assembly completed You have produced a genome assembly and want to associate it with ERGA as a Community genome? Here we detail the next steps required to obtain the ERGA label for your genome and some recommendations for what to do next as part of our best practices: How do I know if my assembly is good enough? First, your assembly should meet the EBP metrics , the Sequencing and Assembly Committee will be able to guide you through the post-assembly QC process. Either submit an EAR or present your genome at a SAC meeting. Open-access genomes for all If you have a high-quality genome and want to associate it with ERGA, it needs to be of EBP quality and in the public domain. We recommend uploading your genome to ENA and then contacting the SAC . Once your genome has the “Seal of Approval”, we will link your publicly available genome to the ERGA Community Genomes BioProject. 7. Assembly completed 8. Annotation completed 8. Annotation completed You have an assembly and annotation that you wish to associate with ERGA as a Community genome? Here we detail the next steps required to obtain the ERGA label for your genome and some recommended next steps: How do I know if my assembly and annotation are good enough? First, your assembly should meet the EBP metrics, the ERGA annotation committee will be able to guide you through the post-assembly QC process. Either submit an EAR or present your genome at a SAC meeting. Your annotation should be in a format that can be downloaded and used by all (e.g. gff3) and linked to your assembly. How do I get the ERGA label? You need to upload your assembly, annotation and all sequenced data to ENA in order to be associated with the ERGA BioProject. Once your genome and data are available, contact the SAC to get the “Seal of Approval” and have your genome linked to the ERGA BioProject. If you wish to make use of the Ensembl rapid annotation, all associated transcript sequencing data also needs to be published on ENA. What next? Now you have a high-quality genome, there is a host of analysis that can be performed including Population Genomics, Phylogenomics, Comparative Genomics & Functional Genomics. The Data Analysis Committee have produced a guide on how to conduct a variety of Downstream Analyses. 9. Downstream analysis You have an ERGA reference genome and you would like to analyse the data? Here we suggest the next steps required to plan your downstream analysis within the highest scientific standards, suggesting recommended frameworks and pipelines to tackle your research questions by applying your reference genome. High-quality reference genomes are an essential tool to detect genic and intergenic regions and identify genetic variants (e.g. SNPs, CNV’s, and structural variants), which are crucial to understand processes in the different fields of genomic research. The Data Analysis committee (DAC) can provide additional help through its subcommittees devoted to the different fields of genomic research: Population Genomics, Phylogenomics, Comparative Genomics & Functional Genomics. You can contact the subcommittee relevant for your research question and meet with several experts in the field. You can also take the opportunity to present your research to the ERGA community and get relevant feedback to develop your research. DAC also offers opportunities for training through its conferences and workshops organised in collaboration with the Training and Knowledge Transfer committee. DAC Subcommittees i. Population Genomics: this subcommittee encloses a group of researchers who specialize in studying the genetic variation and evolutionary processes within populations. This field combines the principles of genetics, genomics, and population biology to understand how genetic diversity arises, spreads, and changes over time. The main objective of this group is to support the investigation of the genetic factors influencing the composition and dynamics of populations and species. Through collaborative efforts and interdisciplinary approaches, the subcommittee intends to contribute to the broader field of genomics and its applications in various areas of biodiversity ii. Phylogenomics: this subcommittee encloses a group of researchers who are focused on studying evolutionary relationships and the diversification of organisms using genomic data. The subcommittee's main purpose is to support the development of research on accurate and robust reconstruction of phylogenetic trees or evolutionary histories using genomic information. Through the collaboration with research teams, the subcommittee intends to provide valuable insights into the tree of life and clarify the evolutionary history of European species. iii. Comparative Genomics: this subcommittee encloses a group of researchers devoted on studying and comparing the genomes of different organisms to gain insights into their evolutionary relationships, genetic variations, and functional elements. Comparative genomics combines genomics, bioinformatics, and evolutionary biology to explore the similarities and differences in the genetic makeup of various species. The subcommittee's main purpose is to support the development of analyses and interpretation of genomic data from multiple organisms, by identifying shared and unique genomic characteristics, to infer evolutionary relationships, gene function, and evolutionary processes. The subcommittee intends to promote the advancement of our understanding of the genomic landscape across species. By comparing and analyzing genomic data, research in this field will offer insights into the evolutionary history and functional elements of genomes, ultimately contributing to various aspects of biological research. iv. Functional Genomics: this subcommittee encloses a group of researchers devoted to understanding the functional elements and activities of genomes, clarifying the functions and interactions of genes, non-coding elements, and regulatory networks, as well as their roles in various biological processes and disease conditions. The main objective of this group is to support research on how genomic information is translated into functional outcomes, exploring the relationships between DNA sequences, gene expression patterns, protein production, and cellular processes. The subcommittee intends to provide insights into the functional aspects of genomes, gene functions, regulatory networks, and their impact on biological processes and disease conditions. 9. Downstream analysis

bottom of page