top of page

Search Results

27 results found with an empty search

  • SSP - Sampling & Sample Processing

    samples@erga-biodiversity.eu < Back SSP - Sampling & Sample Processing samples@erga-biodiversity.eu The Sampling and Sample Processing Committee (SSP) gathers and gives information on all steps from choosing which species to generate sequences for, to sending correctly prepared samples to sequencing centers. This also includes vouchering, biobanking and standardized metadata collection, as well as exchange on best practices for sampling and lab protocols for the whole variety of organisms that ERGA aims to include. This committee is thus often the first port of call for new sequencing projects within ERGA, and closely collaborates with scientific collections and sequencing centers, as well as several other ERGA committees, on issues such as collection permits and equal opportunities for projects from all over Europe. To accomplish these diverse tasks, we depend upon input from taxonomic experts on the whole eukaryotic tree of life. ( V.1.0 02.05.2023) Co-Chairs Katja Reichel Jaakko Pohjoismäki Coordinator Rita Monteiro Steering Committee Astrid Böhne Jennifer Leonard Olga Vinnere Pettersson Torsten Hugo Struck Committee Resources 💡 ERGA Knowledge Hub ▶️ ERGA SSP Youtube Playlist Böhne, A., Fernández, R., Leonard, J.A. et al. Contextualising samples: supporting reference genomes of European biodiversity through sample and associated metadata collection. npj biodivers 3 , 26 (2024). https://doi.org/10.1038/s44185-024-00053-7 Welcome to the new members of the ERGA Executive Board! Press Releases Connections #9: How Biodiversity Genomics drives conservation impact ERGA News #32 - November 2025

  • ITIC - IT & Infrastructure Committee

    itinfra@erga-biodiversity.eu < Back ITIC - IT & Infrastructure Committee itinfra@erga-biodiversity.eu In the IT and Infrastructure committee, we aim to facilitate robust and reproducible science in all steps of creating an ERGA genome. The IT Committee oversees the use of various platforms to keep all other committees up to date, secure and compliant with EU laws and regulations. The IT Committee is central to ensuring that data and metadata are accessible to the public and in line with our Open Access Policies. Our main tasks revolve around providing material for the ERGA community to facilitate the management and publication of (meta)data consistent with the goals of ERGA and the wider Earth BioGenome Project community. Please send us an email if you would like to be involved in our work. (V.2.0 15.05.2025) Coordinators Tom Brown Christian de Guttry Resources 🔗 How-to-guide: Submitting data to ENA Gustafsson, O.J.R., Wilkinson, S.R., Bacall, F., Pireddu, L., Soiland-Reyes, S., Leo, S., Owen, S., Juty, N., Fernández, J.M., Grüning, B. and Brown, T., 2024. WorkflowHub: a registry for computational workflows. arXiv preprint arXiv:2410.06941 . https://doi.org/10.48550/arXiv.2410.06941 Welcome to the new members of the ERGA Executive Board! Press Releases Connections #9: How Biodiversity Genomics drives conservation impact ERGA News #32 - November 2025

  • DAC - Data Analysis Committee

    analysis@erga-biodiversity.eu < Back DAC - Data Analysis Committee analysis@erga-biodiversity.eu The Data Analysis Committee (DAC) aims at fostering collaboration and knowledge sharing in genomic data analysis among ERGA members, and enhancing the development of applications in genomics. DAC has the goal to develop and implement standard protocols for downstream data analysis, providing high standard frameworks and pipelines to tackle research questions with different groups of organisms. Additionally, together with the Training and Transfer of Knowledge committee (TKT), DAC is responsible for providing training opportunities to the ERGA and general scientific communities through the organization of workshops and conferences. Finally, DAC aspires to improve translational communication with stakeholders and citizen scientists by actively engaging with the Citizen Science committee (CS committee) activities, to influence species management and protect earth’s biodiversity (V.1.0 02.05.2023) Chair Tereza Manousaki Coordinator João Pimenta Leader of the Population Genomics Subcommittee Mari Jose Ruiz Leader of the Phylogenomics Subcommittee Pascalia Kapli Iker Irisarri Leader of the Comparative Genomics Subcommittee Toni Gabaldón Leader of the Functional Genomics Subcommittee Steven Van Belleghem Steering Committee José Melo-Ferreira Elena Buzan (CS committee representative) Alice Mouton (TKT committee representative) Joan Pons Committee Resources 💡ERGA Knowledge Hub ERGA BioGenome Analysis and Applications seminars DAC Playlist: Welcome to the new members of the ERGA Executive Board! Press Releases Connections #9: How Biodiversity Genomics drives conservation impact ERGA News #32 - November 2025

  • SAC - Sequencing and Assembly Committee

    assembly@erga-biodiversity.eu < Back SAC - Sequencing and Assembly Committee assembly@erga-biodiversity.eu The Sequencing and Assembly Committee (SAC) fosters collaboration across the ERGA community by coordinating efforts and providing a platform to exchange ideas on genome sequencing and assembly methods. This inclusive framework helps bring together new and affiliated projects under the ERGA umbrella. Working closely with other committees, genome projects, and consortia, SAC promotes and ensures visibility of up-to-date workflows and standardised pipelines, supporting their alignment with Earth Biogenomes Project (EBP) mission and quality requirements while contributing to their development and adaptation. The committee focuses on end-to-end laboratory and analytical practices that deliver high-quality data for ERGA assemblies, including DNA extraction, library preparation and sequencing across platforms, as well as bioinformatic procedures for integrating data to generate high-quality genomes. The committee encourages the sharing of standardised SOPs, guidance, and troubleshooting advice. In addition, SAC develops and maintains a framework for assembly evaluation to ensure quality standards are met and complex cases are addressed through community feedback, thereby supporting continuous improvement and knowledge sharing. ( V.2.0 01.12.2025) Chair Tyler Alioto Coordinator Diego de Panis Steering Committee Camila Mazzoni Henrik Lantz Jean-Marc Aury Kerstin Howe Nadège Guiglielmoni Looking for assistance and guidance with how to assemble a genome? The Sequencing and Assembly Committee can help! Join our Slack Channel! Here you can post your questions and start conversations with the Sequencing and Assembly community from the ERGA consortium. Use our resources! Here we have a collection of Genome Assembly Workshops collected and curated by the members of the ERGA SAC. Join our mailing list! Send an email to assembly@erga-biodiversity.eu to join the ERGA Sequencing and Assembly mailing list and get regular updates about the activities of the SAC. Present at our meetings! Send and email to assembly@erga-biodiversity.eu to request a slot to present at a SAC meeting if you would like feedback on your project. We can advise on steps to improve an assembly or potential pipelines that you may find useful. Resources 💡 ERGA Knowledge Hub ▶️ ERGA SAC Youtube Playlist 🔗 Galaxy workflow for de-novo genome assembly using PacBio HiFi and HiC data 🔗 Galaxy workflow for de-novo genome assembly using ONT, Illumina WGS and HiC data Welcome to the new members of the ERGA Executive Board! Press Releases Connections #9: How Biodiversity Genomics drives conservation impact ERGA News #32 - November 2025

  • CS - Citizen Science

    citizenscience@erga-biodiversity.eu < Back CS - Citizen Science citizenscience@erga-biodiversity.eu The Citizen Science and Outreach Committee aims to facilitate collaboration and communication between scientists, stakeholders and citizens to increase trust in the scientific process and ensure that genomics research reflects the needs and perspectives of the broader community. By engaging a broad range of stakeholders, including policy makers, non-governmental organisations, industry representatives and citizens, the Committee seeks to foster a fruitful multi-stakeholder dialogue and support the traditional community knowledge through the use of citizen science and stakeholder perspectives. To achieve its goals, the Committee works to engage citizens and stakeholders in the research process and to promote public understanding of genomics through various public events and other activities. In addition, the Committee supports policies that support genomics research and its translation into practical applications for the benefit of society. (V.1.0 02.05.2023) Chair Elena Buzan Coordinators Christian de Guttry Luísa Marins Steering Committee Jacob Höglund Lino Ometto Svein-Ole Mikalsen Chiara Bortoluzzi Welcome to the new members of the ERGA Executive Board! Press Releases Connections #9: How Biodiversity Genomics drives conservation impact ERGA News #32 - November 2025

  • Media & Communications

    media@erga-biodiversity.eu < Back Media & Communications media@erga-biodiversity.eu The Media & Communications Committee is responsible for communicating ERGA's goals, actions, and accomplishments internally and externally. Our committee produces newsletters, press releases, blog-posts, manages the website, and maintains social media accounts. We are responsible for developing communication strategies, implementing plans for publicising ERGA events and activities, and ensuring that all relevant information is disseminated in a timely and accurate manner. It is our responsibility to raise awareness about ERGA both inside and outside the scientific community, in order to encourage more people to support and join our community and contribute to our mission. Coordinators Christian de Guttry Luísa Marins Steering Committee Alice Mouton Jan Zwilling Follow #ERGA ! Stay connected! Follow us on social media for updates and insights. 🌍 https://linktr.ee/erga_biodiversity #Genomes for #Biodiversity Welcome to the new members of the ERGA Executive Board! Press Releases Connections #9: How Biodiversity Genomics drives conservation impact ERGA News #32 - November 2025

  • Annotation Committee

    annotation@erga-biodiversity.eu < Back Annotation Committee annotation@erga-biodiversity.eu Annotations transform genomes into larger sources of knowledge and offer critical added value to genome assemblies. They serve as a direct link between the genome sequence and function and facilitate comparisons across taxa, both on a large and small scale. The ERGA Annotation Committee is composed of researchers and bioinformaticians who have experience in using computational methods to predict the structural and functional composition of whole genomes. We strive to understand and use the most advanced analytical methods for annotating genomes, to develop new methods for annotation and its evaluation, and to align with the standards set by the EBP. We are committed to promoting the best possible annotation pipelines are available to and used by the annotation community and new researchers entering this field. The committee meets regularly to review the progress of the annotation in ERGA-genomes, discuss current issues and challenges, and propose changes to improve the annotation process. (V.1.0 16.05.2023) Chair Alice Dennis Coordinator Christian de Guttry Steering Committee Aureliano Bombarely Hugues Roest Crollius Henrik Lantz Fergal Martin Florian Maumus Committee Resources Structural Annotation Guide Pre-trained AUGUSTUS models 💡ERGA Knowledge Hub Pipelines BRAKER3 Protein-Coding Annotation Pipeline Welcome to the new members of the ERGA Executive Board! Press Releases Connections #9: How Biodiversity Genomics drives conservation impact ERGA News #32 - November 2025

  • FAQs | erga

    Frequently Asked Questions about ERGA Frequently Asked Questions (FAQ) What is ERGA? The European Reference Genome Atlas is a community of peers working to advance the generation of reference genomes for European Biodiversity. ERGA members share a passion for biodiversity and see reference genomes as key resources that can boost our understanding of biodiversity and inform conservation strategies. Our community is made up of researchers with very diverse expertise and backgrounds working in the European continent or interested in European biodiversity. ERGA also represents the European node of the global Earth BioGenome Project , which has the goal of coordinating the generation of reference genomes for all of Earth’s Biodiversity. What are ERGA’s main goals? ERGA’s Core Objectives are to: Create and consolidate a collaborative and interdisciplinary network of scientists across Europe and associated countries to deliver reference genome sequences; Connect relevant infrastructures across Europe following a distributed model for genome sequence generation and analysis that can increase dynamically; Develop guidelines and best practices for state-of-the-art reference genome sequence generation, and disseminate them through training and knowledge transfer; Connect BioGenome initiatives working on European species to each other and with ERGA’s own initiatives to maximise synergies. How can I get involved and contribute to ERGA? Firstly, please register as an ERGA member . Membership is free and will ensure you receive our monthly newsletter and information about upcoming events and meetings. Once you become a member, you will have easy access to ERGA meetings . Our monthly plenary meetings are a good starting point to get to know the community. If you are interested or need support with a specific step of the genome generation process, you might want to interact with or even join one of the open ERGA committees . Each committee has their own way of operating and a monthly meeting slot. If you want to participate in any of the committees just send an email to the committee’s address to be added to their communication channels and learn the best opportunities to contribute. If you have an ongoing genome project of any European eukaryotic species, you can associate it with ERGA as an ERGA Community Genome. Check this page for more information on this procedure. What are the benefits of joining ERGA? If you are a researcher working on biodiversity genomics, joining and following ERGA’s activities can bring many advantages, including: Taking an active role in the generation of high-quality reference genomes for biodiversity conservation; Networking - through our network you will be able to interact and collaborate with colleagues from all across Europe working on topics related to your research; Get support from the ERGA Committees - as a member, you have direct access to groups of specialists in all steps of the genome production workflow; Go beyond science - Besides producing reference genomes and connecting researchers, ERGA is also committed to reaching out beyond academia to disseminate the importance of biodiversity and the role of genomics; From theory to practice - Lead the application of genomics technologies to biodiversity research and conservation directly in the field. What is the policy of ERGA on data? Check our Open Data Policy . This covers key requirements and recommendations regarding the collection, processing, storage, and publishing of metadata and data related to the production of high-quality reference genomes. If you have questions or concerns about our data policy, please reach out to the IT & Infrastructure committee at itinfra@erga-biodiversity.eu . How can I connect with other members of ERGA in my country? To interact with the ERGA Community in your country, please contact your country’s Council representative through the email available here and ask about any local initiatives already in place and how to engage. If your country is not yet represented in the ERGA council, we are happy to welcome new countries and hope to have representation from all European countries! Please refer to the Governance Document for more details on how to join the ERGA Council as a representative of your country. How can I get in touch if I have other questions? You can reach out to ERGA through many channels. Here are some ways to get in touch with us: Email us at contact@erga-biodiversity.eu You can join the ERGA Keybase team and ask your question in one of the many channels (instructions for this are provided when you sign up to become a member) Social Media: You can also follow us on X @erga_biodiv (previously Twitter), ERGA LinkedIn and Mastodon .

  • Library | ERGA

    ERGA Library Filter by Category Select Category Publication Biodiversity Genomics Research Practices Require Harmonising to Meet Stakeholder Needs in Conservation Year: 2025 DOI/URL: https://onlinelibrary.wiley.com/doi/10.1111/mec.70001 Next Publication A chromosome-level genome assembly of the European green toad (Bufotes viridis) Year: 2025 DOI/URL: https://doi.org/10.1093/g3journal/jkaf002 Next Publication Chromosome-level reference genome assembly for the mountain hare (Lepus timidus) Year: 2025 DOI/URL: https://peercommunityjournal.org/articles/10.24072/pcjournal.514/ Next Publication The genome sequence of the Violet Copper, Lycaena helle (Denis & Schiffermüller, 1775) Year: 2025 DOI/URL: https://doi.org/10.12688/f1000research.156485.1 Next Publication Nuclear and mitochondrial genome assemblies for the endangered wood-decaying fungus Somion occarium Year: 2025 DOI/URL: https://doi.org/10.1093/gbe/evaf003 Next Publication Chromosome-scale genome assembly and de novo annotation of Alopecurus aequalis Year: 2024 DOI/URL: https://doi.org/10.1038/s41597-024-04222-y Next Publication A Faroese perspective on decoding life for sustainable use of nature and protection of biodiversity Year: 2024 DOI/URL: https://doi.org/10.1038/s44185-024-00068-0 Next Publication The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics Year: 2024 DOI/URL: https://doi.org/10.1038/s44185-024-00054-6 Next Publication Building a Portuguese coalition for biodiversity genomics Year: 2024 DOI/URL: https://doi.org/10.1038/s44185-024-00061-7 Next Publication Contextualising samples: supporting reference genomes of European biodiversity through sample and associated metadata collection Year: 2024 DOI/URL: https://doi.org/10.1038/s44185-024-00053-7 Next Publication First Chromosome-Level Genome Assembly of a Ribbon Worm from the Hoplonemertea Clade, Emplectonema gracile, and Its Structural Annotation Year: 2024 DOI/URL: https://doi.org/10.1093/gbe/evae127 (Funded by the Research Council of Norway project “InvertOmics—phylogeny and evolution of lophotrochozoan invertebrates based on genomic data” (project number: 300587 to T.H.S.) Next Publication The genome sequence of the Violet Carpenter Bee, Xylocopa violacea (Linnaeus, 1785): a hymenopteran species undergoing range expansion Year: 2024 DOI/URL: https://doi.org/10.1038/s41437-024-00720-2 Next 1 2 3 1 ... 1 2 3 ... 3

  • Glossary | ERGA

    Glossary This page provides explanations about terms and acronyms often used within ERGA and in the context of Biodiversity Genomics. You can filter the terms alphabetically or according to categories: Annotation Citizen Science Data Analysis ELSI IT & Infrastructure Media & Communications Other Sampling & Sample Processing Sequencing & Assembly A B C D E F G H I J K L M O P R S T V W References > (Genome) annotation The process of identifying the functions of different pieces of a genome. This includes genes that code for proteins and non coding features (e.g. intron-exon structure of protein coding genes, promotors, transposable elements). Typically performed using computational methods, followed by manual curation. (Genome) assembly A genome assembly is a representation of an organism’s genome that is made using computer programs to turn (assemble) raw sequence data into longer, continuous sequences. (Genome) completeness An estimate of how well a reference genome represents the complete sequence of the target organism. A complete genome should equal the haploid genome size of the target, but may be defined when ‘all chromosomes are gapless and have no runs of 10 or more ambiguous bases, there are no unplaced or unlocalized scaffolds, and all expected chromosomes are present.’ (https://www.ncbi.nlm.nih.gov/assembly/). There are different approaches to estimate the completeness, like BUSCO, analysing K-mers, etc. ABS Access & Benefit Sharing BGE Biodiversity Genomics Europe. The BGE Project has received funding through a Horizon Europe call on Biodiversity and Ecosystem Services. The overarching BGE project includes two streams of genomic research: reference genomes and barcoding, in an effort to establish ERGA and BIOSCAN as the European nodes of the Earth Biogenome Project and of the International Barcode of Life (IBOL), respectively. BUSCO A bioinformatic method (Benchmarking Universal Single-Copy Orthologues) used to estimate the completeness of the coding fraction of an organism’s genome based on the proportion of (lineage specific) single copy orthologous genes that are found in a genome assembly. Biodiversity genomics The application of genomic methods to research biodiversity. CARE Principles The CARE principles for Indigenous data governance (https://www.gida-global.org/care) provide a governance framework that supports the recognition of rights and interests Indigenous Peoples’ to their physical and digital data as well as their Indigenous Knowledges. CBD Convention on Biological Diversity COPO The Collaborative OPen Omics (COPO) platform is for researchers to publish their research assets, providing metadata annotation and deposition capability. It allows researchers to describe their datasets according to community standards and broker the submission of such data to appropriate repositories whilst tracking the resulting accessions/identifiers. Learn more about COPO in this article by the Earlham Institute. CS Citizen Science Committee Chromosome-level assembly the process of generating a contiguous sequence of all chromosomes of a genome, often aided by genetic maps or proximity ligation techniques (3C-seq, Hi-C); term also used to refer to the resulting genome sequence. Council meetings During the monthly ERGA council meetings, the representatives of countries and other genome projects associated with ERGA meet to discuss and vote on important matters related to ERGA’s governance and actions. The council is the main decision making body of the consortium. Learn more about ERGA's structure in our Governance Document. DAC Data Analysis Committee DSI Digital Sequence Information - learn more: https://www.cbd.int/dsi-gr/ DToL The Darwin Tree of Life Project aims to sequence the genomes of 70,000 species of eukaryotic organisms in Britain and Ireland. EBP The Earth BioGenome Project EBP Genome assembly quality standard 6.C.Q40 Minimum reference standard of 6.C.Q40, i.e. megabase N50 contig continuity and chromosomal scale N50 scaffolding, with less than 1/10,000 error rate. For species with chromosome N50 smaller than a megabase this will be C.C.Q40. Additional recommendations include K-mer completeness >90%, BUSCO complete single-copy single >90%, BUSCO complete single duplicate < 5%, and Gaps/Gbp <1000. EC European Commission ELSI Ethical, Legal, and Social Issues (Committee) ENA The European Nucleotide Archive (https://www.ebi.ac.uk/ena) is a global repository for sequence data and provides resources that support management and access to sequence data. ERGA European Reference Genome Atlas ERGA Plenary Our plenary meetings are open to all registered ERGA members and generally include short updates given by committee chairs and one invited talk on various themes connected to biodiversity genomics (watch the previous ones here). ERGANews ERGA’s monthly newsletter, includes important updates about the consortium, each of the committees and associated projects. Our newsletters are usually published on the first Tuesday of each month. All editions of the newsletter are stored here. Equity Deserving According to the Canadian Council (https://canadacouncil.ca/glossary/equity-seeking-groups) equity deserving groups are those individual researchers, communities, Peoples, regions or countries that have identified barriers to equal access, opportunities, and resources due to disadvantage and/or discrimination and that are actively seeking, and deserving of social justice and reparation. The discrimination experienced could be caused by attitudinal, historic, social, and environmental barriers that could be based on a plethora of characteristics that are including (but not limited to) sex, age, ethnicity, disability, economic status, gender, gender expression, nationality, race, sexual orientation, and creed. FAIR Principles A set of principles to guide appropriate management and curation of scientific data (https://www.go-fair.org/fair-principles/) that emphasise data accessibility and use by ensuring that data are Findable, Accessible, Interoperable, and Reusable. Due to the increasing amount of scientific data being reposited, FAIR guidelines promote a data format that is amenable to automated computational access of data by stakeholders Galaxy Galaxy is an open source, web-based platform for data intensive biomedical research. Genome Report A genome report is a technical publication that describes all the steps taken to produce a reference genome: sampling, sequencing, assembling, annotating. They often have a standardised format and structure that allows readers to quickly and easily understand the quality of the genome and how it was generated. GoaT Genomes On A Tree HE Horizon Europe , sometimes refers to the BGE project funded under HE HSM Hierarchical Storage Management is both a data management and data storage technique which transparently manages the movement of data between the different layers of a tiered storage based on file size thresholds, usage and I/O pressure. Usually, a tiered storage is composed of one or more layers of disk arrays, ordered by capacity, latency, redundancy and storage cost. A slow but economically effective archival layer is at the bottom, composed of magnetic tape libraries and automated tape robots, with the highest capacity and latency. The movement between layers is automatically triggered. Haplotype A haplotype refers to the collection of genetic material within an organism that is inherited together. Haplotype may be used to describe a few loci or any number of chromosomes (a chromosome-scale haplotype). Hi-C Sequencing-based method used to study three-dimensional interactions among chromatin regions by measuring the frequency of contact between pairs of loci. Since contact frequency is related to the distance between a pair of loci, Hi-C linking information is used to help with scaffolding stages during a genome assembly process. Hi-C map / graph production The occurrence and frequency of Hi-C contacts are analysed and used in assembly scaffolding. They are typically visualised in Hi-C 2D heatmaps with the full genome sequence on the X and Y axis and a markup for each observed contact. HiFi reads HiFi (High Fidelity) PacBio reads are produced by taking multiple sequences of the same molecule to provide a consensus sequence that is usually 12-20kbp long and has a low error rate (>99.9 % consensus accuracy). INSDC International Nucleotide Sequence Database Collaboration (https://www.insdc.org/) is an initiative between the DDBJ, EMBL-EBI and NCBI that together act as a global repository of sequence data and associated metadata, and provide tools and services that allow access to genomic resources. ITIC IT & Infrastructure Committee IsoSeq This is a sequencing protocol developed by PacBio that aims to sequence full-length transcripts using the accurate, long read capabilities of PacBio HiFi technology. IsoSeq data facilitate analysis of transcriptomes and genome annotation by identifying full-length isoforms of transcripts. JEDI / DEIJ Justice, Equity, Diversity, and Inclusion Subcommittee K-mer A K-mer is a DNA sequence of length k; for example, the sequence AGCT contains the 3-mers (K-mers of length 3) AGC and GCT. Library DNA, cDNA, or RNA that has been prepared for NGS within (usually) a specific size range and containing adapters, which are designed to be appropriate for (a) specific sequencing platform(s). M&C Media & Communications Committee Metadata A collection of data that provides contextual information about multiple characteristics of other, corresponding original data. ONT Oxford Nanopore Technologies (ONT; https://nanoporetech.com/) is a next generation sequencing technology whereby sequence data are generated from the changes in current that occur as single-stranded DNA or RNA molecules pass through nanoscale protein pores (nanopores). ONT provides long read data (up to several megabases) that facilitate genome assembly. Omni-C Modified version of Hi-C that uses a sequence-independent endonuclease during its protocol to produce more even sequence coverage increasing overall resolution. Open data Open data are freely accessible and unrestricted data that can be accessed, used,reused and shared with third parties for any purpose. PUID A permanent unique identifier is a unique label for an object that does not change, such as the Digital Object Identifier (DOI) attached with a scientific publication. PacBio Pacific Biosciences (PacBio; https://www.pacb.com/) is a single-molecule, real time (SMRT) next generation sequencing technology in which sequence data are generated by fluorescent light emission that occurs when a DNA polymerase adds nucleotides. PacBio produces long read data (tens of kilobases) that facilitate genome assembly. RNA-Seq RNA-Seq is a technique that determines the complete or partial RNA sequence using NGS. The RNA expression profiles vary in different tissues of the same organism and can be influenced by physiopathological circumstances. RNA-Seq data facilitate genome assembly by providing empirical evidence for annotation of transcribed regions. Reference genome An accepted standard representation of an organism’s DNA sequence. High-quality reference genomes typically have high completeness (chromosome-level with few gaps in sequence), few errors, and are annotated and accessible. A reference genome serves as a tool for alignment-based analyses, such as variant calling or RNAseq, and has many other applications, for example, phylogenetics and evolutionary relationships, identification of genes and variants, functional analysis and comparative genomics. Reference genomes referred to as “drafts” are those that are under active construction and refinement, and not yet finalised through manual curation. SAC Sequencing and Assembly Committee SOP A standard operating procedure (SOP) is a document that provides detailed instructions on how to perform an activity, outlining the step-by-step process required for its execution. SRA Sequence Read Archive SSP Sampling & Sample Processing (Committee) TKT Training & Knowledge Transfer Committee References The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics (Glossary)- bioRxiv 2023.09.25.559365; doi: https://doi.org/10.1101/2023.09.25.559365 How genomics can help biodiversity conservation; doi: https://doi.org/10.1016/j.tig.2023.01.005 Refererences

  • Annotation_guide | ERGA

    Structural annotation - So you want to annotate protein-coding genes in your genome? Version 1.0 - August 2023 Top of Page 1. Before you start 2. Do you want to do your own annotation? 3. Evaluate your Annotation 4. Finalise your Annotation Authors : Alice Dennis, Jèssica Gómez, Leanne Haggerty, Lucile Soler, Aureliano Bombarley, Henrik Lantz, Florian Maumus, Hugues Roest Crollius, Fergal Martin, Jean-Marc Aury, Christian deGuttry, Robert Waterhouse, and the ERGA Annotation committee . STEP 1 - Before you start Step 1a: Be sur e the a ssembly is done and you are working with a frozen/stable version! Tabl e 1: Genome assembly evaluation before annotation. Rationale: Low consensus accuracy, incomplete genomes, and contamin ations lead to po or annotation. It is thus essential to evaluate your genome before you start the annotation process. 1. Consensus Accuracy and assembly completeness evaluated (suggestion: Merqury) 2.Gene space complet eness evaluated by: a. Conserved gene space (suggestion: BUSCO ) b. RNA-Seq mappig (suggestion: STA R /Minimap2 ) 3. Organelle/Contamination screening and removal: a. Organelle Genomes( suggestion: Minimap2 ) b. Contaminations (suggestion: BlobTools ) 4. Unc ollapsed duplication for the consensus haploid assembly (suggestion: purge_dups ) 5. Full THE completeness evaluated with the LAI (suggestion: LTR_Retriever ) 6. Does Your genome meet E BP standards ? Step 1b: Is your assembly done? If yes go to Step 1C , if no go to Table 1 . Step 1c: Is the assembly publicly available? Public release is necessary for annotation by Ensembl. If yes go to Step 1d, if no, go to Step 2. Step 1d: Is the public assembly linked to ERGA? If yes, go to Step 1f. If no, go to Step 1e Step 1f: This will make the assembly available for annotation at Ensembl rapid provided that relevant transcriptomic data are also publicly available (ENA). Step 1e : Instructions on how to link your project to ERGA STEP 2 - Do you want to do your own annotation? S tep 2a: Do you want to do your own annotation? If yes, go to Step 2b Step 2b: Gather all available Evidence data: Transcriptomic and protein datasets to support the annotation process. Table 2: Evaluation of your evidence data: the accuracy of the genome annotation process is very sensitive to the amount and quality of your evidence data. 1. RNA-Seq transcriptomic data a. Mapping evaluation (suggestion STAR ). b. Transcript models (suggestion: StringTie ). c. Gene space completeness (suggestion: BUSCO ). 2. Protein d ataset a. Gene space compl eteness (suggestion: BUSCO ). b. Percentage of full protein alignments (suggestion: Spaln ). 3. I soSeq transc riptomic data a. Mapping evaluation (suggestion: Minimap2 ). b. Transcript models (suggestion : StringTie ). c.Gene space completeness (suggestion: BUSCO ). Done? If yes, go to Step 2c . Step 2c : Repeat prediction. ERGA recommends: Repeat Modeler2 , Repeat Masker , Protein Excluder, TEclass , PASTEC , TEdenovo . Done? If yes, go to Step 2d . Step 2d : Ab initio training and prediction. ERGA recommends: AUGUSTUS and Gene Mark-ET/EP/ETP. Step 2e : Gene modelling. ERGA recommends: TSEBRA (BRAKER based predictions), Evidence Modeler , and MAKER . Done? For an evaluation of your evidence data go back to Table 2 . Once done, you are ready for the final quality and contamination check and you can go to Step 3a . STEP 3 - Evaluate your annotation Step 3a : Evaluate your annotation. There is no temporal order for the following suggestions: Step 3b : MAKER eAED scores. Step 3c : Gene family analysis. Step 3d : Genome visualization: 1. IGV ; 2. A pollo (manual curation); 3. EasyGB (JBrowser for a simple dataset). Step 3e : Generate basic gene model summary statistics and compare with related species. Step 3f : BUSCO , visual inspection in browser in context with evidence. Step 3g : Use mapped reads to estimate: 1. How many apparently transcribed regions don't have annotation?; 2. How many genes or exons are supported by read data? Step 3h : Compare gene content to related species with similar annotation approach. Happy with the metrics assessment of each of the parameters for which the annotation has been evaluated? Remember that some of them may depend on the phyla. This is your DIY annotation v1. Again, this is a stopping place. Do not go forward until this is complete. If you are happy with the metric assessment, you can move to Step 4 . STEP 4 - Finalise your annnotation Step 4a : Create proper file formats (ENA GFF3 format r ecommendations). Consider to change the Identifiers produced by the different gene annotation tools (e.g., gene-1) for a more meaningful Identifier (SpeciesCode+AssemblyVersion+Chr/Scf/Ctg-XXX+G+YYYYYY). Step 4b: Provide this annotation to Ensembl as a second track (via GFF3 submission to ENA) and go back to Step 1f. 1. Before you start 2. Do you want to do your own annotation? 3. Evaluate your Annotation 4. Finalise your Annotation

  • SUPPORT | ERGA

    ERGA Support Request

bottom of page