
Would you like to start your first project that includes the generation of a high-quality reference genome?
Generating high quality reference genomes is a complex process that requires not only advanced computing and bioinformatic skills. It requires knowledge that spans from how to collect your sample to the laws that you have to comply with.
Below you will find step-by-step guidelines on how associate your independent genome-sequencing project to ERGA.

Letter of Support
Do you wish to indicate in your grant proposal that you are knowledgeable about where and how to find support for the genome generation pipeline? Considering the difficulty of obtaining funding for research in areas where you have no prior experience, your application can be supported by a letter of support from our chairs. If you would like to have this type of assistance, please indicate so on this form.
Your grant's genomic section
I. Are you in need of assistance in preparing a complete, convincing, and coherent grant application with realistic budget estimates for your project?
It can be challenging to prepare your first grant in this area of research, and ERGA provides its hub-of-knowledge to assist you in your first journey into the world of reference genomes. An online meeting can be arranged where you can benefit from the experience of researchers who have already passed through this process several times. If you would like to have this type of assistance, please indicate so on this form.
II. The grant has already been written but you are unsure of the content of the reference genome generation section?
The expertise of the ERGA's members can assist you in this endeavor. Upon request, we will conduct a brief review of the genomic section of your grant proposal in order to determine its feasibility.
Check the current status of the reference genome for the species you wish to sequence
Is another research group already producing the reference genome for your species? On one of the following portals, you can check to see if anyone has already produced the reference genome for your species:
-
ENA: The European Nucleotide Archive (ENA) operates as a public archive for nucleotide sequence data. By bringing together databases for raw sequence data, assembly information and functional annotation, the ENA provides a comprehensive and integrated resource for this fundamental source of biological information.
-
ERGA data portal: This portal allows you to see species for which high-quality reference genomes have already been produced by ERGA-affiliated research groups.
-
GoAT: Genomes on a Tree is built using GenomeHubs 2, to present genome-relevant metadata for all Eukaryotic taxa across the tree of life. Metadata in GoaT include, genome assembly attributes, genome sizes, C values, and chromosome numbers from multiple sources.
-
Other genomic consortia: DtoL, CBP, AtlaSea, EBP, VGP.
2. Sample acquisition strategy
You are planning your field work and would like to know where to start? Here is a step-by-step guide:
Y
a. Permits
Sampling collections should comply with the local and EU regulations. Make sure to have all the required permissions authorising collecting, exporting and sequencing species with open access data deposition before the collection of the species.
i. ABS - Nagoya: The Nagoya Protocol is an international agreement that was adopted in 2010 as a supplementary agreement to the Convention on Biological Diversity. The Nagoya Protocol sets out rules and procedures regarding the access to genetic resources, and the sharing of the benefits derived from their utilization. It also provides guidance on how to ensure the fair and equitable sharing of the benefits arising from the utilization of traditional knowledge associated with genetic resources. The Protocol has been ratified by over 100 countries, and has been widely praised as an effective tool for promoting the conservation and sustainable use of biological diversity.
ii. Sample collection permit: It grants permission to the holder to collect wildlife samples, with the understanding that all samples are used for scientific research and educational purposes only. The holder of this permit must abide by all applicable federal, state, and local laws in the collection of wildlife samples. No sample may be collected without prior approval from the relevant authority. Some general rules:
-
Ensuring that all samples collected are properly labeled and documented.
-
Providing adequate protection of the samples while in transport.
-
Maintaining records of all samples collected.
-
Obtaining authorization to transport the samples to the appropriate research facility.
-
Returning all unneeded samples to the original site.
-
The holder of this permit is responsible for disposing of all unwanted samples in a manner that is respectful to the environment and in compliance with all applicable laws.
-
The holder of this permit must provide a copy of the collected samples to the appropriate authorities upon request.
-
This permit is valid for one year from the date of issuance.
-
The undersigned agrees to abide by the conditions of this permit.
b. Traditional Knowledge and Biocultural labels
It is important to determine whether an indigenous population or a local community is involved in the project or whether the species is of special concern to them. In that case a label should be requested. The Labels allow communities to express local and specific conditions for sharing and engaging in future research and relationships in ways that are consistent with already existing community rules, governance and protocols for using, sharing and circulating knowledge and data. The primary objectives are to enhance and legitimize locally based decision-making and Indigenous governance frameworks for determining ownership, access, and culturally appropriate conditions for sharing historical, contemporary, and future collections of cultural heritage and Indigenous data. For more information check the ‘Local Contexts’ website.
c. Sample Collection
What sampling procedures do you follow on the field? For the generation of reference genomes, the perfect method is liquid nitrogen. There are many organic materials that can be stored in liquid nitrogen, including cells, tissues samples and entire individuals. As liquid nitrogen rapidly freezes samples, it provides researchers with the capability to store samples for long periods of time and minimizes their DNA/RNA degradation. It is fundamental to collect samples and process them following the requirements specified by the sequencing facility. For species that can be maintained alive, they can be transported to a lab for processing and fast freezing the material immediately after dissection to prevent DNA and RNA degradation. The sample should be dissected on top of a plate on ice, to keep the sample cold, and fast frozen in liquid nitrogen.
d. Taxonomic Validation
Did an expert taxonomist confirm the identity of the collected species? Taxonomic validation is a complex and important process that is necessary for accurately classifying organisms by their physical and genetic traits. Reference genomes have already been created, but eventually the species was not what was targeted. Whenever possible, we recommend to DNA barcode the sample to prevent this from occurring.
e. Vouchering
A voucher specimen consists of a representative sample of the collected species. A voucher preserve as much as possible of the physical remains of an organism, serving as a verifiable and permanent record of wildlife. The sample is typically collected in the field and preserved in a herbarium or museum collection. Separate specimen voucher and take scaled pictures, following the requirements from the respective collection facility.
e. Biobanking
Biobanking refers to the storage of biological samples for research purposes. Animal/plant tissue biobanking is used to track genetic changes over time, which can help understand the evolution of species. Material for biobank should be deposited in biobank repositories. In addition to tissue biobanking, DNA biobanking is also possible. Ideally tissue and DNA are from the same specimen that will be sequenced, but for very small specimens a different individual can be used. The material should be preferably deposited in a repository in the same country of origin of the material.
f. Storage
Samples must be kept as cold as possible to prevent DNA degradation prior to sequencing. If possible, place the sample tubes into dry ice, a charged LN2 Dry Shipper (< -150ºC ) or a -80ºC freezer. Please note that wet ice and -20°C freezers are not appropriate for the storage of tubes containing samples intended for genome sequencing.
h. Material Transfer Agreements
MTAs are agreements between two parties, typically a provider and a recipient, that govern the transfer of biological samples. MTAs are used to ensure that the provider of the material is adequately compensated for the use of the material, and that the recipient of the material is legally and ethically responsible for its use. Sample providers should be aware of any MTA, for example when sending biological material between their research facility and sequencing centres. More information can be found in the CETAF Code of Conduct and Best Practices (Example MTA without change in ownership).
i. Shipping
All samples must be shipped on dry ice or in a dry shipper. Please make sure that they refill at borders/often. Be careful on the regulation of non-EU countries in Europe.
j. ERGA reduced manifest
Do you wish to learn what metadata you need to submit with your sample in order to register it with ERGA as an ERGA Satellite genome? This is the ERGA Reduced manifest. Fields marked in bold are the mandatory variables.
k. ENA mandatory fields:
The European Nucleotide Archive (ENA) operates as a public archive for nucleotide sequence data. This is the ENA checklist of minimum requirements to register a physical sample.
Did you acquire the samples and you are ready to extract DNA and RNA?
.
DNA
a. DNA extraction protocols: DNA extraction is the process of isolating DNA from cells, tissues or other biological samples.
b. High Molecular Weight DNA extraction protocols: Please see in the following section Libraries Preparation for a list of recommended protocols for extracting and preparing HMW DNA for sequencing
RNA extraction protocols
RNA extraction involves separating ribonucleic acid (RNA) from a cell or a tissue sample. The Libraries Preparation section details some recommendations for sequencing RNA in order to annotate your reference genome.
DNA concentration, integrity, and purity
i. DNA concentration: is typically measured in nanograms per microliter and can be determined using techniques such Qubit assays.
ii. DNA integrity: is a measure of the quality of the DNA. DNA integrity can be determined using gel electrophoresis or PCR-based methods. It is important to ensure that the DNA is intact and not degraded, as this can affect the accuracy of results. Bioanalizer is the most used.
iii. DNA purity: is a measure of the level of impurities in DNA samples. It is important to ensure that DNA is free from contaminants, as this can affect the accuracy of results. DNA purity can be assessed using spectrophotometry based methods as NanoDrop.
You’ve extracted your DNA and are wondering how to go about getting the required DNA/RNA to assemble or annotate your genome? DNA library preparation is a key step in the process of sequencing. The libraries preparation will determine the quality of your assembly and annotation. Ensuring that the DNA is processed properly in order for accurate and reliable results to be obtained. Here you can find our recommended protocols library preparation such as PacBio, Oxford Nanopore Instruments, Chromatin Conformation Capture (HiC) sequencing and whole-transcript sequencing, among others. We also have a list of ERGA-affiliated Sequencing Centres who are experienced with the various stages of sequencing in order to create a Reference-Quality Genome.
PacBio HiFi
Typically made up of DNA fragments around 10-15kb in size and with an accuracy of over 99%, PacBio HiFi reads are constructed by circularising DNA and creating a Circular Consensus Sequence (CCS) with high accuracy. This protocol has a history of producing high-quality reference de-novo genomes for a wide range of species and genomes.
ONT
Oxford Nanopore Technologies offers an alternative to read long pieces of DNA via electrical fluctuations caused by the nucleotides pasing through a membrane pore. The reads sequenced here can be much longer than with PacBio HiFi (typically over 30kb, but ultra-long libraries are established to sequence reads of over 200kb in length) but come with a higher error rate. As the hardware and base-calling software have improved over time, the error rates have reduced from over 15% to almost 1% in modal error rate.
ONT
Oxford Nanopore Technologies offers an alternative to read long pieces of DNA via electrical fluctuations caused by the nucleotides pasing through a membrane pore. The reads sequenced here can be much longer than with PacBio HiFi (typically over 30kb, but ultra-long libraries are established to sequence reads of over 200kb in length) but come with a higher error rate. As the hardware and base-calling software have improved over time, the error rates have reduced from over 15% to almost 1% in modal error rate. https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.13588
Hi-C Arima or Dovetail genomics
3-dimensional Chromatin Conformation Capture libraries allow us to gain insight into the organisation of the genome into Topologically Associated Domains (TADs), Eu- and hetero-chromatin and chromosomes. In the generation of a reference genome, we leverage the information that regions close together in the linear are more likely to be close together in 3D space to order and orient our smaller assembled sequences (contigs and scaffolds) into chromosomes. HiC protocols generally follow the steps of isolating nuclei, cross-linking chromatin in its 3D conformation, digesting the DNA at either enzyme motif sites (Arima) or DNAse-exposed areas of the genome (Dovetail) and then sequencing the two cross-linked regions via paired-end sequencing on an Illumina device.
Illumina short reads
Useful for error-correction of the final assembly, or identifying sequences from parental lines when performing a trio-binned assembly, Whole Genome Sequencing (WGS or Shotgun Sequencing) aims to sequence the entire genome in short fragments (typically 100/150bp paired-end libraries) with high accuracy (Q30 or 99.9% accuracy).
RNA-seq
Recommend to help the annotation process of creating your reference genome. Sequencing of RNA-seq libraries is typically performed on an Illumina instrument after RNA has been extracted from your tissues of interest (usually brain or gonad for genome annotation), converted to cDNA and finally amplified before loading onto an instrument.
Iso-seq
The PacBio Iso-seq protocol offers full-length sequencing of transcripts, which is particularly powerful when annotating alternate isoforms in the genome. The sequencing is performed on a PacBio instrument and again leverages the repeated sequencing of circular cDNA to create a high-accuracy consensus sequence for each transcript.
You have produced a genome assembly and want to associate it with ERGA as a Satellite genome? Here we detail the next steps required to obtain the ERGA label for your genome and some recommendations for what to do next as part of our best practices:
How do I know if my assembly is good enough?
First, your assembly should meet the EBP metrics, the Sequencing and Assembly Committee will be able to guide you through the post-assembly QC process. Either submit an EAR or present your genome at a SAC meeting.
Open-access genomes for all
If you have a high-quality genome and want to associate it with ERGA, it needs to be in the public domain. We recommend uploading your genome to ENA, where you will be able to link your genome to the ERGA BioProject.
“An assembly is nothing without an annotation”
After you have produced a reference-quality genome assembly, you should think about annotating the key features of your genome. This includes, but is not limited to, finding and recording the locations of:
-
Repeat sequences, or Transposable Elements
-
Telomeres and Centromeres
-
Protein-coding sequences
-
Micro-transcript sequences (miRNA)
-
Non-coding sequences (ncRNA)
The Annotation Committee can guide you with some of these steps, or for ERGA-satellite genomes, we also recommend uploading your genome to ENA, where ENSEMBL will annotate your genome using publicly-available transcript data.
-
Present to SAC/perform an EAR
-
ERGA-suggested genome annotation workflows
-
I need help with my annotation - present to Annotation committee
You have an assembly and annotation that you wish to associate with ERGA as a Satellite genome? Here we detail the next steps required to obtain the ERGA label for your genome and some recommended next steps:
How do I know if my assembly and annotation are good enough?
First, your assembly should meet the EBP metrics, the Sequencing and Assembly Committee will be able to guide you through the post-assembly QC process. Either submit an EAR or present your genome at a SAC meeting. Your annotation should be in a format that can be downloaded and used by all (e.g. gff3) and linked to your assembly.
How do I get the ERGA label?
You need to upload your assembly (and annotation) to ENA in order to be associated with the ERGA BioProject. Our upload guide can be found here with steps on how to format your assembly and annotation.
Publishing my genome
You may also wish to publish your assembly and annotation in a Genome Report. Here you can find scripts to generate an ERGA Genome Report, which can be linked to your submission
What next?
Now you have a high-quality genome, there is a host of analysis that can be performed including Population Genomics, Phylogenomics, Comparative Genomics & Functional Genomics. The Data Analysis Committee have produced a guide on how to conduct a variety of Downstream Analyses
-
How to upload your genome/annotation to ENA & link it to ERGA (Reduced Manifest)
-
I need help analysing my genome - contact to DAC committees (DAC SOPs)
You have an assembly and annotation that you wish to associate with ERGA as a Satellite genome? Here we detail the next steps required to obtain the ERGA label for your genome and some recommended next steps:
How do I know if my assembly and annotation are good enough?
First, your assembly should meet the EBP metrics, the Sequencing and Assembly Committee will be able to guide you through the post-assembly QC process. Either submit an EAR or present your genome at a SAC meeting. Your annotation should be in a format that can be downloaded and used by all (e.g. gff3) and linked to your assembly.
How do I get the ERGA label?
You need to upload your assembly (and annotation) to ENA in order to be associated with the ERGA BioProject. Our upload guide can be found here with steps on how to format your assembly and annotation.
You have an assembly and annotation that you wish to associate with ERGA as a Satellite genome? Here we detail the next steps required to obtain the ERGA label for your genome and some recommended next steps:
How do I know if my assembly and annotation are good enough?
First, your assembly should meet the EBP metrics, the Sequencing and Assembly Committee will be able to guide you through the post-assembly QC process. Either submit an EAR or present your genome at a SAC meeting. Your annotation should be in a format that can be downloaded and used by all (e.g. gff3) and linked to your assembly.
How do I get the ERGA label?
You need to upload your assembly (and annotation) to ENA in order to be associated with the ERGA BioProject. Our upload guide can be found here with steps on how to format your assembly and annotation.
You have an assembly and annotation that you wish to associate with ERGA as a Satellite genome? Here we detail the next steps required to obtain the ERGA label for your genome and some recommended next steps:
How do I know if my assembly and annotation are good enough?
First, your assembly should meet the EBP metrics, the Sequencing and Assembly Committee will be able to guide you through the post-assembly QC process. Either submit an EAR or present your genome at a SAC meeting. Your annotation should be in a format that can be downloaded and used by all (e.g. gff3) and linked to your assembly.
How do I get the ERGA label?
You need to upload your assembly (and annotation) to ENA in order to be associated with the ERGA BioProject. Our upload guide can be found here with steps on how to format your assembly and annotation.