Skip to content Skip to footer

Frequently Asked Questions

What metadata does the Genome Engine require?

The list of Bioplatforms metadata fields used in the Genome Engine is available here. The list contains fields which are collected or defined during sampling, sequencing, or internal Bioplatforms processing steps. Fields with controlled vocabularies or other value or format constraints are designated. These fields have been selected to comply with ENA and broader Tree of Life standards to sufficiently document provenance and facilitate interoperability and reusability.

Controlled vocabulary terms are listed here. Use of controlled vocabulary terms is necessary for accurate data filtering and for compliance with ENA and Tree of Life standards.

What do I do if my organism doesn’t have a taxon ID?

All data ingested by the Genome Engine must have a valid NCBI taxon ID.

For taxa not yet incorporated into the NCBI taxonomy, you will need to make a submission for a new/temporary taxon. Instructions for requesting a new taxon are available in ENA’s documentation.

If the taxon has not yet been formally described, you can request a new taxon ID using an informal name. Once a formal classification has been published, the naming information for that taxon ID can be updated.

What attribution or authorship information will be brokered to ENA/INSDC?

The project lead listed in the Bioplatforms metadata will be entered in the ‘center name’ field for each record. This field is used by broker accounts to designate the identity of the party on whose behalf the data are being deposited. The ‘center name’ will be visible on the record pages in ENA, and will be included in downloaded metadata.

What quality assembly can I expect? Will it meet the Earth BioGenome standards?

We aim to generate assemblies which meet the Earth BioGenome Project (EBP) assembly standards where possible. These standards stipulate multiple criteria, but they are often represented in shorthand as 6.C.Q40, equating to a contig N50 of 1Mb or above, a scaffold N50 at chromosomal scale, and an error rate of below 1 in 10,000. However, the Genome Engine does not mandate manual curation processes, which are required to verify chromosome-level assembly. Additionally, the Genome Engine will generate contig-level assemblies if no scaffolding data (e.g. Hi-C) are available. Hence, assemblies will be assessed according to the applicable metrics (contig N50 of at least 1Mb, QV of at least 40, less than 5% false duplications, greater than 90% kmer completeness, and greater than 90% single copy conserved genes (according to BUSCO)). Only assemblies which meet these criteria will be automatically deposited into ENA.

The genome sequence generated by the Genome Engine will be made available, along with quality metrics, to researchers prior to brokering, providing the opportunity for manual curation. Researchers can choose to further assess and curate genomes to meet the full EBP standards.

What happens if my assembly “fails”/is rejected?

If the assembly fails to pass the applicable EBP metrics, it will not be brokered automatically, however, it will be provided to you for manual curation. You will have the opportunity to make adjustments to the assembly to improve quality metrics. If the metrics meet the minimum values specified above, the Genome Engine can proceed with brokering.

We understand that generating an assembly meeting the specified quality criteria may not be feasible for all taxa, for reasons such as small organism size or scarcity of sample material due to a threatened species status. If you believe that a genome is of sufficient quality as can be expected for a taxon, even if it does not meet the applicable EBP minimum metrics, we can override its typical quality requirements and proceed with brokering the assembly to ENA.

Can I check the assembly before it is released?

Genome assemblies will be made available to researchers to provide the opportunity for testing, manual curation and quality control prior to publishing.

What type of input data does the Genome Engine use?

Our core assembly type is PacBio HiFi reads with optional Hi-C reads for scaffolding and haplotype resolution. We can also use Oxford Nanopore R10+ reads for primary assembly and Ultralong reads for scaffolding in combination with HiFi reads.

Can I use the Genome Engine if I’m not a member of a Framework project?

Yes. You will need to submit your raw data to an INSDC database and follow our metadata guidelines. From there, the Genome Engine can ingest your data and do the assembly. Please contact us first if you want to do this.