Skip to content Skip to footer

Genome-annotation: Genome Annotation

Galaxy Australia is capable of conducting genome annotation using the FgenesH++ annotation tool. Users need to apply for access to this tool, please see service notes here and apply for access here.

This How-to-Guide will describe the steps required to annotate your genome on the Galaxy Australia platform (see Fig 1), developed in consultations between the Bioplatforms Australia Threatened Species Initiative, Galaxy Australia, and Australian BioCommons.

If you need help, the Galaxy community is both approachable and helpful. Ask them questions!

Quick start guide

  1. Log in to Galaxy Australia
  2. Apply for access to FGenesH++.
  3. Create a new history
  4. Upload your assembled reference genome, repeat masked reference genome, .cdna, .pro and .dat files from the transcriptome workflow
  5. Load and execute workflows, using required options
  6. Review workflow report and perform additional QC as needed
  7. Re-run workflows, or individual tools, as needed

How to cite the workflow

Silver, L. (2024). Fgenesh annotation -TSI. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.881.4

The overall workflow

Fig 1. The approach described in this How-to-Guide, including Quick Start guide steps 1) registration, 2) upload of input files, 3) FgenesH++ genome annotation Required workflow steps are blue, and optional steps are red.

Further to this, a summary of the different elements of this assembly approach are detailed below:

Process name Workflow name Description Inputs Outputs
UPLOAD FILES Not applicable See the different upload options. reference genome, repeat masked reference genome, .cdna, .pro and .dat files from transcriptome assembly workflow Uploaded data!
GENOME ANNOTATION Fgenesh annotation-TSI Splitting of reference and masked reference genome files, annotation of genome using FgenesH++, merging of annotation files and extraction of mRNA, CDS and protein sequences reference genome, repeat masked reference genome, .cdna, .pro and .dat files from transcriptome assembly workflow GFF3 of annotated genes, fasta file of mRNA, CDS and protein sequences which were annotated

In-depth workflow guide

Register and login

  1. To register for Galaxy Australia, visit the login page.
  2. Click the Register here link, as shown in Fig 2.
  3. Complete the registration wizard and click Create.
  4. Login to your account!

Fig 2. Log-in / registration menu for Galaxy Australia.

Upload data file(s)

  1. In Galaxy Australia, create a new history and click on Upload Data
  2. Choose local files (see Figure 3)

Fig 3.

  1. Upload your assembled reference genome and masked reference genome (Link to repeat masking workflow), as well as the .cdna, .pro and .dat output from your transcriptome assembly. Note: it is recommended by Softberry that the genome is hard-masked rather than soft-masked.

Run the annotation workflow

  1. Make sure you are logged into Galaxy Australia
  2. Visit this link to:
    • retrieve the workflow for genome annotation, and
    • import into your Galaxy Australia workflows
  3. Once you have reached the workflow screen, select the play button for the annotation workflow (See Figure 4)

Fig 4.

  1. The workflow invocation window will open.
  2. Select your reference assembled genome fasta file (Fig 5).
  3. Select your repeat masked reference genome fasta file (Fig 5).

Fig 5.

  1. Select the matrix of a closely related species (full list of matrices) (Step 1 in Fig 6),
  2. Select the appropriate reference database (Step 2 in Fig 6),
  3. Select the appropriate NR database (TSI animals or TSI plants) (Step 3 in Fig 6).

Fig 6.

  1. Select your cDNA (Step 1 in Fig 7), protein (Step 2 in Fig 7) and Dat files (Step 3 in Fig 7).

Fig 7.

  1. Choose your reference genome as the sequence file for task 6 and 7 (Step 1 and 2 in Fig 8)
  2. Choose the most appropriate lineage to run BUSCO on the protein output file (Step 1 in Fig 9)

Fig 8.

Fig 9.

Affiliations Contributors