Skip to content Skip to footer

Genome-assembly: Genome scaffolding with Hi-C on Galaxy Australia

This How-to-Guide describes the steps required to scaffold your genome on the Galaxy Australia platform using a HiC scaffolding workflow developed by the Vertebrate Genomes Project, and modified by the Galaxy Australia team in consultations with the Bioplatforms Australia Threatened Species Initiative and Australian BioCommons.

This workflow has been created from a Vertebrate Genomes Project (VGP) scaffolding workflow.

  • For more information about the VGP project see the Galaxy-VGP project page.
  • The VGP scaffolding workflow is hosted at WorkflowHub.
  • Some minor changes have been made to better fit with TSI project data: optional inputs of SAK info and sequence graph have been removed; the required input format for the genome is changed from gfa to fasta; and the estimated genome size now requires user input rather than being extracted from output of a previous workflow.

Please see the HiC Scaffolding section in the VGP assembly tutorial for additional information about this workflow.

Note: If you initially assembled the genome with HiFi data only, and you have new HiC data, you may wish to consider re-assembling the genome with the VGP HiFi-HiC assembly pipeline which can give better results than using HiFi data alone.

Register and login

  1. To register for Galaxy Australia, visit the login page.
  2. Click the Register here link.
  3. Complete the registration wizard and click Create.
  4. Login to your account!

Upload data files

Please see the how-to guide for HiFi genome assembly for additional information about uploading data from the Bioplatforms Australia Data Portal.

  • In Galaxy Australia, create a new history
  • Import data
    • assembly.fasta. This file may be in one of your Galaxy histories.
    • HiC data: concatenated HiC_F.fastqsanger.gz, concatenated HiC_R.fastqsanger.gz

Import the scaffolding workflow

Please see the how-to guide for HiFi genome assembly for additional information about how to import and run workflows.

  • Visit this link to:
    • Retrieve the workflow for TSI-Scaffolding-with-HiC
    • Import into your Galaxy Australia workflows

Run the workflow

  • Click on the Workflow tab, find this workflow and click on the triangle run button.
  • Add in the required inputs:
    • assembly.fasta
    • restriction enzymes
    • HiC forward and reverse reads - these need to be a single concatenated file for each set, and in fastqsanger.gz format
    • Estimated genome size as integer
    • Lineage for BUSCO
  • Click Run

What the workflow does

Step Inputs Tool Outputs
Map HiC reads to genome assembled_genome.fasta
HiCR1.fastqsanger.gz
HiCR2.fastqsanger.gz
BWA MEM 2 HiCR1.bam
HiCR2.bam
Merge bams HiCR1.bam
HiCR2.bam
Filter and merge HiC.bam
Make pre-scaffolding pretex map HiC.bam Pretext map Pretext map output
Make pre-scaffolding pretex map snapshot Pretext map output Pretext snapshot HiC contact map
(view)
Scaffold assembled_genome.fasta
HiC.bam
YAHS scaffolded_assembly.fasta
Map HiC reads to scaffold scaffolded_assembly.fasta
HiCR1.fastqsanger.gz
HiCR2.fastqsanger.gz
BWA MEM 2 HiCR1scaffold.bam
HiCR2scaffold.bam
Merge bams HiCR1scaffold.bam
HiCR2scaffold.bam
Filter and merge HiCscaffold.bam
Make post-scaffolding pretex map HiCscaffold.bam Pretext map Pretext map output scaffold
Make post-scaffolding pretex map snapshot Pretext map output scaffold Pretext snapshot HiC contact map scaffold
(view, compare to pre-scaffold map)

Check the outputs

The main outputs are:

  • scaffolded_assembly.fasta
  • comparison of pre- / post- scaffolding contact maps

For more information about what these outputs mean, please see the HiC Scaffolding section in the VGP assembly tutorial.

Acknowledgements

Workflow citation:

Syme, A., Silver, L., based on VGP Project (2024). TSI-Scaffolding-with-HiC (based on VGP-HiC-scaffolding). WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1054.1

Affiliations Contributors