This How-to-Guide describes the steps required to scaffold your genome on the Galaxy Australia platform using a HiC scaffolding workflow developed by the Vertebrate Genomes Project, and modified by the Galaxy Australia team in consultations with the Bioplatforms Australia Threatened Species Initiative and Australian BioCommons.
This workflow has been created from a Vertebrate Genomes Project (VGP) scaffolding workflow.
- For more information about the VGP project see the Galaxy-VGP project page.
- The VGP scaffolding workflow is hosted at WorkflowHub.
- Some minor changes have been made to better fit with TSI project data: optional inputs of SAK info and sequence graph have been removed; the required input format for the genome is changed from gfa to fasta; and the estimated genome size now requires user input rather than being extracted from output of a previous workflow.
Please see the HiC Scaffolding section in the VGP assembly tutorial for additional information about this workflow.
Note: If you initially assembled the genome with HiFi data only, and you have new HiC data, you may wish to consider re-assembling the genome with the VGP HiFi-HiC assembly pipeline which can give better results than using HiFi data alone.
Register and login
- To register for Galaxy Australia, visit the login page.
- Click the
Register here
link. - Complete the registration wizard and click
Create
. - Login to your account!
Upload data files
Please see the how-to guide for HiFi genome assembly for additional information about uploading data from the Bioplatforms Australia Data Portal.
- In Galaxy Australia, create a new history
- Import data
assembly.fasta
. This file may be in one of your Galaxy histories.- HiC data: concatenated
HiC_F.fastqsanger.gz
, concatenatedHiC_R.fastqsanger.gz
Import the scaffolding workflow
Please see the how-to guide for HiFi genome assembly for additional information about how to import and run workflows.
- Visit this link to:
- Retrieve the workflow for
TSI-Scaffolding-with-HiC
- Import into your Galaxy Australia workflows
- Retrieve the workflow for
Run the workflow
- Click on the
Workflow
tab, find this workflow and click on the triangle run button. - Add in the required inputs:
assembly.fasta
- restriction enzymes
- HiC forward and reverse reads - these need to be a single concatenated file for each set, and in
fastqsanger.gz
format - Estimated genome size as integer
- Lineage for BUSCO
- Click
Run
What the workflow does
Step | Inputs | Tool | Outputs |
Map HiC reads to genome | assembled_genome.fasta HiCR1.fastqsanger.gz HiCR2.fastqsanger.gz |
BWA MEM 2 | HiCR1.bam HiCR2.bam |
Merge bams | HiCR1.bam HiCR2.bam |
Filter and merge | HiC.bam |
Make pre-scaffolding pretex map | HiC.bam | Pretext map | Pretext map output |
Make pre-scaffolding pretex map snapshot | Pretext map output | Pretext snapshot | HiC contact map (view) |
Scaffold | assembled_genome.fasta HiC.bam |
YAHS | scaffolded_assembly.fasta |
Map HiC reads to scaffold | scaffolded_assembly.fasta HiCR1.fastqsanger.gz HiCR2.fastqsanger.gz |
BWA MEM 2 | HiCR1scaffold.bam HiCR2scaffold.bam |
Merge bams | HiCR1scaffold.bam HiCR2scaffold.bam |
Filter and merge | HiCscaffold.bam |
Make post-scaffolding pretex map | HiCscaffold.bam | Pretext map | Pretext map output scaffold |
Make post-scaffolding pretex map snapshot | Pretext map output scaffold | Pretext snapshot | HiC contact map scaffold (view, compare to pre-scaffold map) |
Check the outputs
The main outputs are:
scaffolded_assembly.fasta
- comparison of pre- / post- scaffolding contact maps
For more information about what these outputs mean, please see the HiC Scaffolding section in the VGP assembly tutorial.
Acknowledgements
- We acknowledge and thank the Vertebrate Genomes Project for this workflow. This is not original work, but extracted from the VGP scaffolding workflow with some minor changes to inputs to better support various groups of researchers. More information about VGP workflows: Galaxy-VGP project page.
- Bioplatforms Australia Threatened Species Initiative
- Galaxy Australia
- The Australian BioCommons.
Workflow citation:
Affiliations ContributorsSyme, A., Silver, L., based on VGP Project (2024). TSI-Scaffolding-with-HiC (based on VGP-HiC-scaffolding). WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1054.1