Structure-based function annotation

Overview

Teaching: 5 min
Exercises: min
Questions
Objectives
  • Learn how to obtain a functional understanding of an uncharacterised protein using its structure.

Genome annotation and why structure matters

Despite the explosion in genomic data, a large fraction of genes remain annotated only as hypothetical proteins. Traditional genome annotation relies on sequence similarity, but this approach breaks down when homologous sequences are missing or highly diverged.

In this session, we’ll walk through a case study from the genome of Candidatus Protochlamydia naegleriophila.

  1. Before commencing the exercise, navigate to the relevant working directory:

     cd $MYSCRATCH/2025-ABACBS-workshop/exercises/exercise1/
    
  2. Download the Genbank genome annotation file:

     wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/499/655/GCF_001499655.1_PNK1/GCF_001499655.1_PNK1_genomic.gbff.gz
    
  3. Count the number of hypothetical protein annotations:

     zgrep "hypothetical" GCF_001499655.1_PNK1_genomic.gbff.gz | wc -l
    
  4. Count the number of protein coding genes:

     zgrep "protein_id" GCF_001499655.1_PNK1_genomic.gbff.gz | wc -l
    

Review

Note that 802 / 2516 (>30%) of the protein-coding genes do not have functional annotations.

The good news: protein structures are often more conserved than sequences, and they are tightly linked to function. Recent advances in structure prediction (using tools we’ll explore today) allow us to analyse structures at genomic scale. By comparing predicted 3D structures to known proteins, we can uncover functional relationships that sequence-based approaches miss.

Representative example: uncovering the function of a “hypothetical protein”

We’ll examine a gene annotated as a conserved hypothetical protein (locus tag PNK_0205) and explore how protein structure-based annotation can yield new functional hypotheses.

This case study is taken from:

Strategy

  1. Predict the 3D structure of the target protein as a monomer.
  2. Search for similar annotated structures.
  3. Compare gene neighborhoods of our protein and the matched structure.
  4. Predict the complex structure of potential interaction partners based on known functional associations.

workshop

Key Points

  • Many genes do not have functional annotations.

  • Predicted structures can provide important clues about function.