Genome annotation with Prokka

Author(s)	Anna Syme Torsten Seemann Simon Gladman
Reviewers

Overview
Questions:

How can we annotate a bacterial genome?

How can we visualize annotated genomic features?

Objectives:

Load genome into Galaxy

Annotate genome with Prokka

View annotations in JBrowse

Requirements:

Introduction to Galaxy Analyses

Time estimation: 1 hour

Level: Introductory Introductory

Supporting Materials:

Slides

Datasets

Workflows

FAQs

video Recordings

video Tutorial (February 2021) - 20m

video View All

instances Available on these Galaxies

Known Working

UseGalaxy.eu ✅ ⭐️

UseGalaxy.org (Main) ✅ ⭐️

Possibly Working

Galaxy@AuBi

GalaxyTrakr

MISSISSIPPI

UseGalaxy.be

UseGalaxy.cz

UseGalaxy.fr

UseGalaxy.no

UseGalaxy.org.au

Published: Mar 6, 2018

Last modification: Aug 7, 2024

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT

purl PURL: https://gxy.io/GTN:T00168

rating Rating: 3.8 (0 recent ratings, 18 all time)

version Revision: 25

In this section we will use a software tool called Prokka to annotate a draft genome sequence. Prokka is a “wrapper”; it collects together several pieces of software (from various authors), and so avoids “re-inventing the wheel”.

Prokka finds and annotates features (both protein coding regions and RNA genes, i.e. tRNA, rRNA) present on on a sequence. Note, Prokka uses a two-step process for the annotation of protein coding regions: first, protein coding regions on the genome are identified using Prodigal; second, the function of the encoded protein is predicted by similarity to proteins in one of many protein or protein domain databases. Prokka is a software tool that can be used to annotate bacterial, archaeal and viral genomes quickly, generating standard output files in GenBank, EMBL and gff formats. More information about Prokka can be found in Prokka’s github link.

Agenda

In this tutorial, we will deal with:

Import the data

Annotate the genome

Examine the output

View annotated features in JBrowse

What’s Next

Import the data

Prokka requires assembled contigs.

Hands-on: Obtaining our data
Make sure you have an empty analysis history. Give it a name.

To create a new history simply click the new-history icon at the top of the history panel:
Import the following files from Zenodo or from the shared data library
https://zenodo.org/record/1156405/files/contigs.fasta
Copy the link location

Click galaxy-upload Upload Data at the top of the tool panel

Select galaxy-wf-edit Paste/Fetch Data

Paste the link(s) into the text field

Press Start

Close the window

As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:

Go into Data (top panel) then Data libraries

Navigate to the correct folder as indicated by your instructor.

On most Galaxies tutorial data will be provided in a folder named GTN - Material –> Topic Name -> Tutorial Name.

Select the desired files

Click on Add to History galaxy-dropdown near the top and select as Datasets from the dropdown menu

In the pop-up window, choose

“Select history”: the history you want to import the data to (or create a new one)

Click on Import

Annotate the genome

Now we will run the tool called Prokka.

Hands-on: Annotate genome

Prokka ( Galaxy version 1.14.5+galaxy0) with the following parameters (leave everything else unchanged)

param-file “contigs to annotate”: contigs.fasta

Examine the output

Once Prokka has finished, examine each of its output files.

The GFF and GBK files contain all of the information about the features annotated (in different formats.)
The .txt file contains a summary of the number of features annotated.
The .faa file contains the protein sequences of the genes annotated.
The .ffn file contains the nucleotide sequences of the genes annotated.

View annotated features in JBrowse

Now that we have annotated the draft genome sequence, we would like to view the sequence in the JBrowse genome viewer. First, we have to make a JBrowse file. Then, we can view it within Galaxy.

Hands-on: Visualize the annotation

JBrowse ( Galaxy version 1.16.11+galaxy1) with the following parameters

“Reference genome to display”: Use a genome from history

param-file “Select the reference genome”: fna output of Prokka ( Galaxy version 1.14.5+galaxy0)

This sequence will be the reference against which annotations are displayed

“Produce Standalone Instance”: Yes

“Genetic Code”: 11: The Bacterial, Archaeal and Plant Plastid Code

Click on Insert Track Group

We will now set up one track - each track is a dataset displayed underneath the reference sequence (which is displayed as nucleotides in FASTA format). We will choose to display the annotations (the Prokka.gff file).

In 1: Track Group

“Track Category”: gene annotations

Click on Insert Annotation Track and fill it with:

“Track Type”: GFF/GFF3/BED Features

param-file “GFF/GFF3/BED Track Data”: gff output of Prokka ( Galaxy version 1.14.5+galaxy0)

A new file will be created in your history, this contains the JBrowse interactive visualisation. We will now view its contents and play with it

Inspect the JBrowse on data XX and data XX - Complete file by clicking on the galaxy-eye (eye) icon

The JBrowse window will appear in the centre Galaxy panel.

Display all the tracks and practice maneuvering around

Click on the tick boxes on the left to display the tracks

Select contig 1 in the drop down box. You can only see one contig displayed at a time.

Zoom by clicking on the plus and minus buttons.

JBrowse displays the sequence and a 6-frame amino acid translation.

Right click on a gene/feature annotation (the bars on the annotation track), then select View Details to see more information.

gene name

product name

you can download the FASTA sequence by clicking on the disk icon

What’s Next

After automatic annotation of prokaryotic genome, if inspection of predicted genes with JBrowse introduced mistakes, e.g. wrong exon/intron limits, splitted genes, or merged genes – or simply if you wish to rename genes or provide additional functional (e.g., Gene Ontology) data, setting up a manual curation project using Apollo helps a lot to manually fix these errors.

The Apollo training should provide additional guidance.

You've Finished the Tutorial

Key points

Prokka is a useful tool to annotate a bacterial genome.

JBrowse can be used to inspect the annotation of a genome.

Frequently Asked Questions

Have questions about this tutorial? Check out the tutorial FAQ page or the FAQ page for the Genome Annotation topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

Anna Syme, Torsten Seemann, Simon Gladman, Genome annotation with Prokka (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/genome-annotation/tutorials/annotation-with-prokka/tutorial.html Online; accessed TODAY
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{genome-annotation-annotation-with-prokka,
author = "Anna Syme and Torsten Seemann and Simon Gladman",
	title = "Genome annotation with Prokka (Galaxy Training Materials)",
	year = "",
	month = "",
	day = "",
	url = "\url{https://training.galaxyproject.org/training-material/topics/genome-annotation/tutorials/annotation-with-prokka/tutorial.html}",
	note = "[Online; accessed TODAY]"
}
@article{Hiltemann_2023,
	doi = {10.1371/journal.pcbi.1010752},
	url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
	year = 2023,
	month = {jan},
	publisher = {Public Library of Science ({PLoS})},
	volume = {19},
	number = {1},
	pages = {e1010752},
	author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
	editor = {Francis Ouellette},
	title = {Galaxy Training: A powerful framework for teaching!},
	journal = {PLoS Comput Biol}
}

                   

Congratulations on successfully completing this tutorial!

Go Further
Do you want to extend your knowledge? Follow one of our recommended follow-up trainings:

slides Slides: Refining Genome Annotations with Apollo (prokaryotes)

tutorial Hands-on: Refining Genome Annotations with Apollo (prokaryotes)

You can use Ephemeris's shed-tools install command to install the tools used in this tutorial.

shed-tools install [-g GALAXY] [-a API_KEY] -t <(curl https://training.galaxyproject.org/training-material/api/topics/genome-annotation/tutorials/annotation-with-prokka/tutorial.json | jq .admin_install_yaml -r)

Alternatively you can copy and paste the following YAML

---
install_tool_dependencies: true
install_repository_dependencies: true
install_resolver_dependencies: true
tools:
- name: prokka
  owner: crs4
  revisions: bf68eb663bc3
  tool_panel_section_label: Annotation
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: prokka
  owner: crs4
  revisions: 111884f0d912
  tool_panel_section_label: Annotation
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: jbrowse
  owner: iuc
  revisions: a6e57ff585c0
  tool_panel_section_label: Graph/Display Data
  tool_shed_url: https://toolshed.g2.bx.psu.edu/

5 stars 10

4 stars 3

2 stars 3

1 stars 1

0 stars 1

May 2023

5 stars: Liked: Really easy to follow and to understand. The most important points were explained in a clear and concise manner. Disliked: I got everything I needed as an introduction to using Prokka.

November 2022

5 stars: Liked: Short and precise, not more or less than what the title promises Disliked: One setting in the JBrowse step isn't congruent with the most recent Prokka version anymore: can't configure "Produce Standalone Instance”: "Yes" maybe you can update this :)

October 2021

5 stars: Liked: quick and simple Disliked: none

April 2021

2 stars: Liked: It seemed clear - just didn't work. Disliked: I read the tutorial as I tried to duplicate it in Galaxy with a phage genome sequence. The tutorial did not correspond with what was in Galaxy. JBrowse did not work - no indication why.

March 2021

1 stars: Disliked: It doesn't show you where to begin. It gives you steps but doesn't show you how to get to each step. Very frustrating

February 2021

5 stars: Liked: Simple and easy to demonstrate gene annotation using contigs

January 2021

5 stars: Liked: Concise and clear. Thank you!

2 stars: Liked: The explaination about how to use Prokka Disliked: JBrowse doesn't work with this parameters, you should update this tutorial

September 2018

0 stars: Disliked: I am doing this tutorial 09/2018 -> the step using the JBrowse tool works only (at least in my hands) with this version: (Galaxy Version 1.12.5+galaxy0) i had troubles finding it (maybe the search does not work properly?)

2 stars: Disliked: I am doing this tutorial 09/2018 -> the step using the JBrowse tool works only (at least in my hands) with this version: (Galaxy Version 1.12.5+galaxy0) i had troubles finding it (maybe the search does not work properly?)