View markdown source on GitHub

Identification of non-canonical ORFs and their potential biological function

Contributors

last_modification Published: Jul 11, 2023
last_modification Last Updated: Sep 12, 2024

Index of contents

  1. Introduction

  2. Galaxy workflow


Introduction


What do we mean by non-canonical ORFs?

Definition of canonical

.footnote[Source: Cambridge dictionary]


What do we mean by non-canonical ORFs?

Mean Ribo-Seq expression and Ribo-Seq expression standard deviation (SD) have been plotted for human lymphoblastoid cells from RPFdbV2.

.footnote[Source: Erady et al. 2021]


What do we mean by non-canonical ORFs?

The dark proteome: translation from noncanonical open reading frames.


Why study non-canonical ORFs?


Why study non-canonical ORFs?

Noncanonical open reading frames enconde functional proteins essential for cancer cell survival.


Why study non-canonical ORFs?

Translation and natural selection of micropeptides from lon non-canonical RNAs.


Why have not been characterized?


Why study small peptides?

The large unexplored biology of small proteins in pro and eukaryotes.


Why study small peptides?

Short peptides regulate gene expression.


Why study small peptides?

Example of small peptides biological function.

.footnote[Source: Steinberg and Koch 2021]


Annotated as non-coding RNAs?

Coding or non-coding? This is the question.


Annotated as non-coding RNAs?

The small peptide world in long noncoding RNAs.


Annotated as non-coding RNAs?

SPENCER: a comprehensive database for small peptides encoded by noncanical RNAs in cancer patients.


Why study intrinsically disordered proteins (IDP)?

Why study intrinsically disordered proteins?

.footnote[Source: Babu et al. 2012]


Disorder-Function Paradigm

Disorder-Function paradigm.


Disorder-Function Paradigm

Expanding the paradigm: intrinsically disordered proteins and allosteric regulation.


Disorder-Function Paradigm

Intrinsically disordered proteins/regions and insight into their biomolecular interaction.

.footnote[Source: Chakrabarti and Chakravarty 2022]


How identify non-canonical ORFs?

How to identify non-canonical ORFs.


How identify non-canonical ORFs?

IsoformSwitchAnalyzeR: analysis of changes in genome-wide patterns of alternative splicing and its functional consequences.


How identify non-canonical ORFs?

Illustration of isoform switch


Galaxy Workflow

Galaxy workflow.

Full detailed explanation in the Genome-wide alternative splicing analysis Galaxy training.


Galaxy Workflow

Full workflow image.

Full detailed explanation in the Genome-wide alternative splicing analysis Galaxy training.


Galaxy Workflow

Workflow summary.


Initial QC assessment

QC step.Identify potential artifacts that may impact the interpretation of downstream analysis.

Identify potential artifacts that may impact the interpretation of downstream analysis.


Mapping and identication of novel splicing sites with RNASTAR

Mapping step with RNASTAR. Two-pass alignment enables sequence reads to span novel splice junctions by fewer nucleotides, conferring greater read depth and providing significantly more accurate quantification of novel splice junctions.

Two-pass alignment enables sequence reads to span novel splice junctions by fewer nucleotides, conferring greater read depth and providing significantly more accurate quantification of novel splice junctions.


Post-mapping QC assessment with RSeQC

Post-mapping QC. RSeQC is a toolkit for generating RNA-seq-specific quality control metrics. The figure corresponds to RSeQC junction saturation of known (A) and novel (B) splicing sites.

RSeQC is a toolkit for generating RNA-seq-specific quality control metrics. The figure corresponds to RSeQC junction saturation of known (A) and novel (B) splicing sites.


Reference-based transcriptome assembly and quantification with StringTie

Transcriptome assembly and quantification with StringTie. StringTie is a fast and highly efficient assembler of RNA-seq alignments into potential transcripts.

StringTie is a fast and highly efficient assembler of RNA-seq alignments into potential transcripts.


Post-assembly QC assessment with rnaQUAST

Post-assembly QC. rnaQUAST, which will provide us diverse completeness/correctness statistics very useful in order to identify and address potential errors or gaps in the assembly process. The figure is a rnaQUAST cummulative isoform plot.

rnaQUAST, which will provide us diverse completeness/correctness statistics very useful in order to identify and address potential errors or gaps in the assembly process. The figure is a rnaQUAST cummulative isoform plot.


Isoform switching and functional analysis with IsoformSwitchAnalyzeR

Isoform switching and functional analysis. IsoformSwitchAnalyzieR performs the differential isoform usage analysis by using DEXSeq.

IsoformSwitchAnalyzieR performs the differential isoform usage analysis by using DEXSeq.


Isoform switching and functional analysis with IsoformSwitchAnalyzeR

IsoformSwitchAnalyzeR consequences plot.

To analyze large-scale patterns in predicted IS consequences, IsoformSwitchAnalyzeR computes all isoform switching events resulting in a gain/loss of a specific consequence (e.g. protein domain gain/loss)


Thank you!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors! Galaxy Training Network Tutorial Content is licensed under Creative Commons Attribution 4.0 International License.

References

  1. Babu, M. M., R. W. Kriwacki, and R. V. Pappu, 2012 Versatility from Protein Disorder. Science 337: 1460–1461. 10.1126/science.1228775
  2. Erady, C., A. Boxall, S. Puntambekar, N. S. Jagannathan, R. Chauhan et al., 2021 Pan-cancer analysis of transcripts encoding novel open-reading frames (nORFs) and their potential biological functions. npj Genomic Medicine 6: 10.1038/s41525-020-00167-4
  3. Steinberg, R., and H.-G. Koch, 2021 The largely unexplored biology of small proteins in pro- and eukaryotes. The FEBS Journal 288: 7002–7024. 10.1111/febs.15845
  4. Chakrabarti, P., and D. Chakravarty, 2022 Intrinsically disordered proteins/regions and insight into their biomolecular interactions. Biophysical Chemistry 283: 106769. 10.1016/j.bpc.2022.106769