---
title: "In-silico cleavage of polypeptides using the cleaver package"
author:
- name: Sebastian Gibb
  affiliation: Department of Anesthesiology and Intensive Care, University Medicine Greifswald, Germany.
package: cleaver
abstract: >
    This vignette describes the in-silico cleavage of polypeptides using
    the `cleaver` package.
output:
    BiocStyle::html_document:
        toc_float: TRUE
        tidy: TRUE
bibliography: cleaver.bib
vignette: >
    %\VignetteIndexEntry{In-silico cleavage of polypeptides}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteKeywords{Proteomics, Bioinformatics, Cleavage, Polypeptides}
    %\VignetteEncoding{UTF-8}
    %\VignettePackage{cleaver}
---

```{r environment, echo=FALSE, message=FALSE, warning=FALSE}
library("cleaver")
library("UniProt.ws")
library("BRAIN")
```

# Introduction

Most proteomics experiments need protein (peptide) separation and cleavage
procedures before these molecules could be analyzed or identified
by mass spectrometry or other analytical tools.

`r BiocStyle::Biocpkg("cleaver")` allows in-silico cleavage of polypeptide
sequences to e.g. create theoretical mass spectrometry data.

The cleavage rules are taken from the
[ExPASy PeptideCutter tool](https://web.expasy.org/peptide_cutter/peptidecutter_enzymes.html)
[@peptidecutter].

# Simple Usage

Loading the `r BiocStyle::Biocpkg("cleaver")` package:
```{r}
library("cleaver")
```

Getting help and list all available cleavage rules:
```{r, eval=FALSE}
help("cleave")
```

Cleaving of *Gastric juice peptide 1 (P01358)* using *Trypsin*:
```{r}
## cleave it
cleave("LAAGKVEDSD", enzym="trypsin")
## get the cleavage ranges
cleavageRanges("LAAGKVEDSD", enzym="trypsin")
## get only cleavage sites
cleavageSites("LAAGKVEDSD", enzym="trypsin")
```

Sometimes cleavage is not perfect and the enzym miss some cleavage positions:
```{r}
## miss one cleavage position
cleave("LAAGKVEDSD", enzym="trypsin", missedCleavages=1)
cleavageRanges("LAAGKVEDSD", enzym="trypsin", missedCleavages=1)
## miss zero or one cleavage positions
cleave("LAAGKVEDSD", enzym="trypsin", missedCleavages=0:1)
cleavageRanges("LAAGKVEDSD", enzym="trypsin", missedCleavages=0:1)
```

Combine `r BiocStyle::Biocpkg("cleaver")` and
`r BiocStyle::Biocpkg("Biostrings")` [@Biostrings]:
```{r}
## create AAStringSet object
p <- AAStringSet(c(gaju="LAAGKVEDSD", pnm="AGEPKLDAGV"))

## cleave it
cleave(p, enzym="trypsin")
cleavageRanges(p, enzym="trypsin")
cleavageSites(p, enzym="trypsin")
```

# Insulin \& Somatostatin Example

Downloading *Insulin (P01308)* and *Somatostatin (P61278)* sequences
from the [UniProt](http://www.uniprot.org) [@uniprot] database using
`r BiocStyle::Biocpkg("UniProt.ws")` [@UniProt.ws].
```{r}
## load UniProt.ws library
library("UniProt.ws")

## select species Homo sapiens
up <- UniProt.ws(taxId=9606)

## download sequences of Insulin/Somatostatin
s <- select(up,
    keys=c("P01308", "P61278"),
    columns=c("sequence"),
    keytype="UniProtKB"
)

## fetch only sequences
sequences <- setNames(s$Sequence, s$Entry)

## remove whitespaces
sequences <- gsub(pattern="[[:space:]]", replacement="", x=sequences)
```

Cleaving using *Pepsin*:
```{r}
cleave(sequences, enzym="pepsin")
```

# Isotopic Distribution Of Tryptic Digested Insulin

A common use case of in-silico cleavage is the calculation of the
isotopic distribution of peptides (which were enzymatic digested in the
in-vitro experimental workflow). Here
`r BiocStyle::Biocpkg("BRAIN")` [@BRAIN; @BRAIN2] is used to calculate
the isotopic distribution of `r BiocStyle::Biocpkg("cleaver")`'s output.
(please note: it is only a toy example, e.g. the relation of intensity values
between peptides isn't correct).
```{r}
## load BRAIN library
library("BRAIN")

## cleave insulin
cleavedInsulin <- cleave(sequences[1], enzym="trypsin")[[1]]

## create empty plot area
plot(NA, xlim=c(150, 4300), ylim=c(0, 1),
     xlab="mass", ylab="relative intensity",
     main="tryptic digested insulin - isotopic distribution")

## loop through peptides
for (i in seq(along=cleavedInsulin)) {
  ## count C, H, N, O, S atoms in current peptide
  atoms <- BRAIN::getAtomsFromSeq(cleavedInsulin[[i]])
  ## calculate isotopic distribution
  d <- useBRAIN(atoms)
  ## draw peaks
  lines(d$masses, d$isoDistr, type="h", col=2)
}
```

# Session Information

```{r sessioninfo, echo=FALSE}
sessionInfo()
```

# References