TPP2D 1.0.0
Thermal proteome profiling (TPP) (Franken et al., 2015; Savitski et al., 2014) is an unbiased mass spectrometry-based method to assess protein-ligand interactions. It works by employing the cellular thermal shift assay (CETSA) (Molina et al., 2013) on a proteome-wide scale which in brief monitors the profiles of proteins in cells over a temperature gradient and tries to detect shifts induced by ligand-protein interactions in a treatment versus a control sample. 2D-TPP represents a refined version of the assay (Becher et al., 2016) which uses a concentration gradient of the ligand of interest over a temperature gradient. This package aims to analyze data retrieved from 2D-TPP experiments by a functional analysis approach.
This package aims at providing an analysis tool for datasets obtained with the 2D-TPP assay. Please note that methods for analyzing convential TPP datasets (e.g. single dose, melting curve approach) can be found at: https://bioconductor.org/packages/release/bioc/html/TPP.html and https://git.embl.de/childs/TPP-data-analysis/blob/master/NPARC_paper/reports/NPARC_workflow.Rmd .
This vignette is not aimed to represent an in-depth introduction to thermal proteome profiling, please refer to other sources for this purpose:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("TPP2D")
Or install the development version of the package from Github.
BiocManager::install(“nkurzaw/TPP2D”)
library(TPP2D)
This package aims at providing a tool for finding ‘hits’ (proteins affected in their thermal stability by the treatment used in the experiment) at a given false disscovery rate (FDR). Please note that a change in thermal satbility of a protein is not a guarantee for it interacting with the molecule used as treatment. However, we try to give the user additional information by specifying whether an observed effect is likely due to stabilization or a change in expression or solubility of a given protein to make the interpretation of detected hits as easy as possible.
library(dplyr)
library(TPP2D)
After having loaded dplyr
and the TPP2D
package itself we start by loading
an example dataset which is supplied with the package. Therefore, we use
the import2dDataset
function.
For this puporse we need to supply a config table that essentially describes
which experimental conditions the different TMT labels used correspond to and
supplies paths to the raw data files (note: since this example dataset is
included in the package it does not contain a “Path” column, this is however
mandatory if the data should be read in from external raw files).
data("config_tab")
data("raw_dat_list")
config_tab
## Compound Experiment Temperature 126 127L 127H 128L 128H 129L 129H 130L
## 1 Compound1 exp1 37.0 0 0.5 2 10 40 - - -
## 2 Compound1 exp1 37.8 - - - - - 0 0.5 2
## 3 Compound1 exp2 40.4 0 0.5 2 10 40 - - -
## 4 Compound1 exp2 44.0 - - - - - 0 0.5 2
## 5 Compound1 exp3 46.9 0 0.5 2 10 40 - - -
## 6 Compound1 exp3 49.8 - - - - - 0 0.5 2
## 7 Compound1 exp4 52.9 0 0.5 2 10 40 - - -
## 8 Compound1 exp4 55.5 - - - - - 0 0.5 2
## 9 Compound1 exp5 58.6 0 0.5 2 10 40 - - -
## 10 Compound1 exp5 62.0 - - - - - 0 0.5 2
## 11 Compound1 exp6 65.4 0 0.5 2 10 40 - - -
## 12 Compound1 exp6 66.3 - - - - - 0 0.5 2
## 130H 131L RefCol
## 1 - - 126
## 2 10 40 129L
## 3 - - 126
## 4 10 40 129L
## 5 - - 126
## 6 10 40 129L
## 7 - - 126
## 8 10 40 129L
## 9 - - 126
## 10 10 40 129L
## 11 - - 126
## 12 10 40 129L
We then call the import function (note: we here supply a list of data frames for the “data” argument, replacing the raw data files that would be normally specified in the above mentioned column of the config table. If this is supplied the argument “data” can simply be ignored):
import_df <- import2dDataset(
configTable = config_tab,
data = raw_dat_list,
idVar = "protein_id",
intensityStr = "signal_sum_",
fcStr = "rel_fc_",
nonZeroCols = "qusm",
geneNameVar = "gene_name",
addCol = NULL,
qualColName = "qupm",
naStrs = c("NA", "n/d", "NaN"),
concFactor = 1e6,
medianNormalizeFC = TRUE,
filterContaminants = TRUE)
## The following valid label columns were detected:
## 126, 127L, 127H, 128L, 128H, 129L, 129H, 130L, 130H, 131L.
## Importing 2D-TPP dataset: exp1
## Removing duplicate identifiers using quality column 'qupm'...
## 20 out of 20 rows kept for further analysis.
## Importing 2D-TPP dataset: exp1
## Removing duplicate identifiers using quality column 'qupm'...
## 20 out of 20 rows kept for further analysis.
## Importing 2D-TPP dataset: exp2
## Removing duplicate identifiers using quality column 'qupm'...
## 20 out of 20 rows kept for further analysis.
## Importing 2D-TPP dataset: exp2
## Removing duplicate identifiers using quality column 'qupm'...
## 20 out of 20 rows kept for further analysis.
## Importing 2D-TPP dataset: exp3
## Removing duplicate identifiers using quality column 'qupm'...
## 15 out of 15 rows kept for further analysis.
## Importing 2D-TPP dataset: exp3
## Removing duplicate identifiers using quality column 'qupm'...
## 15 out of 15 rows kept for further analysis.
## Importing 2D-TPP dataset: exp4
## Removing duplicate identifiers using quality column 'qupm'...
## 14 out of 14 rows kept for further analysis.
## Importing 2D-TPP dataset: exp4
## Removing duplicate identifiers using quality column 'qupm'...
## 14 out of 14 rows kept for further analysis.
## Importing 2D-TPP dataset: exp5
## Removing duplicate identifiers using quality column 'qupm'...
## 9 out of 9 rows kept for further analysis.
## Importing 2D-TPP dataset: exp5
## Removing duplicate identifiers using quality column 'qupm'...
## 9 out of 9 rows kept for further analysis.
## Importing 2D-TPP dataset: exp6
## Removing duplicate identifiers using quality column 'qupm'...
## 4 out of 4 rows kept for further analysis.
## Importing 2D-TPP dataset: exp6
## Removing duplicate identifiers using quality column 'qupm'...
## 4 out of 4 rows kept for further analysis.
## Ratios were correctly computed!
## Median normalizing fold changes...
recomp_sig_df <- recomputeSignalFromRatios(import_df)
Please refer to the help page of the function to retrieve in-depth description
of the different arguments. Essentially the function needs to know the names
or prefixes of the columns in the raw data files, that contain different
informations like protein id or the raw or relative signal intensities
measured for the different TMT labels.
The imported synthetic dataset consists of 17 simulated protein 2D thermal
profiles (protein1-17) and 3 spiked-in true positives (tp1-3). It represents
a data frame with the columns:
column | description | required |
---|---|---|
representative | protein identifier | Yes |
qupm | number of unique quantified peptides | No |
qusm | number of unique spectra | No |
clustername | gene name | Yes |
temperature | temperature incubated at | Yes |
experiment | experiment identifier | No |
label | TMT label | No |
RefCol | RefCol | No |
conc | treatment concentration | No |
raw_value | raw reporter ion intensity sum | No |
raw_rel_value | raw relative fold change compared to vehicle condition at the same temperature | No |
log_conc | log10 treatment concentration | Yes |
rel_value | median normalized fold change | No |
value | recomputed reporter ion intensity | No |
log2_value | recomputed log2 reporter ion intensity | Yes |
Here the column “required” indicates which of these columns is neccessary for usage of the downstream functions.
We then begin our actual data analysis by fitting two competing models to each protein profil: A H0 model that is expected when a protein profile remains unaffected by a given treatment and a H1 that fits a contrained sigmoidal dose-response model across all temperatures. The goodness of fit of both models for each protein is then compared and a \(F\) statistic is computed.
competed_models <- competeModels(
df = recomp_sig_df)
Then we create a null model using our dataset to be able to estimate the FDR for a given \(F\) statistic in the next step.
set.seed(12, kind = "L'Ecuyer-CMRG")
null_model <- bootstrapNull(
df = recomp_sig_df,
ncores = 1, B = 1/5)
Please note that setting \(B = 1/5\) (corresponsing to \(B \times 10\) permutations) is not enough to guarantee faithful FDR estimation, this has simply been set for fast demonstration purposes. We recommend to use at least \(B = 2\) for applications in praxis.
To estimate the FDR for all given \(F\) statistics and retrieve all significant hits at a set FDR \(\alpha\) we use the following functions:
fdr_tab <- computeFdr(
df_out = competed_models,
df_null = null_model)
hits <- findHits(
fdr_df = fdr_tab,
alpha = 0.1)
hits %>%
dplyr::select(clustername, nObs, F_statistic, fdr)
## # A tibble: 8 x 4
## clustername nObs F_statistic fdr
## <chr> <int> <dbl> <dbl>
## 1 tp1 20 68.0 0
## 2 tp2 30 53.9 0
## 3 tp3 50 28.2 0
## 4 protein10 50 2.61 0
## 5 protein15 50 2.15 0
## 6 protein16 40 1.62 0
## 7 protein1 20 1.23 0
## 8 protein14 40 0.988 0
Finally we can fit and plot proteins that have come up as significant in our analysis by using:
plot2dTppFit(recomp_sig_df, "tp1", model_type = "H0")
or respectively for the H1 model:
plot2dTppFit(recomp_sig_df, "tp1", model_type = "H1")
sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.9-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.9-bioc/R/lib/libRlapack.so
##
## Random number generation:
## RNG: L'Ecuyer-CMRG
## Normal: Inversion
## Sample: Rejection
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] TPP2D_1.0.0 dplyr_0.8.0.1 BiocStyle_2.12.0
##
## loaded via a namespace (and not attached):
## [1] zip_2.0.1 Rcpp_1.0.1 highr_0.8
## [4] pillar_1.3.1 compiler_3.6.0 BiocManager_1.30.4
## [7] plyr_1.8.4 bitops_1.0-6 iterators_1.0.10
## [10] tools_3.6.0 digest_0.6.18 evaluate_0.13
## [13] tibble_2.1.1 gtable_0.3.0 pkgconfig_2.0.2
## [16] rlang_0.3.4 openxlsx_4.1.0 foreach_1.4.4
## [19] cli_1.1.0 parallel_3.6.0 yaml_2.2.0
## [22] xfun_0.6 stringr_1.4.0 knitr_1.22
## [25] grid_3.6.0 tidyselect_0.2.5 glue_1.3.1
## [28] R6_2.4.0 fansi_0.4.0 rmarkdown_1.12
## [31] bookdown_0.9 tidyr_0.8.3 ggplot2_3.1.1
## [34] purrr_0.3.2 magrittr_1.5 scales_1.0.0
## [37] codetools_0.2-16 htmltools_0.3.6 assertthat_0.2.1
## [40] colorspace_1.4-1 labeling_0.3 utf8_1.1.4
## [43] stringi_1.4.3 RCurl_1.95-4.12 lazyeval_0.2.2
## [46] munsell_0.5.0 doParallel_1.0.14 crayon_1.3.4
Becher, I., Werner, T., Doce, C., Zaal, E.A., Tögel, I., Khan, C.A., Rueger, A., Muelbaier, M., Salzer, E., Berkers, C.R., et al. (2016). Thermal profiling reveals phenylalanine hydroxylase as an off-target of panobinostat. Nature Chemical Biology 12, 908–910.
Franken, H., Mathieson, T., Childs, D., Sweetman, G.M.A., Werner, T., Tögel, I., Doce, C., Gade, S., Bantscheff, M., Drewes, G., et al. (2015). Thermal proteome profiling for unbiased identification of direct and indirect drug targets using multiplexed quantitative mass spectrometry. Nature Protocols 10, 1567–1593.
Molina, D.M., Jafari, R., Ignatushchenko, M., Seki, T., Larsson, E.A., Dan, C., Sreekumar, L., Cao, Y., and Nordlund, P. (2013). Monitoring Drug Target Engagement in Cells and Tissues Using the Cellular Thermal Shift Assay. Science 341, 84–88.
Savitski, M.M., Reinhard, F.B.M., Franken, H., Werner, T., Savitski, M.F., Eberhard, D., Martinez Molina, D., Jafari, R., Dovega, R.B., Klaeger, S., et al. (2014). Tracking cancer drugs in living cells by thermal profiling of the proteome. Science 346, 1255784.