--- title: "Working with StarBioTrek package" author: "Claudia Cava, Isabella Castiglioni" date: "`r Sys.Date()`" output: BiocStyle::html_document: toc: true number_sections: false toc_depth: 2 highlight: haddock references: - id: ref1 title: graphite - a Bioconductor package to convert pathway topology to gene network author: - family: Sales G, et al. given: journal: BMC bioinformatics volume: 13 DOI: "10.1186/1471-2105-13-20" number: pages: 20 issued: year: 2012 - id: ref2 title: The GeneMANIA prediction server biological network integration for gene prioritization and predicting gene function author: - family: Warde-Farley D, et al. given: journal: Nucleic Acids Res. volume: 28 number: pages: 214-20 issued: year: 2010 - id: ref3 title: qgraph Network visualizations of relationships in psychometric data author: - family: S. Epskamp, et al. given: journal: Journal of Statistical Software volume: 48 number: 4 pages: 1-18 issued: year: 2012 - id: ref4 title: GOplot an R package for visually combining expression data with functional analysis author: - family: Walter W, et al. given: journal: Bioinformatics volume: 31 number: 17 pages: 2912-4 issued: year: 2015 - id: ref6 title: GC-content normalization for RNA-Seq data author: - family: Risso, D., Schwartz, K., Sherlock, G., & Dudoit, S. given: journal: BMC Bioinformatics volume: 12 number: 1 pages: 480 issued: year: 2011 - id: ref7 title: Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma author: - family: Noushmehr, H., Weisenberger, D.J., Diefes, K., Phillips, H.S., Pujara, K., Berman, B.P., Pan, F., Pelloski, C.E., Sulman, E.P., Bhat, K.P. et al. given: journal: Cancer cell volume: 17 number: 5 pages: 510-522 issued: year: 2010 - id: ref8 title: Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma author: - family: Ceccarelli, Michele and Barthel, Floris P and Malta, Tathiane M and Sabedot, Thais S and Salama, Sofie R and Murray, Bradley A and Morozova, Olena and Newton, Yulia and Radenbaugh, Amie and Pagnotta, Stefano M and others given: journal: Cell URL: "http://doi.org/10.1016/j.cell.2015.12.028" DOI: "10.1016/j.cell.2015.12.028" volume: 164 number: 3 pages: 550-563 issued: year: 2016 - id: ref9 title: Comprehensive molecular profiling of lung adenocarcinoma author: - family: Cancer Genome Atlas Research Network and others given: journal: Nature URL: "http://doi.org/10.1038/nature13385" DOI: "10.1038/nature13385" volume: 511 number: 7511 pages: 543-550 issued: year: 2014 - id: ref10 title: Comprehensive molecular characterization of gastric adenocarcinoma author: - family: Cancer Genome Atlas Research Network and others given: journal: Nature URL: "http://doi.org/10.1038/nature13480" DOI: "10.1038/nature13480" issued: year: 2014 - id: ref11 title: Comprehensive molecular portraits of human breast tumours author: - family: Cancer Genome Atlas Research Network and others given: journal: Nature URL: "http://doi.org/10.1038/nature11412" DOI: "10.1038/nature11412" volume: 490 number: 7418 pages: 61-70 issued: year: 2012 - id: ref12 title: Comprehensive molecular characterization of human colon and rectal cancer author: - family: Cancer Genome Atlas Research Network and others given: journal: Nature URL: "http://doi.org/10.1038/nature11252" DOI: "10.1038/nature11252" volume: 487 number: 7407 pages: 330-337 issued: year: 2012 - id: ref13 title: Genomic classification of cutaneous melanoma author: - family: Cancer Genome Atlas Research Network and others given: journal: Cell URL: "http://doi.org/10.1016/j.cell.2015.05.044" DOI: "10.1016/j.cell.2015.05.044" volume: 161 number: 7 pages: 1681-1696 issued: year: 2015 - id: ref14 title: Comprehensive genomic characterization of head and neck squamous cell carcinomas author: - family: Cancer Genome Atlas Research Network and others given: journal: Nature URL: "http://doi.org/10.1038/nature14129" DOI: "10.1038/nature14129" volume: 517 number: 7536 pages: 576-582 issued: year: 2015 - id: ref15 title: The somatic genomic landscape of chromophobe renal cell carcinoma author: - family: Davis, Caleb F and Ricketts, Christopher J and Wang, Min and Yang, Lixing and Cherniack, Andrew D and Shen, Hui and Buhay, Christian and Kang, Hyojin and Kim, Sang Cheol and Fahey, Catherine C and others given: journal: Cancer Cell URL: "http://doi.org/10.1016/j.ccr.2014.07.014" DOI: "10.1016/j.ccr.2014.07.014" volume: 26 number: 3 pages: 319-330 issued: year: 2014 - id: ref16 title: Comprehensive genomic characterization of squamous cell lung cancers author: - family: Cancer Genome Atlas Research Network and others given: journal: Nature URL: "http://doi.org/10.1038/nature11404" DOI: "10.1038/nature11404" volume: 489 number: 7417 pages: 519-525 issued: year: 2012 - id: ref17 title: Integrated genomic characterization of endometrial carcinoma author: - family: Cancer Genome Atlas Research Network and others given: journal: Nature URL: "http://doi.org/10.1038/nature12113" DOI: "10.1038/nature12113" volume: 497 number: 7447 pages: 67-73 issued: year: 2013 - id: ref18 title: Integrated genomic characterization of papillary thyroid carcinoma author: - family: Cancer Genome Atlas Research Network and others given: journal: Cell URL: "http://doi.org/10.1016/j.cell.2014.09.050" DOI: "10.1016/j.cell.2014.09.050" volume: 159 number: 3 pages: 676-690 issued: year: 2014 - id: ref19 title: The molecular taxonomy of primary prostate cancer author: - family: Cancer Genome Atlas Research Network and others given: journal: Cell URL: "http://doi.org/10.1016/j.cell.2015.10.025" DOI: "10.1016/j.cell.2015.10.025" volume: 163 number: 4 pages: 1011-1025 issued: year: 2015 - id: ref20 title: Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma author: - family: Linehan, W Marston and Spellman, Paul T and Ricketts, Christopher J and Creighton, Chad J and Fei, Suzanne S and Davis, Caleb and Wheeler, David A and Murray, Bradley A and Schmidt, Laura and Vocke, Cathy D and others given: journal: NEW ENGLAND JOURNAL OF MEDICINE URL: "http://doi.org/10.1056/NEJMoa1505917" DOI: "10.1056/NEJMoa1505917" volume: 374 number: 2 pages: 135-145 issued: year: 2016 - id: ref21 title: Comprehensive molecular characterization of clear cell renal cell carcinoma author: - family: Cancer Genome Atlas Research Network and others given: journal: Nature URL: "http://doi.org/10.1038/nature12222" DOI: "10.1038/nature12222" volume: 499 number: 7456 pages: 43-49 issued: year: 2013 - id: ref22 title: Comprehensive Pan-Genomic Characterization of Adrenocortical Carcinoma author: - family: Cancer Genome Atlas Research Network and others given: journal: Cancer Cell URL: "http://dx.doi.org/10.1016/j.ccell.2016.04.002" DOI: "10.1016/j.ccell.2016.04.002" volume: 29 pages: 43-49 issued: year: 2016 - id: ref23 title: Complex heatmaps reveal patterns and correlations in multidimensional genomic data author: - family: Gu, Zuguang and Eils, Roland and Schlesner, Matthias given: journal: Bioinformatics URL: "http://dx.doi.org/10.1016/j.ccell.2016.04.002" DOI: "10.1016/j.ccell.2016.04.002" pages: "btw313" issued: year: 2016 - id: ref24 title: "TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages" author: - family: Silva, TC and Colaprico, A and Olsen, C and D'Angelo, F and Bontempi, G and Ceccarelli, M and Noushmehr, H given: journal: F1000Research URL: "http://dx.doi.org/10.12688/f1000research.8923.1" DOI: "10.12688/f1000research.8923.1" volume: 5 number: 1542 issued: year: 2016 - id: ref25 title: "StarBioTrek: an R/Bioconductor package for integrative analysis of TCGA data" author: - family: Colaprico, Antonio and Silva, Tiago C. and Olsen, Catharina and Garofano, Luciano and Cava, Claudia and Garolini, Davide and Sabedot, Thais S. and Malta, Tathiane M. and Pagnotta, Stefano M. and Castiglioni, Isabella and Ceccarelli, Michele and Bontempi, Gianluca and Noushmehr, Houtan given: journal: Nucleic Acids Research URL: "http://dx.doi.org/10.1093/nar/gkv1507" DOI: "10.1093/nar/gkv1507" volume: 44 number: 8 pages: e71 issued: year: 2016 vignette: > %\VignetteIndexEntry{Vignette Title} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(dpi = 300) knitr::opts_chunk$set(cache=FALSE) ``` ```{r, echo = FALSE,hide=TRUE, message=FALSE,warning=FALSE} devtools::load_all(".") ``` # Introduction Motivation: New technologies have made possible to identify marker gene signatures. However, gene expression-based signatures present some limitations because they do not consider metabolic role of the genes and are affected by genetic heterogeneity across patient cohorts. Considering the activity of entire pathways rather than the expression levels of individual genes can be a way to exceed these limits [@ref12]. This tool `StarBioTrek` presents some methodologies to measure pathway activity and cross-talk among pathways integrating also the information of network and TCGA data. New measures are under development. # Installation To install use the code below. ```{r, eval = FALSE} if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("StarBioTrek") ``` # `Get data`: Get pathway and network data ## `SELECT_path_species`: Select the pathway database and species of interest The user can select the pathway database and species of interest using some functions implemented in graphite [@ref1] ```{r, eval = TRUE} library(graphite) sel<-pathwayDatabases() ``` ```{r, eval = TRUE, echo = FALSE} knitr::kable(sel, digits = 2, caption = "List of patwhay databases and species",row.names = FALSE) ``` ## `GetData`: Searching pathway data for download The user can easily search pathways data and their genes using the `GetData` function. It can download pathways from several databases and species using the following parameters: ```{r, eval = TRUE} species="hsapiens" pathwaydb="kegg" path<-GetData(species,pathwaydb) ``` ## `GetPathData`: Get genes inside pathways The user can identify the genes inside the pathways of interest ```{r, eval = FALSE} pathway_ALLGENE<-GetPathData(path_ALL=path[1:3]) ``` ## `GetPathNet`: Get interacting genes inside pathways `GetPathNet` generates a list of interacting genes for each pathway ```{r, eval = FALSE} pathway_net<-GetPathNet(path_ALL=path[1:3]) ``` ## `ConvertedIDgenes`: Get genes inside pathways The user can convert the gene ID into GeneSymbol ```{r, eval = TRUE} pathway<-ConvertedIDgenes(path_ALL=path[1:10]) ``` ## `getNETdata`: Searching network data for download You can easily search human network data from GeneMania using the `getNETdata` function [@ref2]. The network category can be filtered using the following parameters: * **PHint** Physical_interactions * **COloc** Co-localization * **GENint** Genetic_interactions * **PATH** Pathway * **SHpd** Shared_protein_domains The species can be filtered using the following parameters: * **Arabidopsis_thaliana** * **Caenorhabditis_elegans** * **Danio_rerio** * **Drosophila_melanogaster** * **Escherichia_coli** * **Homo_sapiens** * **Mus_musculus** * **Rattus_norvegicus** * **Saccharomyces_cerevisiae** For default the organism is homo sapiens. The example show the shared protein domain network for Saccharomyces_cerevisiae. For more information see `SpidermiR` package. ```{r, eval = TRUE} organismID="Saccharomyces_cerevisiae" netw<-getNETdata(network="SHpd",organismID) ``` # `Integration data`: Integration between pathway and network data ## `path_net`: Network of interacting genes for each pathway according a network type (PHint,COloc,GENint,PATH,SHpd) The function `path_net` creates a network of interacting genes (downloaded from GeneMania) for each pathway. Interacting genes are genes belonging to the same pathway and the interaction is given from network chosen by the user, according the paramenters of the function `getNETdata`. The output will be a network of genes belonging to the same pathway. ```{r, eval = TRUE} lista_net<-pathnet(genes.by.pathway=pathway[1:5],data=netw) ``` ## `list_path_net`: List of interacting genes for each pathway (list of genes) according a network type (PHint,COloc,GENint,PATH,SHpd) The function `list_path_net` creates a list of interacting genes for each pathway. Interacting genes are genes belonging to the same pathway and the interaction is given from network chosen by the user, according the paramenters of the function `getNETdata`. The output will be a list of genes belonging to the same pathway and those having an interaction in the network. ```{r, eval = TRUE} list_path<-listpathnet(lista_net=lista_net,pathway=pathway[1:5]) ``` # `Pathway summary indexes`: Score for each pathway ## `GE_matrix`: grouping gene expression profiles in pathways Get human KEGG pathway data and a gene expression matrix in order to obtain a matrix with the gene expression levels grouped by pathways. Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function `GE_matrix` creates a profile of gene expression levels for each pathway given by the user: ```{r, eval = TRUE} list_path_gene<-GE_matrix(DataMatrix=tumo[,1:2],genes.by.pathway=pathway[1:10]) ``` ## `GE_matrix_mean`: Get human KEGG pathway data and a gene expression matrix in order to obtain a matrix PXG (in the columns there are the pathways and in the rows there are genes) with the mean gene expression for only genes given containing in the pathways given in input by the user. ```{r, eval = TRUE} list_path_plot<-GE_matrix_mean(DataMatrix=tumo[,1:2],genes.by.pathway=pathway[1:10]) ``` ## `average`: Average of genes for each pathway starting from a matrix of gene expression Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function `average` creates an average matrix (SXG: S are the samples and P the pathways) of gene expression for each pathway: ```{r, eval = FALSE} score_mean<-average(pathwayexpsubset=list_path_gene) ``` ## `stdv`: Standard deviations of genes for each pathway starting from a matrix of gene expression Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function `stdv` creates a standard deviation matrix of gene expression for each pathway: ```{r, eval = TRUE} score_st_dev<-stdv(gslist=list_path_gene) ``` # `Pathway cross-talk indexes`: Score for pairwise pathways ## `eucdistcrtlk`: Euclidean distance for cross-talk measure Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function `eucdistcrtlk` creates an euclidean distance matrix of gene expression for pairwise pathway. ```{r, eval = FALSE} score_euc_distance<-eucdistcrtlk(dataFilt=tumo[,1:2],pathway_exp=pathway[1:10]) ``` ## `dsscorecrtlk`: Discriminating score for cross-talk measure Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function `dsscorecrtlk` creates an discriminating score matrix for pairwise pathway as measure of cross-talk. Discriminating score is given by |M1-M2|/S1+S2 where M1 and M2 are mean and S1 and S2 standard deviation of expression levels of genes in a pathway 1 and and in a pathway 2 . ```{r, eval = FALSE} cross_talk_st_dv<-dsscorecrtlk(dataFilt=tumo[,1:2],pathway_exp=pathway[1:10]) ``` # `Selection of pathway cross-talk`: Selection of pathway cross-talk ## `svm_classification`: SVM classification Given the substantial difference in the activities of many pathways between two classes (e.g. normal and cancer), we examined the effectiveness to classify the classes based on their pairwise pathway profiles. This function is used to find the interacting pathways that are altered in a particular pathology in terms of Area Under Curve (AUC).AUC was estimated by cross-validation method (k-fold cross-validation, k=10).It randomly selected some fraction of TCGA data (e.g. nf= 60; 60% of original dataset) to form the training set and then assigned the rest of the points to the testing set (40% of original dataset). For each pairwise pathway the user can obtain using the methods mentioned above a score matrix ( e.g.dev_std_crtlk ) and can focus on the pairs of pathways able to differentiate a particular subtype with respect to the normal type. ```{r, eval = FALSE} nf <- 60 res_class<-svm_classification(TCGA_matrix=score_euc_dista[1:30,],nfs=nf, normal=colnames(norm[,1:10]),tumour=colnames(tumo[,1:10])) ``` # `IPPI`: Driver genes for each pathway The function `IPPI`, using pathways and networks data, calculates the driver genes for each pathway. Please see Cava et al. BMC Genomics 2017. ```{r, eval = FALSE} DRIVER_SP<-IPPI(pathax=pathway_matrix[,1:3],netwa=netw_IPPI[1:50000,]) ``` # `Visualization`: Gene interactions and pathways StarBioTrek presents several functions for the preparation to the visualization of gene-gene interactions and pathway cross-talk using the qgraph package [@ref3]. The function plotcrosstalk prepares the data: ```{r, eval = TRUE} formatplot<-plotcrosstalk(pathway_plot=pathway[1:6],gs_expre=tumo) library(qgraph) qgraph(formatplot[[1]], minimum = 0.25, cut = 0.6, vsize = 5, groups = formatplot[[2]], legend = TRUE, borders = FALSE,layoutScale=c(0.8,0.8)) ``` ```{r, eval = TRUE} qgraph(formatplot[[1]],groups=formatplot[[2]], layout="spring", diag = FALSE, cut = 0.6,legend.cex = 0.5,vsize = 6,layoutScale=c(0.8,0.8)) ``` A circle can be generated using the function `circleplot` [@ref4]. A score for each gene can be assigned. ```{r, eval = FALSE} formatplot<-plotcrosstalk(pathway_plot=pathway[1:6],gs_expre=tumo) score<-runif(length(formatplot[[2]]), min=-10, max=+10) circleplot(preplot=formatplot,scoregene=score) ``` ```{r, fig.width=6, fig.height=4, echo=FALSE, fig.align="center"} library(png) library(grid) img <- readPNG("circleplot.png") grid.raster(img) ``` ****** ### Session Information ****** ```{r sessionInfo} sessionInfo() ``` # References