--- title: "SBGNview: Pathway based omics data integration, visualization and analysis" author: "Xiaoxi Dong, Weijun Luo" date: "`r format(Sys.time(), '%d %B, %Y')`" output: bookdown::html_document2: fig_caption: yes number_sections: yes toc: yes editor_options: chunk_output_type: console bibliography: REFERENCES.bib vignette: > %\VignetteIndexEntry{SBGNview functions} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r , echo = FALSE, eval = TRUE, results = 'hide', message = FALSE, warning = FALSE} library(knitr) ``` # About SBGNview is a tool set for visualizing omics data on pathway maps and pathway related data analysis. Pathway is rendered with community standard notation: Systems Biology Graphical Notation ([SBGN](http://sbgn.github.io/sbgn))[@le2009systems]. Given an omics data table and a pathway file (SBGN-ML format with layout information), SBGNview can display omics data as colors on glyphs and output image files. For omics data, SBGNview supports automatic ID mapping of common gene/protein/compound ID types (e.g. Entrez Gene ID, UNIPROT, ChEBI etc.). For pathway files, SBGNview can automatically retrieve SBGN-ML files from common pathway databases (e.g. Reactome, MetaCyc, SMPDB, PANTHER, METACROP etc.). To support visualizing multiple types of data on the same glyph/arc, SBGNview provides extensive options to control glyph and edge features (e.g. color, line width, label color/size etc.). To facilitate pathway based analysis, SBGNview can search for pathways by keywords, extract node information (e.g. gene set, compound set) and highlight shortest path between two nodes etc. # Introduction Molecular pathways have been widely used in omics data analysis. We previously developed an R/BioConductor package called Pathview, which maps, integrates and visualizes omics data onto KEGG pathway graphs[@luo2013pathview]. Since its publication, Pathview has been widely used in numerous omics studies and analysis tools. Here we introduce the SBGNview package, which adopts Systems Biology Graphical Notation (SBGN)[@le2009systems] and greatly extends the Pathview project by supportting multiple major pathway databases besides KEGG. Key features: * Pathway diagram is drawn with SBGN notations Pathway maps use different glyphs shapes to represent different types of molecules (macromolecules or simple chemicals) and different arc shapes to represent different reaction types (consumption or catalysis), collectively, they are called graphical notations. SBGN is a community developed notation standard and has been used by major pathway databases (e.g. Reactome, SMPDB, PANTHER pathways, MetaCrop etc.). For details about SBGN, please check [http://sbgn.github.io/sbgn](http://sbgn.github.io/sbgn) * Supports major pathway databases and user defined pathways. As a community standard, SBGN is adopted by major pathway databases, including Reactome, Panther, PathwayCommons, MetaCrop etc. Therefore, users can use **SBGNview** to visualize and interpret their omics data on any pathways from these databases. In additions, molecular biologists often summarize new discoveries or literature knowledge in pathways, and create their own pathway maps. This can be done in SBGN editing/drawing tools: https://sbgn.github.io/software and the pathways can be saved as SBGN-ML files as input for SBGNview. This makes **SBGNview** much more flexible than existing tools such as Pathview and PaintOmics, which only support KEGG pathways. * Extensive choices for graphical control. Like **Pathview**, **SBGNview** supports multiple samples for each gene/compound. In addition, it provides rich options to control glyph/arc attributes such as line color/width and text size/color/wrapping/positioning etc. This gives users maximal control over the pathway graphs. * Pathway related data extraction and analysis + [Search and automatically download](#searchPathways) pathway files from major databases. Keywords can be pathway names or molecule IDs (gene symbol, compound name, UNIPROT, CHEBI, UNICHEM etc.) + Pathway gene/compound set extraction. The gene sets can be used for gene set enrichment analysis. # Installation ## Install SBGNview Install **SBGNview** through Bioconductor. ```{r install, eval = FALSE} BiocManager::install(c("SBGNview")) ``` # Overview To visualize omics data on SBGN pathway map, we need two inputs: 1. A SBGN-ML file containing the pathway information: nodes,edges and their layout (coordidnates). 2. An omics data table in which rows are genes/compounds and columns are different measurements. The measurements can be any numeric values, such as fold change, abundance etc. Given these two inputs, **SBGNview** will parse SBGN-ML file and render a .svg graph with SBGN notation, then displays omics data of each gene/compound on its corresponding glyph on the SBGN map. Each measured value will be displayed as a color corresponding to the value. When there are multiple samples/measurements for each molecule, nodes are divided into multiple slices correspondingly. The output images can be in SVG, PDF, PNG or PS format. ## A quick example {#quickStart} A quick example to visualize a demo gene expression dataset on pathway "Adrenaline and noradrenaline biosynthesis" and highlight several interesting nodes, edges and path. ```{r, echo = TRUE, eval = TRUE, results = 'hide', message = FALSE, warning = FALSE} # load demo dataset and pathway information of built-in collection of SBGN-ML files. # We use a cancer microarray dataset 'gse16837.d' from package 'pathview' library(SBGNview) data("gse16873.d","pathways.info","sbgn.xmls") # search for pathways with user defined keywords input.pathways <- findPathways("Adrenaline and noradrenaline biosynthesis") # render SBGN pathway graph and output image files SBGNview.obj <- SBGNview( gene.data = gse16873.d[,1:3], gene.id.type = "entrez", input.sbgn = input.pathways$pathway.id, output.file = "quick.start", output.formats = c("png") ) print(SBGNview.obj) ``` Two image files (a .svg file and a .pdf file) will be created in the current working directory: ```{r , echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} list.files( pattern = "quick.start", full.names = TRUE) ``` ```{r quickStartFig, echo = FALSE,fig.cap="\\label{fig:quickStartFig}Quick start example: Adrenaline and noradrenaline biosynthesis pathway. "} include_graphics("quick.start_P00001.svg") ``` [Link to SBGN notation](https://cdn.rawgit.com/sbgn/process-descriptions/b2904462d11bd8d65e9c7a1318d95d468048cb50/templates/PD_L1V1.3.svg) In this example, the [original pathway SBGN-ML file is from pathwayCommons ](http://apps.pathwaycommons.org/pathways?uri=http%3A%2F%2Fidentifiers.org%2Fpanther.pathway%2FP00001) with [improved layout](#ourCollection)(node-edge overlaps are removed by routed edges). We can highlight nodes, edges and path: ```{r, echo = TRUE, eval = TRUE, results = 'hide', message = FALSE, warning = FALSE} outputFile(SBGNview.obj) <- "quick.start.highlight.elements" SBGNview.obj + highlightArcs(class = "production",color = "red") + highlightArcs(class = "consumption",color = "blue") + highlightNodes(node.set = c("tyrosine", "(+-)-epinephrine"), stroke.width = 4, stroke.color = "green") + highlightPath(from.node = "tyrosine", to.node = "dopamine", from.node.color = "green", to.node.color = "blue", shortest.paths.cols = "purple", input.node.stroke.width = 6, path.node.stroke.width = 5, path.node.color = "purple", path.stroke.width = 5, tip.size = 10 ) ``` ```{r quickStartFigModified, echo = FALSE,fig.cap="\\label{fig:quickStartFigModified}Quick start example: Adrenaline and noradrenaline biosynthesis pathway. Highlight nodes and edges."} include_graphics("quick.start.highlight.elements_P00001.svg") ``` The color of consumption arcs and production arcs are set to blue and red, respectively. Tyrosine and epinephrine are highlighted by thicker border (stroke width) and green color. Note that there are four nodes mapped to (+-)-epinephrine. A shortest path from tyrosine to dopamine is highlighted with purple arcs and nodes. The start (Tyrosine) and end (epinephrine) nodes have thicker border and different colors. Since there are multiple dopamines in the map, a random dopamine node is selected. If user wants a specific node, function *changeIds* can help find the node IDs corresponding to input IDs. Then user can run *highlightNodes* and/or *highlightPath* again with "node IDs" instead of "compound name". See [this example](#findNode) for details. # Getting started *SBGNview* is the main function to overylay omics data on SBGN pathway maps. It extracts node and edge data from SBGN-ML file and creates a SBGN graph in SVG format. Then it maps omics data to the glyphs and renders the graph with mapped data as colors. Currently it maps gene/protein omics data to "macromolecule" glyphs and maps compound omics data to "simple chemical" glyphs. Please see the documentation of function *SBGNview* for more details. The *SBGNview* function returns a *SBGNview* object, it contains information necessary to render SBGN graph and can be further modified to change graph features. See [this section](#sbgnviewObj) for more details. ## SBGN pathway file (SBGN-ML) SBGN pathway is defined in a special XML format (SBGN-ML file). It contains information of the pathway content (molecules and their reactions) as well as graph layout information. There are two main types of data in SBGN-ML files: 1. node data (in tag "glyph"), such as node location, width, hight and node class (macromolecule, simple chemical etc.). 2. edge data (in tag "arc"), such as arc class, start node and end node. For more details, see: https://github.com/sbgn/sbgn/wiki/SBGN_ML ### SBGN-ML pathway file from online databases Several online databases provide SBGN-ML files, such as pathwayCommons, Reactome and MetaCrop. They can be downloaded from their webpage or FTP site. ### SBGNview's SBGN-ML file collection{#ourCollection} Many pathways from the above databases don't have desirable layout and often have extensive node-node overlaps and node-edge crossings. Thus we refined the layout and removed node-node overlaps. For node-edge crossings, we computed spline edges to resolve this issue and added additional elements in the SBGN-ML file to encode spline edges. The resulting collection of SBGN-ML files are available in a separate [GitHub repository](https://github.com/datapplab/SBGN-ML.files/tree/master/data/SBGN). **SBGNview** can automatically search in this pathway collection and download the SBGN-ML files. Users can further modify the SBGN-ML files using other tools (e.g. [newt editor](http://newteditor.org/)) for desired node layout. The package used to layout nodes and route spline edges is currently under development and will be released in the near future. #### Information about pre-generated SBGN-ML file collection We can check the information of all pre-generated SBGN-ML files ```{r , echo = TRUE, message = FALSE, warning = FALSE} data("pathways.stat") gse16873 <- gse16873.d[,1:3] input.pathway.ids = input.pathways$pathway.id head(pathways.info) pathways.stat ``` There are two common scenarios of using *SBGNview* * Using [our pre-generated SBGN-ML files](https://github.com/datapplab/SBGNhub/tree/master/data/SBGN.with.stamp) In this scenario, the SBGN-ML file contents are stored in data **sbgn.xmls** (included in package *SBGNview.data*). SBGNview can automatically retrieve SBGN-ML contents from this dataset by pathway IDs. * Using SBGN-ML files from other sources. In this scenario, more parameters and/or ID mapping files (map between omics data IDs and SBGN-ML file glyph IDs) are needed. Please see the documentation of function *SBGNview* for more details. #### Search for pathways by keywords {#searchPathways} SBGNview has several functions to search for pathways by keyword and automatically download SBGN-ML files. ```{r , echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} pathways <- findPathways(c("bile acid","bile salt")) head(pathways) pathways.local.file <- downloadSbgnFile(pathways$pathway.id[1:3]) pathways.local.file ``` By default *findPathways* searches for keywords in pathway names. It can also search by different ID types ```{r , echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} pathways <- findPathways(c("tp53","Trp53"),keyword.type = "SYMBOL") head(pathways) pathways <- findPathways(c("K04451","K10136"),keyword.type = "KO") head(pathways) ``` #### Different layout for the same pathway Researchers may have different tastes for a "good looking" layout. We have created differen layouts for each pathway. User can download them from [pre-generated SBGN-ML files](https://github.com/datapplab/SBGNhub/tree/master/data/SBGN.with.stamp) and [try](#tryDifferentLayout). ### User customized SBGN-ML file We can also create a SBGN-ML file from scratch. Several tools like Newt editor (http://newteditor.org/) can let the user draw a pathway diagram and save it as SBGN-ML file. The tools may also able to generate a primitive pathway layout. But these layouts often have too many node-node overlaps and edge-node crossings. Therefore, we recommend the user to use our SBGN-ML file collection [mentioned above](#ourCollection), which have been optimized to solve these problems. ## Omics data SBGNview can visualize a range of omics data, including both gene (or transcript, protein, enzyme) data and compound (or metabolite, chemical, small molecules) data. ### Gene expression data Gene/protein related data will be mapped to "macromolecule" nodes on a SBGN map. ### Chemical compound data Chemical compound data will be mapped to "simple chemical" nodes on a SBGN map. Here we simulate a compound dataset. ```{r , echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} cpd.data <- sim.mol.data(mol.type = "cpd", id.type = "KEGG COMPOUND accession", nmol = 50000, nexp = 2) head(cpd.data) ``` ## Visualize gene data Most of the SBGN-ML files from online resources have their unique ID types and it is different from the ID type in the omics data. If the ID types are different, we need to map the omics IDs to the node IDs in the SBGN-ML file. In the [quick start example](#quickStart) ("Adrenaline and noradrenaline biosynthesis" pathway), the SBGN-ML file uses pathwayCommons IDs for gene/protein nodes, whereas the omics dataset uses Entrez gene IDs. The function **SBGNview** can automatically map common ID types such as ENTREZ, UniProt etc. to nodes in our [pre-generated SBGN-ML files](https://github.com/datapplab/SBGN-ML.files/tree/master/data/SBGN) as shown in the [quick start example](#quickStart). We can also do it manually using function *changeDataId*, which is called by *SBGNview* to do ID mapping. Supprted ID type pairs can be found in *data(mapped.ids)*. *changeDataId* uses [pre-generated mapping tables](https://github.com/datapplab/SBGN-ML.files/tree/master/data/id.mapping) or **pathview** to do the mapping. If the input-output ID type pair is not in *data(mapped.ids)* or can't be mapped by**pathview**, user needs to provide the mapping table explicitly using the "id.mapping.table" argument. Let's change the IDs in the gene expression omics data. ```{r , echo = TRUE, message = FALSE, warning = FALSE} gene.data <- gse16873 head(gene.data[,1:2]) ``` ```{r , echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} gene.data <- changeDataId(data.input.id = gene.data, input.type = "entrez", output.type = "pathwayCommons", cpd.or.gene = "gene", sum.method = "sum" ) ``` ```{r , echo = TRUE, message = FALSE, warning = FALSE} head(gene.data[,1:2]) ``` Now we run *SBGNview*, the main function to overlay omics data on SBGN map. ```{r , echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} SBGNview.obj <- SBGNview( gene.data = gene.data, input.sbgn = "P00001", output.file = "test_output", gene.id.type = "pathwayCommons", output.formats = c("svg") ) SBGNview.obj ``` By default **SBGNview** will generate a .svg file. Other formats can be added also. In this example, three additional files (pdf, ps, png) will be created in the same folder. ```{r figGeneData, echo = FALSE,fig.cap="\\label{fig:figGeneData}Visualization of gene expression data."} include_graphics("test_output_P00001.svg") ``` ## Visualize both gene data and compound data Here for demo purpose, we change the kegg compound IDs to pathwayCommons compound IDs. Although *SBGNview* can do this automatically (e.g. in the [quick start example](#quickStart)). ```{r , echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} cpd.data <- changeDataId(data.input.id = cpd.data, input.type = "kegg.ligand", output.type = "pathwayCommons", cpd.or.gene = "compound", sum.method = "sum" ) head(cpd.data) ``` Now we can visualize both gene and compound data. In this example, we use the original gene expression data with "ENTREZ" IDs to show SBGNview's automatic ID mapping ability. ```{r , echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} SBGNview.obj <- SBGNview( gene.data = gse16873, cpd.data = cpd.data, input.sbgn = "P00001", output.file = "test_output.gene.compound", gene.id.type = "entrez", cpd.id.type = "pathwayCommons", output.formats = c("svg") ) SBGNview.obj ``` ```{r figGeneAndCpdData, echo = FALSE,fig.cap="\\label{fig:figGeneAndCpdData}Visualization of both gene expression and compound abundance data."} include_graphics("test_output.gene.compound_P00001.svg") ``` ## About *SBGNview* object {#sbgnviewObj} **SBGNview** operates in a way similar to **ggplot2**: The main function *SBGNview* returns a *SBGNview* object (similar to the *ggplot* object "p" returned by function *ggplot* in **ggplot2**), which contains all information needed to render SBGN graph, including output file path. Printing this object will render the graph and write output image files. *SBGNview* object can be further modified by several built-in functions to highlight nodes/edges/paths (e.g. *SBGNview.obj*+*highlightNodes()*, similar to *p*+*geom_boxplot()* in **ggplot2**). * These operations will generate plot files: + *SBGNview(...) + highlightNodes(...)* or *SBGNview.obj + highlightNodes(...)* How it works: The functions will return a *SBGNview* object to R console. The returned object is executed as a top-level R expression, thus will be implicitly printed using a *print.SBGNview* function in **SBGNview** package. For more details, please see the documentaion of function *print.SBGNview*. + run *SBGNview.obj* in R console The mechanism is the same as above. The object run in R console is implicitly printed. + *print(SBGNview.obj)* In this case the "print.SBGNview" function is run explicitly. + *for (i in 1:2) {print(SBGNview.obj)}* Same as above: the "print.SBGNview" function is run explicitly. * These commands will NOT generate plot files: + *SBGNview.obj = SBGNview(...)+highlightNodes(...)* In this case, the assign operation "=" made the returned object invisible thus not printed + *for (i in 1:2) {SBGNview.obj}* In this case SBGNview.obj is no longer a top-level R expression thus won't be implicitly printed. ### Structure of *SBGNview* object ```{r , echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} result.one.sbgn <- SBGNview.obj$data[[1]] names(result.one.sbgn) glyphs <- result.one.sbgn$glyphs.list arcs <- result.one.sbgn$arcs.list str(glyphs[[1]]) str(arcs[[1]]) ``` ### Change output file in a *SBGNview* object We can change the output file using built-in function *outputFile*: ```{r , echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} outputFile(SBGNview.obj) outputFile(SBGNview.obj) <- "test.change.output.file" outputFile(SBGNview.obj) SBGNview.obj outputFile(SBGNview.obj) <- "test.print" outputFile(SBGNview.obj) print(SBGNview.obj) ``` # Try different layout for the same pathway.{#tryDifferentLayout} If the default layout is not ideal, users have two options: * Modify the layout manually using tools like [newt editor](http://newteditor.org/) * Download pre-generated SBGN-ML files with different layout. https://github.com/datapplab/SBGN-ML.files/tree/master/data/SBGN ```{r , echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} download.file("https://raw.githubusercontent.com/datapplab/SBGNhub/master/data/SBGN.with.stamp/pathwayCommons/http___identifiers.org_panther.pathway_P00001.1.sbgn",destfile = "P00001.new.layout.sbgn") SBGNview( gene.data = gse16873, gene.id.type = "entrez", input.sbgn = "P00001.new.layout.sbgn", sbgn.gene.id.type = "pathwayCommons", output.file = "test.different.layout", output.formats = c("svg") ) ``` ```{r differentLayout, echo = FALSE,fig.cap="\\label{fig:differentLayoutFig}Graph with different layout."} include_graphics("test.different.layout_P00001.new.layout.sbgn.svg") ``` # Modify graph elements It is useful to highlight interesting nodes, edges or paths in a pathway map. This can be done by modifying the *SBGNview* object, which contains all information needed to render a SBGN map. ## Built-in functions Like ggplot2, the *SBGNview* object can be further modified by concatenating it with modification functions using binary operator *+* (see [quick start](#quickStart) for example). ## Hightlight nodes ### Highlight all nodes ```{r , echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} highlight.all.nodes.sbgn.obj <- SBGNview.obj + highlightNodes( # Here we set argument "node.set" to select all nodes node.set = "all", stroke.width = 4, stroke.color = "green") outputFile(highlight.all.nodes.sbgn.obj) = "highlight.all.nodes" print(highlight.all.nodes.sbgn.obj) ``` ```{r highlightAllNodes, echo = FALSE,fig.cap="\\label{fig:highlightAllNodes}Highlight all nodes."} include_graphics("highlight.all.nodes_P00001.svg") ``` ### Highlight nodes by class ```{r , echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} highlight.macromolecule.sbgn.obj <- SBGNview.obj + highlightNodes( # Here we set argument "select.glyph.class" to select macromolecule nodes select.glyph.class = "macromolecule", stroke.width = 4, stroke.color = "green") outputFile(highlight.macromolecule.sbgn.obj) = "highlight.macromolecule" print(highlight.macromolecule.sbgn.obj) ``` ```{r highlightMacromolecule, echo = FALSE,fig.cap="\\label{fig:highlightMacromolecule}Highlight macromolecule nodes."} include_graphics("highlight.macromolecule_P00001.svg") ``` ### Show node IDs instead of node labels ```{r , echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} highlight.all.nodes.sbgn.obj <- SBGNview.obj + highlightNodes( node.set = "(+-)-epinephrine", stroke.width = 4, stroke.color = "green", # Here we set argument "show.glyph.id" to display node ID instead of the original label. show.glyph.id = TRUE, label.font.size = 10) outputFile(highlight.all.nodes.sbgn.obj) = "highlight.all.id.nodes" print(highlight.all.nodes.sbgn.obj) ``` ```{r highlightNodes, echo = FALSE,fig.cap="\\label{fig:highlightNodes}Highlight nodes using node IDs."} include_graphics("highlight.all.id.nodes_P00001.svg") ``` ## Adjust node labels. ### Label position, font size, color, change labels The function *highlightNodes* also can be used to adjust labels. In this example, we move the label horizontally and vertically, change their color and font size. ```{r, echo = TRUE, eval = TRUE, results = 'hide', message = FALSE, warning = FALSE} my.labels <- c("Tyr","epinephrine") names(my.labels) <- c("tyrosine", "(+-)-epinephrine") SBGNview.obj.adjust.label <- SBGNview.obj + highlightNodes( node.set = c("tyrosine", "(+-)-epinephrine"), stroke.width = 4, stroke.color = "green", label.x.shift = 0, # Labels are moved up a little bit label.y.shift = -20, label.color = "red", label.font.size = 30, label.spliting.string = "", # node labels can be customized by a named vector. The names of the vector is the IDs assigned to argument "node.set". Values of the vector are the new labels for display. labels = my.labels) outputFile(SBGNview.obj.adjust.label) <- "adjust.label" print(SBGNview.obj.adjust.label) ``` ```{r changeLabel, echo = FALSE,fig.cap="\\label{fig:changeLabel}Modify node labels."} include_graphics("adjust.label_P00001.svg") ``` ### Label text wrapping into multiple lines Some nodes may have long labels thus overlap with surrounding graph elements. In this case we can set the parameter *label.spliting.string* to "any" so the label will be wrapped in multiple lines that fit the width of the node. ```{r, echo = TRUE, eval = TRUE, results = 'hide', message = FALSE, warning = FALSE} SBGNview.obj.change.label.wrapping <- SBGNview.obj + highlightNodes( node.set = c("tyrosine", "(+-)-epinephrine"), stroke.width = 4, stroke.color = "green", show.glyph.id = TRUE, label.x.shift = 10,label.y.shift = 20,label.color = "red", label.font.size = 10,label.spliting.string = "any") outputFile(SBGNview.obj.change.label.wrapping) = "change.label.wrapping" print(SBGNview.obj.change.label.wrapping) ``` ```{r changeWrapping, echo = FALSE,fig.cap="\\label{fig:changeWrapping}Change how labels are wrapped."} include_graphics("change.label.wrapping_P00001.svg") ``` ## When one input ID maps to multiple nodes {#findNode} In the example above, we saw that one input ID (e.g. "(+-)-epinephrine") can be mapped to multiple nodes in the graph. If we just want to focus on several particular ones, we can use function *highlightNodes* to find the node IDs, which is unique to each node: ```{r, echo = TRUE, eval = TRUE, results = 'hide', message = FALSE, warning = FALSE} test.show.glyph.id <- SBGNview.obj+ highlightNodes( node.set = c("tyrosine", "(+-)-epinephrine"), stroke.width = 4, stroke.color = "green", show.glyph.id = TRUE, label.x.shift = 10,label.y.shift = 20,label.color = "red", label.font.size = 10, # When "label.spliting.string" is set to a string that is not in the label (including an empty string ""), the label will not be wrapped into multiple lines. label.spliting.string = "") outputFile(test.show.glyph.id) <- "test.show.glyph.id" print(test.show.glyph.id) ``` ```{r displayNodeIds, echo = FALSE,fig.cap="\\label{fig:displayNodeIds}Show node IDs of mapped nodes."} include_graphics("test.show.glyph.id_P00001.svg") ``` We can find the mapping between input IDs and node IDs: ```{r, echo = TRUE, results = 'hide', eval = TRUE, message = FALSE, warning = FALSE} mapping <- changeIds(input.ids = c("tyrosine", "(+-)-epinephrine"), input.type = "CompoundName", output.type = "pathwayCommons", cpd.or.gene = "compound", limit.to.pathways = input.pathway.ids[1] ) ``` ```{r, echo = TRUE, eval = TRUE, message = FALSE, warning = FALSE} mapping ``` We can pick two nodes to highlight and find a shortest path between them. ```{r, echo =TRUE} outputFile(SBGNview.obj) <- "highlight.by.node.id" ``` ```{r, echo = TRUE, eval = TRUE, results = 'hide', message = FALSE, warning = FALSE} SBGNview.obj+ highlightNodes(node.set = c("tyrosine", "(+-)-epinephrine"), stroke.width = 4, stroke.color = "red") + highlightPath(from.node = "SmallMolecule_96737c854fd379b17cb3b7715570b733", to.node = "SmallMolecule_7753c3822ee83d806156d21648c931e6", node.set.id.type = "pathwayCommons", from.node.color = "green", to.node.color = "blue", shortest.paths.cols = c("purple"), input.node.stroke.width = 6, path.node.stroke.width = 3, path.node.color = "purple", path.stroke.width = 5, tip.size = 10) ``` ```{r highlightNodesById, echo = FALSE,fig.cap="\\label{fig:highlightNodesById}Highlight nodes and shortest path using node IDs."} include_graphics("highlight.by.node.id_P00001.svg") ``` ## Modify *SBGNview* object directly More graph features can be controlled by directly modifing the *SBGNview* object. ```{r , echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} result.one.sbgn <- SBGNview.obj$data[[1]] names(result.one.sbgn) glyphs <- result.one.sbgn$glyphs.list arcs <- result.one.sbgn$arcs.list str(glyphs[[1]]) str(arcs[[1]]) ``` # Retrieve pathway related information{#extractInformation} ## Extract node information Node information can be extracted using function *sbgnNodes*. ```{r , echo = TRUE, message = FALSE,results='hide', warning=FALSE} node.info <- sbgnNodes(input.sbgn = c("P00001","P00002"), output.gene.id.type = "SYMBOL", output.cpd.id.type = "chebi", species = "hsa" ) ``` The returned list contains information about all nodes in the SBGN-ML file. ```{r, echo = TRUE} head(node.info[[1]]) ``` For example, the complex membership information can be retrieved by accessing the "complex" element. Macromolecules are represented by gene symbols. Simple chemicals are represented by ChEBI IDs (e.g. 33568). When there are multiple IDs of output type match the same node in SBGN-ML file, the target IDs are concatenated by "; ". In the following example, complex with ID "Complex_4e65cdd554d14679587b7822e6426705" has two members: 1. a protein (symbol Slc18A2 etc.) and 2. a simple chemical (ChEBI 33568) ```{r, echo = TRUE} ``` # ID mapping SBGNview can automatically map common ID types to SBGN-ML glyphs in our [pre-generated SBGN-ML files](https://github.com/datapplab/SBGN-ML.files/tree/master/data/SBGN). Supported ID types can be accessed as follow: ```{r , echo = TRUE, message = FALSE, warning = FALSE} data("mapped.ids") ``` ## Map between two types of IDs Besides *changeDataId* which changes ID for omics data, SBGNview provides functions to map between different types IDs: ```{r , echo = TRUE, results = 'hide',message = FALSE, warning = FALSE } mapping <- changeIds( input.ids = c("tyrosine", "(+-)-epinephrine"), input.type = "CompoundName", output.type = "pathwayCommons", cpd.or.gene = "compound", limit.to.pathways = "P00001" ) ``` ```{r , echo = TRUE,message = FALSE, warning = FALSE } head(mapping) ``` ```{r , echo = TRUE, results = 'hide',message = FALSE, warning = FALSE } mapping <- changeIds( input.ids = c("tyrosine", "(+-)-epinephrine"), input.type = "CompoundName", output.type = "chebi", cpd.or.gene = "compound" ) ``` ```{r , echo = TRUE,message = FALSE, warning = FALSE } head(mapping) ``` ## Re-use downloaded ID mapping tables SBGNview has generated pairwise ID mapping tables (between various gene/compound ID types and pathway glyph IDs and pathway IDs) for the pre-collected SBGN-ML files. SBGNview automatically downloads these mapping tables into a folder specified by parameter "SBGNview.data.folder", if the file is not in that folder. Therefore, user can retain the downloaded files and specify "SBGNview.data.folder" to re-use the downloaded ID mapping files. The default SBGNview.data.folder is "SBGNview.tmp.data" in the working directory. In the following example, we set "SBGNview.tmp.data" so SBGNview doesn't need to download the ID mapping table again. ```{r , echo = TRUE, results = 'hide',message = FALSE, warning = FALSE } mapping <- changeIds( input.ids = c("tyrosine"), input.type = "CompoundName", output.type = "chebi", cpd.or.gene = "compound", SBGNview.data.folder = "./SBGNview.tmp.data" ) ``` ```{r , echo = TRUE,message = FALSE, warning = FALSE } head(mapping) ``` ## Extract molecule list from pathways {#extractList} ```{r , echo = TRUE, results='hide', message = FALSE, warning = FALSE } mol.list <- getMolList( database = "metacrop" ,mol.list.ID.type = "ENZYME" ,org = "ath" ,output.pathway.name = FALSE ,truncate.name.length = 50 ) ``` ```{r , echo = TRUE, message = FALSE, warning = FALSE } mol.list[[1]] ``` ```{r , echo = TRUE, results='hide', message = FALSE, warning = FALSE } mol.list <- getMolList( database = "pathwayCommons", mol.list.ID.type = "ENTREZID", org = "hsa" ) ``` ```{r , echo = TRUE, message = FALSE, warning = FALSE } mol.list[[1]] ``` ```{r , echo = TRUE, results='hide', message = FALSE, warning = FALSE } mol.list <- getMolList( database = "pathwayCommons", mol.list.ID.type = "ENTREZID", org = "mmu" ) ``` ```{r , echo = TRUE, message = FALSE, warning = FALSE } mol.list[[2]] ``` ```{r , echo = TRUE, results='hide', message = FALSE, warning = FALSE } mol.list <- getMolList( database = "MetaCyc", mol.list.ID.type = "KO", org = "eco" ) ``` ```{r , echo = TRUE, message = FALSE, warning = FALSE } mol.list[[2]] ``` ```{r , echo = TRUE, results='hide', message = FALSE, warning = FALSE } mol.list <- getMolList( database = "pathwayCommons", mol.list.ID.type = "chebi", cpd.or.gene = "compound" ) ``` ```{r , echo = TRUE, message = FALSE, warning = FALSE } mol.list[[2]] ``` # Example using selected database ## Use Reactome pathway database ```{r , echo = TRUE,results = 'hide', message = FALSE, warning = FALSE} is.reactome <- pathways.info[,"sub.database"]== "reactome" reactome.ids <- pathways.info[is.reactome ,"pathway.id"] SBGNview.obj <- SBGNview( gene.data = gse16873, gene.id.type = "entrez", input.sbgn = reactome.ids[1:2], output.file = "demo.reactome", output.formats = c("svg") ) SBGNview.obj ``` ## Use MetaCrop pathway database ```{r , echo = TRUE,results = 'hide', message = FALSE, warning = FALSE} is.metacrop <- pathways.info[,"sub.database"]== "MetaCrop" metacrop.ids <- pathways.info[is.metacrop ,"pathway.id"] SBGNview.obj <- SBGNview( gene.data = c(), input.sbgn = metacrop.ids[1:2], output.file = "demo.metacrop", output.formats = c("svg") ) SBGNview.obj ``` # Test SBGN reference cards ```{r , echo = TRUE,results = 'hide', message = FALSE, warning = FALSE} downloadSbgnFile(c("AF_Reference_Card.sbgn" ,"PD_Reference_Card.sbgn" ,"ER_Reference_Card.sbgn" )) SBGNview.obj <- SBGNview( gene.data = c() ,input.sbgn = c("AF_Reference_Card.sbgn" ,"PD_Reference_Card.sbgn" ,"ER_Reference_Card.sbgn" ) ,sbgn.gene.id.type ="glyph" ,output.file = "./test.refcards" ,output.formats = c("pdf") ,font.size = 1 ,logic.node.font.scale = 10 ,status.node.font.scale = 10 ) SBGNview.obj ``` # FAQs ## Color key ### Turn off color key ```{r , eval = FALSE, echo = TRUE,results = 'hide', message = FALSE, warning = FALSE} # Not run! SBGNview( key.pos = "none" ) ``` # References # Session Info ```{r} sessionInfo() ```