%\VignetteIndexEntry{cisPath} %\VignetteKeywords{Visualizing and managing models of protein-protein interaction (PPI) networks} %\VignettePackage{cisPath} \documentclass[11pt]{article} \usepackage{Sweave} \usepackage{amsmath} \usepackage{hyperref} \usepackage[authoryear,round]{natbib} \setlength{\textheight}{8.5in} \setlength{\textwidth}{6in} \setlength{\topmargin}{-0.25in} \setlength{\oddsidemargin}{0.25in} \setlength{\evensidemargin}{0.25in} \newcommand{\Rpackage}[1]{{\textit{#1}}} \begin{document} \title{\bf How to use the cisPath Package} \author{Likun Wang} \maketitle \noindent Institute of Systems Biomedicine, Peking University Health Science Center.\\ \noindent \begin{center} {\tt wanglk@hsc.pku.edu.cn} \end{center} \tableofcontents %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction} This package is used to manage, visualize and edit protein--protein interaction (PPI) networks. Results can be shown visually and managed with different browsers across different platforms. The network is displayed as a {\tt force-directed graph} (\url{http://bl.ocks.org/4062045}) with JavaScript library {\tt D3} (\url{www.d3js.org}). Both mouse and touch screen users can view and edit the network graph easily. The HTML file follows {\tt HTML 4.01 Strict} and {\tt CSS version 3} standards to maintain consistency across different browsers. Chrome, Firefox, Safari, and IE9 will all properly display the PPI view. Please contact us if these paths do not display correctly. \section{Data} As an example, we have generated PPI data for several species from the PINA database \citep{Cowley12, Wu09}, the iRefIndex database\citep{Razick08, Aranda11}, and the STRING database \citep{Szklarczyk11, Franceschini13}. Users can download these files from \url{http://www.isb.pku.edu.cn/cisPath/}. If you make use of these files, please cite PINA, iRefIndex, or STRING accordingly. Users can edit the PPI data downloaded from these two databases, or combine them with their private data to construct more comprehensive PPI networks. In this introduction, we select only a small portion of the available PPI data as an example. An ID mapping file is also provided in this package, which was generated according to data from the UniProt database \citep{UniProt12}. The following examples (section 4 below) will show how these files may be used. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Getting started} To load the \Rpackage{cisPath} package, type {\tt library(cisPath)}. Five methods have been included in this package to format the files downloaded from the PINA, iRefIndex, and STRING databases. These methods are {\tt getMappingFile}, {\tt formatSTRINGPPI}, {\tt formatSIFfile}, {\tt formatPINAPPI}, {\tt formatiRefIndex}, and {\tt combinePPI}. Four other methods have been also provided that are used to manage, visualize, and edit PPI networks. These methods are {\tt cisPath}, {\tt networkView}, and {\tt easyEditor}. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Format PPI data} \subsection{Format PPI data downloaded from the STRING database} The method {\tt formatSTRINGPPI} is used to format the PPI file which is downloaded from the STRING database. Before formatting the PPI data, users should generate the identifier mapping file with the method {\tt getMappingFile}. The input file for {\tt getMappingFile} is downloaded from the UniProt (\url{http://www.uniprot.org/}) database. Please find the URLs for different species by typing {\tt $?$getMappingFile}. The output file contains identifier mapping information which is necessary for the method {\tt formatSTRINGPPI}. Each line contains both the Ensembl Genomes Protein identifier and the Swiss-Prot accession number for a given protein. \begin{center} <>= library(cisPath) sprotFile <- system.file("extdata", "uniprot_sprot_human10.dat", package="cisPath") mappingFile <- file.path(tempdir(), "mappingFile.txt") getMappingFile(sprotFile, output=mappingFile) @ \end{center} The input file for {\tt formatSTRINGPPI} is downloaded from the STRING database (\url{http://string-db.org/}). The URL for this file is \url{http://string-db.org/newstring\_download/protein.links.v9.1/9606.protein.links.v9.1.txt.gz}. \begin{center} <>= STRINGPPI <- file.path(tempdir(), "STRINGPPI.txt") fileFromSTRING <- system.file("extdata", "protein.links.txt", package="cisPath") formatSTRINGPPI(fileFromSTRING, mappingFile, "9606", output=STRINGPPI, 700) @ \end{center} Each line of the output file contains Swiss-Prot accession numbers and gene names for two interacting proteins. An edge value has been estimated for each link between two interacting proteins. This value is defined as {\tt max(1,log(1000-STRING\_SCORE,100))}. This may be treated as the ``cost'' of determining the shortest paths between two proteins. Advanced users can edit the file and change the edge values for each edge. \subsection{Format PPI data downloaded from the PINA2 database (SIF format)} The input file is downloaded from the PINA2 database (\url{http://cbg.garvan.unsw.edu.au/pina/}). Access \url{http://cbg.garvan.unsw.edu.au/pina/interactome.stat.do} to download PPI files with the {\tt SIF format} for different species. \begin{center} <>= input <- system.file("extdata", "Homo_sapiens_PINA2.sif", package="cisPath") PINA2PPI <- file.path(tempdir(), "PINA2PPI.txt") formatSIFfile(input, mappingFile, output=PINA2PPI) @ \end{center} Each line of the output file contains Swiss-Prot accession numbers and gene names for two interacting proteins. The edge value for each link between two interacting proteins is assigned as {\tt 1}. This may be treated as the ``cost'' of determining the shortest paths between two proteins. Advanced users can edit the file and change the edge values for each edge. \subsection{Format PPI data downloaded from the PINA database (MITAB format)} The input file is downloaded from the PINA database (\url{http://cbg.garvan.unsw.edu.au/pina/}). Access \url{http://cbg.garvan.unsw.edu.au/pina/interactome.stat.do} to download PPI files with the {\tt MITAB format} for different species. \begin{center} <>= input <- system.file("extdata", "Homo_sapiens_PINA100.txt", package="cisPath") PINAPPI <- file.path(tempdir(), "PINAPPI.txt") formatPINAPPI(input, output=PINAPPI) @ \end{center} Each line of the output file contains Swiss-Prot accession numbers and gene names for two interacting proteins. The edge value for each link between two interacting proteins is assigned as {\tt 1}. This may be treated as the ``cost'' of determining the shortest paths between two proteins. Advanced users can edit the file and change the edge values for each edge. \subsection{Format PPI data downloaded from the iRefIndex database} The input file is downloaded from the iRefIndex database (\url{http://www.irefindex.org/wiki/}). Access \url{http://irefindex.org/download/irefindex/data/archive/release_13.0/psi_mitab/MITAB2.6/} to download PPI files with the {\tt MITAB2.6 format} for different species. \begin{center} <>= input <- system.file("extdata", "9606.mitab.100.txt", package="cisPath") iRefIndex <- file.path(tempdir(), "iRefIndex.txt") formatiRefIndex(input, output=iRefIndex) @ \end{center} Each line of the output file contains Swiss-Prot accession numbers and gene names for two interacting proteins. The edge value for each link between two interacting proteins is assigned as {\tt 1}. This may be treated as the ``cost'' of determining the shortest paths between two proteins. Advanced users can edit the file and change the edge values for each edge. \subsection{Combine PPI data from different databases} The method {\tt combinePPI} is used to combine the PPIs that were generated from different databases. \begin{center} <>= output <- file.path(tempdir(), "allPPI.txt") combinePPI(c(PINA2PPI, iRefIndex, STRINGPPI), output) @ \end{center} We have processed the PPI data from the PINA, iRefIndex, and STRING databases. Users can access these files from our website \url{http://www.isb.pku.edu.cn/cispath/}. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Examples} \subsection{Example of the networkView method} This method is used to visualize the input proteins in the PPI network and display the evidence that supports the specific interactions among these proteins. If the PPI information downloaded from PINA and/or STRING is used, the PubMed IDs for the corresponding publications and/or STRING scores will also be shown. If the PubMed ID is provided, users can access the publication directly using the link. Occasionally, two given proteins cannot interact directly but their interaction can be mediated by scaffold proteins. This tool therefore also displays other proteins that can interact with at least two of the main proteins. In addition, using this method, users can choose the visual style (color and model size) in which the proteins will be displayed in the network. \begin{center} <>= infoFile <- system.file("extdata", "PPI_Info.txt", package="cisPath") outputDir <- file.path(tempdir(), "networkView") networkView(infoFile, c("MAGI1","TP53BP2","TP53", "PTEN"), outputDir, FALSE, c(1,1,1,0), displayMore=TRUE) @ \end{center} \begin{figure} \centering \includegraphics{networkView.jpg} \caption{networkView result} \label{fig1} \end{figure} Figure \ref{fig1} is the screenshot of the HTML file which will open automatically. In this example, the selected proteins are TP53, TP53BP2, MAGI1, and PTEN. The first three proteins are assigned as the main nodes and PTEN is treated as a leaf node. The three main nodes are displayed as bigger circles filled in with default colors. The links between these main nodes are highlighted with bold lines. The leaf node (PTEN) will be displayed as a smaller circle in another default color. All node colors can be customized by the parameter {\tt nodeColors} in this method. The other proteins which can interact with at least two given main proteins are presented as leaf nodes. The color of these leaf nodes can be customized by the parameter {\tt leafColor}. The evidence which supports the functional interactions between these main proteins is shown in the table below the graph. \begin{center} <>= infoFile <- system.file("extdata", "PPI_Info.txt", package="cisPath") outputDir <- file.path(tempdir(), "networkView2") inputFile <- system.file("extdata", "networkView.txt", package="cisPath") rt <- read.table(inputFile, sep=",", comment.char="", header=TRUE) proteins <- as.vector(rt[,1]) sizes <- as.vector(rt[,2]) colors <- as.vector(rt[,3]) networkView(infoFile, proteins, outputDir, FALSE, sizes, colors, FALSE) @ \end{center} \noindent \subsection{Example of the cisPath method} This method is used to identify and visualize the shortest functional paths between two given proteins in the PPI network. The file containing PPI information with edge cost for each two interacting proteins is used as input for this function. To identify the paths utilizing the shortest number of steps instead of reflecting minimal cost, the parameter {\tt byStep} should be set as {\tt TRUE}. In this situation, all the edge cost will be assigned as {\tt 1}. Setting this parameter as {\tt TRUE} allows users to view more of the possible paths between two proteins. Please see the file {\tt PPI\_Info.txt} in this package for the input file format. Taking the protein {\tt TP53} as an example, input of the following codes in R will result in output of all the shortest paths to the target proteins {\tt MAGI1} and {\tt GH1}. \begin{center} <>= infoFile <- system.file("extdata", "PPI_Info.txt", package="cisPath") outputDir <- file.path(tempdir(), "TP53_example1") results <- cisPath(infoFile, outputDir, "TP53", c("MAGI1", "GH1"), byStep=TRUE) @ \end{center} \begin{figure} \centering \includegraphics{cisPathOutput.jpg} \caption{cisPath result} \label{fig2} \end{figure} The HTML file will open automatically. Figure \ref{fig2} is a screenshot of this file. Users can query the path to the target protein by input of the gene name or Swiss-Prot accession number. All of the shortest paths will be shown in the table at the bottom. Upon input, a randomly selected shortest path will be shown graphically. This shortest path will be highlighted by a bold black line. The other proteins which can interact with at least two proteins which lie on the shortest path are also displayed as leaf nodes. The colors of the main nodes and leaf nodes in this graph can be customized by the parameters {\tt nodeColors} and {\tt leafColor} respectively in the method {\tt cisPath}. Users can change the view of this graph by dragging a node. When the {\tt Release graph} button is clicked, the nodes will find their appropriate positions automatically. The evidence for the interactions between the proteins along the shortest path is shown in the table below the graph. By clicking the {\tt View Graph} link in the table, users can select the path to be shown graphically. In order to avoid inputting invalid target protein names, the unique identifier Swiss-Prot accession numbers may alternatively be used as input. The Swiss-Prot accession numbers can be sought from the UniProt (\url{http://www.uniprot.org/}) database. <>= infoFile <- system.file("extdata", "PPI_Info.txt", package="cisPath") outputDir <- file.path(tempdir(), "TP53_example2") results <- cisPath(infoFile, outputDir, "TP53", targetProteins=c("Q96QZ7", "P01241"), swissProtID=TRUE) results @ This method may be run without input of target proteins. If target proteins are not provided, this method will identify the shortest paths from the source protein to all the other relevant proteins. Users can thus query the results with a browser upon finding an interesting ``target'' protein without launching R. <>= infoFile <- system.file("extdata", "PPI_Info.txt", package="cisPath") outputDir <- file.path(tempdir(), "TP53_example3") results <- cisPath(infoFile, outputDir, "TP53") results @ This method may be run without input of source protein. If source protein is not provided, this method will open an HTML web page. Users can thus query the results with a browser upon finding an interesting ``source'' protein without launching R. <>= infoFile <- system.file("extdata", "PPI_Info.txt", package="cisPath") outputDir <- file.path(tempdir(), "cisPathResult") results <- cisPath(infoFile, outputDir) @ \noindent \subsection{Example of downloading PPI data from our website} The following codes will first download the PPI data that was generated for this example, and then will identify the shortest paths from {\tt TP53} to all other proteins. With combined PPI data from PINA, iRefIndex, and STRING, the output from a single source protein is about {\tt 1.5G} in size. Although the output in such cases is large, we strongly suggest users give this method of selecting a source but not a target a try. \begin{center} <>= outputDir <- "/home/user/TP53" # Create the output directory dir.create(outputDir, showWarnings = FALSE, recursive = TRUE) # infoFile: site where the PPI data file will be saved. infoFile <- file.path(outputDir, "Homo_sapiens_PPI.txt") # Download PPI data url <- "http://www.isb.pku.edu.cn/cispath/data/Homo_sapiens_PPI.txt" download.file(url, infoFile) results <- cisPath(infoFile, outputDir, "TP53", byStep=TRUE) outputDir <- "/home/user/cisPathWeb" results <- cisPath(infoFile, outputDir) @ \end{center} \noindent \subsection{Example of the easyEditor method} The output HTML file of this method is a editor for network graphs. Users can draw and edit their own network with this editor. \begin{center} <>= outputDir <- file.path(tempdir(), "easyEditor") easyEditor(outputDir) @ \end{center} \noindent %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\newpage \bibliographystyle{apalike} \bibliography{cisPath} \end{document}