\title{Synthetic Genetic Interaction in Yeast genes} \author{N. LeMeur and Z. Jiang} \begin{document} \maketitle \section{Introduction} Synthetic genetic interactions experiments are now being conduct to better understand cellular interactions. The generated data have already proven to be extremely valuable \citep{Davierwala2005, Tong2004, Zhao2005}. Synthetic lethality especially defines a genetic interaction were the combination of mutations in two or more genes leads to cell death. The implications of synthetic lethal screens have been discussed in the context of drug development as synthetic lethal pairs could be used to selectively kill cancer cells, but leave normal cells relatively unharmed. In this package, we propose statistical and computational tools for a systems biology approach in analyzing synthetic genetic interactions. Currently, our methods can be used to find relationships between synthetic genetic interactions and cellular organizational units such multi-protein complexes or sequence motifs. \section{Synthetic genetic interaction data} Several synthetic genetic datasets are now publicly available. In this package we currently propose 6 datasets: \begin{itemize} \item \cite{Tong2004} Systematic genetic analysis with ordered arrays of yeast deletion \item \cite{Pan2006} DNA integrity experiment in \textit{S. cerevisiae} \item \cite{measday2005sys} Systematic yeast synthetic lethal and synthetic dosage lethal screens identify genes required for chromosome segregation. \item \cite{schuldiner2005efa} Genetic Interaction Data (EMAP) from the yeast early secretory pathway \item \cite{Collins2007} Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map. \item \cite{Chervitz} Synthetic genetic interaction data from as recorded by the Saccharomyces Genome database in January, 2007. \end{itemize} In this package and as reported by most authors, we use the terms \textit{query genes} for the genes that are specifically tested by the experimenter and \textit{array genes} for the target genes usually spotted on a array (\textit{e.g.,} SGA, dSLAM). We note however that an analogy can be made with the concept of \textit{bait} and \textit{prey} terms used in proteomic experiments (\textit{e.g.,} Y2H, APMS). \subsection{Synthetic genetic array data, Tong et al. (2004)} \cite{Tong2001} used the \textit{Synthetic Genetic Array technology} or SGA to investigate synthetic genetic interaction in \textit{S. cerevisiae}. The package \Rpackage{SLGI} contains both the raw and preprocessed data from \cite{Tong2004}. To access those data you first need to load the package \Rpackage{SLGI} and the yeast genome annotation package (\Rpackage{org.Sc.sgd.db}): <>= library("SLGI") library("org.Sc.sgd.db") ##loading Tong et al data data(SGA) data(Atong) @ Data \Robject{SGA} contains the systematic names of all the \Sexpr{length(SGA)} genes tested by \cite{Tong2004}, including both the ones that were reported as presenting synthetic genetic interactions and the ones that were not (\Robject{SGAraw} corresponds to the original list parsed from table1 of \cite{Tong2001} supplementary material). We can verify that the genes reported by \cite{Tong2004} are well characterized. To that aim, we use the yeast annotation data package \Rpackage{org.Sc.sgd.db}: <>= rejected <- length(intersect(SGA, org.Sc.sgdREJECTORF)) @ We note that at this time \Sexpr{rejected} genes (out of the \Sexpr{length(SGA)}) are among the rejected ORF listed by the \textit{Saccharomyces} Genome Database (SGD \href{http://www.yeastgenome.org/}{http://www.yeastgenome.org/}). If one want to update common gene names or alias to systematic names, one can use the following: <>= updateSGA=mget(SGA, org.Sc.sgdCOMMON2ORF, ifnotfound = NA ) @ The \Robject{tong2004raw} data.frame contains the original data reported by \cite{Tong2004} as Table S1 in their online supporting material. The \Robject{Atong} data contains the association matrix extracted from the \Robject{tong2004raw} data.frame. The gene names were updated for systematic gene names. They selected \Sexpr{dim(Atong)[[1]]} query genes that are known involved in a chosen set of molecular functions. <>= data(essglist) esg = names(essglist) n1 <- sum( esg %in% dimnames(Atong)[[1]]) n2 <- sum( esg %in% dimnames(Atong)[[2]]) @ There are \Sexpr{n1} essential genes found in the query genes. \cite{Tong2004} pointed out that some of the query genes are partially functioning alleles of essential genes. So, we assumed these genes are fine. There are also \Sexpr{n2} essential genes in the reported array genes that showed synthetic lethal (SL) interaction with at least one of the query genes. We checked these three genes. Two of them, "YJL174W" and "YPL075W" , are annotated both "lethal" and "viable" in the \Special{SGD} database. The other gene, "YBR121C", is "lethal". We don't have the resources to tract down why this gene appears on the \Special{SGA} array \cite{Tong2001}. \subsubsection{Synthetic lethal and synthetic dosage lethal screens, Measday et al (2005)} \cite{measday2005sys} perform some systematic yeast synthetic lethal and synthetic dosage lethal screens using the SGA approach \citep{Tong2001}. They first tested 14 query genes and found 84 non-essential genes that synthetically interact with at least one query gene (\Robject{SLchr}). Then they tested interaction between 3 query genes and the genome wide set of deletion strains under 3 different temperatures. They found 141 array genes that interact at least with one query gene (\Robject{SDL}). They identified genes required for chromosome segregation. \subsection{DNA integrity experiment in S. cerevisiae, Pan et al (2006)} The package contains raw and preprocessed data from \cite{Pan2006} obtained in Boeke's lab. <<>>= data(Boeke2006raw) data(Boeke2006) @ \Robject{Boeke2006raw} is a data frame with 5775 observations and \Robject{Boeke2006} is an incidence matrix reporting the systematic genetic interactions identified between 74 query genes and the deletion gene set in \cite{Pan2004} (see man pages for more details). The technology used by Boeke and collaborators is slightly different from the approach taken by \cite{Tong2001}. The used heterozygote diploid-based synthetic lethality analyzed by microarray (dSLAM). The 21991 probes spotted on the dSLAM array are available by calling \Robject{dSLAM.GPL1444} or \Robject{dSLAM} (see man pages for more details). \subsection{Genetic Interaction Data (EMAP), Schuldiner et al (2005) and Collins (2007)} We also collected data generated by Collins and collaborators. These data are different from the other as they have be heavily preprocessed using their own procedure, EMAP or epistatic miniarray profiles. Those data are presented as incidence matrix and are accompanied by some metadata, e.g., systematic names and mutated allele. <<>>= ## Schuldiner et al. (2005) data(gi2005) data(gi2005.metadata) @ \subsection{Saccharomyces Genome database} We provide synthetic genetic interaction data as recorded by the Saccharomyces Genome database in January, 2007. Data can be accessed using \Robject{SGD.SL}, synthetic lethal, \Robject{SGD.SynRescue}, synthetic rescue, and \Robject{SGD.SynGrowthDefect}, synthetic growth defect. \section{Transcription Factor data} The transcription factor binding affinities data were extracted from \cite{Lee2002}. They represented as an matrix where rows are \textit{S. cerevisiae} systematic gene names and columns known transcription factor. The value in each entry represents the p-value, as reported by \cite{Lee2002}, for the transcription factor (TF) binding upstream of the gene. <<>>= data(TFmat) @ \section{Example of analysis: Synthetic genetic interactions and multi-protein complexes} To integrate synthetic genetic interactions with multi-protein complexes, we can make use of the interactome as defined in the \Rpackage{ScISI} package. The \Rpackage{ScISI} package or \textit{In Silico Interactome for Saccharomyces cerevisiae} provides an interactome built for computational experimentation. The \Robject{ScISI} is binary incidence matrix where the rows are indexed by the gene locus names and the columns are indexed by the identification codes for the protein complexes based on the repository from where they are obtained. This interactome is currently built from the Intact, Gene Ontology and Mips curated databases, and estimated protein complexes from the \Rpackage{apComplex} package. In this vignette, we will make use of a subset of the \Robject{ScISI} interactome, the \Robject{ScISIC} data, that only contains the data from the curated databases. <>= library(ScISI) data(ScISIC) ScISIC[1:5, 1:5] @ As an example we will use the data generated by \cite{Pan2006}.First, one need to reduce the interactome matrix and genetic interaction matrix to the same list of genes. This can be done using the \Rfunction{gi2Interactome} function. <<>>= data(Boeke2006) data(dSLAM) dim(Boeke2006) Boeke2006red <- gi2Interactome(Boeke2006, ScISIC) dim(Boeke2006red) @ Next we can identify multi-protein complexes that present synthetic interaction among their proteins (\bf{within} interaction) or share synthetic interaction with other multi-protein complex (\bf{between} interaction) using the \Rfunction{getInteraction} function. This function requires the incidence matrix, the array list and the interactome of interest. <<>>= interact <- getInteraction(Boeke2006red, dSLAM, ScISIC) @ Then, one might want to know how what are the multi-protein complexes that share at least \textit{n} interactions: <<>>= intSummary <- iSummary(interact$bwMat, n=5) @ Finally, we want to know if any of those interactions are statistically significant. To that aim we developed 2 approaches. First, using a graph theory approach, we test whether those interactions are randomly distributed within the interactome. <>= modelBoeke <- modelSLGI(Boeke2006red, universe= dSLAM, interactome=ScISIC,type="intM", perm=5) @ A \Rfunction{plot} function allows you the visualize the result. In this case, we note that the number of observed synthetic genetic interaction is globally higher that the simulated data <>= plot(modelBoeke,pch=20) @ <> print(plot(modelBoeke,pch=20)) @ Note that here, for computer time efficiency, we only performed 5 permutations but for really analysis 100 permutations or more are strongly recommended. Next, we can perform a Hypergeometric test to identify the multi-protein complexes that presents a unusual number of synthetic genetic interaction. The \Rfunction{test2Interact} function allows you to summarize the genetic interactions within one cellular organizational unit or between 2 cellular organizational units, taking into account all the interactions tested (positive or negative). One can compute the global interaction matrix as follows: <>= array <- dSLAM[dSLAM %in% rownames(ScISIC)] query <- rownames(Boeke2006)[rownames(Boeke2006) %in% rownames(ScISIC)] allInteract <- matrix(1, nrow=length(query), ncol=length(array), dimnames=list(query, array)) tested <- getInteraction(allInteract, dSLAM, ScISIC) @ <>= testedInteract <- test2Interact(iMat=interact$bwMat, tMat=tested$bwMat, interactome=ScISIC) significant <- hyperG(cbind("Tested"=testedInteract$tested,"Interact"=testedInteract$interact), sum(Boeke), nrow(Boeke2006red)*length(dSLAM)) @ \bibliographystyle{plainnat} \bibliography{SLGI} \end{document}