\name{essGene} \Rdversion{1.1} \alias{essGene} \title{ Essential member genes in a gene set } \description{ This function extracts data for essential member genes in a gene set. Essential genes are genes that have changes over noise level. } \usage{ essGene(gs, exprs, ref = NULL, samp = NULL, gsets = NULL, compare = "paired", use.fold = TRUE, rank.abs = FALSE, use.chi = FALSE, chi.p = 0.05, ...) } \arguments{ \item{gs}{ character, either the name of an interesting gene set in a gene set collection passed by \code{gsets} argument, or a vector of gene IDs. Make sure that the same gene ID system is used for both \code{gs} and \code{exprs}. } \item{exprs}{ an expression matrix or matrix-like data structure, with genes as rows and samples as columns. } \item{ref}{ a numeric vector of column numbers for the reference condition or phenotype (i.e. the control group) in the exprs data matrix. Default ref = NULL, all columns are considered as target experiments. } \item{samp}{ a numeric vector of column numbers for the target condition or phenotype (i.e. the experiment group) in the exprs data matrix. Default samp = NULL, all columns other than ref are considered as target experiments. } \item{gsets}{ a named list, each element contains a gene set that is a character vector of gene IDs or symbols. For example, type \code{head(kegg.gs)}. A gene set can also be a "smc" object defined in PGSEA package. Make sure that the same gene ID system is used for both \code{gsets} and \code{exprs}. Default to be NULL, then argument \code{gs} needs to be a vector of gene IDs. } \item{compare}{ character, which comparison scheme to be used: 'paired', 'unpaired', '1ongroup', 'as.group'. 'paired' is the default, ref and samp are of equal length and one-on-one paired by the original experimental design; 'as.group', group-on-group comparison between ref and samp; 'unpaired' (used to be '1on1'), one-on-one comparison between all possible ref and samp combinations, although the original experimental design may not be one-on-one paired; '1ongroup', comparison between one samp column at a time vs the average of all ref columns. } \item{use.fold}{ Boolean, whether the input \code{gage} results used fold changes or t-test statistics as per gene statistics. Default use.fold= TRUE. } \item{rank.abs}{ boolean, whether to sort the essential gene data based on absoluate changes. Default to be FALSE. } \item{use.chi}{ boolean, whether to use chi-square test to select the essential genes. Default to be FALSE, use the mean plus standard deviation of all gene changes instead. Check details for more information. } \item{chi.p}{ numeric value between 0 and 1, cutoff p-value for the chi-square test to select the essential genes. Default to 0.05. } \item{\dots}{ other arguments to be passed into the inside \code{gagePrep} function. } } \details{ There are two different criteria for essential gene selection. One uses a chi-square test to determin whether the change of a gene is more than noise. A second considers any changes beyond 1 standard deviation from mean of all genes as real. Note that essential genes are different from core genes considered in \code{esset.grp} function. Essential genes may change in a different direction than the overall change of a gene set. But core genes need to change in the in the interesting direction(s) of the gene set test. } \value{ A expression data matrix extracted for the essential genes, with similar structure as \code{exprs}. } \references{ Luo, W., Friedman, M., Shedden K., Hankenson, K. and Woolf, P GAGE: Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics 2009, 10:161 } \author{ Weijun Luo } \seealso{ \code{\link{gage}} the main function for GAGE analysis; \code{\link{geneData}} output and visualization of expression data for selected genes; \code{\link{esset.grp}} non-redundant signcant gene set list; } \examples{ data(gse16873) cn=colnames(gse16873) hn=grep('HN',cn, ignore.case =TRUE) dcis=grep('DCIS',cn, ignore.case =TRUE) #kegg test for 1-directional changes data(kegg.gs) gse16873.kegg.p <- gage(gse16873, gsets = kegg.gs, ref = hn, samp = dcis) rownames(gse16873.kegg.p$greater)[1:3] gs=unique(unlist(kegg.gs[rownames(gse16873.kegg.p$greater)[1:3]])) essData=essGene(gs, gse16873, ref =hn, samp =dcis) head(essData) ref1=1:6 samp1=7:12 #generated text file for data table, pdf files for heatmap and scatterplot for (gs in rownames(gse16873.kegg.p$greater)[1:3]) { outname = gsub(" |:|/", "_", substr(gs, 10, 100)) geneData(genes = kegg.gs[[gs]], exprs = essData, ref = ref1, samp = samp1, outname = outname, txt = TRUE, heatmap = TRUE, Colv = FALSE, Rowv = FALSE, dendrogram = "none", limit = 3, scatterplot = TRUE) } } \keyword{multivariate} \keyword{manip}