\name{sigGeneSet} \Rdversion{1.1} \alias{sigGeneSet} \title{ Significant gene set from GAGE analysis } \description{ This function sorts and counts signcant gene sets based on q- or p-value cutoff. } \usage{ sigGeneSet(setp, cutoff = 0.1, dualSig = (0:2)[2], qpval = c("q.val", "p.val")[1],heatmap=TRUE, outname="array", pdf.size = c(7,7), p.limit=c(0.5, 5.5), stat.limit=5, ...) } \arguments{ \item{setp}{ the result object returned by \code{gage} function, either a numeric matrix or a list of two such matrices. Check \code{gage} help information for details. } \item{cutoff}{ numeric, q- or p-value cutoff, between 0 and 1. Default 0.1 (for q-value). When p-value is used, recommended cutoff value is 0.001 for data with more than 2 replicates per condition or 0.01 for les sample sizes. } \item{dualSig}{ integer, switch argument controlling how dual-signficant gene sets should be treated. This argument is only useful when Stouffer method is not used in gage function (use.stouffer=FALSE), hence makes no difference normally. 0: discard such gene sets from the final significant gene set list; 1: keep such gene sets in the more significant direction and remove them from the less significant direction; 2: keep such gene sets in the lists for both directions. default to 1. Dual-signficant means a gene set is called significant simultaneously in both 1-direction tests (up- and down-regulated). Check the details for more information. } \item{qpval}{ character, specifies the column name used for gene set selection, i.e. what type of q- or p-value to use in gene set selection. Default to be "q.val" (q-value using BH procedure). "p.val" is the unadjusted global p-value and may be used as selection criterion sometimes. } \item{heatmap}{ boolean, whether to plot heatmap for the selected gene data as a PDF file. Default to be FALSE. } \item{outname}{ a character string, to be used as the prefix of the output data files. Default to be "array". } \item{pdf.size}{ a numeric vector to specify the the width and height of PDF graphics region in inches. Default to be c(7, 7). } \item{stat.limit}{ numeric vector of length 1 or 2 to specify the value range of gene set statistics to visualize using the heatmap. Statistics beyong will be reset to equal the proximal limit. Default to 5, i.e. plot all gene set statistics within (-5, 5) range. May also be NULL, i.e. plot all statistics without limit. This argument allows optimal differentiation between most gene set statistic values when extremely positive/negative values exsit and squeeze the normal-value region. } \item{p.limit}{ numeric vector of length 1 or 2 to specify the value range of gene set -log10(p-values) to visualize using the heatmap. Values beyong will be reset to equal the proximal limit. Default to c(0.5,5.5), i.e. plot all -log10(p-values) within this range. This argument is similar to argument stat.limit. } \item{\dots}{ other arguments to be passed into the inside \code{gs.heatmap} function, which is a wrapper of the \code{heatmap2} function. } } \details{ By default, heatmaps are produced to show the gene set perturbations using either -log10(p-value) or statistics. Since gage package version 2.2.0, Stouffer's method is used as the default procedure for more robust p-value summarization. With the original p-value summarization, i.e. negative log sum following a Gamma distribution as the Null hypothesis, the global p-value could be heavily affected by a small subset of extremely small individual p-values from pair-wise comparisons. Such sensitive global p-value leads to the "dual signficance" phenomenon. In other words, Gene sets are signficantly up-regulated in a subset of experiments, but down-regulated in another subset. Note that dual-signficant gene sets are not the same as gene sets called signficant in 2-directional tests, although they are related. } \value{ \code{sigGeneSet} function returns a named list of the same structure as \code{gage} result. Check \code{gage} help information for details. } \references{ Luo, W., Friedman, M., Shedden K., Hankenson, K. and Woolf, P GAGE: Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics 2009, 10:161 } \author{ Weijun Luo } \seealso{ \code{\link{gage}} the main function for GAGE analysis; \code{\link{esset.grp}} non-redundant signcant gene set list; \code{\link{essGene}} essential member genes in a gene set; } \examples{ data(gse16873) cn=colnames(gse16873) hn=grep('HN',cn, ignore.case =TRUE) dcis=grep('DCIS',cn, ignore.case =TRUE) data(kegg.gs) #kegg test for 1-directional changes gse16873.kegg.p <- gage(gse16873, gsets = kegg.gs, ref = hn, samp = dcis) #kegg test for 2-directional changes gse16873.kegg.2d.p <- gage(gse16873, gsets = kegg.gs, ref = hn, samp = dcis, same.dir = FALSE) gse16873.kegg.sig<-sigGeneSet(gse16873.kegg.p, outname="gse16873.kegg") str(gse16873.kegg.sig) gse16873.kegg.2d.sig<-sigGeneSet(gse16873.kegg.2d.p, outname="gse16873.kegg") str(gse16873.kegg.2d.sig) #also check the heatmaps in pdf files named "*.heatmap.pdf". } \keyword{htest} \keyword{multivariate} \keyword{manip}