\name{bga} \alias{bga} \alias{plot.bga} \title{Between group analysis} \description{Discrimination of samples using between group analysis as described by Culhane et al., 2002.} \usage{ bga(dataset, classvec, type = "coa", \dots) \method{plot}{bga}(x, axis1=1, axis2=2, arraycol=NULL, genecol="gray25", nlab=10, genelabels= NULL, \dots) } \arguments{ \item{dataset}{Training dataset. A \code{\link{matrix}}, \code{\link{data.frame}}, \code{\link[Biobase:ExpressionSet-class]{ExpressionSet}} or \code{\link[marray:marrayRaw-class]{marrayRaw-class}}. If the input is gene expression data in a \code{\link{matrix}} or \code{\link{data.frame}}. The rows and columns are expected to contain the variables (genes) and cases (array samples) respectively. } \item{classvec}{A \code{factor} or \code{vector} which describes the classes in the training dataset.} \item{type}{Character, "coa", "pca" or "nsc" indicating which data transformation is required. The default value is type="coa".} \item{x}{An object of class \code{bga}. The output from \code{bga} or \code{\link[made4:bga.suppl]{bga.suppl}}. It contains the projection coordinates from \code{bga}, the \$ls, \$co or \$li coordinates to be plotted.} \item{arraycol, genecol}{Character, colour of points on plot. If arraycol is NULL, arraycol will obtain a set of contrasting colours using \code{getcol}, for each classes of cases (microarray samples) on the array (case) plot. genecol is the colour of the points for each variable (genes) on gene plot.} \item{nlab}{Numeric. An integer indicating the number of variables (genes) at the end of axes to be labelled, on the gene plot.} \item{axis1}{Integer, the column number for the x-axis. The default is 1.} \item{axis2}{Integer, the column number for the y-axis, The default is 2.} \item{genelabels}{A vector of variables labels, if \code{genelabels=NULL} the row.names of input matrix \code{dataset} will be used.} \item{\dots}{further arguments passed to or from other methods.} } \details{ \code{bga} performs a between group analysis on the input dataset. This function calls \code{\link[ade4:between]{between}}. The input format of the dataset is verified using \code{\link[made4:array2ade4]{array2ade4}}. Between group analysis is a supervised method for sample discrimination and class prediction. BGA is carried out by ordinating groups (sets of grouped microarray samples), that is, groups of samples are projected into a reduced dimensional space. This is most easily done using PCA or COA, of the group means. The choice of PCA, COA is defined by the parameter \code{type}. The user must define microarray sample groupings in advance. These groupings are defined using the input \code{classvec}, which is a \code{factor} or \code{vector}. \bold{Cross-validation and testing of bga results:} bga results should be validated using one leave out jack-knife cross-validation using \code{\link[made4:bga.jackknife]{bga.jackknife}} and by projecting a blind test datasets onto the bga axes using \code{\link[made4:suppl]{suppl}}. \code{bga} and \code{\link[made4:suppl]{suppl}} are combined in \code{\link[made4:bga.suppl]{bga.suppl}} which requires input of both a training and test dataset. It is important to ensure that the selection of cases for a training and test set are not biased, and generally many cross-validations should be performed. The function \code{\link[made4:randomiser]{randomiser}} can be used to randomise the selection of training and test samples. \bold{Plotting and visualising bga results:} \emph{1D plots, show one axis only:} 1D graphs can be plotted using \code{\link[made4:between.graph]{between.graph}} and \code{\link[made4:graph1D]{graph1D}}. \code{\link[made4:between.graph]{between.graph}} is used for plotting the cases, and required both the co-ordinates of the cases (\$ls) and their centroids (\$li). It accepts an object \code{bga}. \code{\link[made4:graph1D]{graph1D}} can be used to plot either cases (microarrays) or variables (genes) and only requires a vector of coordinates. \emph{2D plots:} Use \code{plot.bga} to plot results from \code{bga}. plot.bga calls the functions \code{\link[made4:plotarrays]{plotarrays}} to draw an xy plot of cases (\$ls). \code{\link[made4:plotgenes]{plotgenes}}, is used to draw an xy plot of the variables (genes). \code{\link[made4:plotgenes]{plotgenes}}, is used to draw an xy plot of the variables (genes). \emph{3D plots:} 3D graphs can be generated using \code{\link[made4:do3d]{do3D}} and \code{\link[made4:html3D]{html3D}}. \code{\link[made4:html3D]{html3D}} produces a web page in which a 3D plot can be interactively rotated, zoomed, and in which classes or groups of cases can be easily highlighted. \bold{Analysis of the distribution of variance among axes:} It is important to know which cases (microarray samples) are discriminated by the axes. The number of axes or principal components from a \code{bga} will equal \code{the number of classes - 1}, that is length(levels(classvec))-1. The distribution of variance among axes is described in the eigenvalues (\$eig) of the \code{bga} analysis. These can be visualised using a scree plot, using \code{\link[ade4:scatter]{scatterutil.eigen}} as it done in \code{plot.bga}. It is also useful to visualise the principal components from a using a \code{bga} or principal components analysis \code{\link[ade4:dudi.pca]{dudi.pca}}, or correspondence analysis \code{\link[ade4:dudi.coa]{dudi.coa}} using a heatmap. In MADE4 the function \code{\link[made4:heatplot]{heatplot}} will plot a heatmap with nicer default colours. \bold{Extracting list of top variables (genes):} Use \code{\link[made4:topgenes]{topgenes}} to get list of variables or cases at the ends of axes. It will return a list of the top n variables (by default n=5) at the positive, negative or both ends of an axes. \code{\link[made4:sumstats]{sumstats}} can be used to return the angle (slope) and distance from the origin of a list of coordinates. For more details see Culhane et al., 2002 and \url{http://bioinf.ucd.ie/research/BGA}. } \value{ A list with a class \code{bga} containing: \item{ord}{Results of initial ordination. A list of class "dudi" (see \code{\link[ade4:dudi]{dudi}} )} \item{bet}{Results of between group analysis. A list of class "dudi" (see \code{\link[ade4:dudi]{dudi}}), "between" (see \code{\link[ade4:between]{between}})} \item{fac}{The input classvec, the \code{factor} or \code{vector} which described the classes in the input dataset} } \references{Culhane AC, et al., 2002 Between-group analysis of microarray data. Bioinformatics. 18(12):1600-8.} \author{Aedin Culhane} \seealso{See Also \code{\link[made4:bga]{bga}}, \code{\link[made4:suppl]{suppl}}, \code{\link[made4:bga.suppl]{suppl.bga}}, \code{\link[ade4:between]{between}}, \code{\link[made4:bga.jackknife]{bga.jackknife}}} \examples{ data(khan) if (require(ade4, quiet = TRUE)) { khan.bga<-bga(khan$train, classvec=khan$train.classes) } khan.bga plot(khan.bga, genelabels=khan$annotation$Symbol) # Provide a view of the principal components (axes) of the bga heatplot(khan.bga$bet$ls, dend="none") } \keyword{manip} \keyword{multivariate}