%\VignetteIndexEntry{ACME}
%\VignetteDepends{ACME}
%\VignetteKeywords{ACME}
%\VignetteKeywords{ACME}
%\VignettePackage{ACME}
\documentclass[12pt,fullpage]{article}

\usepackage{amsmath,epsfig,fullpage}
\usepackage{hyperref}
\usepackage{url}
\usepackage[authoryear,round]{natbib}

\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\textit{#1}}}

\author{Sean Davis$^\ddagger$\footnote{sdavis2@mail.nih.gov}}
\begin{document}
\title{Using the ACME package}
\maketitle
\begin{center}$^\ddagger$Genetics Branch\\ National Cancer Institute\\ National Institutes of Health
\end{center}


\tableofcontents
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Overview of ACME} 
Data obtained from high-density oligonucleotide tiling arrays present new computational challenges for users. ACME (Algorithm for Capturing Microarray Enrichment) is a method for determing genomic regions of enrichment in the context of tiling microarray experiments.  ACME identifies signals or "peaks" in tiled array data using a user-defined sliding window of n-base-pairs and a threshold (again, user-defined) strategy to assign a probability value (p-value) of enrichment to each probe on the array. This approach has been applied successfully to at least two different genomic applications involving tiled arrays: ChIP-chip and DNase-chip.  However, it can potentially be applied to tiling array data whenever regions of relative enrichment are expected.  

The ACME algorithm is quite straightforward.  Using a user-defined quantile of the data, called the threshold, any probes in the data that are above that threshold are considered positive probes.  For example, if a user chooses a threshold of 0.95, then, of course, 5 percent of the total data are going to be positive probes.  To look for enrichment, a sliding window of fix number of base pairs (the chosen window size) is examined centered on each probe.  Enrichment is calculated using a chi-square of the number of expected positive probes in the window as compared to the expected number.  A p-value is then assigned to each probe.  Note that these p-values are not corrected for multiple comparisons and should be used as a guide to determining regions of interest rather than a strict statistical significance level.    

\section{Getting Started using ACME}


<<>>=
library(ACME)
@ 

This loads the ACME library.

To illustrate the package, we begin by loading some example data from two nimblegen arrays.  The arrays were custom-designed to assay HOX genes in a ChIP-chip experiment.

<<>>=
datdir <- system.file('extdata',package='ACME')
fnames <- dir(datdir)
example.agff <- read.resultsGFF(fnames,path=datdir)
example.agff
@ 

Now, \Robject{a} is an R data structure (of class \Rclass{ACMESet}) that contains the data from two test GFF files.

<<>>=
calc <- do.aGFF.calc(example.agff,window=1000,thresh=0.95)
@ 

The function do.aGFF.calc takes as input an \Rclass{ACMESet} object, a window size (usually 2-3 times the expected fragment size from the experiment and large enough to include about 10 probes, at least), and a threshold which will be used to determine which probes are counted as positive in the chi-square test.

If desired, the results can be plotted in an R graphics window. The raw signal intensities of each oligonucleotide (Chip/total genomic DNA) will be displayed as grey points; corresponding P values will be displayed in red. The dotted horizontal line represents the threshold as defined in the call to \Rfunction{do.aGFF.calc}.  In the following example, R plots the results from an arbitrarily chosen region on chromosome 1, genome coordinates 10,000-50,000.

<<fig=TRUE>>=
plot(calc,chrom='chr1',sample=1)
@ 

And one can find significant regions of interest using:

<<>>=
regs <- findRegions(calc)
regs[1:5,]
@ 

\subsection{Generating files for viewing in genome browsers}

The Affymetrix Integrated Genome Browser (IGB) is a very fast, cross-platform (Java-based) genome browser that can display data in many formats.  By generating so-called ``sgr'' files, one can view both the raw data and the calculated p-values in a fully interactive manner.  A simple function, \Rfunction{write.sgr}, will generate such files that can then be loaded into that browser.  The function also serves as a model for how to generate other file formats.  With minor modifications, other formats can be generated.

<<>>=
# write both calculated values and raw data
write.sgr(calc)
# OR write only calculated data
write.sgr(calc,raw=FALSE)
@ 

Export to the UCSC genome browser bedGraph format is also supported.

<<>>=
# or for the UCSC genome browser
write.bedGraph(calc)
@ 


\end{document}