%\VignetteIndexEntry{cghMCR findMCR}
%\VignetteKeyword{platform}
%\VignettePackage{cghMCR}
\documentclass[12pt]{article}
\usepackage{hyperref}
\textwidth=6.2in
\textheight=8.5in
%\parskip=.3cm
\oddsidemargin=.1in
\evensidemargin=.1in
\headheight=-.3in

\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}

\begin{document}
\author{Jianhua Zhang\and Bin Feng}

\title{How to use cghMCR}

\maketitle

\section{Overview}

This vignette demonstrate how to use \Rpackage{cghMCR} to locate minimum common regions (MCR) across arrayCGH profiles derived from different samples. MCR was initially proposed by Dr. Lynda Chin's lab (Aguirre et. al. 2004) to identify chromosome regions showing common gains/losses across samples using arrayCGH platform. The \Rpackage{cghMCR} pacakge implements the algothrim as described below:

\begin{itemize}
\item MCRs are identified based on the segments obtained using \Rpackage{DNAcopy} 
\item Segments above an upper (defined by a parameter \Robject{alteredHigh}) and lower (\Robject{alteredLow}) threshold values of percentile are identified as altered.   
\item If two or more altered segments are deparated by less than 500 kb, the entire region spanned by the segments is considered to be an altered span.
\item Highly altered segments or spans are retained as informative spans that define descrete locus boundaries.
\item Informative spanes are compared acress samples to identify overlapping groups of positive or negative value segments.
\item Minimal common regions (MCRs) are defined as contiguous spans having at least a recurrence rate defined by a parameter (\Robject{recurrence}) across samples. 
\end{itemize}    

\section{Getting Started}

The example data used in this vignette are artificially constructed following the \textit{Agilent} arrayCGH format with 5000 probes to maintain a reasonable speed of execution.   

\subsection{Read the sample data}

The sample data are stored in the \textit{data} subdirectory and can be loaded using \Rfunction{data}.

<<>>=
require("cghMCR")
data("sampleData")
@

The sample data was created by reading three fabricated files using \Rfunction{read.Agilent} and then normalized using \Rfunction{maNorm} (norm = "loess") of \Rpackage{marray}. Readers are referred to \Rpackage{marray} for more information on how to read in Agilent profile data.  

\Robject{sampleData} has three samples with intensity measures for 5000 probes.

<<>>=
maNsamples(sampleData)
length(maLabels(maGnames(sampleData)))
@ 

\subsection{Identify chromosome segments}

For each sample, we need to first identify chromosome segments having similar intensity measures. The function \Rfunction{getSegments} is a wrapper around the main functions provided by \Rpackage{DNAcopy} that are capable of detecting chromosome regions within which probe intensities remain similar.

<<>>=
segments <- getSegments(sampleData)
@

Results from the segmentation analysis (\Robject{segments}) is a list with three elements:

<<>>= 
names(segments)
@ 

The \textit{data} element contain the normalized data, the \textit{output} element contains the chromosome segments identified, and the \textit{call} elements contains the function call with parameters passed indicated. Now, let's plot the original data and the segments to see what the segments look like.

\begin{Sinput}
> plot(segments)
\end{Sinput}

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{segplot}
    \caption{\label{Segmentation plot}
    }
  \end{center}
\end{figure}

\subsection{Identify MCRs}

The element - \Robject{output} of the \Robject{segments} object generated in the previous section contains the segmentation data and will be used to get the MCRs. The parameter \Robject{gapAllowed} is numeric and indicate how many basepairs should two adjancent segments be apart, below which the segments will be joined to form an altered span. Parameters \Robject{alteredLow} and \Robject{alteredHigh} are also numerics and specify the lower and upper percential threshold values. Only segements with means less or greater than the lower or upper threshold values will be considered as altered regions and included in the subsequent analysis. \Robject{recurrence} is an integer defining the rate of recurrence for a region to show gain/loss across samples before it can be declared as an MCR. Due to the small number of probes in the sample data, the parameters are set to values that result in presentable results rather than correctness.
<<>>=
cghmcr <- cghMCR(segments, gapAllowed = 500, alteredLow = 0.20,
                   alteredHigh = 0.80, recurrence = 50)
mcrs <- MCR(cghmcr)
@ 

Using the above settings, we get four MCRs but only one (on chromosome 7) of them are common to two samples.

<<>>=
print(cbind(mcrs[, c("chromosome", "status", "mcr.start", "mcr.end", 
                     "samples")])) 
@ 

To include probe ids for the MCRs identified, we can call the function \Rfunction{mergeMCRProbes} to have probe ids within each MCR appended. Multiple probes are separated by a ",".

<<>>=
mcrs <- mergeMCRProbes(mcrs, segments[["data"]])
print(cbind(mcrs[, c("chromosome", "status", "mcr.start", "mcr.end", 
                     "probes")]))
@ 


\section{Session Information}

The version number of R and packages loaded for generating the vignette were:

<<echo=FALSE>>=
sessionInfo()
@

\section{References}
Aguirre, AJ, C. Brennan, G. Bailey, R. Sinha, B. Feng, C. Leo, Y. Zhang,
J. Zhang, N. Bardeesy, C. Cauwels, C. Cordon-Cardo, MS Redston, RA DePinho and
L. Chin. High-resolution Characterization of the Pancreatic Adenocarcinoma
Genome. Proc Natl Acad Sci U S A. 2004. 101(24):9067-9072.


\end{document}