\name{constructBins}
\alias{constructBins}
\title{
Construct bin-level ChIP-sep data from an aligned read file
}
\description{
Preprocess and construct bin-level ChIP-sep data from an aligned read file.
}
\usage{
constructBins( infile=NULL, fileFormat=NULL, outfileLoc="./", 
    byChr=FALSE, excludeChr=NULL, 
    fragLen=200, binSize=200, capping=0, perl = "perl" )
}
\arguments{
  \item{infile}{
    Name of the aligned read file to be processed.
}
  \item{fileFormat}{
    Format of the aligned read file to be processed.
    Currently, \code{constructBins} permits the following aligned read file formats:
        \code{"eland_result"} (Eland result), \code{"eland_extended"} (Eland extended),
        \code{"eland_export"} (Eland export), \code{"bowtie"} (default Bowtie),
        \code{"sam"} (SAM), and \code{"bed"} (BED).
}
  \item{outfileLoc}{
    Directory of processed bin-level files. 
    By default, processed bin-level files are exported to the current directory. 
}
  \item{byChr}{
    Construct separate bin-level file for each chromosome? 
    Possible values are \code{TRUE} or \code{FALSE}.
    If \code{byChr=FALSE}, bin-level data for all chromosomes are exported to one file. 
    If \code{byChr=TRUE}, bin-level data for each chromosome is exported to a separate file.
    Default is \code{FALSE}.
}
  \item{excludeChr}{
    Vector of chromosomes that will be excluded from the analysis.
}  
  \item{fragLen}{
    Average fragment length. Default is 200.
}
  \item{binSize}{
    Size of bins. Default is 200.
}
  \item{capping}{
    Maximum number of reads allowed to start at each nucleotide position. 
    To avoid potential PCR amplification artifacts, the maximum number of reads
    that can start at a nucleotide position is capped at \code{capping}. 
    Capping is not applied if non-positive value is used for \code{capping}.
    Default is 0 (no capping).
}
  \item{perl}{
    Name of the perl executable to be called. Default is \code{"perl"}.
}
}
\details{
Bin-level files are constructred from the aligned read file and 
exported to the directory specified in \code{outfileLoc} argument.
If \code{byChr=FALSE}, bin-level files are named 
as \code{[infileName]_fragL[fragLen]_bin[binSize].txt}.
If \code{byChr=TRUE}, bin-level files are named 
as \code{[infileName]_fragL[fragLen]_bin[binSize]_[chrID].txt},
where \code{chrID} is chromosome IDs that reads align to.
These chromosome IDs are extracted from the aligned read file.
Chromosomes that are specified in \code{excludeChr}
will not be included in the processed bin-level files.
Constructed bin-level files can be loaded into the R environment using the method \code{readBins}.

\code{constructBins} currently supports the following aligned read file formats: 
Eland result (\code{"eland_result"}), Eland extended (\code{"eland_extended"}),
Eland export (\code{"eland_export"}), default Bowtie (\code{"bowtie"}), 
SAM (\code{"sam"}), and BED (\code{"bed"}).
This method assumes that these aligned read files are obtained from single-end tag (SET) experiments
and it retains only reads mapping uniquely to the reference genome. 
}
\value{
Processed bin-level files are exported to the directory specified in \code{outfileLoc}.
}
\references{
Kuan, PF, D Chung, JA Thomson, R Stewart, and S Keles (2011), 
"A Statistical Framework for the Analysis of ChIP-Seq Data", 
\emph{Journal of the American Statistical Association}, Vol. 106, pp. 891-903.
}
\author{ Dongjun Chung, Pei Fen Kuan, Sunduz Keles }
\seealso{
\code{\link{readBins}}, \code{\linkS4class{BinData}}.
}
\examples{
\dontrun{
constructBins( infile="/scratch/eland/STAT1_eland_results.txt", 
    fileFormat="eland_result", outfileLoc="/scratch/eland/", 
    byChr=FALSE, excludeChr="chrM", fragLen=200, binSize=200, capping=0 )
}
}
\keyword{models}
\keyword{methods}