\name{pcKeepCompDetect} \alias{pcKeepCompDetect} \title{ Auto detection of a fitted \code{pcKeepComp} param for filterFFT function } \description{ This function tries to obtain the minimum number of components needed in a FFT filter to achieve or get as close as possible to a given correlation value. Usually you don't need to call directly this function, is used in \code{filterFFT} by default. } \usage{ pcKeepCompDetect(data, pc.min=0.01, pc.max=0.1, max.iter=20, verbose=FALSE, cor.target=0.98, cor.tol=1e-3, smpl.num=25, smpl.min.size=2^10, smpl.max.size=2^14) } \arguments{ \item{data}{ Numeric vector to be filtered } \item{pc.min, pc.max}{ Range of allowed values for pcKeepComp (minimum and maximum), in the range 0:1. } \item{max.iter}{ Maximum number of iterations } \item{verbose}{ Extra information (debug) } \item{cor.target}{ Target correlation between the filtered and the original profiles. A value around 0.99 is recommeded for Next Generation Sequencing data and around 0.7 for Tiling Arrays. } \item{cor.tol}{ Tolerance allowed between the obtained correlation an the target one. } \item{smpl.num}{ If \code{data} is a large vector, some samples from the vector will be used instead the whole dataset. This parameters tells the number of samples to pick. } \item{smpl.min.size, smpl.max.size}{ Minimum and maximum size of the samples. This is used for selection and sub-selection of ranges with meaningful values (i,e, different from 0 and NA). Power of 2 values are recommended, despite non-mandatory. } \item{\dots}{ Parameters to be pass to \code{autoPcKeepComp} } } \details{ This function predicts a suitable \code{pcKeepComp} value for \code{filterFFT} function. This is the recommended amount of components (in percentage) to keep in the \code{filterFFT} function to obtain a correlation of (or near of) \code{cor.target}. The search starts from two given values \code{pc.min, pc.max} and uses linial interpolation to quickly reach a value that gives a corelation between the filtered and the original near \code{cor.target} within the specified tolerance \code{cor.tol}. To allow a quick detection without an exhaustive search, this function uses a subset of the data by randomly sampling those regions with meaningful coverage values (i,e, different from 0 or NA) larger than \code{smpl.min.size}. If it's not possible to obtain \code{smpl.max.size} from this region (this could be due to flanking 0's, for example) at least \code{smpl.min.size} will be used to check correlation. Mean correlation between all sampled regions is used to test the performance of the pcKeepComp parameter. If the number of meaningful bases in \code{data} is less than \code{smpl.min.size * (smpl.num/2)} all the \code{data} vector will be used instead of using sampling. } \value{ Fitted \code{pcKeepComp} value } \author{ Oscar Flores \email{oflores@mmb.pcb.ub.es}, David Rosell \email{david.rosell@irbbarcelona.org} } \keyword{ attribute } \examples{ #Load dataset data(nucleosome_htseq) data = as.vector(coverage.rpm(nucleosome_htseq)[[1]]) #Get recommended pcKeepComp value pckeepcomp = pcKeepCompDetect(data, cor.target=0.99) print(pckeepcomp) #call filterFFT f1 = filterFFT(data, pcKeepComp=pckeepcomp) #Also this can be called directly f2 = filterFFT(data, pcKeepComp="auto", cor.target=0.99) #Plot plot(data[1:2000], col="black", type="l", lwd=2) lines(f1[1:2000], col="red", lwd=2) lines(f2[1:2000], col="blue", lwd=2, lty=2) legend("bottom", c("original", "two calls", "one call"), col=c("black", "red", "blue"), lty=c(1,1,2), horiz=TRUE, bty="n") }