\name{inputDataToADaCGHData} \alias{inputDataToADaCGHData} %- Also NEED an '\alias' for EACH other topic documented here. \title{Convert CGH data to ff data frames} \description{ An input data frame with CGH data is converted to several ff files and data checked for potential errors and location duplications. } \usage{ inputDataToADaCGHData(ffpattern = paste(getwd(), "/", sep = ""), MAList = NULL, cloneinfo = NULL, filename = NULL, sep = "\t", quote = "\"", na.omit = FALSE, minNumPerChrom = 10) } %- maybe also 'usage' for other objects documented here. \arguments{ \item{ffpattern}{ See argument \code{pattern} in \code{\link[ff]{ff}}. The default is to create the "ff" files in the current working directory. } \item{MAList}{The name of an object of class \code{MAList} (\code{\link[limma]{as.MAList}}) or \code{SegList} (e.g., \code{\link[snapCGH]{dim.SegList}}). See vignnettes for these packages for details about these objects.} \item{cloneinfo}{A character vector with the full path to a file that conforms to the characteristis of \code{file} in function \code{\link[snapCGH]{read.clonesinfo}} (see details in the vignette) or the name of a data frame with at least a column named "Chr" (with chromosomal informtaion) and "Position".} \item{filename}{ Name of data RData file that contains the data frame with original, non-ff, data. Note: this is the name of the RData file (possibly including path), NOT the name of the data frame. The first three columns of the data frame are the IDs of the probes, the chromosome number, and the position, and all remaining columns contain the data for the arrays, one column per array. The names of the first three column do not matter, but the order does. Names of the remaining columns will be used if existing; otherwise, fake array names will be created. } \item{sep}{Argument to \code{\link{read.table}} if reading a \code{cloneinfo} file.} \item{quote}{Argument to \code{\link{read.table}} if reading a \code{cloneinfo} file.} \item{na.omit}{Omit NAs? If there are NAs and na.omit is set to FALSE, the function will stop with an error.} \item{minNumPerChrom}{If any chromosome has fewer observations than minNumPerChrom the function will fail. This can help detect upstream pre-processing errors.} } \details{ If there are identical positions (in the same chromosome) a small random uniform variate is added to get unique locations. Commented examples of reading objects from \pkg{limma} and \pkg{snapCGH} are provided in the vignnette. %% ~~ If necessary, more details than the description above ~~ } \value{ This function is used mainly for its side effects: writing several ff files to the current working directory (the actual names are printed out). In addition, and since we need to manipulate the complete set of original data, the return value is a data frame that is could be used later to speed up certain calculations. Right now, however, this is not used for anything, except for information purposes. This table is similar to a dictionary or hash table. This data frame has (number of arrays * number of chromosomes) rows. The columns are \item{Index}{The integer index of the entry, 1:number of arrays * number of chromosomes} \item{ArrayNum}{The array number} \item{Arrayname}{The name of the array} \item{ChromNum}{The chrosome number} \item{ChromName}{The chromosome name. Yes, chromosome must be numeric, but the values of ChromNum form a set of integers starting at one and going up to the total number of different chromosomes. E.g., if you only have two chromosomes, say 3 and 22, ChromNum contains values 1 and 2, whereas ChromName contains values 3 and 22.} \item{posInit}{The first position (in a vector ordered from 1 to total number of probes, with probes ordered by chromosome and position within chromosome) of a probe of this chromosome.} \item{posEnd}{The last position of a probe of this chromosome.} } % \references{ % %% ~put references to the literature/web site here ~ % } \note{ Converting a very large data set into a set of ff files can be memory consuming. Since this function is mainly used for its side effects (leaving the ff files in the disk), it can be run in a separate process that will then be killed. See an example below using \pkg{multicore}. (For the example you must install \pkg{multicore}). } \author{Ramon Diaz-Uriarte \email{rdiaz02@gmail.com}} % \seealso{ % %% ~~objects to See Also as \code{\link{help}}, ~~~ % } \examples{ ## Create a temp dir for storing output. ## (Not needed, but cleaner). dir.create("ADaCGH2_example_input_dir") originalDir <- getwd() setwd("ADaCGH2_example_input_dir") Sys.sleep(1) ## Get location (and full filename) of example data file fname <- list.files(path = system.file("data", package = "ADaCGH2"), full.names = TRUE, pattern = "inputEx1") tableChromArray <- inputDataToADaCGHData(filename = fname) ### Clean up (DO NOT do this with objects you want to keep!!!) load("chromData.RData") load("posData.RData") load("cghData.RData") delete(cghData); rm(cghData) delete(posData); rm(posData) delete(chromData); rm(chromData) unlink("chromData.RData") unlink("posData.RData") unlink("cghData.RData") unlink("probeNames.RData") ### Running in a separate process ### This example only does anything if you have multicore installed. if(require(multicore)) { parallel(inputDataToADaCGHData(filename = fname), silent = FALSE) tableChromArray <- collect()[[1]] if(inherits(tableChromArray, "try-error")) { stop("ERROR in input data conversion") } ### Clean up (DO NOT do this with objects you want to keep!!!) load("chromData.RData") load("posData.RData") load("cghData.RData") delete(cghData); rm(cghData) delete(posData); rm(posData) delete(chromData); rm(chromData) unlink("chromData.RData") unlink("posData.RData") unlink("cghData.RData") unlink("probeNames.RData") } ### Try to prevent problems in R CMD check Sys.sleep(2) ### Delete temp dir setwd(originalDir) Sys.sleep(2) unlink("ADaCGH2_example_input_dir", recursive = TRUE) Sys.sleep(2) } \keyword{ IO }