\name{yeastAnn} \alias{yeastAnn} \alias{getProbe2SGD} \alias{procYeastGeno} \alias{formatGO} \alias{formatChrLoc} \alias{getGEOYeast} \alias{getYGExons} \title{Functions to annotate yeast genom data} \description{ Given a GEO accession number for a yease data set and the extensions for annotation data files names that are available from Yeast Genom web site, the functions generates a data package with containing annoatation data for yeast genes in the GEO data set. } \usage{ yeastAnn(base = "", yGenoUrl, yGenoNames = c("literature_curation/gene_literature.tab", "chromosomal_feature/SGD_features.tab", "literature_curation/gene_association.sgd.gz"), toKeep = list(c(6, 1), c(1, 5, 9, 10, 12, 16, 6), c(2, 5, 7)), colNames = list(c("sgdid", "pmid"), c("sgdid", "genename", "chr", "chrloc", "chrori", "description", "alias"), c("sgdid", "go")), seps = c("\t", "\t", "\t"), by = "sgdid") getProbe2SGD(probe2ORF = "", yGenoUrl, fileName = "literature_curation/orf_geneontology.tab", toKeep = c(1, 7), colNames = c("orf", "sgdid"), sep = "\t", by = "orf") procYeastGeno(baseURL, fileName, toKeep, colNames, seps = "\t") getGEOYeast(GEOAccNum, GEOUrl, geoCols = c(1, 8), yGenoUrl) formatGO(gos, evis) formatChrLoc(chr, chrloc, chrori) getYGExons(srcUrl, yGenoName = "chromosomal_feature/intron_exon.tab", sep = "\t") } \arguments{ \item{base}{\code{base} a file name for a matrix with two columns. The first column is probe ids and the second one are the mappings to SGD ids used by all the Yeast Genome data files. If \code{base} = "", the whole genome will be mapped based on a data file that contains mappings between all the ORFs and SGD ids} \item{GEOAccNum}{\code{GEOAccNum} a character string for the accession number given by GEO for a yeast data set} \item{GEOUrl}{\code{GEOUrl} a character string for the url that contains a common CGI for all the GEO data. Currently it is \url{http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?}} \item{geoCols}{\code{geoCols} a vector of integers for the coloumn numbers of the source file from GEO that maps yeast probe ids to ORF ids} \item{yGenoUrl}{\code{yGenoUrl} a character string for the url that is a directory in Yeast Genom web site that contains directories for yeast annotation data. Currently it is \url{ftp://genome-ftp.stanford.edu/pub/yeast/data_download/}} \item{baseURL}{see yGenoUrl} \item{yGenoNames}{\code{yGenoNames} a vector of character strings for the names of yeast annotation data. Each of the strings can be appended to yGenoUrl to make a complete url for a data file} \item{fileName}{a character string for the extension part of the source data file that can be used to target genes to SGD ids} \item{toKeep}{\code{toKeep} a list of vector of integers with numbers corresponding to column numbers of yeast genom data files that will kept when data files are processed. The length of toKeep must be the same as yGenoName (a vector for each file)} \item{colNames}{\code{colNames} a list of vectors of character strings for the names to be given to the columns to keep when processing the data. Again, the length of colNames must be the same as yGenoNames} \item{seps}{\code{seps} a vector of characters for the separators used by the data files included in yGenoNames} \item{sep}{singular version of seps} \item{by}{\code{by} a character string for the column that is common in all data files to be processed. The column will be used to merge separate data files} \item{probe2ORF}{\code{probe2ORF} a matrix with mappings of yease target genes to ORF ids that in turn can be mapped to SGD ids} \item{gos}{\code{gos} a vector of character strings for GO ids retrieved from Yeast Genome Project} \item{evis}{\code{evis} a vector of character string for the evidence code associated with go ids} \item{chr}{\code{chr} a vector of character strings for chromosome numbers} \item{chrloc}{\code{chrloc} a vector of integers for chromosomal locations} \item{chrori}{\code{chrori} a vector of characters that can either be w or c that are used for strand of yeast chromosomes} \item{srcUrl}{\code{srcUrl} a character string for the url where source yeast genome data are stroed} \item{yGenoName}{\code{yGenoName} a character string for the yeast genome file name to be processed} } \details{ To merge files, the system has to map the target genes in the base file to SGD ids and then use SGD ids to map traget genes to annotation data from different sources. \code{\link{formatGO}} adds leading 0s to goids when needed and then append the evidence code to the end of a goid following a "@". \code{\link{formatChrLoc}} assigns a + or - sing to \code{chrloc} depending on whether the corresponding \code{chrori} is w or c and then append \code{chr} to the end of \code{chrloc} following a "@". \code{\link{getGEOYeast}} gets yeast data from GEO for the columns specified. } \value{ \code{\link{yeastAnn}} returns a matrix with traget genes annotated by data from selected data columns in different data sources. \code{\link{getProbe2SGD}} returns a matrix with mappings between target genes and SGD ids. \code{\link{procYeastGeno}} returns a data matrix. \code{\link{formatGO}} returns a vector of character strings. \code{\link{formatChrLoc}} returns a vector of character strings. \code{\link{getGEOYeast}} returns a matrix with the number of columns specified. } \references{\url{ftp://genome-ftp.stanford.edu}} \author{Jianhua Zhang} \seealso{} \examples{ \dontrun{ yeastData <- yeastAnn(GEOAccNum = "GPL90") } } \keyword{manip}