\documentclass[12pt]{article} \usepackage{fullpage} \usepackage[pdftex, bookmarks, bookmarksopen, pdfauthor={David Clayton}, pdftitle={snpMatrix-differences Vignette}] {hyperref} \title{Differences between snpStats and snpMatrix} \author{David Clayton} \date{\today} \usepackage{Sweave} \SweaveOpts{echo=TRUE, pdf=TRUE, eps=FALSE} \begin{document} \setkeys{Gin}{width=1.0\textwidth} %\VignetteIndexEntry{snpMatrix-differences} %\VignettePackage{snpStats} \maketitle \section*{The {\tt snpMatrix} and {\tt snpStats} packages} The package ``{\tt snpMatrix}'' was written to provide data classes and methods to facilitate the analysis of whole genome association studies in R. In the data classes it implements, each genotype call is stored as a single byte and, at this density, data for single chromosomes derived from large studies and new high-throughput gene chip platforms can be handled in memory by modern PCs and workstations. The object--oriented programming model introduced with version 4 of the S-plus package, usually termed ``S4 methods'' was used to implement these classes. {\tt snpStats} initially arose out of the need to store, and analyse, SNP genotype data in which subjects cannot be assigned to the three possible genotypes with certainty. This necessitated a change in the way in which data are stored internally, although {\tt snpStats} can still handle conventionally called genotype data stored in the original {\tt snpMatrix} storage mode. {\tt snpStats} currently lacks some facilities which were present in {\tt snpMatrix} (although, hopefully, the important gaps will soon be filled) but it also includes several new facilities. This vignette simply describes differences for users converting from the old {\tt snpMatrix} package. \section*{Classes} Function names have, for the most part, remained unchanged so that existing analysis scripts will continue to work with minimal modification. Initially it was hoped also to maintain the old class names since the classes were (mostly) backwards-compatible. But this proved troublesome and, in versions 1.1.4 and later, the class names have been changed (see Table). \begin{table}[h] \centering \begin{tabular}{ll} \hline {\tt snpMatrix} class & {\tt snpStats} class\\ \hline {\tt snp.matrix} & {\tt SnpMatrix}\\ {\tt X.snp.matrix} & {\tt XSnpMatrix}\\ {\tt single.snp.tests}& {\tt SingleSnpTests}\\ {\tt single.snp.tests.score}& {\tt SingleSnpTestsScore}\\ {\tt snp.tests.glm} & {\tt GlmTests}\\ {\tt snp.tests.glm.score} & {\tt GlmTestsScore}\\ {\tt snp.estimates.glm} & {\tt GlmEstimates}\\ {\tt imputation.rules}&{\tt ImputationRules}\\ \hline \end{tabular} \caption{Changes in class names} \end{table} Two functions have been provided to help users convert objects of a {\tt snpMatrix} class to the corresponding {\tt snpStats} class: \begin{itemize} \item {\tt convert.snpMatrix}: Converts a {\tt snpMatrix} object to the corresponding {\tt snpStats} class \item {\tt convert.snpMatrix.dir}: Converts all saved {\tt snpMatrix} objects in a given directory \end{itemize} \section*{Differences} A major difference is that the basic class, now {\tt SnpMatrix}, supports uncertain genotypes, as generated by imputation programs. Two classes have been removed, namely the {\tt snp} and {\tt X.snp} classes. These were originally devised to support a loss of dimension of a {\tt snp.matrix} or {\tt X.snp.matrix} due to selection of a single row or column with {\tt drop=TRUE} in force in the selection operator {\tt[]}. However these classes were never fully satisfactory and were seldom used. In {\tt snpStats} the {\tt drop=} option is no longer allowed during row and column selection; dimensions are never dropped. A word or warning, however: in the event that {\tt drop=} does occur in the selection operator, this will force the object to be regarded as a simple matrix of type {\tt raw}; this is the class that {\tt SnpMatrix} extends and this class does allow {\tt drop=}. There has been a cosmetic, but important, change in the {\tt XSnpMatrix} class as compared with its forerunner. The {\tt Female} slot has been renamed as {\tt diploid} to emphasize that this class is not only used for SNPs on the X chromosome, but for any SNP genotypes which may be haploid; this includes SNPs on the Y chromosome and mitocondrial SNPs. The functions for computing pairwise linkage disequilibrium statistics have been replaced by a rewritten single function, {\tt ld}. The large band matrix which this function generates in one usage is stored using the {\tt dsCMatrix} class defined in the {\tt Matrix} package, (which is now required). The function {\tt read.pedfile} has been rewritten, this time entirely in R. It has different arguments from the function of the same name in {\tt snpMatrix} and may be somewhat slower, but is somewhat more flexible. The {\tt ImputationRules} class has changed as a result of the introduction of the new storage convention for uncertain genotypes. In the new coding, uncertainty of calls is represented by (grouped) posterior probabilities of assignment to the three genotypes. This change was necessary because one of the imputation methods of in {\tt snpMatrix} only produced a posterior expectation of the genotype (when coded 0, 1 or 2) and this could not be accomodated unambiguously in the extended coding. The {\tt GlmTests} and {\tt GlmTestsScore} classes (formerly {\tt snp.tests.glm} and {\tt snp.tests.glm.score}) have changed slightly in order to accomodate ongoing work on methods for multinomial and multivariate phenotypes. The {\tt test.names} slot has been renamed as {\tt snp.names} and its function has been changed slightly (although this should only affect more complicated uses of {\tt snp.rhs.tests}). A new slot, {\tt var.names} has been added; this holds the name of the variable(s) tested against SNPs. \end{document}