%\documentclass[a4paper,12pt]{article} \documentclass[12pt]{article} \usepackage{fullpage} % \usepackage{times} %\usepackage{mathptmx} %\renewcommand{\ttdefault}{cmtt} \usepackage{graphicx} \usepackage[pdftex, bookmarks, bookmarksopen, pdfauthor={David Clayton}, pdftitle={Imputed SNP analyses with chopsticks}] {hyperref} \title{Imputed SNP analyses and meta-analysis with chopsticks} \author{David Clayton} \date{\today} \usepackage{Sweave} \SweaveOpts{echo=TRUE, pdf=TRUE, eps=FALSE} \begin{document} \setkeys{Gin}{width=1.0\textwidth} %\VignetteIndexEntry{Imputation and meta-analysis} %\VignettePackage{chopsticks} \maketitle % R code as %<>= % %@ \section*{Getting started} The need for imputation in SNP analysis studies occurs when we have a smaller set of samples in which a large number of SNPs have been typed, and a larger set of samples typed in only a subset of the SNPs. We use the smaller, complete dataset (which will be termed the {\em training dataset}) to impute the missing SNPs in the larger, incomplete dataset (the {\em target dataset}). Examples of such applications include: \begin{itemize} \item use of HapMap data to impute association tests for a large number of SNPs, given data from genome-wide studies using, for example, a 500K SNP array, and \item meta-analyses which seek to combine results from two platforms such as the Affymetrix 500K and Illumina 550K platforms. \end{itemize} Here we will not use a real example such as the above to explore the use of {\tt chopsticks} for imputation, but generate a fictitious example using the data analysed in earlier exercises. This is particularly artificial in that we have seen that these data suffer from extreme heterogeneity of population structure. We start by attaching the required libraries and accessing the data used in the exercises: <>= library(chopsticks) library(hexbin) data(for.exercise) @ We shall sample 200 subjects in our fictitious study as the training data set, select alternate SNPs to be potentially missing or present in the target dataset, and split the training set into two parts accoordingly: <