\documentclass{article} \usepackage{fullpage} \usepackage{hyperref} \marginparwidth 0pt \oddsidemargin 0pt \evensidemargin 0pt \topmargin 0pt \textwidth 16cm \textheight 21cm %\VignetteIndexEntry{QTL Mapping using Diversity Outbred Mice} \usepackage{Sweave} \begin{document} \SweaveOpts{concordance=TRUE} \setkeys{Gin}{width=1.0\textwidth} \title{QTL Mapping using Diversity Outbred Mice} \author{Daniel M. Gatti} \date{08 October 2013} \maketitle \section{Introduction} Quantitative Trait Locus (QTL) mapping in DO mice is performed in several steps. First, we use the founder haplotype contributions to perform linkage mapping. In the mapping model, we adjust for kinship between DO mice using the R package \href{http://cran.r-project.org/web/packages/QTLRel/}{QTLRel}. Then, we perform permutations to determine and empirical signficance threshold. Next, we select chromosomes with QTL peaks above the signficance threshold, examine the founder allele effects and determine support intervals. Finally, we impute the founder SNPs onto the DO genomes to perform association mapping in the QTL intervals. \section{Mapping Models} \subsection{Linkage Mapping} Linkage mapping involves the use of founder haplotype probabilities. We perform point mapping at each marker on the array. We fit an additive model that regresses the phenotype on the eight founder haplotype contributions and incorporates an adjustment for the kinship between samples. \begin{equation} y = X\alpha + H\beta + Zu + \varepsilon \end{equation} where: \begin{itemize} \item{\emph{n} is the number of samples} \item{\emph{y} is an \emph{n} x 1 vector of phenotype values for each sample} \item{\emph{X} is an \emph{n} x \emph{p} matrix of \emph{p} fixed covariates (sex, diet, etc.) } \item{$\alpha$ is a \emph{p} x 1 vector of fixed effects } \item{\emph{H} is an \emph{n} x 8 matrix of founder haplotype contributions (each row sums to 1) } \item{$\beta$ is an 8 x 1 vector of founder haplotype effects } \item{\emph{Z} is an \emph{n} x \emph{n} matrix of error covariances between samples } \item{\emph{u} is an \emph{n} x 1 vector of ???} \item{$\varepsilon$ is an \emph{n} x 1 vector of residual errors} \end{itemize} \subsection{Association Mapping} Between each pair of markers, we assign the genotype state with the highest probability to each DO sample. We then query the \href{ftp://ftp.jax.org/SNPtools/variants/}{Sanger Mouse Genomes SNP file} to obtain all of the founder SNPs in the interval. For each Sanger SNP, we impute the Sanger SNPs onto DO genomes as follows: \begin{equation} a_{j}=\sum_{i=1}^{8}s_{i}h_{ij} \end{equation} where: \begin{itemize} \item{\emph{a} is the allele call (coded as 0, 1 or 2) for sample \emph{j} } \item{\emph{s} is the Sanger founder allele call (coded as 0 or 1) } \item{\emph{h} is the founder haplotype contribution of founder \emph{i} for sample \emph{j} } \end{itemize} \begin{equation} y = X\alpha + A\beta + Zu + \varepsilon \end{equation} where: \begin{itemize} \item{\emph{n} is the number of samples} \item{\emph{y} is an \emph{n} x 1 vector of phenotype values for each sample} \item{\emph{X} is an \emph{n} x \emph{p} matrix of \emph{p} fixed covariates (sex, diet, etc.) } \item{$\alpha$ is a \emph{p} x 1 vector of fixed effects } \item{\emph{A} is an \emph{n} x 3 matrix of imputed allele calls } \item{$\beta$ is an 3 x 1 vector of allele effects } \item{\emph{Z} is an \emph{n} x \emph{n} matrix of error covariances between samples } \item{\emph{u} is an \emph{n} x 1 vector of ???} \item{$\varepsilon$ is an \emph{n} x 1 vector of residual errors} \end{itemize} \section{QTL Mapping} We will use example data from \href{http://www.ncbi.nlm.nih.gov/pubmed/22345611}{Svenson et.al, \emph{Genetics}, 2012}. Breifly, 149 mice (75 F, 74 M) were placed on either a chow (n = 100) or a high fat diet (n = 49). A variety of clinical phenotypes were measured at two time points, roughly 14 weeks apart. In this example, we will map the hemoglobin distribution width (HDW) at the second time point. We will load this data from the \href{http://bioconductor.org}{Bioconductor} data package \texttt{MUGAExampleData}. <>= library(DOQTL) library(MUGAExampleData) data(pheno) data(model.probs) @ QTL mapping requires phenotype and genotype data. Here, we have a \texttt{data.frame} of phenotypes called \texttt{pheno} and a 3D array of founder haplotype contributions (num.samples x 8 founders x num.markers) called \texttt{model.probs}. The sample IDs must be in rownames(pheno) and dimnames(model.probs)[[1]] and they must match each other. We will map the hemoglobin distribution width at time point 2 (HDW2). \vspace{5 mm} First, we need to create a kinship matrix using the founder contributions. <>= K = kinship.probs(model.probs) @ Second, we need to create a matrix of additive covariates to run in the model. In this case, we will use sex, diet and CHOL1. Note that the sample IDs must be in \texttt{rownames(covar)}. <>= covar = data.frame(sex = as.numeric(pheno$Sex == "M"), diet = as.numeric(pheno$Diet == "hf")) rownames(covar) = rownames(pheno) @ Third, we need to get the marker locations on the array. <>= load(url("ftp://ftp.jax.org/MUGA/muga_snps.Rdata")) @ Fourth, we map the phenotype using \texttt{scanone}. <>= qtl = scanone(pheno = pheno, pheno.col = "HDW2", probs = model.probs, K = K, addcovar = covar, snps = muga_snps) @ Fifth, we run permutations to determine significane thresholds. We recommend running at least 1,000 permutations. In this demo, we run 100 permutations to save time. <>= perms = scanone.perm(pheno = pheno, pheno.col = "HDW2", probs = model.probs, addcovar = covar, snps = muga_snps, nperm = 100) thr = quantile(perms, probs = 0.95) @ We then plot the LOD curve for the QTL. <>= plot(qtl, sig.thr = thr, main = "HDW2") @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{QTL plot of HDW2. The LOD of the mode in Eqn. 1 is plotted along the mouse genome. The red line is the p $<$ 0.05 significance threshold.} \label{fig:qtlplot} \end{figure} The largest peak appears on Chr 9. The linkage mapping model (Eqn. 1) produces an estimate of the effect of each founder allele at each marker. We can plot these effects (model coefficients) on Chr 9 to see which founders contribute to a high HDW. <>= coefplot(qtl, chr = 9) @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Coefficient plot of HDW2 on Chr 9. The top panel shows the 8 estimated founder allele effects along Chr 9. The NOD/ShiLtJ allele contributes to high values and the A/J and PWK/PhJ alleles contribute to low values. The bottom panel shows the LOD score.} \label{fig:qtlplot} \end{figure} Note that the DO mice with alleles from three strains, 129S1/SvImJ, NZO/HlLtJ and WSB/EiJ, have lower changes in cholesterol than the other five strains. Remember these strains because they will appear again below. \vspace{5 mm} We then determine the width of the QTL support interval using \texttt{bayesint}. Note that this function only provides reasonable support intervals if there is a single QTL on the chromosome. <<>>= interval = bayesint(qtl, chr = 9) interval @ The QTL support interval is 4.7 Mb wide. \vspace{5 mm} Finally, we narrow the candidate gene list by imputing the founder SNPs onto the DO genomes. This idea is essentially \href{http://www.ncbi.nlm.nih.gov/pubmed/22847376}{assocation mapping} in an outbred population. <>= ma = assoc.map(pheno = pheno, pheno.col = "HDW2", probs = model.probs, K = K, addcovar = covar, snps = muga_snps, chr = interval[1,2], start = interval[1,3], end = interval[3,3]) tmp = assoc.plot(ma, thr = 4) unique(tmp$sdps) @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Association mapping plot of HDW2 in the Chr 9 support interval. The top panel shows the LOD score from assocation mapping (Eqn. 3) in the QTL support interval. The bottom panel shows the genes and non-coding RNAs from the \href{http://informatics.jax.org/}{Mouse Genome Informatics} database.} \label{fig:qtlplot} \end{figure} We can get the genes in the QTL interval using the \texttt{get.mgi.features()} function. <>= mgi = get.mgi.features(chr = interval[1,2], start = interval[1,3], end = interval[3,3], type = "gene", source = "MGI") nrow(mgi) head(mgi) @ There are 169 genes in the QTL support interval. Several SNPs have LOD scores above 4. This is a somewhat arbitrary cutoff and an appropriate threshold will be supplied in future version of DOQTL. In this case, there may be more than one variant that influences the phenotype. \section{SessionInfo} <>= sessionInfo() @ \end{document}