\name{snp.rhs.estimates} \alias{snp.rhs.estimates} \title{Fit GLMs with SNP genotypes as independent variable(s)} \description{ This function fits a generalized linear model with phenotype as dependent variable and with a series of SNPs (or small sets of SNPs) as predictor variables. Optionally, one or more potential confounders of a phenotype-genotype association may be included in the model. In order to protect against misspecification of the variance function, "robust" estimates of the variance-covariance matrix of estimates may be calculated in place of the usual model-based estimates. } \usage{ snp.rhs.estimates(formula, family = "binomial", link, weights, subset, data = parent.frame(), snp.data, rules = NULL, sets = NULL, robust = FALSE, uncertain = FALSE, control = glm.test.control()) } \arguments{ \item{formula}{The model formula, with phenotype as dependent variable and any potential confounders as independent variables. Note that parameter estimates are not returned for these model terms} \item{family}{A string defining the generalized linear model family. This currently should (partially) match one of \code{"binomial"}, \code{"Poisson"}, \code{"Gaussian"} or \code{"gamma"} (case-insensitive)} \item{link}{A string defining the link function for the GLM. This currently should (partially) match one of \code{"logit"}, \code{"log"}, \code{"identity"} or \code{"inverse"}. The default action is to use the "canonical" link for the family selected} \item{data}{The dataframe in which the model formula is to be interpreted} \item{snp.data}{An object of class \code{"SnpMatrix"} or \code{"XSnpMatrix"} containing the SNP data} \item{rules}{Optionally, an object of class \code{"ImputationRules"}} \item{sets}{Either a vector of SNP names (or numbers) for the SNPs to be added to the model formula, or a logical vector of length equal to the number of columns in \code{snp.data} or a list of short vectors defining sets of SNPs to be included (see \code{Details})} \item{weights}{"Prior" weights in the generalized linear model} \item{subset}{Array defining the subset of rows of \code{data} to use} \item{robust}{If \code{TRUE}, robust tests will be carried out} \item{uncertain}{If \code{TRUE}, uncertain genotypes are used and scored by their posterior expectations. Otherwise they are treated as missing} \item{control}{An object giving parameters for the IRLS algorithm fitting of the base model and for the acceptable aliasing amongst new terms to be tested. See \code{\link{glm.test.control}}} } \details{ Homozygous SNP genotypes are coded 0 or 2 and heterozygous genotypes are coded 1. For SNPs on the X chromosome, males are coded as homozygous females. For X SNPs, it will often be appropriate to include sex of subject in the base model (this is not done automatically). The "robust" option causes Huber-White estimates of the variance-covariance matrix of the parameter estimates to be returned. These protect against mis-specification of the variance function in the GLM, for example if binary or count data are overdispersed, If a \code{data} argument is supplied, the \code{snp.data} and \code{data} objects are aligned by rowname. Otherwise all variables in the model formulae are assumed to be stored in the same order as the columns of the \code{snp.data} object. Usually SNPs to be fitted in models will be referenced by name. However, they can also be referenced by number, indicating the appropriate column in the input \code{snp.data}. They can also be referenced by a logical selection vector of length equal to the number of columns in \code{snp.data}. If the \code{rules} argument is supplied, SNPs may be imputed using these rules and included in the model. } \value{ An object of class \code{\link[=GlmEstimates-class]{GlmEstimates}} } \author{David Clayton \email{david.clayton@cimr.cam.ac.uk}} \note{ A factor (or several factors) may be included as arguments to the function \code{strata(...)} in the \code{formula}. This fits all interactions of the factors so included, but leads to faster computation than fitting these in the normal way. Additionally, a \code{cluster(...)} call may be included in the base model formula. This identifies clusters of potentially correlated observations (e.g. for members of the same family); in this case, an appropriate robust estimate of the variance of the parameter estimates is used. If uncertain genotypes (e.g. as a result of imputation) are used, the interpretation of the regression coefficients is questionable; the regression coefficient for an imperfectly measurement of a variable is not a biased (attenuated) estimate of the coefficient of the variable measured. } \seealso{\code{\link{GlmEstimates-class}}, \code{\link{snp.lhs.estimates}}, \code{\link{snp.rhs.tests}}, \code{\link{SnpMatrix-class}}, \code{\link{XSnpMatrix-class}}} \examples{ data(testdata) test <- snp.rhs.estimates(cc~strata(region), family="binomial", data=subject.data, snp.data= Autosomes, sets=1:10) print(test) test2 <- snp.rhs.estimates(cc~region+sex, family="binomial", data=subject.data, snp.data= Autosomes, sets=1:10) print(test2) test.robust <- snp.rhs.estimates(cc~strata(region), family="binomial", data=subject.data, snp.data= Autosomes, sets=1:10, robust=TRUE) print(test.robust) } \keyword{htest}