%\VignetteIndexEntry{ASSET Vignette} %\VignettePackage{ASSET} %\VigetteDepends{ASSET} \documentclass[a4paper]{article} \begin{document} \title{ASSET(Association analysis for SubSETs) Package} \maketitle \section*{Introduction} ASSET is a suite of statistical tools specifically designed to be powerful for pooling association signals across multiple studies when true effects may exist only in a subset of the studies and could be in opposite directions across studies. The method explores all possible subsets (or a restricted set if user specifies so) of studies and evaluates fixed-effect meta-analysis-type test-statistics for each subset. The final test-statistic is obtained by maximizing the subset-specific test-statistics over all possible subsets and then evaluating its significance after efficient adjustment for multiple-testing, taking into account the correlation between test-statistics across different subsets due to overlapping subjects. The method not only returns a p-value for significance for the overall evidence of association of a SNP across studies, but also outputs the "best subset" containing the studies that contributed to the overall association signal. For detection of association signals with effects in opposite directions, ASSET allows subset search separately for positively- and negatively- associated studies and then combines association signals from two directions using a chi-square test-statistic. The method can take into account correlation due to overlapping subjects across studies (e.g. shared controls). Although the method is originally developed for conducting genetic association scans, it can also be applied for analysis of non-genetic risk factors as well. The ASSET package consists of two main functions: (1) h.traits and (2) h.types. The function h.traits is suitable for conducting meta-analysis of possibly different traits when summary level data are available from individual studies. The function allows for correlation among different studies/traits, which, for example, may arise due to shared subjects across studies. This function can also be used to conduct "meta-analysis" across multiple correlated traits on the same individuals by appropriately specifying the correlation matrix for the multivariate trait. Input arguments to this function are vectors/matrices of the estimated log-odds ratios, standard errors and number of cases and controls for each SNP and study. The function h.types is suitable for analysis of case-control studies when cases consist of distinct disease subtypes. This function assumes individual level data are available. The main input argument for h.types is a data frame containing the SNP variables, response variable and covariates for all subjects. <>= library(ASSET) @ \section*{Examples of h.traits} Get the path to the data. <>= datafile <- system.file("sampleData", "vdata.rda", package="ASSET") @ Load the data frames. There are 4 data frames, data1 - data4 for the 4 independent studies. Each study has the SNPs SNP1-SNP3 genotyped, and information on each subject's age and case-control status. Each SNP is coded as the number of copies of the minor allele or NA for missing genotypes. <>= load(datafile) data1[1:5, ] SNPs <- paste("SNP", 1:3, sep="") nSNP <- length(SNPs) studies <- paste("STUDY", 1:4, sep="") nStudy <- length(studies) @ Let us determine the number of non-missing cases and controls for each SNP and study. <>= case <- matrix(data=NA, nrow=nSNP, ncol=nStudy) control <- matrix(data=NA, nrow=nSNP, ncol=nStudy) for (i in 1:nStudy) { data <- eval(parse(text=paste("data", i, sep=""))) caseVec <- data[, "CC"] == 1 controlVec <- !caseVec for (j in 1:nSNP) { temp <- !is.na(data[, SNPs[j]]) case[j, i] <- sum(caseVec & temp, na.rm=TRUE) control[j, i] <- sum(controlVec & temp, na.rm=TRUE) } } case control @ Run a logistic regression for each SNP and study <>= beta <- matrix(data=NA, nrow=nSNP, ncol=nStudy) sigma <- matrix(data=NA, nrow=nSNP, ncol=nStudy) for (i in 1:nStudy) { data <- eval(parse(text=paste("data", i, sep=""))) for (j in 1:nSNP) { data[, "SNP"] <- data[, SNPs[j]] fit <- glm(CC ~ AGE + SNP, data=data, family=binomial()) coef <- summary(fit)$coefficients beta[j, i] <- coef["SNP", 1] sigma[j, i] <- coef["SNP", 2] } } beta sigma @ Call the h.traits function. Since the studies are independent, we do not need to specify the cor option. <>= res <- h.traits(SNPs, studies, beta, sigma, ncase=case, ncntl=control, meta=TRUE) @ Compute a summary table. Notice that in the Subset.2sided results, the first 2 SNPs have missing values for OR.2, CI.low.2, and CI.high.2 since the estimated betas were all positive for these SNPs. <

>= h.summary(res) @ Intead of searching over all possible subsets, let us define our own subset function to determine which nsubsets to search over. We will only consider subsets where the first m traits are in the subset (m = 1, 2, ...). The DLM p-value will also be computed using only these subsets. <>= sub.def <- function(logicalVec, args) { sum <- sum(logicalVec) ret <- all(logicalVec[1:sum]) ret } @ Call the h.traits function with the zmax.args pval.args options defined <>= res <- h.traits(SNPs, studies, beta, sigma, ncase=case, ncntl=control, meta=TRUE, zmax.args=list(sub.def=sub.def), pval.args=list(sub.def=sub.def)) @ <

>= h.summary(res) @ \section*{Examples of h.types} The disease subtype variable in each study data frame is called TYPE, which has values "TYPE\_1", "TYPE\_2", "TYPE\_3" and "CONTROL". First, we will combine the individual study data into a single data frame <>= data <- NULL for (i in 1:nStudy) { temp <- eval(parse(text=paste("data", i, sep=""))) temp[, "STUDY"] <- i data <- rbind(data, temp) } @ In addition to age, we will also adjust for study. Create the study indicator variables: <>= for (i in 1:nStudy) { dvar <- paste("STUDY_", i, sep="") data[, dvar] <- as.numeric(data[, "STUDY"] %in% i) } @ Define the SNP variables, adjustment variables and disease subtype labels. <>= snp.vars <- paste("SNP", 1:3, sep="") adj.vars <- c("AGE", "STUDY_1", "STUDY_2", "STUDY_3") types.lab <- paste("TYPE_", 1:3, sep="") @ Call the h.types function with option logit=TRUE to also run the overall logistic regression for each SNP. <>= ret <- h.types(dat=data, response.var="TYPE", snp.vars=snp.vars, adj.vars=adj.vars, types.lab=types.lab, cntl.lab="CONTROL", logit=TRUE) @ Summarize the results <>= h.summary(ret) @ \section*{Session Information} <>= sessionInfo() @ \end{document}