% % NOTE -- ONLY EDIT Biobase.Rnw!!! % Biobase.tex file will get overwritten. % %\VignetteIndexEntry{RSNPper Primer} %\VignetteDepends{XML} %\VignetteKeywords{SNP} %\VignettePackage{RSNPper} % % NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % \documentclass[12pt]{article} %%\usepackage{amsmath} \usepackage[authoryear,round]{natbib} %%\usepackage{hyperref} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \bibliographystyle{plainnat} %\setkeys{Gin}{width=0.55\textwidth} \begin{document} \title{{\it RSNPper}: utilities for SNP data} \author{VJ Carey {\tt stvjc@channing.harvard.edu}} \maketitle \tableofcontents \section{Introduction} This document describes {\tt RSNPper} version 1.0, added to Bioconductor in October of 2003. This first version focuses on SNP metadata, with functions that retrieve SNP-related data from the Boston Children's Hospital Informatics Program SNPper web service \cite{Riva:2002}. Earlier non-released versions of this package included considerable code for working with prettybase format and for conducting other tasks in SNP discovery projects. That material has been moved to {\tt inst/OLD} and may be re-introduced later. Users seeking legacy support should contact the author. \section{How it works} <>= library(RSNPper) @ The core of this package is the XML-RPC service at CHIP accessible through the following URL stub: <>= print(.SNPperBaseURL) @ The {\tt useSNPper} function allows you to work directly with the XML-RPC server by packing up appropriate command and argument strings. <>= dput(useSNPper) print(useSNPper("geneinfo","&name=CRP")[1:7]) @ The main functions of {\it RSNPper} attend to simplifying specification of parameters and parsing and packaging the XML results. {\bf Note on auditability.} All functions return textual information coupled with auditing information as a 'toolInfo' attribute, detailing the SNPper supplied information on the human genome sequence build, the dbSNP version, and the SNPper version from which the results are obtained. At present, there is one exception: when {\tt itemsInRange} is invoked with {\tt item='countsnps}, no toolInfo data is obtained. This will be corrected once the {\tt countsnps} command at SNPper returns valid XML element tags. \section{Overview of the functions} The current set of functions intended for investigative use is: \begin{itemize} \item {\tt geneInfo} -- general information about location and nomenclature \item {\tt geneLayout} -- information about exon locations \item {\tt geneSNPs} -- all SNPs associated with a given gene \item {\tt SNPinfo} -- detailed information on a SNP \item {\tt itemsInRange} -- supports chromosome scanning for genes, SNPs, or counts of SNPs \end{itemize} An omission: for SNP information, I have not collected information on submitter. \section{Demonstrations} \subsection{Obtaining information on genes} The {\tt geneInfo} function will collect some basic information on a gene. The gene may be specified by HUGO name, mRNA accession number, or SNPper id. <>= print(geneInfo("CRP")) @ The {\tt geneLayout} function provides information on exon locations. <>= print(geneLayout("546")) @ Information on all the genes catalogued in a certain chromosomal region can be obtained using {\tt itemsInRange}. <>= print(itemsInRange("genes", "chr1", "156400000", "156500000")) @ \subsection{Obtaining information on SNPs} Suppose you want information on the SNP with dbSNP id rs25. <>= print(SNPinfo("25")) @ Suppose instead you want information on all the SNPs cataloged in a certain chromosomal region. <>= ird <- itemsInRange("snps", "chr1", "156400000", "156500000") print(length(ird)) print(ird[1:3]) @ Note that the start and end locations are supplied as strings. This is to avoid coercion to textual scientific notation. Additional detail on the count of SNPs can be obtained more briefly: <>= print(itemsInRange("countsnps", "chr1", "156400000", "156500000")) @ To see all the SNPs associated with a given gene, use the {\tt geneSNPs} function. This requires knowledge of the SNPper gene id, which can be obtained using {\tt geneInfo}. <>= gs <- geneSNPs("546") print(length(gs)) print(gs[1:3]) @ \section{Application: SNP density on chr 1} Human chromosome 1 is approximately 300Mb, and 142,629 SNPs have been recorded as of dbSNP build 106, according to NCBI SNP/maplists/maplist-newmap.html on 13 Sep 03. Let's see if these facilities can recover this sort of information. Counting the number of SNPs on a long chromosomal region seems to take a long time for SNPper, so we will break up the task. <>= print(itemsInRange("countsnps", "chr1", "1", "100000")) system("sleep 2") print(itemsInRange("countsnps", "chr1", "100001", "200000")) system("sleep 2") print(itemsInRange("countsnps", "chr1", "200001", "300000")) system("sleep 2") @ These runs complete in a reasonable amount of time. Here we will just look at the first 2Mb in intervals of .1Mb. <>= starts <- as.character(as.integer(seq(1,2000001,100000))) ends <- as.character(as.integer(as.integer(starts)+99999)) out <- matrix(NA,nr=20,nc=3) for (i in 1:20) { cat(i) out[i,] <- itemsInRange("countsnps", "chr1", starts[i], ends[i]) system("sleep 2") } print(out) @ \end{document}