<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> <title> Lab3 (RNW) — bioconductor.org </title> <base href="http://bioconductor.fhcrc.org/workshops/2002/Seattle02/Seattle/inst/doc/Lab3.Rnw" /> <meta name="generator" content="Plone - http://plone.org" /> <meta content="admin" name="DC.creator" /> <meta content="2005-07-01 10:23:17" name="DC.date.created" /> <meta content="2005-07-08 13:47:00" name="DC.date.modified" /> <meta content="Document" name="DC.type" /> <meta content="text/plain" name="DC.format" /> <!-- Basic crude style for Netscape4.x - This can be removed if you don't want the special NS4 look - it will still work, just be plain text instead. Column layout for Netscape4.x included --> <link rel="Stylesheet" type="text/css" href="http://bioconductor.fhcrc.org/ploneNS4.css" /> <!-- Column style sheet. --> <style type="text/css" media="screen"><!-- @import url(http://bioconductor.fhcrc.org/ploneColumns.css); --></style> <!-- Main style sheets for CSS2 capable browsers --> <style type="text/css" media="screen"><!-- @import url(http://bioconductor.fhcrc.org/plone.css); --></style> <!-- Old style sheet from Plone 1.0, remove tal:condition="nothing" if you need to use the old styles. Will be removed in Plone 2.1. --> <!-- Alternate style sheets for the bigger/smaller text switcher --> <link rel="alternate stylesheet" type="text/css" media="screen" href="http://bioconductor.fhcrc.org/ploneTextSmall.css" title="Small Text" /> <link rel="alternate stylesheet" type="text/css" media="screen" href="http://bioconductor.fhcrc.org/ploneTextLarge.css" title="Large Text" /> <!-- Style sheet used for printing --> <link rel="stylesheet" type="text/css" media="print" href="http://bioconductor.fhcrc.org/plonePrint.css" /> <!-- Style sheet used for presentations (Opera is the only browser supporting this at the moment) --> <link rel="stylesheet" type="text/css" media="projection" href="http://bioconductor.fhcrc.org/plonePresentation.css" /> <!-- Internet Explorer CSS Fixes --> <!--[if IE]> <style type="text/css" media="all">@import url(http://bioconductor.fhcrc.org/ploneIEFixes.css);</style> <![endif]--> <!-- Custom style sheet if available --> <style type="text/css" media="all"><!-- @import url(http://bioconductor.fhcrc.org/ploneCustom.css); --></style> <link rel="shortcut icon" href="http://bioconductor.fhcrc.org/favicon.ico" type="image/x-icon" /> <link rel="search" href="http://bioconductor.fhcrc.org/search_form" title="Search this site" /> <link rel="up" href="http://bioconductor.fhcrc.org/workshops/2002/Seattle02/Seattle/inst/doc" title="Up one level" /> <!-- Disable IE6 image toolbar --> <meta http-equiv="imagetoolbar" content="no" /> <!-- A slot where you can insert CSS in the header from a template --> <!-- min-width support for IE via Javascript, required for tableless --> <!--[if IE]> <script type="text/javascript" src="http://bioconductor.fhcrc.org/plone_minwidth.js"></script> <![endif]--> <!-- A slot where you can insert elements in the header from a template --> <!-- A slot where you can insert javascript in the header from a template --> <!-- Common Plone ECMAScripts --> <!-- Pull-down ECMAScript menu, only active if logged in --> <!-- old google analytics here --> <!-- Define dynamic server-side variables for javascripts in this one --> <script type="text/javascript" src="http://bioconductor.fhcrc.org/plone_javascript_variables.js"> </script> <script type="text/javascript" src="http://bioconductor.fhcrc.org/plone_javascripts.js"> </script> <!-- Old JS from Plone 1.0, remove tal:condition="nothing" if you need to use the old pop-ups. Will be removed in Plone 2.1 --> </head> <body class="section-workshops"> <div id="visual-portal-wrapper"> <div id="portal-top"> <a href="#documentContent" class="hiddenStructure">Skip to content.</a> <h1 id="portal-logo"> <a href="http://bioconductor.fhcrc.org">bioconductor.org</a> </h1> <div id="portal-slogan"><p>Bioconductor is an open source and open development software project<br /> for the analysis and comprehension of genomic data.</p></div> <h5 class="hiddenStructure">Sections</h5> <ul id="portal-globalnav"><li id="portaltab-index_html" class="plain"><a href="http://bioconductor.fhcrc.org" accesskey="t">Home</a></li><li id="portaltab-GettingStarted" class="plain"><a href="http://bioconductor.fhcrc.org/GettingStarted" accesskey="t">Getting Started</a></li><li id="portaltab-overview" class="plain"><a href="http://bioconductor.fhcrc.org/overview" accesskey="t">Overview</a></li><li id="portaltab-download" class="plain"><a href="http://bioconductor.fhcrc.org/download" accesskey="t">Downloads</a></li><li id="portaltab-docs" class="plain"><a href="http://bioconductor.fhcrc.org/docs" accesskey="t">Documentation</a></li><li id="portaltab-biocpub" class="plain"><a href="http://bioconductor.fhcrc.org/pub" accesskey="t">Publications</a></li><li id="portaltab-workshops" class="selected"><a href="http://bioconductor.fhcrc.org/workshops" accesskey="t">Workshops</a></li><li id="portaltab-cabig" class="plain"><a href="http://wiki.fhcrc.org/caBioc" accesskey="t">caBIG</a></li></ul> </div> <div class="visualClear"></div> <!-- The wrapper div. It contains the three columns. --> <div id="portal-columns" class="visualColumnHideNone"> <!-- start of the main and left columns --> <div id="visual-column-wrapper"> <!-- start of main content block --> <div id="portal-column-content" class="topmargin1"> <div id="content" class=""> <div class="documentContent" id="region-content"> <a name="documentContent"></a> <h1 class="documentFirstHeading">Lab3 (RNW)</h1> <div class="documentDescription"></div> <div class="plain"> % <br />% NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is <br />% likely to be overwritten. <br />% <br />% \VignetteIndexEntry{Seattle Lab 3} <br />%\VignetteDepends{Biobase. golubEsets, ellipse,lattice, mva, MASS } <br />%\VignetteKeywords{Microarray} <br />\documentclass[12pt]{article} <br /> <br />\usepackage{amsmath,pstricks} <br />\usepackage[authoryear,round]{natbib} <br />\usepackage{hyperref} <br /> <br /> <br />\textwidth=6.2in <br />\textheight=8.5in <br />%\parskip=.3cm <br />\oddsidemargin=.1in <br />\evensidemargin=.1in <br />\headheight=-.3in <br /> <br />\newcommand{\scscst}{\scriptscriptstyle} <br />\newcommand{\scst}{\scriptstyle} <br /> <br />\bibliographystyle{plainnat} <br /> <br />\begin{document} <br /> <br />\section*{Clustering Using R and Bioconductor} <br /> <br />We continue our extended example involving the data set from \citet{Golub99}. <br />In this lab we will consider <br />computing a number of different distances and <br />clustering techniques applied to these data. <br /> <br />We will look at distance calculations, followed by multidimensional scaling <br />and then examine a number of different clustering methods. <br /> <br /><<loadlibs>>= <br /> library(Seattle) <br /> library(lattice) <br />@ <br /> <br />We will set up the data from scratch once again. <br /> <br /><<setup>>= <br />X<-exprs(golubTrain) <br />X[X<100]<-100 <br />X[X>16000]<-16000 <br /> <br />mmfilt <- function(r=5, d=500, na.rm=TRUE) { <br />function(x) { <br /> minval <- min(x, na.rm=na.rm) <br /> maxval <- max(x, na.rm=na.rm) <br /> (maxval/minval > r) && (maxval-minval > d) <br /> } <br />} <br /> <br />mmfun <- mmfilt() <br /> <br />ffun <- filterfun(mmfun) <br />sub <- genefilter(X, ffun ) <br />sum(sub) ## Should get 3051 <br /> <br />X <- X[sub,] <br />X <- log2(X) <br />golubTrainSub<-golubTrain[sub,] <br />golubTrainSub@exprs <- X <br /> <br />Y <- golubTrainSub$ALL.AML <br /> <br />Y<-paste(golubTrain$ALL.AML,golubTrain$T.B.cell) <br />Y<-sub("NA","",Y) <br /> <br />@ <br /> <br />Now that the data are set up we are ready to start looking at the data. <br /> <br />\section*{Distances} <br /> <br />%%FIXME: get the ellipse package <br /> <br />We first compute the correlation distance between all the genes in <br />the selected subset from the training set. <br /> <br />%%FIXME: other distances? <br /> <br /><<cordist>>= <br />r<-cor(X) <br />dimnames(r)<-list(as.vector(Y),as.vector(Y)) <br />d<-1-r <br /> <br />@ <br /> <br /> <br />We rely on the \textit{ellipse} package for plotting some of these <br />values. <br /><<plotcorr>>= <br /> <br />plotcorr(r, <br />main="Leukemia data: Correlation matrix for 38 mRNA samples\n All 3,051 genes") <br />plotcorr(r,numbers=TRUE, <br />main="Leukemia data: Correlation matrix for 38 mRNA samples\n All 3,051 genes") <br />levelplot(r,col.region=heat.colors(50), <br />main="Leukemia data: Correlation matrix for 38 mRNA samples\n All 3,051 genes") <br /> <br />@ <br /> <br />We can next look at multidimensional scaling using these data. <br />Multidimensional scaling is a data reduction method that is appropriate for <br />distance data. It is much like principal components (but it is not the <br />same -- except for Euclidean distances). <br /> <br />An interesting question is whether the data in the distance matrix can be <br />reduced to a smaller dimensional space. <br />Note, if we are measuring distances between samples on the basis of <br />$N=3051$ probes then we are essentially looking at points in 3,051 dimensional <br />space. <br /> <br /><<mds>>= <br />library(mva) <br /> <br />mds<- cmdscale(d, k=2, eig=TRUE) <br />plot(mds$points, type="n", xlab="", ylab="", main="MDS for ALL AML data, correlation matrix, G=3,051 genes, k=2") <br />text(mds$points[,1],mds$points[,2],Y, col=codes(factor(Y))+1, cex=0.8) <br /> <br />mds<- cmdscale(d, k=3, eig=TRUE) <br />pairs(mds$points, main="MDS for ALL AML data, correlation matrix, G=3,051 genes, k=3", pch=c("B","T","M")[codes(factor(Y))], col = codes(factor(Y))+1) <br /> <br /> <br /> <br />@ <br /> <br />To assess how many components there are a \textit{scree} plot similar to that <br />used for principal components can be created. <br /> <br />This plot of the eigenvalues suggests that much of the information is <br />contained in the first component. One might consider using either three <br />or four components as well. <br /> <br /><<scree>>= <br /> <br /> mdsScree <- cmdscale(d, k=8, eig=TRUE) <br /> <br /> plot(mdsScree$eig, pch=18, col="blue") <br /> <br />@ <br /> <br /> <br />\end{document} <br /> </div> <div class="discussion"> </div> </div> </div> </div> <!-- end of main content block --> <!-- start of the left (by default at least) column --> <div id="portal-column-one"> <div class="visualPadding"> <br><br><br> <!-- disabled left slot image <img tal:replace="structure nocall:here/pict.jpg" /> --> <div class="portlet" id="portlet-navigation-tree"> <div> <h5>Navigation</h5> <div class="portletBody"> <div class="portletContent odd"> <a href="http://bioconductor.fhcrc.org/GettingStarted/" accesskey="n" class="navItem navLevel1" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">Getting Started</span> </a> <a href="http://bioconductor.fhcrc.org/overview/" accesskey="n" class="navItem navLevel1" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">Overview</span> </a> <a href="http://bioconductor.fhcrc.org/download/" accesskey="n" class="navItem navLevel1" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">Downloads</span> </a> <a href="http://bioconductor.fhcrc.org/docs/" accesskey="n" class="navItem navLevel1" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">Documentation</span> </a> <a href="http://bioconductor.fhcrc.org/pub/" accesskey="n" class="navItem navLevel1" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">Publications</span> </a> <a href="http://bioconductor.fhcrc.org/workshops/" accesskey="n" class="navItem navLevel1" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">Workshops</span> </a> <a href="http://bioconductor.fhcrc.org/workshops/2010/" accesskey="n" class="navItem navLevel2" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">2010</span> </a> <a href="http://bioconductor.fhcrc.org/workshops/2009/" accesskey="n" class="navItem navLevel2" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">2009</span> </a> <a href="http://bioconductor.fhcrc.org/workshops/2008/" accesskey="n" class="navItem navLevel2" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">2008</span> </a> <a href="http://bioconductor.fhcrc.org/workshops/2007/" accesskey="n" class="navItem navLevel2" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">2007</span> </a> <a href="http://bioconductor.fhcrc.org/workshops/2006/" accesskey="n" class="navItem navLevel2" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">2006</span> </a> <a href="http://bioconductor.fhcrc.org/workshops/2005/" accesskey="n" class="navItem navLevel2" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">2005</span> </a> <a href="http://bioconductor.fhcrc.org/workshops/2004/" accesskey="n" class="navItem navLevel2" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">2004</span> </a> <a href="http://bioconductor.fhcrc.org/workshops/2003/" accesskey="n" class="navItem navLevel2" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">2003</span> </a> <a href="http://bioconductor.fhcrc.org/workshops/2002/" accesskey="n" class="navItem navLevel2" title="Workshops and courses 2002"> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">2002</span> </a> <a href="http://bioconductor.fhcrc.org/workshops/2002/Seattle02/" accesskey="n" class="navItem navLevel3" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">Seattle Dec</span> </a> <a href="http://bioconductor.fhcrc.org/workshops/2002/Seattle02/Seattle/" accesskey="n" class="navItem navLevel4" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">Seattle</span> </a> <a href="http://bioconductor.fhcrc.org/workshops/2002/Seattle02/Seattle/inst/" accesskey="n" class="navItem navLevel5" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">inst</span> </a> <a href="http://bioconductor.fhcrc.org/workshops/2002/Seattle02/Seattle/inst/doc/" accesskey="n" class="navItem navLevel6 currentNavItem" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">doc</span> </a> <a href="http://bioconductor.fhcrc.org/developers/" accesskey="n" class="navItem navLevel1" title=""> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">Developers</span> </a> <a href="http://bioconductor.fhcrc.org/News/" accesskey="n" class="navItem navLevel1" title="BioC Project News"> <!-- Disable Folder icon <img src="" height="16" width="16" alt="" class="navIcon" tal:condition="not:isAnon" tal:attributes="src python:portal_url+'/'+sibling.getIcon(1); title sibling/Type;" /> --> <span class="navItemText">News</span> </a> </div> </div> </div> </div> </div> </div> <!-- end of the left (by default at least) column --> </div> <!-- end of the main and left columns --> <!-- start of right (by default at least) column --> <div id="portal-column-two"> <div class="visualPadding"> <div> <!-- The Related Items box --> <script type="text/javascript" src="plonesearchbox_utils.js"> </script> <div class="portlet"> <!-- <h5 i18n:translate="searchbox_title">Search Box</h5> --> <div class="portletBody"> <form name="searchbox_form" id="searchbox_form" method="post" onsubmit="return process()" action="redirectToUrl" target="_blank"> <div class="portletContent odd"> <div id="searchbox_fields"> <input type="text" name="toSearch" id="toSearch" size="22" /> </div> <select name="choice" id="choice" onchange="return generateForm();" style="margin: 3px 0px 0px 0px;"> <option value="http://bioconductor.fhcrc.org/workshops/2002/Seattle02/Seattle/inst/doc/Lab3.Rnw/search?SearchableText=" id="search_portal">In this site</option> <option value="http://bioconductor.fhcrc.org/workshops/2002/Seattle02/Seattle/inst/doc/Lab3.Rnw/search?path=/Plones/rgentlem/bioconductor/workshops/2002/Seattle02/Seattle/inst/doc/Lab3.Rnw&SearchableText=" id="search_rubric">In this folder</option> <option value="http://google.com/search?sitesearch=www.bioconductor.org&q=%(text)s">Google this site</option> </select> <input class="context searchButton" type="submit" value="Search" style="margin: 3px;" /> </div> <!-- <div class="portletContent odd"> <input class="context searchButton" type="submit" value="Search" style="margin: 3px;" i18n:attributes="value"/> </div> <div class="portletContent even"> <a href="" tal:attributes="href string:${portal_url}/search_form" i18n:translate="advanced_search_link">Advanced search</a> </div> --> </form> <script type="text/javascript"> <!-- /*in case the first choice of the combobox is an url with several fields*/ clearForm(); generateForm(); --> </script> </div> </div> </div> <div class="portlet" id="portlet-news"> <h5>News</h5> <div class="portletBody"> <div class="portletContent odd"> <div class="portletDetails"> <a href="http://bioconductor.fhcrc.org/News/AdvancedR" class="date" title="Advanced R Programming">2010-05-21</a> <p> Advanced R Programming for Bioinformatics course material now <a href="http://bioconductor.fhcrc.org/workshops/2010/AdvancedR/">available</a> </p> </div> </div> <div class="portletContent even"> <div class="portletDetails"> <a href="http://bioconductor.fhcrc.org/News/bioc_2.6_release" class="date" title="BioC 2.6 Released">2010-04-23</a> <p> Bioconductor 2.6, consisting of 389 packages and designed to work with R version 2.11, was released today. </p> </div> </div> <div class="portletContent odd"> <a href="http://bioconductor.fhcrc.org/news" class="portletMore"> More... </a> </div> </div> </div> </div> </div> <!-- end of the right (by default at least) column --> </div> <!-- end column wrapper --> <div class="visualClear"></div> <hr class="netscape4" /> <div id="portal-footer"> © 2003-2009 BioConductor. All Rights Reserved. <script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try{ var pageTracker = _gat._getTracker("UA-357281-1"); pageTracker._trackPageview(); } catch(err) {} </script> </div> </div> </body> </html>