% \VignetteIndexEntry{CRImage Manual} % \VignetteDepends{EBImage, MASS, e1071, foreach} % \VignetteKeyword{CRImage} % \VignettePackage{CRImage} \documentclass[10pt]{article} \topmargin 0.0cm \oddsidemargin 0.5cm \evensidemargin 0.5cm \textwidth 16cm \textheight 21cm \usepackage[labelfont=bf,labelsep=period,justification=raggedright]{caption} \newcommand{\Rfunction}[1]{{\mbox{\normalfont\texttt{#1}}}} \begin{document} \begin{flushleft} \begin{center} {\LARGE \bf CRImage \\ a package for classifying cells and calculating tumour cellularity\\} \bigskip Henrik Failmezger$^{\ast}$, Yinyin Yuan, Oscar Rueda, Florian Markowetz \\ \bigskip $\ast$ E-mail: failmezger@mpipz.mpg.de \end{center} \end{flushleft} \tableofcontents \section{Load the package} The package is loaded by the following command: <<>>= library(CRImage) @ See the functions of the package EBImage to read, write and manipulate an image. \section{Image processing} \subsection{Color space conversion} Images can be converted to the HSV and LAB color space. <>= f = system.file("extdata", "exImg2.jpg", package="CRImage") img=readImage(f) #convert image to the HSV color space imgHSV=convertRGBToHSV(img) #convert image to the LAB color space imgLAB=convertRGBToLAB(img) #convert back to the RGB color space imgRGB=convertHSVToRGB(imgHSV) imgRGB=Image(imgRGB) colorMode(imgRGB)='color' #display(imgRGB) #convert back to the RGB color space imgRGB=convertLABToRGB(imgLAB) imgRGB=Image(imgRGB) colorMode(imgRGB)='color' #display(imgRGB) @ \subsection{Color correction} In order to correct for staining deviations the color space of one image can be adapted to the color space of a target image. The target image is converted to the LAB color space and the mean and standard deviation of every channel is calculated. Afterwards the image is standardized to the mean and standard deviation of the target image. \begin{figure} \centerline{\includegraphics[width=0.3\textwidth]{exImg2} \includegraphics[width=0.3\textwidth]{exImg3} \includegraphics[width=0.3\textwidth]{imgCorrected}} \caption{Color correction. The color values of the image in the middle are adapted to the color values of the left image.} \label{fig2} \end{figure} \section{Image thresholding} The function \Rfunction{createBinaryImage} can be used to create a binary image of a grayscale image. Either Otsu thresholding or Phansalkar thresholding can be used for this task. Otsu thresholding trys to separate the grayscale histogram in two parts. Phansalkar thresholding calculates the threshold based on the mean and standard deviation of the values in a certain window. The threshold is calculated on a RGB grayscale image and an Euclidean distance image of the LAB colour space, the results are ORed afterwards. This function calculates Otsu threshold of a grayscale image. <>= f = system.file("extdata", "exImg2.jpg", package="CRImage") img=readImage(f) #convert to grayscale imgG=EBImage::channel(img,"gray") #create a mask for white pixel whitePixelMask=img[,,1]>0.85 & img[,,2]>0.85 & img[,,3]>0.85 #create binary image imgB=createBinaryImage(imgG,img,method="otsu",numWindows=4,whitePixelMask=whitePixelMask) @ The image is read with \Rfunction{readImage}. \Rfunction{whitePixelMask} is TRUE for white pixels these pixels are considered as background and excluded from thresholding. The image is converted to grayscale by the function \Rfunction{channel(exImgB,"gray")} of EBImage. The function \Rfunction{createBinaryImage} creates a binary image using the thresholding method "otsu". \begin{figure} \centerline{\includegraphics[width=0.5\textwidth]{imgB}} \caption{Thresholded image. Cell nuclei are coloured white, background is coloured black.} \label{fig1} \end{figure} \section{Segment an image} An image can be segmented to find cells in the image, using the function \Rfunction{segmentImage}. <>= f = system.file("extdata", "exImg2.jpg", package="CRImage") segmentationValues=segmentImage(filename=f,maxShape=800,minShape=40,failureRegion=2000,threshold="otsu",numWindows=4) @ The image is converted to grayscale and thresholded. Morphological opening is used to delete clutter and to smooth the shapes of the cells. The watershed algorithm is used to separate clustered cells. The parameter \Rfunction{maxShape} defines the maximal shape of cell nuclei. Segmented nuclei which exceed this value will be thresholded and segmented again. The parameter \Rfunction{minShape} defines the minimum size if cell nuclei. Cell nuclei, which fall below this value will be deleted. The parameter \Rfunction{failureRegion} defines, when artifacts in the image should be deleted. Dark regions which exceed this value will be deleted. The list element \Rfunction{segmentationValues[[1]]} holds the original image, \Rfunction{segmentationValues[[2]]} holds the segmented image and \Rfunction{segmentationValues[[3]]} holds features, which were calculated for the segmented objects. The segmented objects can be drawn by the function \Rfunction{display} (see EBImage). \begin{figure} \centerline{\includegraphics[width=0.5\textwidth]{segmentedImageRaw} \includegraphics[width=0.5\textwidth]{segmentedImage}} \caption{SegmentedImage. Cell nuclei are labeled in green.} \label{fig2} \end{figure} \section{Creating a training set} To classify cells in images first an appropriate training set has to be created. \begin{verbatim} f = system.file("extdata", "exImg.jpg", package="CRImage") trainingValues=createTrainingSet(filename=f,maxShape=800,minShape=40,failureRegion=2000) \end{verbatim} The list trainingValues returns two values. The first value is an image in which every segmented cell is numbered. The second value is a table with features for every cell: \begin{verbatim} index class g.x g.y g.s g.p g.pdm g.pdsd g.effr... 1 148.2203 4.855932 118 36 5.721049 1.0130550 6.128668... 2 160.2719 4.763158 114 38 5.659780 1.0331399 6.023896... 3 183.7975 3.101266 79 35 4.593585 0.9575905 5.014627... 4 196.3500 4.242857 140 43 6.424591 1.4433233 6.675581... 5 271.5694 2.680556 72 29 4.504704 1.2490794 4.787307... 6 338.5221 6.530973 113 35 5.601531 1.0849651 5.997418... 7 456.0179 2.946429 112 39 5.726556 1.8101368 5.970821... 8 556.3778 9.018519 270 81 9.196894 2.4687870 9.270581... 9 575.1777 10.935950 484 101 11.959995 2.1378649 12.412171... 10 592.7391 1.724638 69 41 5.145094 2.2782539 4.686511... \end{verbatim} To create the training set, class values for the cells have to be inserted in the column "class": \begin{verbatim} index class g.x g.y g.s g.p g.pdm g.pdsd g.effr 1 normal 148.2203 4.855932 118 36 5.721049 1.0130550 6.128668 2 malignant 160.2719 4.763158 114 38 5.659780 1.0331399 6.023896 \end{verbatim} The values in the column "index" match the numbers for the cells in the image. You can save the table as tab seperated by: \begin{verbatim} write.table(trainingData,file="path",sep="\t",rownames=F) \end{verbatim} and open it for example in a spreadsheet program, however be careful not to shift the columns. You do not have to assign a class value to every cell, but do ensure that there are enough examples for every class. Class values can be numbers (e.g. 1,2,3) or strings (e.g. "normal", "malignant"). You can choose at most 10 classes. If you want to use the kernel smoothing approach for classification you can only choose two classes or you have to specifiy a cancer class, when classifying cells. \begin{figure} \centerline{\includegraphics[width=0.8\textwidth]{labeledImage}} \caption{The image with labeled cell nuclei. The number of the cell nuclei correspond to the index in the feature table.} \label{fig2.5} \end{figure} \section{Creating the classifier} The command: <>= f = system.file("extdata", "trainingData.txt", package="CRImage") #read training data trainingData=read.table(f,header=TRUE) #create classifier classifierValues=createClassifier(trainingData) classifier=classifierValues[[1]] #classifiedCells=classifierValues[[2]] #display(classifiedCells) @ creates the classifier. The classifier is a Support Vector Machine provided by the package e1017. \section{Classification of cells} After having created the classifier, cells in another image can be classified (Figure \ref{fig2}). <>= #classify cells f = system.file("extdata", "exImg2.jpg", package="CRImage") classValues=classifyCells(classifier,filename=f,KS=TRUE,maxShape=800,minShape=40,failureRegion=2000) @ \section{Calculation of cellularity} If tumour images are processed, the cellularity of the tumour can be calculated. The image is first segmented and the cell types are classified. Afterwards, the cellularity of the tumour is determined. The function needs to know, which class value should be the tumour class. <>= t = system.file("extdata", "trainingData.txt", package="CRImage") #read training data trainingData=read.table(t,header=TRUE) #create classifier classifier=createClassifier(trainingData)[[1]] #calculation of cellularity f = system.file("extdata", "exImg2.jpg", package="CRImage") cellularity=calculateCellularity(classifier=classifier,filename=f,KS=TRUE,maxShape=800,minShape=40,failureRegion=2000,classifyStructures=FALSE,cancerIdentifier="1",numDensityWindows=2,colors=c("green","red")) @ \begin{figure}[tpb] \centerline{\includegraphics[width=0.5\textwidth]{classifiedImage} \includegraphics[width=0.5\textwidth]{cellularity2}} \caption{Classified Image (left) and heatmap of cellularity values (right). Malignant cells are colored green, whereas other cells are coloured red. The heatmap of tumour cellularity indicates regions of large tumour cellularity (strong green values) and low tumour cellularity (white regions). } \label{fig3} \end{figure} \section{Classification of Aperio Image Slides} Large pathological images are often difficult to process due to their often large file size. Images of ScanScope TX scanner can be saved in the CWS file format. In this format the images are separated in smaller subimages. These images can be process with the function \Rfunction{processAperio}. <>= #create the classifier t = system.file("extdata", "trainingData.txt", package="CRImage") trainingData=read.table(t,header=TRUE) classifier=createClassifier(trainingData)[[1]] @ <>= dir.create("AperiOutput") f = system.file("extdata", package="CRImage") processAperio(classifier=classifier,inputFolder=f,outputFolder="AperiOutput",identifier="Da",numSections=2,cancerIdentifier="c",maxShape=800,minShape=40,failureRegion=2000) @ The function gets a classifier, which was for instance created with \Rfunction{createClassifier}. An input folder has to be specified, which includes the subimages. The common identifier of the subimages has tp specified (either "Da" or "Ss", depending on the adjustments of the Aperio software). If the image includes different sections for which cellularity has to be calculated separately, this can be specified with \Rfunction{numSections}, the function will afterwards do an automatical clustering of the subimages to their corresponding sections. The function needs to know, which class of the classifier is the tumour class, which is specified with \Rfunction{cancerIdentifier}. The function creates three folders in the output folder. The folder classifiedImage includes the classified subimages. The folder \Rfunction{Files} includes files with cellularity values for every section. The folder \Rfunction{tumourDensity} includes cancer heatmaps for every subimage. \section{Copy Number Correction:} Segmentation of copy number and correction of log-ratios (LRR) and beta allele frequency (BAF) values for cellularity. <>= LRR <- c(rnorm(100, 0, 1), rnorm(10, -2, 1), rnorm(20, 3, 1), rnorm(100,0, 1)) BAF <- c(rnorm(100, 0.5, 0.1), rnorm(5, 0.2, 0.01), rnorm(5, 0.8, 0.01), rnorm(10, 0.25, 0.1), rnorm(10, 0.75, 0.1), rnorm(100,0.5, 0.1)) Pos <- sample(x=1:500, size=230, replace=TRUE) Pos <- cumsum(Pos) Chrom <- rep(1, length(LRR)) z <- data.frame(Name=1:length(LRR), Chrom=Chrom, Pos=Pos, LRR=LRR, BAF=BAF) res <- correctCopyNumber(arr="Sample1", chr=1, p=0.75, z=z) @ The results of \Rfunction{correctCopyNumber} can be plotted using \Rfunction{plotCorrectedCN}: <>= plotCorrectedCN(res, chr=1) @ \end{document}