% \VignetteIndexEntry{CRImage Manual} % \VignetteDepends{EBImage, MASS, e1071, foreach} % \VignetteKeyword{CRImage} % \VignettePackage{CRImage} \documentclass[10pt]{article} \topmargin 0.0cm \oddsidemargin 0.5cm \evensidemargin 0.5cm \textwidth 16cm \textheight 21cm \usepackage[labelfont=bf,labelsep=period,justification=raggedright]{caption} \newcommand{\Rfunction}[1]{{\mbox{\normalfont\texttt{#1}}}} \begin{document} \begin{flushleft} \begin{center} {\LARGE \bf CRImage \\ a package for classifying cells and calculating tumour cellularity\\} \bigskip Henrik Failmezger$^{\ast}$, Yinyin Yuan, Florian Markowetz \\ \bigskip $\ast$ E-mail: failmezger@cip.ifi.lmu.de \end{center} \end{flushleft} \tableofcontents \section{Load the package} The package is loaded by the following command: <<>>= library(CRImage) @ See the functions of the package EBImage to read, write and manipulate an image. \section{Threshold an image} To create a binary image of a coloured image the function \Rfunction{calculateThreshold} can be used. This function calculates Otsu threshold of a grayscale image. <>= f = system.file("extdata", "exImg2.jpg", package="CRImage") exImgB=readImage(f) #find white pixels and exclude them from thresholding(if white is background) indexWhitePixel=which(exImgB[,,1]>0.85 & exImgB[,,2]>0.85 & exImgB[,,3]>0.85) #convert to grayscale exImgB=channel(exImgB,"gray") #calculate threshold t=calculateThreshold(as.vector(exImgB[-indexWhitePixel])) #create binary image exImgB[which(exImgB>=t)]=1 exImgB[which(exImgB>= f = system.file("extdata", "exImg2.jpg", package="CRImage") segmentationValues=segmentImage(f,maxShape=800,minShape=40,failureRegion=2000) @ The image is converted to grayscale and thresholded. Morphological opening is used to delete clutter and to smooth the shapes of the cells. The watershed algorithm is used to separate clustered cells. The parameter \Rfunction{maxShape} defines the maximal shape of cell nuclei. Segmented nuclei which exceed this value will be thresholded and segmented again. The parameter \Rfunction{minShape} defines the minimum size if cell nuclei. Cell nuclei, which fall below this value will be deleted. The parameter \Rfunction{failureRegion} defines, when artifacts in the image should be deleted. Dark regions which exceed this value will be deleted. The list element \Rfunction{segmentationValues[[1]]} holds the original image, \Rfunction{segmentationValues[[2]]} holds the segmented image and \Rfunction{segmentationValues[[3]]} holds features, which were calculated for the segmented objects. The segmented objects can be drawn by the function \Rfunction{display} (see EBImage). \begin{figure} \centerline{\includegraphics[width=0.5\textwidth]{segmentedImageRaw} \includegraphics[width=0.5\textwidth]{segmentedImage}} \caption{SegmentedImage. Cell nuclei are labeled in green.} \label{fig2} \end{figure} \section{Creating a training set} To classify cells in images first an appropriate training set has to be created. \begin{verbatim} f = system.file("extdata", "exImg.jpg", package="CRImage") trainingValues=createTrainingSet(filename=f,maxShape=800,minShape=40,failureRegion=2000) \end{verbatim} The list trainingValues returns two values. The first value is an image in which every segmented cell is numbered. The second value is a table with features for every cell: \begin{verbatim} index class g.x g.y g.s g.p g.pdm g.pdsd g.effr... 1 148.2203 4.855932 118 36 5.721049 1.0130550 6.128668... 2 160.2719 4.763158 114 38 5.659780 1.0331399 6.023896... 3 183.7975 3.101266 79 35 4.593585 0.9575905 5.014627... 4 196.3500 4.242857 140 43 6.424591 1.4433233 6.675581... 5 271.5694 2.680556 72 29 4.504704 1.2490794 4.787307... 6 338.5221 6.530973 113 35 5.601531 1.0849651 5.997418... 7 456.0179 2.946429 112 39 5.726556 1.8101368 5.970821... 8 556.3778 9.018519 270 81 9.196894 2.4687870 9.270581... 9 575.1777 10.935950 484 101 11.959995 2.1378649 12.412171... 10 592.7391 1.724638 69 41 5.145094 2.2782539 4.686511... \end{verbatim} To create the training set, class values for the cells have to be inserted in the column "class": \begin{verbatim} index class g.x g.y g.s g.p g.pdm g.pdsd g.effr 1 normal 148.2203 4.855932 118 36 5.721049 1.0130550 6.128668 2 malignant 160.2719 4.763158 114 38 5.659780 1.0331399 6.023896 \end{verbatim} The values in the column "index" match the numbers for the cells in the image. You can save the table as tab seperated by: \begin{verbatim} write.table(trainingData,file="path",sep="\t",rownames=F) \end{verbatim} and open it for example in a spreadsheet program, however be careful not to shift the columns. You do not have to assign a class value to every cell, but do ensure that there are enough examples for every class. Class values can be numbers (e.g. 1,2,3) or strings (e.g. "normal", "malignant"). You can choose at most 10 classes. If you want to use the kernel smoothing approach for classification you can only choose two classes or you have to specifiy a cancer class, when classifying cells. \begin{figure} \centerline{\includegraphics[width=0.8\textwidth]{labeledImage}} \caption{The image with labeled cell nuclei. The number of the cell nuclei correspond to the index in the feature table.} \label{fig2.5} \end{figure} \section{Creating the classifier} The command: <>= f = system.file("extdata", "trainingData.txt", package="CRImage") #read training data trainingData=read.table(f,header=TRUE) #create classifier classifierValues=createClassifier(trainingData,topo=FALSE) classifier=classifierValues[[1]] #classifiedCells=classifierValues[[2]] #display(classifiedCells) @ creates the classifier. The classifier is a Support Vector Machine provided by the package e1017. \section{Classification of cells} After having created the classifier, cells in another image can be classified (Figure \ref{fig2}). <>= #classify cells f = system.file("extdata", "exImg2.jpg", package="CRImage") classValues=classifyCells(classifier,filename=f,KS=TRUE,maxShape=800,minShape=40,failureRegion=2000) @ \section{Calculation of cellularity} If tumour images are processed, the cellularity of the tumour can be calculated. The image is first segmented and the cell types are classified. Afterwards, the cellularity of the tumour is determined. The function needs to know, which class value should be the tumour class. <>= t = system.file("extdata", "trainingData.txt", package="CRImage") #read training data trainingData=read.table(t,header=TRUE) #create classifier classifier=createClassifier(trainingData,topo=FALSE)[[1]] #calculation of cellularity f = system.file("extdata", "exImg2.jpg", package="CRImage") exImg=readImage(f) cellularityValues=calculateCellularity(f,classifier=classifier,cancerIdentifier="1") @ \begin{figure}[tpb] \centerline{\includegraphics[width=0.5\textwidth]{classifiedImage} \includegraphics[width=0.5\textwidth]{cellularity2}} \caption{Classified Image (left) and heatmap of cellularity values (right). Malignant cells are colored green, whereas other cells are coloured red. The heatmap of tumour cellularity indicates regions of large tumour cellularity (strong green values) and low tumour cellularity (white regions). } \label{fig3} \end{figure} \section{Classification of Aperio Image Slides} Large pathological images are often difficult to process due to their often large file size. Images of ScanScope TX scanner can be saved in the CWS file format. In this format the images are separated in smaller subimages. These images can be process with the function \Rfunction{processAperio}. <>= #create the classifier t = system.file("extdata", "trainingData.txt", package="CRImage") trainingData=read.table(t,header=TRUE) classifier=createClassifier(trainingData,topo=FALSE)[[1]] @ <>= dir.create("AperiOutput") f = system.file("extdata", package="CRImage") #the image in CWS format f=file.path(f,"8905") processAperio(classifier=classifier,inputFolder=f,outputFolder="AperiOutput",identifier="Da",numSections=2,cancerIdentifier="c",maxShape=800,minShape=40,failureRegion=2000) @ The function gets a classifier, which was for instance created with \Rfunction{createClassifier}. An input folder has to be specified, which includes the subimages. The common identifier of the subimages has tp specified (either "Da" or "Ss", depending on the adjustments of the Aperio software). If the image includes different sections for which cellularity has to be calculated separately, this can be specified with \Rfunction{numSections}, the function will afterwards do an automatical clustering of the subimages to their corresponding sections. The function needs to know, which class of the classifier is the tumour class, which is specified with \Rfunction{cancerIdentifier}. The function creates three folders in the output folder. The folder classifiedImage includes the classified subimages. The folder \Rfunction{Files} includes files with cellularity values for every section. The folder \Rfunction{tumourDensity} includes cancer heatmaps for every subimage. \end{document}