\name{run.crossvalidation} \alias{run.crossvalidation} %- Also NEED an '\alias' for EACH other topic documented here. \title{Assessment of Prediction Performance via Cross-validation} \description{ Evaluate the prediction performance of a gene2pathway model via a repeated cross-validation scheme. } \usage{ run.crossvalidation(nfolds=10, repeats=10, stratified=TRUE, signaltrans.only=FALSE, minnmap=ifelse(signaltrans.only, 15, 30), nbag=11, level1Only="Metabolism", level2Only="Genetic Information Processing", organism="hsa", gene2Domains=NULL, seed=1234, mc.cores=8, DIR=".") } %- maybe also 'usage' for other objects documented here. \arguments{ \item{nfolds}{number of cross-validation folds} \item{repeats}{number of repeats of the cross-validation procedure} \item{stratified}{Ensure that during bagging each class is represented} \item{signaltrans.only}{do cross-validation for model predicting pathway components of signaling pathways} \item{minnmap}{ prune hierarchy branches with < minnmap mapping genes} \item{nbag}{number of models to average over} \item{level1Only}{ for these hierarchy branches only the first level is used } \item{level2Only}{ for these hierarchy branches only the first and the second levels are used } \item{organism}{KEGG letter code describing an organism. Please refer to for a complete list of organisms (and their letter codes) supported by KEGG.} \item{gene2Domains}{By default associations between genes and InterPro domains are retrieved via biomaRt from Ensembl. Alternatively, the user can provide its own mapping of genes to InterPro domains in form of a list here (see details).} \item{seed}{seed value for random number generator: influences splitting of data into training and test} \item{DIR}{directory where to save diagnostic plots} \item{mc.cores}{number of cores to use for parallelization; requires package 'doMC' to be loaded} } \details{ A gene2pathway model is trained and tested within a repeated cross-validation scheme. The method produces boxplots (saved as PDFs in the directory passed in the DIR argument) of the accuracy (1 - loss), sensitivity, specificity and F1 values summarized over all pathways. Additionally it produces separate boxplots of F1-values for all pathways in the top KEGG hierarchy level, at the 2nd KEGG hierarchy level and for all pathways individually. } \value{ \item{cv}{a matrix of nfolds*repeats rows and as many columns as labels with predictions of the model} \item{groups}{used groups in the cross-validation procedure} \item{used_domains}{used InterPro domains by the prediction model} \item{evaluation}{a list with average loss, sensitivity, specificity and F1-value for each pathway} } \author{ Holger Froehlich } \seealso{ \code{\link{retrain}}, \code{\link{gene2pathway}} } \examples{ \dontrun{ run.crossvalidation(signaltrans.only=T, repeats=1, nfolds=2) } } \keyword{ file }% at least one, from doc/KEYWORDS