Back to Multiple platform build/check report for BioC 3.20: simplified long |
|
This page was generated on 2025-01-16 12:09 -0500 (Thu, 16 Jan 2025).
Hostname | OS | Arch (*) | R version | Installed pkgs |
---|---|---|---|---|
nebbiolo2 | Linux (Ubuntu 24.04.1 LTS) | x86_64 | 4.4.2 (2024-10-31) -- "Pile of Leaves" | 4746 |
palomino8 | Windows Server 2022 Datacenter | x64 | 4.4.2 (2024-10-31 ucrt) -- "Pile of Leaves" | 4489 |
merida1 | macOS 12.7.5 Monterey | x86_64 | 4.4.2 (2024-10-31) -- "Pile of Leaves" | 4517 |
kjohnson1 | macOS 13.6.6 Ventura | arm64 | 4.4.2 (2024-10-31) -- "Pile of Leaves" | 4469 |
taishan | Linux (openEuler 24.03 LTS) | aarch64 | 4.4.2 (2024-10-31) -- "Pile of Leaves" | 4387 |
Click on any hostname to see more info about the system (e.g. compilers) (*) as reported by 'uname -p', except on Windows and Mac OS X |
Package 888/2289 | Hostname | OS / Arch | INSTALL | BUILD | CHECK | BUILD BIN | ||||||||
goSorensen 1.8.0 (landing page) Pablo Flores
| nebbiolo2 | Linux (Ubuntu 24.04.1 LTS) / x86_64 | OK | OK | OK | |||||||||
palomino8 | Windows Server 2022 Datacenter / x64 | OK | OK | OK | OK | |||||||||
merida1 | macOS 12.7.5 Monterey / x86_64 | OK | OK | OK | OK | |||||||||
kjohnson1 | macOS 13.6.6 Ventura / arm64 | OK | OK | OK | OK | |||||||||
taishan | Linux (openEuler 24.03 LTS) / aarch64 | OK | OK | NA | ||||||||||
To the developers/maintainers of the goSorensen package: - Allow up to 24 hours (and sometimes 48 hours) for your latest push to git@git.bioconductor.org:packages/goSorensen.git to reflect on this report. See Troubleshooting Build Report for more information. - Use the following Renviron settings to reproduce errors and warnings. - If 'R CMD check' started to fail recently on the Linux builder(s) over a missing dependency, add the missing dependency to 'Suggests:' in your DESCRIPTION file. See Renviron.bioc for more information. |
Package: goSorensen |
Version: 1.8.0 |
Command: /Library/Frameworks/R.framework/Resources/bin/R CMD check --install=check:goSorensen.install-out.txt --library=/Library/Frameworks/R.framework/Resources/library --no-vignettes --timings goSorensen_1.8.0.tar.gz |
StartedAt: 2025-01-14 21:08:33 -0500 (Tue, 14 Jan 2025) |
EndedAt: 2025-01-14 21:17:47 -0500 (Tue, 14 Jan 2025) |
EllapsedTime: 554.0 seconds |
RetCode: 0 |
Status: OK |
CheckDir: goSorensen.Rcheck |
Warnings: 0 |
############################################################################## ############################################################################## ### ### Running command: ### ### /Library/Frameworks/R.framework/Resources/bin/R CMD check --install=check:goSorensen.install-out.txt --library=/Library/Frameworks/R.framework/Resources/library --no-vignettes --timings goSorensen_1.8.0.tar.gz ### ############################################################################## ############################################################################## * using log directory ‘/Users/biocbuild/bbs-3.20-bioc/meat/goSorensen.Rcheck’ * using R version 4.4.2 (2024-10-31) * using platform: aarch64-apple-darwin20 * R was compiled by Apple clang version 14.0.0 (clang-1400.0.29.202) GNU Fortran (GCC) 12.2.0 * running under: macOS Ventura 13.7.1 * using session charset: UTF-8 * using option ‘--no-vignettes’ * checking for file ‘goSorensen/DESCRIPTION’ ... OK * checking extension type ... Package * this is package ‘goSorensen’ version ‘1.8.0’ * package encoding: UTF-8 * checking package namespace information ... OK * checking package dependencies ... OK * checking if this is a source package ... OK * checking if there is a namespace ... OK * checking for hidden files and directories ... OK * checking for portable file names ... OK * checking for sufficient/correct file permissions ... OK * checking whether package ‘goSorensen’ can be installed ... OK * checking installed package size ... OK * checking package directory ... OK * checking ‘build’ directory ... OK * checking DESCRIPTION meta-information ... OK * checking top-level files ... OK * checking for left-over files ... OK * checking index information ... OK * checking package subdirectories ... OK * checking code files for non-ASCII characters ... OK * checking R files for syntax errors ... OK * checking whether the package can be loaded ... OK * checking whether the package can be loaded with stated dependencies ... OK * checking whether the package can be unloaded cleanly ... OK * checking whether the namespace can be loaded with stated dependencies ... OK * checking whether the namespace can be unloaded cleanly ... OK * checking dependencies in R code ... OK * checking S3 generic/method consistency ... OK * checking replacement functions ... OK * checking foreign function calls ... OK * checking R code for possible problems ... OK * checking Rd files ... OK * checking Rd metadata ... OK * checking Rd cross-references ... OK * checking for missing documentation entries ... OK * checking for code/documentation mismatches ... OK * checking Rd \usage sections ... OK * checking Rd contents ... OK * checking for unstated dependencies in examples ... OK * checking contents of ‘data’ directory ... OK * checking data for non-ASCII characters ... OK * checking data for ASCII and uncompressed saves ... OK * checking installed files from ‘inst/doc’ ... OK * checking files in ‘vignettes’ ... OK * checking examples ... OK Examples with CPU (user + system) or elapsed time > 5s user system elapsed hclustThreshold 170.699 14.852 185.938 buildEnrichTable 51.950 3.434 55.529 enrichedIn 41.518 3.292 44.913 * checking for unstated dependencies in ‘tests’ ... OK * checking tests ... Running ‘test_gosorensen_funcs.R’ OK * checking for unstated dependencies in vignettes ... NOTE 'library' or 'require' calls not declared from: ‘GO.db’ ‘ggplot2’ ‘ggrepel’ * checking package vignettes ... OK * checking running R code from vignettes ... SKIPPED * checking re-building of vignette outputs ... SKIPPED * checking PDF version of manual ... OK * DONE Status: 1 NOTE See ‘/Users/biocbuild/bbs-3.20-bioc/meat/goSorensen.Rcheck/00check.log’ for details.
goSorensen.Rcheck/00install.out
############################################################################## ############################################################################## ### ### Running command: ### ### /Library/Frameworks/R.framework/Resources/bin/R CMD INSTALL goSorensen ### ############################################################################## ############################################################################## * installing to library ‘/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library’ * installing *source* package ‘goSorensen’ ... ** using staged installation ** R ** data ** inst ** byte-compile and prepare package for lazy loading ** help *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded from temporary location ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path * DONE (goSorensen)
goSorensen.Rcheck/tests/test_gosorensen_funcs.Rout
R version 4.4.2 (2024-10-31) -- "Pile of Leaves" Copyright (C) 2024 The R Foundation for Statistical Computing Platform: aarch64-apple-darwin20 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(goSorensen) Attaching package: 'goSorensen' The following object is masked from 'package:utils': upgrade > > # A contingency table of GO terms mutual enrichment > # between gene lists "atlas" and "sanger": > data(tab_atlas.sanger_BP3) > tab_atlas.sanger_BP3 Enriched in sanger Enriched in atlas TRUE FALSE TRUE 38 31 FALSE 2 452 > ?tab_atlas.sanger_BP3 tab_atlas.sanger_BP3 package:goSorensen R Documentation _C_r_o_s_s-_t_a_b_u_l_a_t_i_o_n _o_f _e_n_r_i_c_h_e_d _G_O _t_e_r_m_s _a_t _l_e_v_e_l _3 _o_f _o_n_t_o_l_o_g_y _B_P _i_n _t_w_o _g_e_n_e _l_i_s_t_s _D_e_s_c_r_i_p_t_i_o_n: From the "Cancer gene list" of Bushman Lab, a collection of gene lists related with cancer, for gene lists "Atlas" and "Sanger", this dataset is the cross-tabulation of all GO terms of ontology BP at level 3 which are: Enriched in both lists, enriched in sanger but not in atlas, non-enriched in sanger but enriched in atlas and non-enriched in both lists. Take it just as an illustrative example, non up-to-date for changes in the gene lists or changes in the GO. The present version was obtained under Bioconductor 3.17. _U_s_a_g_e: data(tab_atlas.sanger_BP3) _F_o_r_m_a_t: An object of class "table" representing a 2x2 contingency table. _S_o_u_r_c_e: <http://www.bushmanlab.org/links/genelists> > class(tab_atlas.sanger_BP3) [1] "table" > > # Sorensen-Dice dissimilarity on this contingency table: > ?dSorensen dSorensen package:goSorensen R Documentation _C_o_m_p_u_t_a_t_i_o_n _o_f _t_h_e _S_o_r_e_n_s_e_n-_D_i_c_e _d_i_s_s_i_m_i_l_a_r_i_t_y _D_e_s_c_r_i_p_t_i_o_n: Computation of the Sorensen-Dice dissimilarity _U_s_a_g_e: dSorensen(x, ...) ## S3 method for class 'table' dSorensen(x, check.table = TRUE, ...) ## S3 method for class 'matrix' dSorensen(x, check.table = TRUE, ...) ## S3 method for class 'numeric' dSorensen(x, check.table = TRUE, ...) ## S3 method for class 'character' dSorensen(x, y, check.table = TRUE, ...) ## S3 method for class 'list' dSorensen(x, check.table = TRUE, ...) ## S3 method for class 'tableList' dSorensen(x, check.table = TRUE, ...) _A_r_g_u_m_e_n_t_s: x: either an object of class "table", "matrix" or "numeric" representing a 2x2 contingency table, or a "character" vector (a set of gene identifiers) or "list" or "tableList" object. See the details section for more information. ...: extra parameters for function 'buildEnrichTable'. check.table: Boolean. If TRUE (default), argument 'x' is checked to adequately represent a 2x2 contingency table, by means of function 'nice2x2Table'. y: an object of class "character" representing a vector of valid gene identifiers (e.g., ENTREZ). _D_e_t_a_i_l_s: Given a 2x2 arrangement of frequencies (either implemented as a "table", a "matrix" or a "numeric" object): n_{11} n_{10} n_{01} n_{00}, this function computes the Sorensen-Dice dissimilarity { n_10 + n_01}/{2 n_11 + n_10 + n_01}. The subindex '11' corresponds to those GO terms enriched in both lists, '01' to terms enriched in the second list but not in the first one, '10' to terms enriched in the first list but not enriched in the second one and '00' corresponds to those GO terms non enriched in both gene lists, i.e., to the double negatives, a value which is ignored in the computations. In the "numeric" interface, if 'length(x) >= 3', the values are interpreted as (n_11, n_01, n_10, n_00), always in this order and discarding extra values if necessary. The result is correct, regardless the frequencies being absolute or relative. If 'x' is an object of class "character", then 'x' (and 'y') must represent two "character" vectors of valid gene identifiers (e.g., ENTREZ). Then the dissimilarity between lists 'x' and 'y' is computed, after internally summarizing them as a 2x2 contingency table of joint enrichment. This last operation is performed by function 'buildEnrichTable' and "valid gene identifiers (e.g., ENTREZ)" stands for the coherency of these gene identifiers with the arguments 'geneUniverse' and 'orgPackg' of 'buildEnrichTable', passed by the ellipsis argument '...' in 'dSorensen'. If 'x' is an object of class "list", the argument must be a list of "character" vectors, each one representing a gene list (character identifiers). Then, all pairwise dissimilarities between these gene lists are computed. If 'x' is an object of class "tableList", the Sorensen-Dice dissimilarity is computed over each one of these tables. Given k gene lists (i.e. "character" vectors of gene identifiers) l1, l2, ..., lk, an object of class "tableList" (typically constructed by a call to function 'buildEnrichTable') is a list of lists of contingency tables t(i,j) generated from each pair of gene lists i and j, with the following structure: $l2 $l2$l1$t(2,1) $l3 $l3$l1$t(3,1), $l3$l2$t(3,2) ... $lk $lk$l1$t(k,1), $lk$l2$t(k,2), ..., $lk$l(k-1)t(k,k-1) _V_a_l_u_e: In the "table", "matrix", "numeric" and "character" interfaces, the value of the Sorensen-Dice dissimilarity. In the "list" and "tableList" interfaces, the symmetric matrix of all pairwise Sorensen-Dice dissimilarities. _M_e_t_h_o_d_s (_b_y _c_l_a_s_s): • 'dSorensen(table)': S3 method for class "table" • 'dSorensen(matrix)': S3 method for class "matrix" • 'dSorensen(numeric)': S3 method for class "numeric" • 'dSorensen(character)': S3 method for class "character" • 'dSorensen(list)': S3 method for class "list" • 'dSorensen(tableList)': S3 method for class "tableList" _S_e_e _A_l_s_o: 'buildEnrichTable' for constructing contingency tables of mutual enrichment, 'nice2x2Table' for checking contingency tables validity, 'seSorensen' for computing the standard error of the dissimilarity, 'duppSorensen' for the upper limit of a one-sided confidence interval of the dissimilarity, 'equivTestSorensen' for an equivalence test. _E_x_a_m_p_l_e_s: # Gene lists 'atlas' and 'sanger' in 'allOncoGeneLists' dataset. Table of joint enrichment # of GO terms in ontology BP at level 3. data(tab_atlas.sanger_BP3) tab_atlas.sanger_BP3 ?tab_atlas.sanger_BP3 dSorensen(tab_atlas.sanger_BP3) # Table represented as a vector: conti4 <- c(56, 1, 30, 471) dSorensen(conti4) # or as a plain matrix: dSorensen(matrix(conti4, nrow = 2)) # This function is also appropriate for proportions: dSorensen(conti4 / sum(conti4)) conti3 <- c(56, 1, 30) dSorensen(conti3) # Sorensen-Dice dissimilarity from scratch, directly from two gene lists: # (These examples may be considerably time consuming due to many enrichment # tests to build the contingency tables of joint enrichment) # data(allOncoGeneLists) # ?allOncoGeneLists # Obtaining ENTREZ identifiers for the gene universe of humans: # library(org.Hs.eg.db) # humanEntrezIDs <- keys(org.Hs.eg.db, keytype = "ENTREZID") # (Time consuming, building the table requires many enrichment tests:) # dSorensen(allOncoGeneLists$atlas, allOncoGeneLists$sanger, # onto = "BP", GOLevel = 3, # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") # Essentially, the above code makes the same as: # tab_atlas.sanger_BP3 <- buildEnrichTable(allOncoGeneLists$atlas, allOncoGeneLists$sanger, # onto = "BP", GOLevel = 3, # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") # dSorensen(tab_atlas.sanger_BP3) # (Quite time consuming, all pairwise dissimilarities:) # dSorensen(allOncoGeneLists, # onto = "BP", GOLevel = 3, # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > dSorensen(tab_atlas.sanger_BP3) [1] 0.3027523 > > # Standard error of this Sorensen-Dice dissimilarity estimate: > ?seSorensen seSorensen package:goSorensen R Documentation _S_t_a_n_d_a_r_d _e_r_r_o_r _o_f _t_h_e _s_a_m_p_l_e _S_o_r_e_n_s_e_n-_D_i_c_e _d_i_s_s_i_m_i_l_a_r_i_t_y, _a_s_y_m_p_t_o_t_i_c _a_p_p_r_o_a_c_h _D_e_s_c_r_i_p_t_i_o_n: Standard error of the sample Sorensen-Dice dissimilarity, asymptotic approach _U_s_a_g_e: seSorensen(x, ...) ## S3 method for class 'table' seSorensen(x, check.table = TRUE, ...) ## S3 method for class 'matrix' seSorensen(x, check.table = TRUE, ...) ## S3 method for class 'numeric' seSorensen(x, check.table = TRUE, ...) ## S3 method for class 'character' seSorensen(x, y, check.table = TRUE, ...) ## S3 method for class 'list' seSorensen(x, check.table = TRUE, ...) ## S3 method for class 'tableList' seSorensen(x, check.table = TRUE, ...) _A_r_g_u_m_e_n_t_s: x: either an object of class "table", "matrix" or "numeric" representing a 2x2 contingency table, or a "character" (a set of gene identifiers) or "list" or "tableList" object. See the details section for more information. ...: extra parameters for function 'buildEnrichTable'. check.table: Boolean. If TRUE (default), argument 'x' is checked to adequately represent a 2x2 contingency table. This checking is performed by means of function 'nice2x2Table'. y: an object of class "character" representing a vector of gene identifiers (e.g., ENTREZ). _D_e_t_a_i_l_s: This function computes the standard error estimate of the sample Sorensen-Dice dissimilarity, given a 2x2 arrangement of frequencies (either implemented as a "table", a "matrix" or a "numeric" object): n_{11} n_{10} n_{01} n_{00}, The subindex '11' corresponds to those GO terms enriched in both lists, '01' to terms enriched in the second list but not in the first one, '10' to terms enriched in the first list but not enriched in the second one and '00' corresponds to those GO terms non enriched in both gene lists, i.e., to the double negatives, a value which is ignored in the computations. In the "numeric" interface, if 'length(x) >= 3', the values are interpreted as (n_11, n_01, n_10), always in this order. If 'x' is an object of class "character", then 'x' (and 'y') must represent two "character" vectors of valid gene identifiers (e.g., ENTREZ). Then the standard error for the dissimilarity between lists 'x' and 'y' is computed, after internally summarizing them as a 2x2 contingency table of joint enrichment. This last operation is performed by function 'buildEnrichTable' and "valid gene identifiers (e.g., ENTREZ)" stands for the coherency of these gene identifiers with the arguments 'geneUniverse' and 'orgPackg' of 'buildEnrichTable', passed by the ellipsis argument '...' in 'seSorensen'. In the "list" interface, the argument must be a list of "character" vectors, each one representing a gene list (character identifiers). Then, all pairwise standard errors of the dissimilarity between these gene lists are computed. If 'x' is an object of class "tableList", the standard error of the Sorensen-Dice dissimilarity estimate is computed over each one of these tables. Given k gene lists (i.e. "character" vectors of gene identifiers) l1, l2, ..., lk, an object of class "tableList" (typically constructed by a call to function 'buildEnrichTable') is a list of lists of contingency tables t(i,j) generated from each pair of gene lists i and j, with the following structure: $l2 $l2$l1$t(2,1) $l3 $l3$l1$t(3,1), $l3$l2$t(3,2) ... $lk $lk$l1$t(k,1), $lk$l2$t(k,2), ..., $lk$l(k-1)t(k,k-1) _V_a_l_u_e: In the "table", "matrix", "numeric" and "character" interfaces, the value of the standard error of the Sorensen-Dice dissimilarity estimate. In the "list" and "tableList" interfaces, the symmetric matrix of all standard error dissimilarity estimates. _M_e_t_h_o_d_s (_b_y _c_l_a_s_s): • 'seSorensen(table)': S3 method for class "table" • 'seSorensen(matrix)': S3 method for class "matrix" • 'seSorensen(numeric)': S3 method for class "numeric" • 'seSorensen(character)': S3 method for class "character" • 'seSorensen(list)': S3 method for class "list" • 'seSorensen(tableList)': S3 method for class "tableList" _S_e_e _A_l_s_o: 'buildEnrichTable' for constructing contingency tables of mutual enrichment, 'nice2x2Table' for checking the validity of enrichment contingency tables, 'dSorensen' for computing the Sorensen-Dice dissimilarity, 'duppSorensen' for the upper limit of a one-sided confidence interval of the dissimilarity, 'equivTestSorensen' for an equivalence test. _E_x_a_m_p_l_e_s: # Gene lists 'atlas' and 'sanger' in 'Cangenes' dataset. Table of joint enrichment # of GO terms in ontology BP at level 3. data(tab_atlas.sanger_BP3) tab_atlas.sanger_BP3 dSorensen(tab_atlas.sanger_BP3) seSorensen(tab_atlas.sanger_BP3) # Contingency table as a numeric vector: seSorensen(c(56, 1, 30, 47)) seSorensen(c(56, 1, 30)) # (These examples may be considerably time consuming due to many enrichment # tests to build the contingency tables of mutual enrichment) # data(allOncoGeneLists) # ?allOncoGeneLists # Standard error of the sample Sorensen-Dice dissimilarity, directly from # two gene lists, from scratch: # seSorensen(allOncoGeneLists$atlas, allOncoGeneLists$sanger, # onto = "BP", GOLevel = 3, # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") # Essentially, the above code makes the same as: # ctab_atlas.sanger_BP3 <- buildEnrichTable(allOncoGeneLists$atlas, allOncoGeneLists$sanger, # onto = "BP", GOLevel = 3, # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") # ctab_atlas.sanger_BP3 # seSorensen(ctab_atlas.sanger_BP3) # tab_atlas.sanger_BP3 and ctab_atlas.sanger_BP3 have exactly the same result. # All pairwise standard errors (quite time consuming): # seSorensen(allOncoGeneLists, # onto = "BP", GOLevel = 3, # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > seSorensen(tab_atlas.sanger_BP3) [1] 0.05058655 > > # Upper 95% confidence limit for the Sorensen-Dice dissimilarity: > ?duppSorensen duppSorensen package:goSorensen R Documentation _U_p_p_e_r _l_i_m_i_t _o_f _a _o_n_e-_s_i_d_e_d _c_o_n_f_i_d_e_n_c_e _i_n_t_e_r_v_a_l (_0, _d_U_p_p] _f_o_r _t_h_e _S_o_r_e_n_s_e_n-_D_i_c_e _d_i_s_s_i_m_i_l_a_r_i_t_y _D_e_s_c_r_i_p_t_i_o_n: Upper limit of a one-sided confidence interval (0, dUpp] for the Sorensen-Dice dissimilarity _U_s_a_g_e: duppSorensen(x, ...) ## S3 method for class 'table' duppSorensen( x, dis = dSorensen.table(x, check.table = FALSE), se = seSorensen.table(x, check.table = FALSE), conf.level = 0.95, z.conf.level = qnorm(1 - conf.level), boot = FALSE, nboot = 10000, check.table = TRUE, ... ) ## S3 method for class 'matrix' duppSorensen( x, dis = dSorensen.matrix(x, check.table = FALSE), se = seSorensen.matrix(x, check.table = FALSE), conf.level = 0.95, z.conf.level = qnorm(1 - conf.level), boot = FALSE, nboot = 10000, check.table = TRUE, ... ) ## S3 method for class 'numeric' duppSorensen( x, dis = dSorensen.numeric(x, check.table = FALSE), se = seSorensen.numeric(x, check.table = FALSE), conf.level = 0.95, z.conf.level = qnorm(1 - conf.level), boot = FALSE, nboot = 10000, check.table = TRUE, ... ) ## S3 method for class 'character' duppSorensen( x, y, conf.level = 0.95, boot = FALSE, nboot = 10000, check.table = TRUE, ... ) ## S3 method for class 'list' duppSorensen( x, conf.level = 0.95, boot = FALSE, nboot = 10000, check.table = TRUE, ... ) ## S3 method for class 'tableList' duppSorensen( x, conf.level = 0.95, boot = FALSE, nboot = 10000, check.table = TRUE, ... ) _A_r_g_u_m_e_n_t_s: x: either an object of class "table", "matrix" or "numeric" representing a 2x2 contingency table, or a "character" (a set of gene identifiers) or "list" or "tableList" object. See the details section for more information. ...: additional arguments for function 'buildEnrichTable'. dis: Sorensen-Dice dissimilarity value. Only required to speed computations if this value is known in advance. se: standard error estimate of the sample dissimilarity. Only required to speed computations if this value is known in advance. conf.level: confidence level of the one-sided confidence interval, a numeric value between 0 and 1. z.conf.level: standard normal (or bootstrap, see arguments below) distribution quantile at the '1 - conf.level' value. Only required to speed computations if this value is known in advance. Then, the argument 'conf.level' is ignored. boot: boolean. If TRUE, 'z.conf.level' is computed by means of a bootstrap approach instead of the asymptotic normal approach. Defaults to FALSE. nboot: numeric, number of initially planned bootstrap replicates. Ignored if 'boot == FALSE'. Defaults to 10000. check.table: Boolean. If TRUE (default), argument 'x' is checked to adequately represent a 2x2 contingency table. This checking is performed by means of function 'nice2x2Table'. y: an object of class "character" representing a vector of gene identifiers (e.g., ENTREZ). _D_e_t_a_i_l_s: This function computes the upper limit of a one-sided confidence interval for the Sorensen-Dice dissimilarity, given a 2x2 arrangement of frequencies (either implemented as a "table", a "matrix" or a "numeric" object): n_{11} n_{10} n_{01} n_{00}, The subindex '11' corresponds to those GO terms enriched in both lists, '01' to terms enriched in the second list but not in the first one, '10' to terms enriched in the first list but not enriched in the second one and '00' corresponds to those GO terms non enriched in both gene lists, i.e., to the double negatives, a value which is ignored in the computations, except if 'boot == TRUE'. In the "numeric" interface, if 'length(x) >= 4', the values are interpreted as (n_11, n_01, n_10, n_00), always in this order and discarding extra values if necessary. Arguments 'dis', 'se' and 'z.conf.level' are not required. If known in advance (e.g., as a consequence of previous computations with the same data), providing its value may speed the computations. By default, 'z.conf.level' corresponds to the 1 - conf.level quantile of a standard normal N(0,1) distribution, as the studentized statistic (^d - d) / ^se) is asymptotically N(0,1). In the studentized statistic, d stands for the "true" Sorensen-Dice dissimilarity, ^d to its sample estimate and ^se for the estimate of its standard error. In fact, the normal is its limiting distribution but, for finite samples, the true sampling distribution may present departures from normality (mainly with some inflation in the left tail). The bootstrap method provides a better approximation to the true sampling distribution. In the bootstrap approach, 'nboot' new bootstrap contingency tables are generated from a multinomial distribution with parameters 'size =' n11 + n01 + n10 + n00 and probabilities %. Sometimes, some of these generated tables may present so low frequencies of enrichment that make them unable for Sorensen-Dice computations. As a consequence, the number of effective bootstrap samples may be lower than the number of initially planned bootstrap samples 'nboot'. Computing in advance the value of argument 'z.conf.level' may be a way to cope with these departures from normality, by means of a more adequate quantile function. Alternatively, if 'boot == TRUE', a bootstrap quantile is internally computed. If 'x' is an object of class "character", then 'x' (and 'y') must represent two "character" vectors of valid gene identifiers (e.g., ENTREZ). Then the confidence interval for the dissimilarity between lists 'x' and 'y' is computed, after internally summarizing them as a 2x2 contingency table of joint enrichment. This last operation is performed by function 'buildEnrichTable' and "valid gene identifiers (e.g., ENTREZ)" stands for the coherency of these gene identifiers with the arguments 'geneUniverse' and 'orgPackg' of 'buildEnrichTable', passed by the ellipsis argument '...' in 'dUppSorensen'. In the "list" interface, the argument must be a list of "character" vectors, each one representing a gene list (character identifiers). Then, all pairwise upper limits of the dissimilarity between these gene lists are computed. In the "tableList" interface, the upper limits are computed over each one of these tables. Given gene lists (i.e. "character" vectors of gene identifiers) l1, l2, ..., lk, an object of class "tableList" (typically constructed by a call to function 'buildEnrichTable') is a list of lists of contingency tables t(i,j) generated from each pair of gene lists i and j, with the following structure: $l2 $l2$l1$t(2,1) $l3 $l3$l1$t(3,1), $l3$l2$t(3,2) ... $lk $lk$l1$t(k,1), $lk$l2$t(k,2), ..., $lk$l(k-1)t(k,k-1) _V_a_l_u_e: In the "table", "matrix", "numeric" and "character" interfaces, the value of the Upper limit of the confidence interval for the Sorensen-Dice dissimilarity. When 'boot == TRUE', this result also haves a an extra attribute: "eff.nboot" which corresponds to the number of effective bootstrap replicats, see the details section. In the "list" and "tableList" interfaces, the result is the symmetric matrix of all pairwise upper limits. _M_e_t_h_o_d_s (_b_y _c_l_a_s_s): • 'duppSorensen(table)': S3 method for class "table" • 'duppSorensen(matrix)': S3 method for class "matrix" • 'duppSorensen(numeric)': S3 method for class "numeric" • 'duppSorensen(character)': S3 method for class "character" • 'duppSorensen(list)': S3 method for class "list" • 'duppSorensen(tableList)': S3 method for class "tableList" _S_e_e _A_l_s_o: 'buildEnrichTable' for constructing contingency tables of mutual enrichment, 'nice2x2Table' for checking contingency tables validity, 'dSorensen' for computing the Sorensen-Dice dissimilarity, 'seSorensen' for computing the standard error of the dissimilarity, 'equivTestSorensen' for an equivalence test. _E_x_a_m_p_l_e_s: # Gene lists 'atlas' and 'sanger' in 'Cangenes' dataset. Table of joint enrichment # of GO terms in ontology BP at level 3. data(tab_atlas.sanger_BP3) ?tab_atlas.sanger_BP3 duppSorensen(tab_atlas.sanger_BP3) dSorensen(tab_atlas.sanger_BP3) + qnorm(0.95) * seSorensen(tab_atlas.sanger_BP3) # Using the bootstrap approximation instead of the normal approximation to # the sampling distribution of (^d - d) / se(^d): duppSorensen(tab_atlas.sanger_BP3, boot = TRUE) # Contingency table as a numeric vector: duppSorensen(c(56, 1, 30, 47)) duppSorensen(c(56, 1, 30)) # Upper confidence limit for the Sorensen-Dice dissimilarity, from scratch, # directly from two gene lists: # (These examples may be considerably time consuming due to many enrichment # tests to build the contingency tables of mutual enrichment) # data(allOncoGeneLists) # ?allOncoGeneLists # Obtaining ENTREZ identifiers for the gene universe of humans: # library(org.Hs.eg.db) # humanEntrezIDs <- keys(org.Hs.eg.db, keytype = "ENTREZID") # Computing the Upper confidence limit: # duppSorensen(allOncoGeneLists$atlas, allOncoGeneLists$sanger, # onto = "CC", GOLevel = 5, # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") # Even more time consuming (all pairwise values): # duppSorensen(allOncoGeneLists, # onto = "CC", GOLevel = 5, # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > duppSorensen(tab_atlas.sanger_BP3) [1] 0.3859598 > # This confidence limit is based on an assimptotic normal N(0,1) > # approximation to the distribution of (dSampl - d) / se, where > # dSampl stands for the sample dissimilarity, d for the true dissimilarity > # and se for the sample dissimilarity standard error estimate. > > # Upper confidence limit but using a Student's t instead of a N(0,1) > # (just as an example, not recommended -no theoretical justification) > df <- sum(tab_atlas.sanger_BP3[1:3]) - 2 > duppSorensen(tab_atlas.sanger_BP3, z.conf.level = qt(1 - 0.95, df)) [1] 0.3870921 > > # Upper confidence limit but using a bootstrap approximation > # to the sampling distribution, instead of a N(0,1) > set.seed(123) > duppSorensen(tab_atlas.sanger_BP3, boot = TRUE) [1] 0.3941622 attr(,"eff.nboot") [1] 10000 > > # Some computations on diverse data structures: > badConti <- as.table(matrix(c(501, 27, 36, 12, 43, 15, 0, 0, 0), + nrow = 3, ncol = 3, + dimnames = list(c("a1","a2","a3"), + c("b1", "b2","b3")))) > tryCatch(nice2x2Table(badConti), error = function(e) {return(e)}) <simpleError in nice2x2Table.table(badConti): Not a 2x2 table> > > incompleteConti <- badConti[1,1:min(2,ncol(badConti)), drop = FALSE] > incompleteConti b1 b2 a1 501 12 > tryCatch(nice2x2Table(incompleteConti), error = function(e) {return(e)}) <simpleError in nice2x2Table.table(incompleteConti): Not a 2x2 table> > > contiAsVector <- c(32, 21, 81, 1439) > nice2x2Table(contiAsVector) [1] TRUE > contiAsVector.mat <- matrix(contiAsVector, nrow = 2) > contiAsVector.mat [,1] [,2] [1,] 32 81 [2,] 21 1439 > contiAsVectorLen3 <- c(32, 21, 81) > nice2x2Table(contiAsVectorLen3) [1] TRUE > > tryCatch(dSorensen(badConti), error = function(e) {return(e)}) <simpleError in nice2x2Table.table(x): Not a 2x2 table> > > # Apparently, the next order works fine, but returns a wrong value! > dSorensen(badConti, check.table = FALSE) [1] 0.05915493 > > tryCatch(dSorensen(incompleteConti), error = function(e) {return(e)}) <simpleError in nice2x2Table.table(x): Not a 2x2 table> > dSorensen(contiAsVector) [1] 0.6144578 > dSorensen(contiAsVector.mat) [1] 0.6144578 > dSorensen(contiAsVectorLen3) [1] 0.6144578 > dSorensen(contiAsVectorLen3, check.table = FALSE) [1] 0.6144578 > dSorensen(c(0,0,0,45)) [1] NaN > > tryCatch(seSorensen(badConti), error = function(e) {return(e)}) <simpleError in nice2x2Table.table(x): Not a 2x2 table> > tryCatch(seSorensen(incompleteConti), error = function(e) {return(e)}) <simpleError in nice2x2Table.table(x): Not a 2x2 table> > seSorensen(contiAsVector) [1] 0.04818012 > seSorensen(contiAsVector.mat) [1] 0.04818012 > seSorensen(contiAsVectorLen3) [1] 0.04818012 > seSorensen(contiAsVectorLen3, check.table = FALSE) [1] 0.04818012 > tryCatch(seSorensen(contiAsVectorLen3, check.table = "not"), error = function(e) {return(e)}) <simpleError in seSorensen.numeric(contiAsVectorLen3, check.table = "not"): Argument 'check.table' must be logical> > seSorensen(c(0,0,0,45)) [1] NaN > > tryCatch(duppSorensen(badConti), error = function(e) {return(e)}) <simpleError in nice2x2Table.table(x): Not a 2x2 table> > tryCatch(duppSorensen(incompleteConti), error = function(e) {return(e)}) <simpleError in nice2x2Table.table(x): Not a 2x2 table> > duppSorensen(contiAsVector) [1] 0.6937071 > duppSorensen(contiAsVector.mat) [1] 0.6937071 > set.seed(123) > duppSorensen(contiAsVector, boot = TRUE) [1] 0.6922658 attr(,"eff.nboot") [1] 10000 > set.seed(123) > duppSorensen(contiAsVector.mat, boot = TRUE) [1] 0.6922658 attr(,"eff.nboot") [1] 10000 > duppSorensen(contiAsVectorLen3) [1] 0.6937071 > # Bootstrapping requires full contingency tables (4 values) > set.seed(123) > tryCatch(duppSorensen(contiAsVectorLen3, boot = TRUE), error = function(e) {return(e)}) <simpleError in duppSorensen.numeric(contiAsVectorLen3, boot = TRUE): Bootstraping requires a numeric vector of 4 frequencies> > duppSorensen(c(0,0,0,45)) [1] NaN > > # Equivalence test, H0: d >= d0 vs H1: d < d0 (d0 = 0.4444) > ?equivTestSorensen equivTestSorensen package:goSorensen R Documentation _E_q_u_i_v_a_l_e_n_c_e _t_e_s_t _b_a_s_e_d _o_n _t_h_e _S_o_r_e_n_s_e_n-_D_i_c_e _d_i_s_s_i_m_i_l_a_r_i_t_y _D_e_s_c_r_i_p_t_i_o_n: Equivalence test based on the Sorensen-Dice dissimilarity, computed either by an asymptotic normal approach or by a bootstrap approach. _U_s_a_g_e: equivTestSorensen(x, ...) ## S3 method for class 'table' equivTestSorensen( x, d0 = 1/(1 + 1.25), conf.level = 0.95, boot = FALSE, nboot = 10000, check.table = TRUE, ... ) ## S3 method for class 'matrix' equivTestSorensen( x, d0 = 1/(1 + 1.25), conf.level = 0.95, boot = FALSE, nboot = 10000, check.table = TRUE, ... ) ## S3 method for class 'numeric' equivTestSorensen( x, d0 = 1/(1 + 1.25), conf.level = 0.95, boot = FALSE, nboot = 10000, check.table = TRUE, ... ) ## S3 method for class 'character' equivTestSorensen( x, y, d0 = 1/(1 + 1.25), conf.level = 0.95, boot = FALSE, nboot = 10000, check.table = TRUE, ... ) ## S3 method for class 'list' equivTestSorensen( x, d0 = 1/(1 + 1.25), conf.level = 0.95, boot = FALSE, nboot = 10000, check.table = TRUE, ... ) ## S3 method for class 'tableList' equivTestSorensen( x, d0 = 1/(1 + 1.25), conf.level = 0.95, boot = FALSE, nboot = 10000, check.table = TRUE, ... ) _A_r_g_u_m_e_n_t_s: x: either an object of class "table", "matrix", "numeric", "character", "list" or "tableList". See the details section for more information. ...: extra parameters for function 'buildEnrichTable'. d0: equivalence threshold for the Sorensen-Dice dissimilarity, d. The null hypothesis states that d >= d0, i.e., inequivalence between the compared gene lists and the alternative that d < d0, i.e., equivalence or dissimilarity irrelevance (up to a level d0). conf.level: confidence level of the one-sided confidence interval, a value between 0 and 1. boot: boolean. If TRUE, the confidence interval and the test p-value are computed by means of a bootstrap approach instead of the asymptotic normal approach. Defaults to FALSE. nboot: numeric, number of initially planned bootstrap replicates. Ignored if 'boot == FALSE'. Defaults to 10000. check.table: Boolean. If TRUE (default), argument 'x' is checked to adequately represent a 2x2 contingency table (or an aggregate of them) or gene lists producing a correct table. This checking is performed by means of function 'nice2x2Table'. y: an object of class "character" representing a list of gene identifiers (e.g., ENTREZ). _D_e_t_a_i_l_s: This function computes either the normal asymptotic or the bootstrap equivalence test based on the Sorensen-Dice dissimilarity, given a 2x2 arrangement of frequencies (either implemented as a "table", a "matrix" or a "numeric" object): n_{11} n_{10} n_{01} n_{00}, The subindex '11' corresponds to those GO terms enriched in both lists, '01' to terms enriched in the second list but not in the first one, '10' to terms enriched in the first list but not enriched in the second one and '00' corresponds to those GO terms non enriched in both gene lists, i.e., to the double negatives, a value which is ignored in the computations. In the "numeric" interface, if 'length(x) >= 4', the values are interpreted as (n_11, n_01, n_10, n_00), always in this order and discarding extra values if necessary. If 'x' is an object of class "character", then 'x' (and 'y') must represent two "character" vectors of valid gene identifiers (e.g., ENTREZ). Then the equivalence test is performed between 'x' and 'y', after internally summarizing them as a 2x2 contingency table of joint enrichment. This last operation is performed by function 'buildEnrichTable' and "valid gene identifiers (e.g., ENTREZ)" stands for the coherency of these gene identifiers with the arguments 'geneUniverse' and 'orgPackg' of 'buildEnrichTable', passed by the ellipsis argument '...' in 'equivTestSorensen'. If 'x' is an object of class "list", each of its elements must be a "character" vector of gene identifiers (e.g., ENTREZ). Then all pairwise equivalence tests are performed between these gene lists. Class "tableList" corresponds to objects representing all mutual enrichment contingency tables generated in a pairwise fashion: Given gene lists l1, l2, ..., lk, an object of class "tableList" (typically constructed by a call to function 'buildEnrichTable') is a list of lists of contingency tables tij generated from each pair of gene lists i and j, with the following structure: $l2 $l2$l1$t21 $l3 $l3$l1$t31, $l3$l2$t32 ... $lk$l1$tk1, $lk$l2$tk2, ..., $lk$l(k-1)tk(k-1) If 'x' is an object of class "tableList", the test is performed over each one of these tables. The test is based on the fact that the studentized statistic (^d - d) / ^se is approximately distributed as a standard normal. ^d stands for the sample Sorensen-Dice dissimilarity, d for its true (unknown) value and ^se for the estimate of its standard error. This result is asymptotically correct, but the true distribution of the studentized statistic is not exactly normal for finite samples, with a heavier left tail than expected under the Gaussian model, which may produce some type I error inflation. The bootstrap method provides a better approximation to this distribution. In the bootstrap approach, 'nboot' new bootstrap contingency tables are generated from a multinomial distribution with parameters 'size =' (n11 + n01 + n10 + n00) and probabilities %. Sometimes, some of these generated tables may present so low frequencies of enrichment that make them unable for Sorensen-Dice computations. As a consequence, the number of effective bootstrap samples may be lower than the number of initially planned ones, 'nboot', but our simulation studies concluded that this makes the test more conservative, less prone to reject a truly false null hypothesis of inequivalence, but in any case protects from inflating the type I error. In a bootstrap test result, use 'getNboot' to access the number of initially planned bootstrap replicates and 'getEffNboot' to access the number of finally effective bootstrap replicates. _V_a_l_u_e: For all interfaces (except for the "list" and "tableList" interfaces) the result is a list of class "equivSDhtest" which inherits from "htest", with the following components: statistic the value of the studentized statistic (dSorensen(x) - d0) / seSorensen(x) p.value the p-value of the test conf.int the one-sided confidence interval (0, dUpp] estimate the Sorensen dissimilarity estimate, dSorensen(x) null.value the value of d0 stderr the standard error of the Sorensen dissimilarity estimate, seSorensen(x), used as denominator in the studentized statistic alternative a character string describing the alternative hypothesis method a character string describing the test data.name a character string giving the names of the data enrichTab the 2x2 contingency table of joint enrichment whereby the test was based For the "list" and "tableList" interfaces, the result is an "equivSDhtestList", a list of objects with all pairwise comparisons, each one being an object of "equivSDhtest" class. _M_e_t_h_o_d_s (_b_y _c_l_a_s_s): • 'equivTestSorensen(table)': S3 method for class "table" • 'equivTestSorensen(matrix)': S3 method for class "matrix" • 'equivTestSorensen(numeric)': S3 method for class "numeric" • 'equivTestSorensen(character)': S3 method for class "character" • 'equivTestSorensen(list)': S3 method for class "list" • 'equivTestSorensen(tableList)': S3 method for class "tableList" _S_e_e _A_l_s_o: 'nice2x2Table' for checking and reformatting data, 'dSorensen' for computing the Sorensen-Dice dissimilarity, 'seSorensen' for computing the standard error of the dissimilarity, 'duppSorensen' for the upper limit of a one-sided confidence interval of the dissimilarity. 'getTable', 'getPvalue', 'getUpper', 'getSE', 'getNboot' and 'getEffNboot' for accessing specific fields in the result of these testing functions. 'update' for updating the result of these testing functions with alternative equivalence limits, confidence levels or to convert a normal result in a bootstrap result or the reverse. _E_x_a_m_p_l_e_s: # Gene lists 'atlas' and 'sanger' in 'allOncoGeneLists' dataset. Table of joint enrichment # of GO terms in ontology BP at level 3. data(tab_atlas.sanger_BP3) tab_atlas.sanger_BP3 equivTestSorensen(tab_atlas.sanger_BP3) # Bootstrap test: equivTestSorensen(tab_atlas.sanger_BP3, boot = TRUE) # Equivalence tests from scratch, directly from gene lists: # (These examples may be considerably time consuming due to many enrichment # tests to build the contingency tables of mutual enrichment) # data(allOncoGeneLists) # ?allOncoGeneLists # Obtaining ENTREZ identifiers for the gene universe of humans: library(org.Hs.eg.db) humanEntrezIDs <- keys(org.Hs.eg.db, keytype = "ENTREZID") # Computing the equivalence test: # equivTestSorensen(allOncoGeneLists$atlas, allOncoGeneLists$sanger, # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db", # onto = "BP", GOLevel = 3) # Bootstrap instead of normal approximation test: # equivTestSorensen(allOncoGeneLists$atlas, allOncoGeneLists$sanger, # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db", # onto = "BP", GOLevel = 3, # boot = TRUE) # Essentially, the above code makes: # ctab_atlas.sanger_BP3 <- buildEnrichTable(allOncoGeneLists$atlas, allOncoGeneLists$sanger, # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db", # onto = "BP", GOLevel = 3) # ctab_atlas.sanger_BP3 # equivTestSorensen(ctab_atlas.sanger_BP3) # equivTestSorensen(ctab_atlas.sanger_BP3, boot = TRUE) # (Note that building first the contingency table may be advantageous to save time!) # The object tab_atlas.sanger_BP3 and ctab_atlas.sanger_BP3 are exactly the same # All pairwise equivalence tests: # equivTestSorensen(allOncoGeneLists, # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db", # onto = "BP", GOLevel = 3) # Equivalence test on a contingency table represented as a numeric vector: equivTestSorensen(c(56, 1, 30, 47)) equivTestSorensen(c(56, 1, 30, 47), boot = TRUE) equivTestSorensen(c(56, 1, 30)) # Error: all frequencies are needed for bootstrap: try(equivTestSorensen(c(56, 1, 30), boot = TRUE), TRUE) > equiv.atlas.sanger <- equivTestSorensen(tab_atlas.sanger_BP3) > equiv.atlas.sanger Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab_atlas.sanger_BP3 (d - d0) / se = -2.801, p-value = 0.002547 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3859598 sample estimates: Sorensen dissimilarity 0.3027523 attr(,"se") standard error 0.05058655 > getTable(equiv.atlas.sanger) Enriched in sanger Enriched in atlas TRUE FALSE TRUE 38 31 FALSE 2 452 > getPvalue(equiv.atlas.sanger) p-value 0.002547349 > > tryCatch(equivTestSorensen(badConti), error = function(e) {return(e)}) <simpleError in nice2x2Table.table(x): Not a 2x2 table> > tryCatch(equivTestSorensen(incompleteConti), error = function(e) {return(e)}) <simpleError in nice2x2Table.table(x): Not a 2x2 table> > equivTestSorensen(contiAsVector) Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: contiAsVector (d - d0) / se = 3.5287, p-value = 0.9998 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.6937071 sample estimates: Sorensen dissimilarity 0.6144578 attr(,"se") standard error 0.04818012 > equivTestSorensen(contiAsVector.mat) Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: contiAsVector.mat (d - d0) / se = 3.5287, p-value = 0.9998 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.6937071 sample estimates: Sorensen dissimilarity 0.6144578 attr(,"se") standard error 0.04818012 > set.seed(123) > equivTestSorensen(contiAsVector.mat, boot = TRUE) Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: contiAsVector.mat (d - d0) / se = 3.5287, p-value = 0.9996 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.6922658 sample estimates: Sorensen dissimilarity 0.6144578 attr(,"se") standard error 0.04818012 > equivTestSorensen(contiAsVectorLen3) Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: contiAsVectorLen3 (d - d0) / se = 3.5287, p-value = 0.9998 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.6937071 sample estimates: Sorensen dissimilarity 0.6144578 attr(,"se") standard error 0.04818012 > > tryCatch(equivTestSorensen(contiAsVectorLen3, boot = TRUE), error = function(e) {return(e)}) <simpleError in equivTestSorensen.numeric(contiAsVectorLen3, boot = TRUE): Bootstraping requires a numeric vector of 4 frequencies> > > equivTestSorensen(c(0,0,0,45)) No test performed due non finite (d - d0) / se statistic data: c(0, 0, 0, 45) (d - d0) / se = NaN, p-value = NA alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0 NaN sample estimates: Sorensen dissimilarity NaN attr(,"se") standard error NaN > > # Sorensen-Dice computations from scratch, directly from gene lists > data(allOncoGeneLists) > ?allOncoGeneLists allOncoGeneLists package:goSorensen R Documentation _7 _g_e_n_e _l_i_s_t_s _p_o_s_s_i_b_l_y _r_e_l_a_t_e_d _w_i_t_h _c_a_n_c_e_r _D_e_s_c_r_i_p_t_i_o_n: An object of class "list" of length 7. Each one of its elements is a "character" vector of gene identifiers (e.g., ENTREZ). Only gene lists of length almost 100 were taken from their source web. Take these lists just as an illustrative example, they are not automatically updated. _U_s_a_g_e: data(allOncoGeneLists) _F_o_r_m_a_t: An object of class "list" of length 7. Each one of its elements is a "character" vector of ENTREZ gene identifiers . _S_o_u_r_c_e: <http://www.bushmanlab.org/links/genelists> > > library(org.Hs.eg.db) Loading required package: AnnotationDbi Loading required package: stats4 Loading required package: BiocGenerics Attaching package: 'BiocGenerics' The following objects are masked from 'package:stats': IQR, mad, sd, var, xtabs The following objects are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted, lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rownames, sapply, saveRDS, setdiff, table, tapply, union, unique, unsplit, which.max, which.min Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Loading required package: IRanges Loading required package: S4Vectors Attaching package: 'S4Vectors' The following object is masked from 'package:utils': findMatches The following objects are masked from 'package:base': I, expand.grid, unname > humanEntrezIDs <- keys(org.Hs.eg.db, keytype = "ENTREZID") > # First, the mutual GO node enrichment tables are built, then computations > # proceed from these contingency tables. > # Building the contingency tables is a slow process (many enrichment tests) > normTest <- equivTestSorensen(allOncoGeneLists[["atlas"]], allOncoGeneLists[["sanger"]], + listNames = c("atlas", "sanger"), + onto = "BP", GOLevel = 5, + geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > normTest Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = -7.6163, p-value = 1.306e-14 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3604408 sample estimates: Sorensen dissimilarity 0.3373016 attr(,"se") standard error 0.01406763 > > # To perform a bootstrap test from scratch would be even slower: > # set.seed(123) > # bootTest <- equivTestSorensen(allOncoGeneLists[["atlas"]], allOncoGeneLists[["sanger"]], > # listNames = c("atlas", "sanger"), > # boot = TRUE, > # onto = "BP", GOLevel = 5, > # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > # bootTest > > # It is much faster to upgrade 'normTest' to be a bootstrap test: > set.seed(123) > bootTest <- upgrade(normTest, boot = TRUE) > bootTest Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = -7.6163, p-value = 9.999e-05 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3604667 sample estimates: Sorensen dissimilarity 0.3373016 attr(,"se") standard error 0.01406763 > # To know the number of planned bootstrap replicates: > getNboot(bootTest) [1] 10000 > # To know the number of valid bootstrap replicates: > getEffNboot(bootTest) [1] 10000 > > # There are similar methods for dSorensen, seSorensen, duppSorensen, etc. to > # compute directly from a pair of gene lists. > # They are quite slow for the same reason as before (many enrichment tests). > # dSorensen(allOncoGeneLists[["atlas"]], allOncoGeneLists[["sanger"]], > # listNames = c("atlas", "sanger"), > # onto = "BP", GOLevel = 5, > # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > # seSorensen(allOncoGeneLists[["atlas"]], allOncoGeneLists[["sanger"]], > # listNames = c("atlas", "sanger"), > # onto = "BP", GOLevel = 5, > # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > # > # duppSorensen(allOncoGeneLists[["atlas"]], allOncoGeneLists[["sanger"]], > # listNames = c("atlas", "sanger"), > # onto = "BP", GOLevel = 5, > # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > # > # set.seed(123) > # duppSorensen(allOncoGeneLists[["atlas"]], allOncoGeneLists[["sanger"]], > # boot = TRUE, > # listNames = c("atlas", "sanger"), > # onto = "BP", GOLevel = 5, > # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > # etc. > > # To build the contingency table first and then compute from it, may be a more flexible > # and saving time strategy, in general: > ?buildEnrichTable buildEnrichTable package:goSorensen R Documentation _C_r_e_a_t_e_s _a _2_x_2 _e_n_r_i_c_h_m_e_n_t _c_o_n_t_i_n_g_e_n_c_y _t_a_b_l_e _f_r_o_m _t_w_o _g_e_n_e _l_i_s_t_s, _o_r _a_l_l _p_a_i_r_w_i_s_e _c_o_n_t_i_n_g_e_n_c_y _t_a_b_l_e_s _f_o_r _a "_l_i_s_t" _o_f _g_e_n_e _l_i_s_t_s. _D_e_s_c_r_i_p_t_i_o_n: Creates a 2x2 enrichment contingency table from two gene lists, or all pairwise contingency tables for a "list" of gene lists. _U_s_a_g_e: buildEnrichTable(x, ...) ## Default S3 method: buildEnrichTable( x, y, listNames = c("gene.list1", "gene.list2"), check.table = TRUE, geneUniverse, orgPackg, onto, GOLevel, showEnrichedIn = TRUE, pAdjustMeth = "BH", pvalCutoff = 0.01, qvalCutoff = 0.05, parallel = FALSE, nOfCores = 1, ... ) ## S3 method for class 'character' buildEnrichTable( x, y, listNames = c("gene.list1", "gene.list2"), geneUniverse, orgPackg, onto, GOLevel, showEnrichedIn = TRUE, check.table = TRUE, pAdjustMeth = "BH", pvalCutoff = 0.01, qvalCutoff = 0.05, parallel = FALSE, nOfCores = 1, ... ) ## S3 method for class 'list' buildEnrichTable( x, check.table = TRUE, geneUniverse, orgPackg, onto, GOLevel, showEnrichedIn = TRUE, pAdjustMeth = "BH", pvalCutoff = 0.01, qvalCutoff = 0.05, parallel = FALSE, nOfCores = min(detectCores() - 1, length(x) - 1), ... ) _A_r_g_u_m_e_n_t_s: x: either an object of class "character" (or coerzable to "character") representing a vector of gene identifiers (e.g., ENTREZ) or an object of class "list". In this second case, each element of the list must be a "character" vector of gene identifiers (e.g., ENTREZ). Then, all pairwise contingency tables between these gene lists are built. ...: Additional parameters for internal use (not used for the moment) y: an object of class "character" (or coerzable to "character") representing a vector of gene identifiers (e.g., ENTREZ). listNames: a character(2) with the gene lists names originating the cross-tabulated enrichment frequencies. Only in the "character" or default interface. check.table: Logical The resulting table must be checked. Defaults to TRUE. geneUniverse: character vector containing the universe of genes from where gene lists have been extracted. This vector must be obtained from the annotation package declared in 'orgPackg'. For more details see README File. orgPackg: A string with the name of the genomic annotation package corresponding to a specific species to be analyzed, which must be previously installed and activated. For more details see README File. onto: string describing the ontology. Either "BP", "MF" or "CC". GOLevel: An integer, the GO ontology level. showEnrichedIn: Boolean. If TRUE (default), the cross-table of enriched and non-enriched GO terms vs Gene Lists names (obtained from the 'enrichedIn' function) is automatically saved in the Global Environment. pAdjustMeth: string describing the adjust method, either "BH", "BY" or "Bonf", defaults to 'BH'. pvalCutoff: adjusted pvalue cutoff on enrichment tests to report qvalCutoff: qvalue cutoff on enrichment tests to report as significant. Tests must pass i) pvalueCutoff on unadjusted pvalues, ii) pvalueCutoff on adjusted pvalues and iii) qvalueCutoff on qvalues to be reported parallel: Logical. Defaults to FALSE but put it at TRUE for parallel computation. nOfCores: Number of cores for parallel computations. Only in "list" interface. _D_e_t_a_i_l_s: The arguments 'parallel' and 'nOfCores' are ignored in the 'default' and "character" interfaces, but included for possible future developments; they only apply to the "list" interface. In the "list" interface, 'parallel' defaults to FALSE but there is the possibility of some time saving when the number of gene lists (the length of 'x' in the "list" interface) is high. The trade off between the time spent initializing parallel computing and the possible time gain due to parallelization must be considered in each application and computer. _V_a_l_u_e: in the "character" interface, an object of class "table". It represents a 2x2 contingency table, the cross-tabulation of the enriched GO terms in two gene lists: "Number of enriched GO terms in list 1 (TRUE, FALSE)" x "Number of enriched Go terms in list 2 (TRUE, FALSE)". In the "list" interface, the result is an object of class "tableList" with all pairwise tables. Class "tableList" corresponds to objects representing all mutual enrichment contingency tables generated in a pairwise fashion: Given gene lists (i.e. "character" vectors of gene identifiers) l1, l2, ..., lk, an object of class "tableList" is a list of lists of contingency tables t(i,j) generated from each pair of gene lists i and j, with the following structure: $l2 $l2$l1$t(2,1) $l3 $l3$l1$t(3,1), $l3$l2$t(3,2) ... $lk $lk$l1$t(k,1), $lk$l2$t(k,2), ..., $lk$l(k-1)t(K,k-1) _M_e_t_h_o_d_s (_b_y _c_l_a_s_s): • 'buildEnrichTable(default)': S3 default method • 'buildEnrichTable(character)': S3 method for class "character" • 'buildEnrichTable(list)': S3 method for class "list" _E_x_a_m_p_l_e_s: # Obtaining ENTREZ identifiers for the gene universe of humans: library(org.Hs.eg.db) humanEntrezIDs <- keys(org.Hs.eg.db, keytype = "ENTREZID") # Gene lists to be explored for enrichment: data(allOncoGeneLists) ?allOncoGeneLists # Table of joint GO term enrichment between gene lists Vogelstein and sanger, # for ontology MF at GO level 6. vog.VS.sang <- buildEnrichTable(allOncoGeneLists[["Vogelstein"]], allOncoGeneLists[["sanger"]], geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db", onto = "MF", GOLevel = 6, listNames = c("Vogelstein", "sanger")) vog.VS.sang # All tables of mutual enrichment: all.tabs <- buildEnrichTable(allOncoGeneLists, geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db", onto = "MF", GOLevel = 6) all.tabs$waldman > tab <- buildEnrichTable(allOncoGeneLists[["atlas"]], allOncoGeneLists[["sanger"]], + listNames = c("atlas", "sanger"), + onto = "BP", GOLevel = 5, + geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > > tab Enriched in sanger Enriched in atlas TRUE FALSE TRUE 501 458 FALSE 52 9463 > > # (Here, an obvious faster possibility would be to recover the enrichment contingency > # table from the previous normal test result:) > tab <- getTable(normTest) > tab Enriched in sanger Enriched in atlas TRUE FALSE TRUE 501 458 FALSE 52 9463 > > tst <- equivTestSorensen(tab) > tst Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = -7.6163, p-value = 1.306e-14 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3604408 sample estimates: Sorensen dissimilarity 0.3373016 attr(,"se") standard error 0.01406763 > set.seed(123) > bootTst <- equivTestSorensen(tab, boot = TRUE) > bootTst Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = -7.6163, p-value = 9.999e-05 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3604667 sample estimates: Sorensen dissimilarity 0.3373016 attr(,"se") standard error 0.01406763 > > dSorensen(tab) [1] 0.3373016 > seSorensen(tab) [1] 0.01406763 > # or: > getDissimilarity(tst) Sorensen dissimilarity 0.3373016 attr(,"se") standard error 0.01406763 > > duppSorensen(tab) [1] 0.3604408 > getUpper(tst) dUpper 0.3604408 > > set.seed(123) > duppSorensen(tab, boot = TRUE) [1] 0.3604667 attr(,"eff.nboot") [1] 10000 > getUpper(bootTst) dUpper 0.3604667 > > # To perform from scratch all pairwise tests (or other Sorensen-Dice computations) > # is even much slower. For example, all pairwise... > # Dissimilarities: > # # allPairDiss <- dSorensen(allOncoGeneLists, > # # onto = "BP", GOLevel = 5, > # # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > # # allPairDiss > # > # # Still time consuming but faster: build all tables computing in parallel: > # allPairDiss <- dSorensen(allOncoGeneLists, > # onto = "BP", GOLevel = 5, > # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db", > # parallel = TRUE) > # allPairDiss > > # Standard errors: > # seSorensen(allOncoGeneLists, > # onto = "BP", GOLevel = 5, > # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > # > # Upper confidence interval limits: > # duppSorensen(allOncoGeneLists, > # onto = "BP", GOLevel = 5, > # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > # All pairwise asymptotic normal tests: > # allTests <- equivTestSorensen(allOncoGeneLists, > # onto = "BP", GOLevel = 5, > # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > # getPvalue(allTests, simplify = FALSE) > # getPvalue(allTests) > # p.adjust(getPvalue(allTests), method = "holm") > # To perform all pairwise bootstrap tests from scratch is (slightly) > # even more time consuming: > # set.seed(123) > # allBootTests <- equivTestSorensen(allOncoGeneLists, > # boot = TRUE, > # onto = "BP", GOLevel = 5, > # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > # Not all bootstrap replicates may conduct to finite statistics: > # getNboot(allBootTests) > > # Given the normal tests (object 'allTests'), it is much faster to upgrade > # it to have the bootstrap tests: > # set.seed(123) > # allBootTests <- upgrade(allTests, boot = TRUE) > # getPvalue(allBootTests, simplify = FALSE) > > # Again, the faster and more flexible possibility may be: > # 1) First, build all pairwise enrichment contingency tables (slow first step): > # allTabsBP.4 <- buildEnrichTable(allOncoGeneLists, > # onto = "BP", GOLevel = 5, > # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db") > # allTabsBP.4 > > # Better, directly use the dataset available at this package, goSorensen: > data(allTabsBP.4) > allTabsBP.4 $cangenes $cangenes$atlas Enriched in atlas Enriched in cangenes TRUE FALSE TRUE 0 0 FALSE 420 3383 attr(,"onto") [1] "BP" attr(,"GOLevel") [1] 4 $cis $cis$atlas Enriched in atlas Enriched in cis TRUE FALSE TRUE 80 3 FALSE 340 3380 $cis$cangenes Enriched in cangenes Enriched in cis TRUE FALSE TRUE 0 83 FALSE 0 3720 attr(,"onto") [1] "BP" attr(,"GOLevel") [1] 4 $miscellaneous $miscellaneous$atlas Enriched in atlas Enriched in miscellaneous TRUE FALSE TRUE 198 21 FALSE 222 3362 $miscellaneous$cangenes Enriched in cangenes Enriched in miscellaneous TRUE FALSE TRUE 0 219 FALSE 0 3584 attr(,"onto") [1] "BP" attr(,"GOLevel") [1] 4 $miscellaneous$cis Enriched in cis Enriched in miscellaneous TRUE FALSE TRUE 70 149 FALSE 13 3571 $sanger $sanger$atlas Enriched in atlas Enriched in sanger TRUE FALSE TRUE 209 24 FALSE 211 3359 $sanger$cangenes Enriched in cangenes Enriched in sanger TRUE FALSE TRUE 0 233 FALSE 0 3570 attr(,"onto") [1] "BP" attr(,"GOLevel") [1] 4 $sanger$cis Enriched in cis Enriched in sanger TRUE FALSE TRUE 68 165 FALSE 15 3555 $sanger$miscellaneous Enriched in miscellaneous Enriched in sanger TRUE FALSE TRUE 151 82 FALSE 68 3502 $Vogelstein $Vogelstein$atlas Enriched in atlas Enriched in Vogelstein TRUE FALSE TRUE 220 32 FALSE 200 3351 $Vogelstein$cangenes Enriched in cangenes Enriched in Vogelstein TRUE FALSE TRUE 0 252 FALSE 0 3551 attr(,"onto") [1] "BP" attr(,"GOLevel") [1] 4 $Vogelstein$cis Enriched in cis Enriched in Vogelstein TRUE FALSE TRUE 68 184 FALSE 15 3536 $Vogelstein$miscellaneous Enriched in miscellaneous Enriched in Vogelstein TRUE FALSE TRUE 156 96 FALSE 63 3488 $Vogelstein$sanger Enriched in sanger Enriched in Vogelstein TRUE FALSE TRUE 217 35 FALSE 16 3535 $waldman $waldman$atlas Enriched in atlas Enriched in waldman TRUE FALSE TRUE 264 39 FALSE 156 3344 $waldman$cangenes Enriched in cangenes Enriched in waldman TRUE FALSE TRUE 0 303 FALSE 0 3500 attr(,"onto") [1] "BP" attr(,"GOLevel") [1] 4 $waldman$cis Enriched in cis Enriched in waldman TRUE FALSE TRUE 77 226 FALSE 6 3494 $waldman$miscellaneous Enriched in miscellaneous Enriched in waldman TRUE FALSE TRUE 203 100 FALSE 16 3484 $waldman$sanger Enriched in sanger Enriched in waldman TRUE FALSE TRUE 181 122 FALSE 52 3448 $waldman$Vogelstein Enriched in Vogelstein Enriched in waldman TRUE FALSE TRUE 192 111 FALSE 60 3440 attr(,"onto") [1] "BP" attr(,"GOLevel") [1] 4 attr(,"class") [1] "tableList" "list" > class(allTabsBP.4) [1] "tableList" "list" > # 2) Then perform all required computatios from these enrichment contingency tables... > # All pairwise tests: > allTests <- equivTestSorensen(allTabsBP.4) > allTests $cangenes $cangenes$atlas No test performed due not finite (d - d0) / se statistic data: tab (d - d0) / se = Inf, p-value = NA alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0 NaN sample estimates: Sorensen dissimilarity 1 attr(,"se") standard error 0 $cis $cis$atlas Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = 8.807, p-value = 1 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.7262589 sample estimates: Sorensen dissimilarity 0.6819085 attr(,"se") standard error 0.02696312 $cis$cangenes No test performed due not finite (d - d0) / se statistic data: tab (d - d0) / se = Inf, p-value = NA alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0 NaN sample estimates: Sorensen dissimilarity 1 attr(,"se") standard error 0 $miscellaneous $miscellaneous$atlas Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = -2.8406, p-value = 0.002252 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.4174355 sample estimates: Sorensen dissimilarity 0.3802817 attr(,"se") standard error 0.02258792 $miscellaneous$cangenes No test performed due not finite (d - d0) / se statistic data: tab (d - d0) / se = Inf, p-value = NA alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0 NaN sample estimates: Sorensen dissimilarity 1 attr(,"se") standard error 0 $miscellaneous$cis Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = 2.5804, p-value = 0.9951 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.5950555 sample estimates: Sorensen dissimilarity 0.5364238 attr(,"se") standard error 0.03564549 $sanger $sanger$atlas Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = -3.8566, p-value = 5.748e-05 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3959452 sample estimates: Sorensen dissimilarity 0.3598775 attr(,"se") standard error 0.02192764 $sanger$cangenes No test performed due not finite (d - d0) / se statistic data: tab (d - d0) / se = Inf, p-value = NA alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0 NaN sample estimates: Sorensen dissimilarity 1 attr(,"se") standard error 0 $sanger$cis Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = 3.5799, p-value = 0.9998 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.6271347 sample estimates: Sorensen dissimilarity 0.5696203 attr(,"se") standard error 0.03496631 $sanger$miscellaneous Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = -4.3974, p-value = 5.479e-06 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3739718 sample estimates: Sorensen dissimilarity 0.3318584 attr(,"se") standard error 0.02560313 $Vogelstein $Vogelstein$atlas Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = -4.6585, p-value = 1.593e-06 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3802668 sample estimates: Sorensen dissimilarity 0.3452381 attr(,"se") standard error 0.02129595 $Vogelstein$cangenes No test performed due not finite (d - d0) / se statistic data: tab (d - d0) / se = Inf, p-value = NA alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0 NaN sample estimates: Sorensen dissimilarity 1 attr(,"se") standard error 0 $Vogelstein$cis Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = 4.4076, p-value = 1 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.6498536 sample estimates: Sorensen dissimilarity 0.5940299 attr(,"se") standard error 0.03393844 $Vogelstein$miscellaneous Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = -4.2339, p-value = 1.148e-05 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3790962 sample estimates: Sorensen dissimilarity 0.3375796 attr(,"se") standard error 0.02524032 $Vogelstein$sanger Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = -23.128, p-value < 2.2e-16 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.1292852 sample estimates: Sorensen dissimilarity 0.1051546 attr(,"se") standard error 0.01467036 $waldman $waldman$atlas Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = -9.3848, p-value < 2.2e-16 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3003348 sample estimates: Sorensen dissimilarity 0.2697095 attr(,"se") standard error 0.01861884 $waldman$cangenes No test performed due not finite (d - d0) / se statistic data: tab (d - d0) / se = Inf, p-value = NA alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0 NaN sample estimates: Sorensen dissimilarity 1 attr(,"se") standard error 0 $waldman$cis Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = 4.9573, p-value = 1 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.6529946 sample estimates: Sorensen dissimilarity 0.6010363 attr(,"se") standard error 0.03158842 $waldman$miscellaneous Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = -11.029, p-value < 2.2e-16 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.2553636 sample estimates: Sorensen dissimilarity 0.2222222 attr(,"se") standard error 0.02014852 $waldman$sanger Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = -5.1402, p-value = 1.372e-07 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3629683 sample estimates: Sorensen dissimilarity 0.3246269 attr(,"se") standard error 0.02330993 $waldman$Vogelstein Normal asymptotic test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity data: tab (d - d0) / se = -6.0739, p-value = 6.243e-10 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.000000 0.345029 sample estimates: Sorensen dissimilarity 0.3081081 attr(,"se") standard error 0.02244631 attr(,"class") [1] "equivSDhtestList" "list" > class(allTests) [1] "equivSDhtestList" "list" > set.seed(123) > allBootTests <- equivTestSorensen(allTabsBP.4, boot = TRUE) > allBootTests $cangenes $cangenes$atlas No test performed due not finite (d - d0) / se statistic data: tab (d - d0) / se = Inf, p-value = NA alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0 NaN sample estimates: Sorensen dissimilarity 1 attr(,"se") standard error 0 $cis $cis$atlas Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = 8.807, p-value = 1 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.000000 0.725535 sample estimates: Sorensen dissimilarity 0.6819085 attr(,"se") standard error 0.02696312 $cis$cangenes No test performed due not finite (d - d0) / se statistic data: tab (d - d0) / se = Inf, p-value = NA alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0 NaN sample estimates: Sorensen dissimilarity 1 attr(,"se") standard error 0 $miscellaneous $miscellaneous$atlas Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = -2.8406, p-value = 0.004 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.000000 0.418077 sample estimates: Sorensen dissimilarity 0.3802817 attr(,"se") standard error 0.02258792 $miscellaneous$cangenes No test performed due not finite (d - d0) / se statistic data: tab (d - d0) / se = Inf, p-value = NA alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0 NaN sample estimates: Sorensen dissimilarity 1 attr(,"se") standard error 0 $miscellaneous$cis Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = 2.5804, p-value = 0.9933 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.5953221 sample estimates: Sorensen dissimilarity 0.5364238 attr(,"se") standard error 0.03564549 $sanger $sanger$atlas Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = -3.8566, p-value = 3e-04 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3960626 sample estimates: Sorensen dissimilarity 0.3598775 attr(,"se") standard error 0.02192764 $sanger$cangenes No test performed due not finite (d - d0) / se statistic data: tab (d - d0) / se = Inf, p-value = NA alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0 NaN sample estimates: Sorensen dissimilarity 1 attr(,"se") standard error 0 $sanger$cis Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = 3.5799, p-value = 0.9996 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.6278561 sample estimates: Sorensen dissimilarity 0.5696203 attr(,"se") standard error 0.03496631 $sanger$miscellaneous Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = -4.3974, p-value = 2e-04 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3765829 sample estimates: Sorensen dissimilarity 0.3318584 attr(,"se") standard error 0.02560313 $Vogelstein $Vogelstein$atlas Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = -4.6585, p-value = 2e-04 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3809157 sample estimates: Sorensen dissimilarity 0.3452381 attr(,"se") standard error 0.02129595 $Vogelstein$cangenes No test performed due not finite (d - d0) / se statistic data: tab (d - d0) / se = Inf, p-value = NA alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0 NaN sample estimates: Sorensen dissimilarity 1 attr(,"se") standard error 0 $Vogelstein$cis Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = 4.4076, p-value = 1 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.6489965 sample estimates: Sorensen dissimilarity 0.5940299 attr(,"se") standard error 0.03393844 $Vogelstein$miscellaneous Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = -4.2339, p-value = 9.999e-05 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3798191 sample estimates: Sorensen dissimilarity 0.3375796 attr(,"se") standard error 0.02524032 $Vogelstein$sanger Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = -23.128, p-value = 9.999e-05 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.1312585 sample estimates: Sorensen dissimilarity 0.1051546 attr(,"se") standard error 0.01467036 $waldman $waldman$atlas Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = -9.3848, p-value = 9.999e-05 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3006583 sample estimates: Sorensen dissimilarity 0.2697095 attr(,"se") standard error 0.01861884 $waldman$cangenes No test performed due not finite (d - d0) / se statistic data: tab (d - d0) / se = Inf, p-value = NA alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0 NaN sample estimates: Sorensen dissimilarity 1 attr(,"se") standard error 0 $waldman$cis Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = 4.9573, p-value = 1 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.6525683 sample estimates: Sorensen dissimilarity 0.6010363 attr(,"se") standard error 0.03158842 $waldman$miscellaneous Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = -11.029, p-value = 9.999e-05 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.2577849 sample estimates: Sorensen dissimilarity 0.2222222 attr(,"se") standard error 0.02014852 $waldman$sanger Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = -5.1402, p-value = 9.999e-05 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3639666 sample estimates: Sorensen dissimilarity 0.3246269 attr(,"se") standard error 0.02330993 $waldman$Vogelstein Bootstrap test for 2x2 contingency tables based on the Sorensen-Dice dissimilarity (10000 bootstrap replicates) data: tab (d - d0) / se = -6.0739, p-value = 9.999e-05 alternative hypothesis: true equivalence limit d0 is less than 0.4444444 95 percent confidence interval: 0.0000000 0.3471766 sample estimates: Sorensen dissimilarity 0.3081081 attr(,"se") standard error 0.02244631 attr(,"class") [1] "equivSDhtestList" "list" > class(allBootTests) [1] "equivSDhtestList" "list" > getPvalue(allBootTests, simplify = FALSE) atlas cangenes cis miscellaneous sanger Vogelstein atlas 0.00000000 NaN 1.0000000 0.00399960 0.00029997 0.00019998 cangenes NaN 0 NaN NaN NaN NaN cis 1.00000000 NaN 0.0000000 0.99330067 0.99960004 1.00000000 miscellaneous 0.00399960 NaN 0.9933007 0.00000000 0.00019998 0.00009999 sanger 0.00029997 NaN 0.9996000 0.00019998 0.00000000 0.00009999 Vogelstein 0.00019998 NaN 1.0000000 0.00009999 0.00009999 0.00000000 waldman 0.00009999 NaN 1.0000000 0.00009999 0.00009999 0.00009999 waldman atlas 9.999e-05 cangenes NaN cis 1.000e+00 miscellaneous 9.999e-05 sanger 9.999e-05 Vogelstein 9.999e-05 waldman 0.000e+00 > getEffNboot(allBootTests) cangenes.atlas cis.atlas cis.cangenes NaN 10000 NaN miscellaneous.atlas miscellaneous.cangenes miscellaneous.cis 10000 NaN 10000 sanger.atlas sanger.cangenes sanger.cis 10000 NaN 10000 sanger.miscellaneous Vogelstein.atlas Vogelstein.cangenes 10000 10000 NaN Vogelstein.cis Vogelstein.miscellaneous Vogelstein.sanger 10000 10000 10000 waldman.atlas waldman.cangenes waldman.cis 10000 NaN 10000 waldman.miscellaneous waldman.sanger waldman.Vogelstein 10000 10000 10000 > > # To adjust for testing multiplicity: > p.adjust(getPvalue(allBootTests), method = "holm") cangenes.atlas.p-value cis.atlas.p-value NaN 1.00000000 cis.cangenes.p-value miscellaneous.atlas.p-value NaN 0.02399760 miscellaneous.cangenes.p-value miscellaneous.cis.p-value NaN 1.00000000 sanger.atlas.p-value sanger.cangenes.p-value 0.00209979 NaN sanger.cis.p-value sanger.miscellaneous.p-value 1.00000000 0.00179982 Vogelstein.atlas.p-value Vogelstein.cangenes.p-value 0.00179982 NaN Vogelstein.cis.p-value Vogelstein.miscellaneous.p-value 1.00000000 0.00149985 Vogelstein.sanger.p-value waldman.atlas.p-value 0.00149985 0.00149985 waldman.cangenes.p-value waldman.cis.p-value NaN 1.00000000 waldman.miscellaneous.p-value waldman.sanger.p-value 0.00149985 0.00149985 waldman.Vogelstein.p-value 0.00149985 > > # If only partial statistics are desired: > dSorensen(allTabsBP.4) atlas cangenes cis miscellaneous sanger Vogelstein atlas 0.0000000 1 0.6819085 0.3802817 0.3598775 0.3452381 cangenes 1.0000000 0 1.0000000 1.0000000 1.0000000 1.0000000 cis 0.6819085 1 0.0000000 0.5364238 0.5696203 0.5940299 miscellaneous 0.3802817 1 0.5364238 0.0000000 0.3318584 0.3375796 sanger 0.3598775 1 0.5696203 0.3318584 0.0000000 0.1051546 Vogelstein 0.3452381 1 0.5940299 0.3375796 0.1051546 0.0000000 waldman 0.2697095 1 0.6010363 0.2222222 0.3246269 0.3081081 waldman atlas 0.2697095 cangenes 1.0000000 cis 0.6010363 miscellaneous 0.2222222 sanger 0.3246269 Vogelstein 0.3081081 waldman 0.0000000 > duppSorensen(allTabsBP.4) atlas cangenes cis miscellaneous sanger Vogelstein atlas 0.0000000 NaN 0.7262589 0.4174355 0.3959452 0.3802668 cangenes NaN 0 NaN NaN NaN NaN cis 0.7262589 NaN 0.0000000 0.5950555 0.6271347 0.6498536 miscellaneous 0.4174355 NaN 0.5950555 0.0000000 0.3739718 0.3790962 sanger 0.3959452 NaN 0.6271347 0.3739718 0.0000000 0.1292852 Vogelstein 0.3802668 NaN 0.6498536 0.3790962 0.1292852 0.0000000 waldman 0.3003348 NaN 0.6529946 0.2553636 0.3629683 0.3450290 waldman atlas 0.3003348 cangenes NaN cis 0.6529946 miscellaneous 0.2553636 sanger 0.3629683 Vogelstein 0.3450290 waldman 0.0000000 > seSorensen(allTabsBP.4) atlas cangenes cis miscellaneous sanger atlas 0.00000000 0 0.02696312 0.02258792 0.02192764 cangenes 0.00000000 0 0.00000000 0.00000000 0.00000000 cis 0.02696312 0 0.00000000 0.03564549 0.03496631 miscellaneous 0.02258792 0 0.03564549 0.00000000 0.02560313 sanger 0.02192764 0 0.03496631 0.02560313 0.00000000 Vogelstein 0.02129595 0 0.03393844 0.02524032 0.01467036 waldman 0.01861884 0 0.03158842 0.02014852 0.02330993 Vogelstein waldman atlas 0.02129595 0.01861884 cangenes 0.00000000 0.00000000 cis 0.03393844 0.03158842 miscellaneous 0.02524032 0.02014852 sanger 0.01467036 0.02330993 Vogelstein 0.00000000 0.02244631 waldman 0.02244631 0.00000000 > > > # Tipically, in a real study it would be interesting to scan tests > # along some ontologies and levels inside these ontologies: > # (which obviously will be a quite slow process) > # gc() > # set.seed(123) > # allBootTests_BP_MF_lev4to8 <- allEquivTestSorensen(allOncoGeneLists, > # boot = TRUE, > # geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db", > # ontos = c("BP", "MF"), GOLevels = 4:8) > # getPvalue(allBootTests_BP_MF_lev4to8) > # getEffNboot(allBootTests_BP_MF_lev4to8) > > proc.time() user system elapsed 135.209 11.673 147.279
goSorensen.Rcheck/goSorensen-Ex.timings
name | user | system | elapsed | |
allBuildEnrichTable | 0.000 | 0.001 | 0.001 | |
allEquivTestSorensen | 0.304 | 0.029 | 0.336 | |
allHclustThreshold | 0.058 | 0.003 | 0.062 | |
allSorenThreshold | 0.056 | 0.005 | 0.061 | |
buildEnrichTable | 51.950 | 3.434 | 55.529 | |
dSorensen | 0.100 | 0.046 | 0.171 | |
duppSorensen | 0.133 | 0.035 | 0.173 | |
enrichedIn | 41.518 | 3.292 | 44.913 | |
equivTestSorensen | 0.423 | 0.014 | 0.437 | |
getDissimilarity | 0.252 | 0.098 | 0.360 | |
getEffNboot | 1.301 | 0.079 | 1.384 | |
getNboot | 1.301 | 0.044 | 1.350 | |
getPvalue | 0.244 | 0.090 | 0.342 | |
getSE | 0.242 | 0.091 | 0.344 | |
getTable | 0.269 | 0.122 | 0.399 | |
getUpper | 0.247 | 0.094 | 0.350 | |
hclustThreshold | 170.699 | 14.852 | 185.938 | |
nice2x2Table | 0.002 | 0.001 | 0.003 | |
seSorensen | 0.001 | 0.002 | 0.003 | |
sorenThreshold | 0.248 | 0.015 | 0.261 | |
upgrade | 0.545 | 0.191 | 0.746 | |