CHANGES IN VERSION 1.9.2 -------------------------- BUG FIXES: o Remove function overlap_GO because the VennDiagram package has issues with condition tests on vectors with length greater than 1 CHANGES IN VERSION 1.9.1 -------------------------- BUG FIXES: o Remove duplicated plot.title argument in ggplot call. GENERAL UPDATES: o Minor code cleaning. CHANGES IN VERSION 1.7.1 -------------------------- GENERAL UPDATES: o Updated email address. CHANGES IN VERSION 1.5.5 -------------------------- BUG FIX: o Function subEset also removes empty levels in factors not mentionned in the subset argument. Necessary for randomForest as the factor of interest cannot have empty levels. CHANGES IN VERSION 1.5.4 -------------------------- NEW FEATURES: o plot_design supports subset argument to restrict the plot to certain subsets of samples. CHANGES IN VERSION 1.5.3 -------------------------- GENERAL UPDATES: o Clarify that GOexpress in not a conventional gene set enrichment analysis tool. It is based on ranking, not enrichment. CHANGES IN VERSION 1.5.2 -------------------------- BUG FIX: o Rename "namespace" column to "namespace_1003"", not "namespace_1006". CHANGES IN VERSION 1.5.1 -------------------------- GENERAL UPDATES: o Updated maintainer email address. CHANGES IN VERSION 1.3.2 -------------------------- BUG FIX: o expression_plot() functions return error if an empty gene name is given CHANGES IN VERSION 1.3.1 -------------------------- BUG FIX: o heatmap_GO() function does not automatically resize margins when using either gene names or gene identifiers. This caused the user-defined margins to be systematically ignored. CHANGES IN VERSION 1.1.12 -------------------------- NEW FEATURES: o Summarisation function of scores and rank from feature-level to ontology-level also affects the calculation of P-values by pValue_GO. The default behaviour is to use the same function as used in the call to GO_analyse, stored in the result object. CHANGES IN VERSION 1.1.11 -------------------------- NEW FEATURES: o Summarisation function of scores and rank from feature-level to ontology-level can be overriden from the default "mean" to any function specified by the user. The function need to support a list of numeric values as an input, and return a unique numeric value as an output. CHANGES IN VERSION 1.1.10 -------------------------- BUG FIX: o margins argument in heatmap_GO() was not used in the call to heatmap.2() CHANGES IN VERSION 1.1.9 -------------------------- TYPOS: o Missing space character in error message. CHANGES IN VERSION 1.1.8 -------------------------- TYPOS: o Simone Coughlan, not Simone Coughland. Apologies! CHANGES IN VERSION 1.1.7 -------------------------- BUG FIXES: o Changed the rank of annotated features absent from the given ExpressionSet to (number of features in ExpressionSet + 1) instead of max(rank) + 1 to match the documentation, and desired behaviour. This follows the logic of giving the same rank R to N tied-scored features, while the next feature receives a rank of R + N. CHANGES IN VERSION 1.1.6 -------------------------- BUG FIXES: o overlap_GO() was crashing for 3-group Venn diagrams, except if the VennDiagram was loaded manually loaded in the workspace using libray(VennDiagram). The function will now run seemlessly without that manual step, as loading GOexpress will immediately load VennDiagram in the workspace (stated as a dependency in the DESCRIPTION file). NEW FEATURES: o New function pValue_GO() allows calculation of P-value for each ontology using permutation of genes labels. This allows users to estimate the chance of seeing a GO term reach a particular rank (or score). Features a fancy progres bar shamelessly adapted from StackOverflow. o heatmap_GO now semi-autmoatically resizes the bottom and right margins to accomodate large gene and sample labels, respectively. The user may control those margins using the "margins" argument of the function. o heatmap_GO default call now shows the gene feature identifier for those missing an annotated gene name, when gene names are requested (also the default). o A rank.by slot is now created by the GO_analyse() function in the result object to state the metric used to order the result tables. o a filters.GO slot stating the filters and cutoffs applied to the result object is now created or updated by successive uses of the subset_scores() function. Warnings and notes are displayed if conflicting filters and cutoffs are applied on a previously filtered result object. o rerank() function now supports re-ordering by P-value. Note that this is only applicable to the output of the pValue_GO() function mentioned above. o rerank() function now updates the rank.by slot of the result object to state the current ordering metric. o subset_scores() function now allows filtering by P-value. Note that this is only applicable to the output of the pValue_GO() function mentioned above. o Backward compatibility with Ensembl annotation releases 75 and earlier, which used 'external_gene_id', which was renamed to 'external_gene_name' in releases 76 and later. o table_genes() function defaults to sorting genes by decreasing score (equivalent to increasing rank). Gene feature name or identifier are supported alternative filters for sorting. UPDATED FEATURES: o Allow user to override row_labels in heatmap_GO. This way, the color-coding of the sample can be kept, while better description of the samples can be used to label them, instead of the phenodata values. o In heatmap_GO(), if the labRow argument is of length 1, it is assumed to be the name of a column in the phenoData slot. Useful to re-label subsetted ExpressionSet objects. GENERAL UPDATES: o Updated the AlvMac training dataset to include 'RPL36A' an example of multiple Ensembl gene identifier annotated to the same gene name. o Updated the AlvMac example custom annotations to match the updated dataset. o Updated the example AlvMac_results to match the updated dataset. o Set the random seed prior to running the GO_analyse() example in the vignette. Hopefully, this should allow reproducible testing by the users. o In User Guide, load package before loading the attached data. o In User Guide, new sections and examples dealing with the re-labelling of heatmap samples, the use of P-values, the re-ranking and subsetting of results using P-values. New sub-sections for clarity. Emphasis on the use and generation of local annotation, rather than use of current online Ensembl annotation release. o No more code connecting to the Ensembl server in any the help files and User Guide. o Help pages examples with more consistent indentation of code. CHANGES IN VERSION 1.1.5 -------------------------- BUG FIXES: o Forgot to commit image file of shiny screenshot in release 1.1.5 CHANGES IN VERSION 1.1.4 -------------------------- NEW FEATURES: o Custom annotations can be provided to the GO_analyse function using three new arguments: "GO_genes", "all_GO", and "all_genes". See below for individual description. o The GO_genes argument allows the user to provide associations between feature identifiers in the ExpressionSet and gene ontology identifiers. This will skip all calls to the Ensembl BioMart server. Consequently, we recommend the use of the "all_GO" and "all_genes" arguments whenever "GO_genes" is used. Otherwise, some downstream functions may not work. For instance, the expression_plot, expression_profiles, and heatmap_GO function can be used to generate plots,although lacking gene and gene ontology names. See below for more details. o The all_GO argument allows the user to provide the name and namespace ("biological_process", "molecular_function", or "cellular_component") corresponding to gene ontology identifiers. This enables subsequent filtering of result tables by namespace, and annotation of heatmaps with the name of the gene ontology. o The all_genes argument allows the user to provide the name and description corresponding to feature identifiers in the ExpressionSet. This enables annotation of expression plots with the gene name associated with the feature. UPDATED FEATURES: o In the GO_analyse function, the eSet argument is now formally checked to be of class ExpressionSet prior to any calculation. If not, the function returns an appropriate error message. o The error messages caused when the user gives a name that does not exist in the phenoData slot of the ExpressionSet were updated to use the word "column", rather than "factor". GENERAL UPDATES: o Updated the help page for function GO_analyse to describe the new features described above and provide a code example. o Used the BiocStyle package to format the vignette. o Added a new section in the vignette to document the new features described above. o Added a new section in the vignette to mention the creation of shiny applications using the output of GOexpress. Included a screenshot of a shiny application developed from the original AlvMac dataset. o Added a new section in the vignette to highlight the availability of a "subset" argument to avoid the need for additional ExpressionSet objects to analyse or visualise subsets of samples. CHANGES IN VERSION 1.1.3 -------------------------- NEW FEATURES: o Included private function used to generate the prefix2dataset table. o Included private function used to generate the microarray2dataset table. o GO_analyse states the number of gene features in the given dataset that were found in the BioMart dataset. This allows the user to interrupt the script if a suspiciously low number of mapped features suggests an incorrect BioMart dataset was used. UPDATED FEATURES: o Updated the prefix2dataset table. Three more species ("Chlorocebus sabaeus", "Papio anubis" , and "Poecilia formosa"), and two more columns ("species" and "sample"). o Updated the microarray2dataset table. Probeset patterns are now hard coded in the package, but dynamically defined as unique to a platform (or not) by the function building the microarray2dataset table. Species without microarray platforms were also included in the table with NA values. Code was added in the GO_analyse method to ignore those before trying to resolve the origin of the expression data, if not specified by the user. o Renamed column "prefix" to "pattern" in microarray2dataset table. o Disabled manual check of C. elegans and D. melanogaster, as the automated detection is doing the exact same thing. Only S. cerevisiae requires a manual check of the "Y" prefix instead of the automatically extracted "Y[:LETTERS:]{2}" pattern. o Methods expression_plot_symbol and expression_profiles_symbol now return the list of gene feature identifiers instead of NULL when multiple plots are produced. That may be confusing as they return the list of close matches when an invalid gene symbol was given, and they return the ggplot when obly one plot is produced. GENERAL UPDATES: o Code layout. Lines of code over 80 characters were split around brackets and commas to use the built-in indentation defaulted to 4 space characters. Plus, this made the code more readable in many cases. CHANGES IN VERSION 1.1.2 -------------------------- GENERAL UPDATES: o DESCRIPTION file minor correction: Inappropriate "Metagenomics" biocView removed. CHANGES IN VERSION 1.1.1 -------------------------- BUG FIXES: o Ensembl BioMart has changed the column 'external_gene_id' to 'external_gene_name'. Renamed my biomaRt queries accordingly to prevent GO_analyse() from crashing during the analysis step. o Updated the AlvMac_results variable containing sample results annotated with the deprecated 'external_gene_id'. It now includes the new 'external_gene_name'. GENERAL UPDATES: o DESCRIPTION file minor correction. o User Guide's chunk about installation using biocLite is not evaluated anymore, as advised by Dan Tenenbaum. CHANGES IN VERSION 0.99.17 -------------------------- BUG FIXES o NAs were introduced in the average score and rank of GO terms following analysis of microarray data (ANOVA and randomForest) for GO terms without associated features in the ExpressionSet. The problem was not found in any case for analysis of RNA-Seq data. This was causing issues during the subset_scores function and subsequent plots, such as main titles made of multiple lines and NAs. GO terms without associated probesets are now given average score of 0 and average rank equal to the maximum in the dataset plus 1. CHANGES IN VERSION 0.99.16 -------------------------- GENERAL UPDATES o Included co-authors who participated in the generation and analysis of data used to test the package. CHANGES IN VERSION 0.99.15 -------------------------- GENERAL UPDATES o Updated man pages GO_analyse to describe the recently added subset slot in the result variable. CHANGES IN VERSION 0.99.14 -------------------------- BUG FIXES o The anova method of the GO_analyse method was broken since the introduction of ExpressionSet, release 0.99.4. UPDATED FEATURES o The subset argument was also added to the GO_analyse method. This allows the identification of genes clustering a subset of groups, at a subset of time-points, etc. while plotting the expression profile of the entire dataset, if desired. GENERAL UPDATES o Updated man pages GO_analyse to allow only one example to be run. This saves time during CMD check, while making the man page more easily readable. CHANGES IN VERSION 0.99.13 -------------------------- UPDATED FEATURES o All expression plots were updated to a default ylim range corresponding to the minimum and maximum expression values found in the entire ExpressionSet. This is meant to avoid mis-interpretation of the amplitude of variation between sample groups, as suggested in the paper: Rougier, N.P., Droettboom, M., and Bourne, P.E. (2014). Ten simple rules for better figures. PLoS computational biology 10, e1003833. CHANGES IN VERSION 0.99.12 -------------------------- UPDATED FEATURES o All expression plots were given new arguments for more control: xlab allows users to change the default title for the X-axis, ylim allows users to override the lower and upper boundaries of the Y axis. GENERAL UPDATES o Updated man pages AlvMac, AlvMac_results, microarray2dataset, and prefix2dataset. Replaced the LaTeX describe statement by an itemise statement to make the man page more readable in a terminal window. o Moved functions between the post_analysis script and the toolkit script. From now on, only functions accessible to the users should be present in the post_analysis script, while toolkit should contain method called internally. o The User's Guide was updated to redirect the users to the new support site for Bioconductor rather than the bioc-devel mailing list. CHANGES IN VERSION 0.99.11 -------------------------- NEW FEATURES o The new subEset() method subsets an ExpressionSet given a list where item names are column names of the phenoData slots and item values are vectors of values corresponding to sample to retain (e.g. list(Time=c("2H", "6H")) will retain samples with value "2H" or "6H" in the "Time" column of the phenoData slot). UPDATED FEATURES o All relevant visualisation methods have been added an argument to subset samples to plot to those with a given set of values for a given column in the phenoData (uses the new function subEset described above). o The example analysis results was renamed from "raw_results" to "AlvMac_results", so that it can now loaded running data(AlvMac_results). The new name is meant to be more specific to the package. CHANGES IN VERSION 0.99.10 -------------------------- GENERAL UPDATES o Replaced \format by \value sections to the following man pages: AlvMac.Rd, microarray2dataset.Rd, prefix2dataset.Rd, raw_results.Rd to avoid having an empty value section, as recommended by the Bioconductor package tracker. Hopefully, it does not require a non-empty \format section as well... o Restricted lines to less than 80 characters and indentation by multiple of four space characters. CHANGES IN VERSION 0.99.9 ------------------------- UPDATED FEATURES o all post analysis functions were given more sanity checks to verify that the "result" argument contains the required slots of a GO_analyse() output and the the arguments pointing at a phenotypic data column are valid column names. BUG FIXES o expression_plot_symbol and expression_profiles symbol used the default value of col.palette and colourF instead of forwarding the user-defined one to the expression_plot and expression_profiles functions. GENERAL UPDATES o Added cross-reference in UsersGuide to point at examples of usages of factor and numeric values for the expression plots CHANGES IN VERSION 0.99.8 ------------------------- GENERAL UPDATES o Resaved R data files to reduced package disk size. No more WARNING in R CMD check. GENERAL UPDATES o Removed reference to GitHub in the README file. The weblink is given in the DESCRIPTION file anyway if users are interested. CHANGES IN VERSION 0.99.7 -------------------------- UPDATED FEATURES o fixed a typo in the code of heatmap_GO which made it crash for any other dataset than the example dataset. CHANGES IN VERSION 0.99.6 ------------------------- UPDATED FEATURES o expression_profiles_symbol() method was missing the "index" argument to select the feature identifier to plot alone. CHANGES IN VERSION 0.99.5 ------------------------- NEW FEATURES o expression_profiles() method plots the expression profiles of individual samples series, as opposed to grouped samples series handled by expression_plot functions. o expression_profiles_symbol() method plots the expression profiles of individual samples series using a gene name instead of an Ensembl gene identifier. UPDATED FEATURES o overlap_GO can print to screen, if filename argument is set to NULL (Default). o heatmap_GO, cluter_GO and plot_design can resize title font and wrap the text on multiple lines. o expression_plot and expression_plot_symbol can orient X axis labels at a given angle. o replaced return(NULL) statement by stop() when no close match is found to a gene name in the family of expression_plot functions. GENERAL UPDATES o User's Guide updated. o List of contributors updated in User's Guide and DESCRIPTION. CHANGES IN VERSION 0.99.4 ------------------------- UPDATED FEATURES o Use of ExpressionSet instead of numeric named matrix and AnnotatedDataFrame. Better consistency with other Bioconductor packages. GENERAL UPDATES o Implemented corrections requested following the Bioconductor review. Includes typos, consistent terminology through the package code and metafiles, additional information in help files, no reference to GitHub as an alternate installation option, use of arrow signs instead of equal signs for value assignment. o Restricted lines to 80 characters, and used 4-space tabulations. o Corrected out-of-date documentation. CHANGES IN VERSION 0.99.3 ------------------------- UPDATED FEATURES o Control the size of the legend text in the two expression plot figures. Updated help files accordingly. o Updated vignette with new section "Statistics". o Complete cleaning of code files for lines shorter than 80 columns. o Cleanup of help files for lines shorter than 80 columns. o Enabled filtering of raw results on the average score of a GO term. CHANGES IN VERSION 0.99.2 ------------------------- UPDATED FEATURES o Metadata lines in the preamble of the Sweave file CHANGES IN VERSION 0.99.1: UPDATED FEATURES o Sweave vignette implemented. o Replaced all message() statements by cat() to make Sweave output the full message in the vignette. o Updated a missed F into FALSE o Updated an invalid biocViews (typo) o Date field added for a proper citation() method. CHANGES IN VERSION 0.99.0 ------------------------- UPDATED FEATURES o Replaced all cat() statements by message() to match the Biocondcutor guidelines. CHANGES IN VERSION 0.6.2 ------------------------ UPDATED FEATURES o Fixed typo in the example of the vignette of GO_anova(). o Advice to use the subset_scores() function in the GO_anova() vignette. o Helpful warning message in the vignette of GO_anova() to make sure that the experimental factor to analyse is actually formatted as a factor in the true R language meaning. o Updated license to GPL (>= 3) instead of GPL-2. CHANGES IN VERSION 0.6.1 ------------------------ NEW FEATURES o GOexpress now implements the randomForest framework to estimate the importance of each gene on the clustering of the samples according to the different levels of the desired factor (See section UPDATED FEATURES below for more details). o rerank() function return a re-ordered version of the result variable given, ordered using either the rank or the score metrics. o Sample data raw_result was added to provide an example output of the analysis, and also allow to test the visualisation functions on a readily available variable instead of having to run the analysis function everytime (would have killed the R CMD check duration). UPDATED FEATURES o GO_analyse() default has been changed to a random forest statistical framework. This framework has various advantages over the previous ANOVA approach; namely 1) the bootstrapping and re-sampling of gene subsets to measure the importance of each gene in predicting the desired factor over many iterations (Default: 1,000 trees), 2) a shorter analysis duration due to the Fortran code underlying the randomForest package, 3) randomForest does not make assumptions on the distribution of the expression level within and between genes, 4) randomForest deals implicitely with interactions in multifactorial experiments. o Genes are now initially ranked according to their rank and GO terms according to the average rank of their associated genes. Genes associated with a GO term but absent from the dataset are assigned a value of max(rank)+1 and a score of 0. Ties between genes are resolved by giving all genes the minimal rank, the next gene being given rank+length(tied_genes). The average_rank metric is expected to be more robust than the average "score". Depending on the statistical framework used, "score" may mean "importance" (randomForest) or "F.value" (ANOVA). o More sanity checks at the start of some methods to ensure smooth execution of the downstream code and more helpful error messages. o More synonyms allowed for subsetting and filtering of the result variable. o list_genes() and table_genes() methods now allow to chose whether to return all feature identifiers associated with a GO term, or only those also present in the expression dataset. CHANGES IN VERSION 0.5.5 ------------------------ UPDATED FEATURES o heatmap_GO() default colorscale has been changed to blue-white-red better suited to the representation of expression level. The previous colorscale green-black-red is now suggested if differential expression data is used (e.g. log2FC). CHANGES IN VERSION 0.5.4 ------------------------ NEW FEATURES o Sample data accessible with the data() method. CHANGES IN VERSION 0.5.3 ------------------------ NEW FEATURES o Method overlap_GO() produces a Venn diagram showing the overlap of gene sets associated with two to five GO terms. The Venn diagram is saved in a TIFF image file. CHANGES IN VERSION 0.5.2 ------------------------ UPDATED FEATURES o Method heatmap_GO() updated to color-code the samples by levels of a factor, and a better colormap is used to represent the level of expression of genes. CHANGES IN VERSION 0.5.1 ------------------------ UPDATED FEATURES o Package renamed GOexpress after verifying that no other package on Bioconductor uses this name. o Method GO_anova() renamed to GO_analyse() as other metrics than ANOVA may be implemented in the future. CHANGES IN VERSION 0.4.1 ------------------------ NEW FEATURES o Supports microarray probeset identifiers as gene identifiers in the expression dataset. o Argument "result=" is no more optional in all the post-analysis functions. UPDATED FEATURES o expression_ploto() functions now dynamically adapt the colormap to the number of groups of samples instead of three hard-coded colors used for the colormap. CHANGES IN VERSION 0.3.1 ------------------------ NEW FEATURES o expression_plot() plots the expression profile of the gene corresponding to an Ensembl identifier, given valid variable name for the X-axis and a grouping factor for the Y-axis. o expression_plot_symbol() plots the expression profile of the gene(s) with the Ensembl identifier(s) corresponding to a gene symbol, given valid variable name for the X-axis and a grouping factor for the Y-axis. o plot_design() plots the univariate effect of each level of each factor available in the AnnotatedDataFrame on the expression levels of genes associated with a GO term. UPDATED FEATURES o The scoring function now uses the total count of genes associated with a GOterm instead of the count of genes in the dataset and associated with the GO term. Genes absent from the expression dataset are assigned a F.value of 0, similarly to the genes with a non- significant ANOVA result. This was not found to significantly impact the ranking of GO terms, but is a more objective scoring of the GO terms if the expression data was generated through objective filtering (e.g. filtering out lowly expressed genes). o subset_scores() now also filters the $mapping and $anova slots to retain only mappings and gene information related to the GO terms left in the $scores slot after filtering. This reduces significantly the memory space of the filtered variable, while the object containing the raw results of GO_anova() will still contain the full information fetched from the Ensembl BioMart server if the user wants it. o ggplot2 and grid are now required for the proper installation of anovaGO, even though these packages are only required for a subset of non-essential features in anovaGO. This way, all features in anovaGO will be available as soon as the package is installed. o Messages printed to the user during the executing of the funcions were made as clear as possible, particularly when fetching information from the Ensembl BioMart server. CHANGES IN VERSION 0.2 ---------------------- UPDATED FEATURES o subset_scores() can now filter and keep only GO terms of a given type, i.e. "Biological process", "Molecular function", or "Cellular Component". CHANGES IN VERSION 0.1 ---------------------- OVERVIEW o This package was designed for the analysis of bioinformatics-related data based on gene expression measurements. It requires 3 input values: (1) a gene-by-sample table providing the expression level of genes (rows) in each sample (columns), (2) an AnnotatedDataFrame from the Biobase package providing phenotypic information about the samples grouping them by levels of the factor, (3) the name of the grouping factor to investigate, which must be a valid column name in the AnnotatedDataFrame. o The analysis will identify all Gene Ontology (GO) terms represented by at least one gene in the expression dataset. A one-way ANOVA will be performed on the grouping factor for each gene present in the expression dataset. Following multiple-testing correction, genes below the threshold for significance will be assigned an F.value of 0. GO terms will be scored and ranked on the average F.value of associated genes. o Functions are provided to investigate and visualise the results of the above analysis. The score table can be filtered for rows over given thresholds. The distribution of scores can be visualised. The quantiles of scores can be obtained. The genes associated with a given GO term can be listed, with or without descriptive information. Hierarchical clustering of the samples can be performed based on the expression levels of genes associated with a given GO term. Heatmaps accompanied by hierarchical clustering of samples and genes can be drawn and customised. FEATURES o GO_anova() scores all Gene Ontology (GO) terms represented in the dataset based on the ability of their associated genes to cluster samples according to a predefined grouping factor. It also returns the table used to map genes to GO terms, the table summarising the one-way ANOVA results for each gene, and finally the predefined grouping factor used for ANOVA. Genes annotated to a GO term but absent from the expression dataset are ignored. o get_mart_dataset() returns a connection to the appropriate BioMart dataset of the Ensembl server based on the gene name of the first gene in the expression dataset. The choice of the dataset can be overriden by the user if a valid BioMart Ensembl dataset is specified. o subset_scores() filters the table of scored GO terms in the output of GO_anova() and returns a list formatted identically to the output of GO_anova() with the resulting filtered table of scores. o hist_scores() plots the distribution of average F scores in the output of GO_anova() or subset_scores(). o quantiles_scores() returns the quantile values corresponding to defined percentiles. o list_genes() returns the list of Ensembl gene identifiers associated with a given GO term. o table_genes() returns a table of information about the Ensembl gene identifiers associated with a given GO term. o cluster_GO() plots a hierarchical clustering of the samples based on the expression levels of genes associated with a given GO term. o heatmap_GO() plots a heatmap with hierarchical clustering of the samples and genes based on the expression levels of genes associated with a given GO term.