%\VignetteEngine{knitr::knitr} \documentclass[a4paper, 9pt]{article} <>= BiocStyle::latex() @ %\VignetteIndexEntry{OncoScore} \usepackage{hyperref} \usepackage{amsmath, amsthm, amssymb} \usepackage[utf8]{inputenc} \usepackage{graphicx} \usepackage{tcolorbox} \usepackage{authblk} \newcommand{\OncoScore}{\textsc{OncoScore}} \begin{document} \title{Using the \OncoScore{} package} \author[1]{Luca De Sano} \author[2]{Carlo Gambacorti Passerini} \author[2]{Rocco Piazza} \author[1,3]{Daniele Ramazzotti} \author[2]{Roberta Spinelli} \affil[1]{Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi Milano Bicocca Milano, Italy.} \affil[2]{Dipartimento di Medicina e Chirurgia, Università degli Studi Milano Bicocca Milano, Italy.} \affil[3]{Stanford University, Stanford, California, USA} \date{\today} \maketitle \begin{tcolorbox}{\bf Overview.} \OncoScore{} is a tool to measure the association of genes to cancer based on citation frequency in biomedical literature. The score is evaluated from PubMed literature by dynamically updatable web queries. \vspace{1.0cm} {\em In this vignette, we give an overview of the package by presenting some of its main functions.} \vspace{1.0cm} \renewcommand{\arraystretch}{1.5} \end{tcolorbox} \vspace{1.0cm} \renewcommand{\arraystretch}{2} \begin{center} \begin{tabular}{l | p{6.0cm} | l | p{6.0cm}} {\bf Acronym} & {\bf Extended name} & {\bf Reference}\\ \hline OncoScore & OncoScore: a novel, Internet-based tool to assess the oncogenic potential of genes & \href{https://www.nature.com/articles/srep46290}{Publication}. \\ \hline \end{tabular} \end{center} \vspace{1.0cm} The \OncoScore{} analysis consists of two parts. One can estimate a score to asses the oncogenic potential of a set of genes, given the lecterature knowledge, at the time of the analysis, or one can study the trend of such score over time. We next present the two analysis and we conclude with showing the capabilities of the tool to visualize the results. \paragraph{\large Requirements. } First we load the library. <<>>= library(OncoScore) @ \paragraph{\large OncoScore analysis.} The query that we show next retrieves from PubMed the citations, at the time of the query, for a list of genes in cancer related and in all the documents. <>= query = perform.query(c("ASXL1","IDH1","IDH2","SETBP1","TET2")) @ \begin{verbatim} ### Starting the queries for the selected genes. ### Performing queries for cancer literature Number of papers found in PubMed for ASXL1 was: 309 Number of papers found in PubMed for IDH1 was: 1450 Number of papers found in PubMed for IDH2 was: 557 Number of papers found in PubMed for SETBP1 was: 69 Number of papers found in PubMed for TET2 was: 560 ### Performing queries for all the literature Number of papers found in PubMed for ASXL1 was: 350 Number of papers found in PubMed for IDH1 was: 1581 Number of papers found in PubMed for IDH2 was: 648 Number of papers found in PubMed for SETBP1 was: 89 Number of papers found in PubMed for TET2 was: 717 \end{verbatim} <>= data(query) @ OncoScore provides a function to merge gene names if requested by the user. This function is useful when there are aliases in the gene list. <<>>= combine.query.results(query, c('IDH1', 'IDH2'), 'new_gene') @ OncoScore also provides a function to retireve the names of the genes in a given portion of a chromosome that can be exploited if we are dealing, e.g., with copy number alterations hitting regions rather than specific genes. <>= chr13 = get.genes.from.biomart(chromosome=13,start=54700000,end=72800000) head(chr13) @ \begin{verbatim} [1] "LINC00374" "RNA5SP30" "RNU7-87P" "HNF4GP1" "RN7SL375P" "BORA" \end{verbatim} Furthermore, one can also automatically perform the OncoScore analysis on chromosomic regions as follows: <>= result = compute.oncoscore.from.region(10, 100000, 500000) @ \begin{verbatim} ### Performing query on BioMart ### Performing web query on: RNA5SP297 RNA5SP298 RN7SL754P ZMYND11 DIP2C ### Starting the queries for the selected genes. ### Performing queries for cancer literature Number of papers found in PubMed for RNA5SP297 was: -1 Number of papers found in PubMed for RNA5SP298 was: -1 Number of papers found in PubMed for RN7SL754P was: -1 Number of papers found in PubMed for ZMYND11 was: 27 Number of papers found in PubMed for DIP2C was: 1 ### Performing queries for all the literature Number of papers found in PubMed for RNA5SP297 was: -1 Number of papers found in PubMed for RNA5SP298 was: -1 Number of papers found in PubMed for RN7SL754P was: -1 Number of papers found in PubMed for ZMYND11 was: 45 Number of papers found in PubMed for DIP2C was: 3 ### Processing data ### Computing frequencies scores ### Estimating oncogenes ### Results: RNA5SP297 -> 0 RNA5SP298 -> 0 RN7SL754P -> 0 ZMYND11 -> 49.07473 DIP2C -> 12.30234 \end{verbatim} We now compute a score for each of the genes, to estimate their oncogenic potential. <<>>= result = compute.oncoscore(query) @ \paragraph{\large OncoScore timeline analysis.} The query that we show next retrieves from PubMed the citations, at specified time points, for a list of genes in cancer related and in all the documents. <>= query.timepoints = perform.query.timeseries(c("ASXL1","IDH1","IDH2","SETBP1","TET2"), c("2012/03/01", "2013/03/01", "2014/03/01", "2015/03/01", "2016/03/01")) @ \begin{verbatim} ### Starting the queries for the selected genes. ### Quering PubMed for timepoint 2012/03/01 ### Performing queries for cancer literature Number of papers found in PubMed for ASXL1 was: 83 Number of papers found in PubMed for IDH1 was: 408 Number of papers found in PubMed for IDH2 was: 172 Number of papers found in PubMed for SETBP1 was: 5 Number of papers found in PubMed for TET2 was: 169 ### Performing queries for all the literature Number of papers found in PubMed for ASXL1 was: 91 Number of papers found in PubMed for IDH1 was: 488 Number of papers found in PubMed for IDH2 was: 234 Number of papers found in PubMed for SETBP1 was: 10 Number of papers found in PubMed for TET2 was: 196 ### Quering PubMed for timepoint 2013/03/01 ### Performing queries for cancer literature Number of papers found in PubMed for ASXL1 was: 132 Number of papers found in PubMed for IDH1 was: 662 Number of papers found in PubMed for IDH2 was: 267 Number of papers found in PubMed for SETBP1 was: 11 Number of papers found in PubMed for TET2 was: 254 ### Performing queries for all the literature Number of papers found in PubMed for ASXL1 was: 149 Number of papers found in PubMed for IDH1 was: 753 Number of papers found in PubMed for IDH2 was: 336 Number of papers found in PubMed for SETBP1 was: 18 Number of papers found in PubMed for TET2 was: 302 ### Quering PubMed for timepoint 2014/03/01 ### Performing queries for cancer literature Number of papers found in PubMed for ASXL1 was: 185 Number of papers found in PubMed for IDH1 was: 903 Number of papers found in PubMed for IDH2 was: 364 Number of papers found in PubMed for SETBP1 was: 29 Number of papers found in PubMed for TET2 was: 342 ### Performing queries for all the literature Number of papers found in PubMed for ASXL1 was: 208 Number of papers found in PubMed for IDH1 was: 1002 Number of papers found in PubMed for IDH2 was: 439 Number of papers found in PubMed for SETBP1 was: 36 Number of papers found in PubMed for TET2 was: 430 ### Quering PubMed for timepoint 2015/03/01 ### Performing queries for cancer literature Number of papers found in PubMed for ASXL1 was: 250 Number of papers found in PubMed for IDH1 was: 1188 Number of papers found in PubMed for IDH2 was: 467 Number of papers found in PubMed for SETBP1 was: 49 Number of papers found in PubMed for TET2 was: 452 ### Performing queries for all the literature Number of papers found in PubMed for ASXL1 was: 283 Number of papers found in PubMed for IDH1 was: 1300 Number of papers found in PubMed for IDH2 was: 550 Number of papers found in PubMed for SETBP1 was: 64 Number of papers found in PubMed for TET2 was: 576 ### Quering PubMed for timepoint 2016/03/01 ### Performing queries for cancer literature Number of papers found in PubMed for ASXL1 was: 309 Number of papers found in PubMed for IDH1 was: 1446 Number of papers found in PubMed for IDH2 was: 557 Number of papers found in PubMed for SETBP1 was: 69 Number of papers found in PubMed for TET2 was: 558 ### Performing queries for all the literature Number of papers found in PubMed for ASXL1 was: 350 Number of papers found in PubMed for IDH1 was: 1576 Number of papers found in PubMed for IDH2 was: 648 Number of papers found in PubMed for SETBP1 was: 89 Number of papers found in PubMed for TET2 was: 715 \end{verbatim} <>= data(query.timepoints) @ We now compute a score for each of the genes, to estimate their oncogenic potential at specified time points. <<>>= result.timeseries = compute.oncoscore.timeseries(query.timepoints) @ \paragraph{\large Visualization of the results.} We next plot the scores measuring the oncogenetic potential of the considered genes as a barplot. <>= plot.oncoscore(result, col = 'darkblue') @ \incfig[ht]{figure/image-oncoscore-1}{0.5\textwidth}{Oncogenetic potential of the considered genes.}{} We finally plot the trend of the scores over the considered times as absolute and values and as variations. <>= plot.oncoscore.timeseries(result.timeseries) @ \incfig[ht]{figure/image-timeseries-1}{0.6\textwidth}{Absolute values of the oncogenetic potential of the considered genes over times.}{} <>= plot.oncoscore.timeseries(result.timeseries, incremental = TRUE, ylab='absolute variation') @ \incfig[ht]{figure/image-oncoscore-timeseries-1}{0.6\textwidth}{Variations of the oncogenetic potential of the considered genes over times.}{} <>= plot.oncoscore.timeseries(result.timeseries, incremental = TRUE, relative = TRUE, ylab='relative variation') @ \incfig[ht]{figure/image-oncoscore-timeseries-incremental-1}{0.6\textwidth}{Variations as relative values of the oncogenetic potential of the considered genes over times.}{} \paragraph{\Rcode{sessionInfo()}} <>= toLatex(sessionInfo()) @ \end{document}