RTCGA
package to download mRNA data that are included in RTCGA.mRNA
packageThe Cancer Genome Atlas (TCGA) Data Portal provides a platform for researchers to search, download, and analyze data sets generated by TCGA. It contains clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes. The key is to understand genomics to improve cancer care.
RTCGA
package offers download and integration of the variety and volume of TCGA data using patient barcode key, what enables easier data possession. This may have a benefcial infuence on development of science and improvement of patients’ treatment. RTCGA
is an open-source R package, available to download from Bioconductor
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("RTCGA")
or use below code to download the development version which is like to be more bug-free than the release version on Bioconductor:
if (!require(devtools)) {
install.packages("devtools")
require(devtools)
}
install_github("RTCGA/RTCGA")
Furthermore, RTCGA
package transforms TCGA data into form which is convenient to use in R statistical package. Those data transformations can be a part of statistical analysis pipeline which can be more reproducible with RTCGA
.
Use cases and examples are shown in RTCGA
packages vignettes:
There are many available date times of TCGA data releases. To see them all just type:
Version 1.0 of RTCGA.mRNA
package contains mRNA datasets which were released 2015-11-01
. They were downloaded in the following way (which is mainly copied from http://rtcga.github.io/RTCGA/:
All cohort names can be checked using:
For all cohorts the following code downloads the mRNA data.
# dir.create( "data2" )
releaseDate <- "2015-11-01"
sapply( cohorts, function(element){
tryCatch({
downloadTCGA( cancerTypes = element,
dataSet = "Merge_transcriptome__agilentg4502a_07_3__unc_edu__Level_3__unc_lowess_normalization_gene_level__data.Level_3",
destDir = "data2",
date = releaseDate )},
error = function(cond){
cat("Error: Maybe there weren't mutations data for ", element, " cancer.\n")
}
)
})
NA
files from data2There were not mRNA data for these cohorts.
Below is the code that removes unneeded “MANIFEST.txt” file from each mRNA cohort folder.
list.files( "data2") %>%
file.path( "data2", .) %>%
sapply(function(x){
file.path(x, list.files(x)) %>%
grep(pattern = "MANIFEST.txt", x = ., value=TRUE) %>%
file.remove()
})
Below is the code that automatically gives the path to files for all available mRNA cohorts types downloaded to data2
folder.
readTCGA
Because of the fact that mRNA data are transposed in downloaded files, there has been prepared special function readTCGA
to read and transpose data automatically. Code is below
RTCGA.mRNA
packagegrep( "mRNA", ls(), value = TRUE) %>%
grep("path", x=., value = TRUE, invert = TRUE) %>%
cat( sep="," ) #can one to id better? as from use_data documentation:
# ... Unquoted names of existing objects to save
devtools::use_data(BRCA.mRNA,COAD.mRNA,COADREAD.mRNA,GBMLGG.mRNA,
KIPAN.mRNA,KIRC.mRNA,KIRP.mRNA,LGG.mRNA,LUAD.mRNA,
LUSC.mRNA,OV.mRNA,READ.mRNA,UCEC.mRNA,
overwrite = TRUE,
compress="xz")