This package provides long description of genes collected from the RefSeq database. The text in "COMMENT" section started with "Summary:" is extracted as the description of the gene, e.g. in the following example:
LOCUS NM_012363 936 bp mRNA linear PRI 12-FEB-2021
DEFINITION Homo sapiens olfactory receptor family 1 subfamily N member 1
(OR1N1), mRNA.
ACCESSION NM_012363 XM_071152
VERSION NM_012363.1
KEYWORDS RefSeq; MANE Select.
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 936)
AUTHORS Malnic B, Godfrey PA and Buck LB.
TITLE The human olfactory receptor gene family
JOURNAL Proc Natl Acad Sci U S A 101 (8), 2584-2589 (2004)
PUBMED 14983052
REMARK Erratum:[Proc Natl Acad Sci U S A. 2004 May 4;101(18):7205]
REFERENCE 2 (bases 1 to 936)
AUTHORS Fuchs T, Malecova B, Linhart C, Sharan R, Khen M, Herwig R,
Shmulevich D, Elkon R, Steinfath M, O'Brien JK, Radelof U, Lehrach
H, Lancet D and Shamir R.
TITLE DEFOG: a practical scheme for deciphering families of genes
JOURNAL Genomics 80 (3), 295-302 (2002)
PUBMED 12213199
REFERENCE 3 (bases 1 to 936)
AUTHORS Rouquier S, Taviaux S, Trask BJ, Brand-Arpon V, van den Engh G,
Demaille J and Giorgi D.
TITLE Distribution of olfactory receptor genes in the human genome
JOURNAL Nat Genet 18 (3), 243-250 (1998)
PUBMED 9500546
REMARK Erratum:[Nat Genet 1998 May;19(1):102]
COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The
reference sequence was derived from AL359636.17.
On Apr 5, 2004 this sequence version replaced XM_071152.1.
Summary: Olfactory receptors interact with odorant molecules in the
nose, to initiate a neuronal response that triggers the perception
of a smell. The olfactory receptor proteins are members of a large
family of G-protein-coupled receptors (GPCR) arising from single
coding-exon genes. Olfactory receptors share a 7-transmembrane
domain structure with many neurotransmitter and hormone receptors
and are responsible for the recognition and G protein-mediated
transduction of odorant signals. The olfactory receptor gene family
is the largest in the genome. The nomenclature assigned to the
olfactory receptor genes and proteins for this organism is
independent of other organisms. [provided by RefSeq, Jul 2008].
##RefSeq-Attributes-START##
MANE Ensembl match :: ENST00000304880.2/ ENSP00000306974.2
RefSeq Select criteria :: based on single protein-coding transcript
##RefSeq-Attributes-END##
Function loadGeneSummary()
extracts the gene summary table. Specifying the organism
argument with the full name or the corresponding taxon ID returns a table of genes and their summaries:
library(GeneSummary)
tb = loadGeneSummary(organism = 9606)
# # or use the full organism name
# tb = loadGeneSummary(organism = "Homo sapiens")
dim(tb)
## [1] 50550 6
head(tb)
## RefSeq_accession Organism Taxon_ID Gene_ID Review_status
## 1 NM_001368885.1 Homo sapiens 9606 1305 REVIEWED REFSEQ
## 2 NM_001368886.1 Homo sapiens 9606 1305 REVIEWED REFSEQ
## 3 NR_148047.2 Homo sapiens 9606 6867 REVIEWED REFSEQ
## 4 NR_148053.2 Homo sapiens 9606 6867 REVIEWED REFSEQ
## 5 NM_001374457.1 Homo sapiens 9606 6597 REVIEWED REFSEQ
## 6 NR_148052.2 Homo sapiens 9606 6867 REVIEWED REFSEQ
## Gene_summary
## 1 This gene encodes the alpha chain of one of the nonfibrillar collagens. The function of this gene product is not known, however, it has been detected at low levels in all connective tissue-producing cells so it may serve a general function in connective tissues. Unlike most of the collagens, which are secreted into the extracellular matrix, collagen XIII contains a transmembrane domain and the protein has been localized to the plasma membrane. The transcripts for this gene undergo complex and extensive splicing involving at least eight exons. Like other collagens, collagen XIII is a trimer; it is not known whether this trimer is composed of one or more than one alpha chain isomer. A number of alternatively spliced transcript variants have been described, but the full length nature of some of them has not been determined.
## 2 This gene encodes the alpha chain of one of the nonfibrillar collagens. The function of this gene product is not known, however, it has been detected at low levels in all connective tissue-producing cells so it may serve a general function in connective tissues. Unlike most of the collagens, which are secreted into the extracellular matrix, collagen XIII contains a transmembrane domain and the protein has been localized to the plasma membrane. The transcripts for this gene undergo complex and extensive splicing involving at least eight exons. Like other collagens, collagen XIII is a trimer; it is not known whether this trimer is composed of one or more than one alpha chain isomer. A number of alternatively spliced transcript variants have been described, but the full length nature of some of them has not been determined.
## 3 This locus may represent a breast cancer candidate gene. It is located close to FGFR1 on a region of chromosome 8 that is amplified in some breast cancers. Several transcript variants encoding different isoforms have been found for this gene.
## 4 This locus may represent a breast cancer candidate gene. It is located close to FGFR1 on a region of chromosome 8 that is amplified in some breast cancers. Several transcript variants encoding different isoforms have been found for this gene.
## 5 The protein encoded by this gene is a member of the SWI/SNF family of proteins and is similar to the brahma protein of Drosophila. Members of this family have helicase and ATPase activities and are thought to regulate transcription of certain genes by altering the chromatin structure around those genes. The encoded protein is part of the large ATP-dependent chromatin remodeling complex SNF/SWI, which is required for transcriptional activation of genes normally repressed by chromatin. In addition, this protein can bind BRCA1, as well as regulate the expression of the tumorigenic protein CD44. Mutations in this gene cause rhabdoid tumor predisposition syndrome type 2. Multiple transcript variants encoding different isoforms have been found for this gene.
## 6 This locus may represent a breast cancer candidate gene. It is located close to FGFR1 on a region of chromosome 8 that is amplified in some breast cancers. Several transcript variants encoding different isoforms have been found for this gene.
Setting organism
to NULL
returns a table of all organisms.
tb = loadGeneSummary(organism = NULL)
sort(table(tb$Organism))
##
## Aotus nancymaae Aplysia californica
## 1 1
## Bison bison bison Callorhinchus milii
## 1 1
## Macaca nemestrina Mandrillus leucophaeus
## 1 1
## Rhinopithecus roxellana Aedes albopictus
## 1 2
## Anas platyrhynchos Cercocebus atys
## 2 2
## Chelonia mydas Colobus angolensis palliatus
## 2 2
## Crassostrea gigas Geospiza fortis
## 2 2
## Latimeria chalumnae Loxodonta africana
## 2 2
## Melopsittacus undulatus Nannospalax galili
## 2 2
## Python bivittatus Alligator sinensis
## 2 3
## Amphimedon queenslandica Chlorocebus sabaeus
## 3 3
## Columba livia Falco cherrug
## 3 3
## Falco peregrinus Oncorhynchus mykiss
## 3 3
## Orycteropus afer afer Pelodiscus sinensis
## 3 3
## Salmo salar Zonotrichia albicollis
## 3 3
## Alligator mississippiensis Bos mutus
## 4 4
## Ficedula albicollis Meleagris gallopavo
## 4 4
## Myotis brandtii Myotis davidii
## 4 4
## Pseudopodoces humilis Ailuropoda melanoleuca
## 4 5
## Astyanax mexicanus Balaenoptera acutorostrata scammoni
## 5 5
## Camelus ferus Elephantulus edwardii
## 5 5
## Panthera tigris altaica Poecilia formosa
## 5 5
## Chrysemys picta Heterocephalus glaber
## 6 6
## Otolemur garnettii Physeter catodon
## 6 6
## Saimiri boliviensis Sorex araneus
## 6 6
## Cavia porcellus Chinchilla lanigera
## 7 7
## Dasypus novemcinctus Leptonychotes weddellii
## 7 7
## Myotis lucifugus Octodon degus
## 7 7
## Tursiops truncatus Ceratotherium simum simum
## 7 8
## Condylura cristata Echinops telfairi
## 8 8
## Erinaceus europaeus Jaculus jaculus
## 8 8
## Mesocricetus auratus Mustela putorius furo
## 8 8
## Ochotona princeps Pteropus alecto
## 8 8
## Vicugna pacos Chrysochloris asiatica
## 8 9
## Felis catus Ictidomys tridecemlineatus
## 9 9
## Lipotes vexillifer Odobenus rosmarus divergens
## 9 9
## Orcinus orca Trichechus manatus latirostris
## 9 9
## Hydra vulgaris Microtus ochrogaster
## 10 10
## Papio anubis Bubalus bubalis
## 10 11
## Macaca fascicularis Nomascus leucogenys
## 11 11
## Peromyscus maniculatus bairdii Pongo abelii
## 11 14
## Callithrix jacchus Strongylocentrotus purpuratus
## 15 64
## Sarcophilus harrisii Xenopus laevis
## 65 84
## Brassica rapa Saccoglossus kowalevskii
## 89 90
## Cucumis melo Ovis aries
## 104 115
## Acyrthosiphon pisum Malus domestica
## 125 130
## Takifugu rubripes Citrus sinensis
## 140 146
## Solanum lycopersicum Vitis vinifera
## 152 156
## Oryzias latipes Zea mays
## 161 166
## Pan paniscus Tupaia chinensis
## 179 184
## Solanum tuberosum Cricetulus griseus
## 215 236
## Xenopus tropicalis Taeniopygia guttata
## 240 248
## Apis mellifera Capra hircus
## 254 277
## Anolis carolinensis Brachypodium distachyon
## 293 312
## Oryctolagus cuniculus Ciona intestinalis
## 319 331
## Nasonia vitripennis Tribolium castaneum
## 332 333
## Gorilla gorilla Ornithorhynchus anatinus
## 374 396
## Sus scrofa Bombyx mori
## 401 423
## Danio rerio Eptesicus fuscus
## 440 494
## Glycine max Macaca mulatta
## 670 677
## Pan troglodytes Monodelphis domestica
## 680 685
## Gallus gallus Canis lupus familiaris
## 956 1085
## Equus caballus Bos taurus
## 1463 1966
## Mus musculus Rattus norvegicus
## 5722 7893
## Homo sapiens
## 50550
sort(table(tb$Review_status))
##
## PREDICTED REFSEQ INFERRED REFSEQ VALIDATED REFSEQ PROVISIONAL REFSEQ
## 4 2350 8570 20727
## REVIEWED REFSEQ
## 49222
A specific status can be set via argument status
, e.g. only to "reviewed"
:
tb = loadGeneSummary(organism = NULL, status = "reviewed")
sort(table(tb$Review_status))
## REVIEWED REFSEQ
## 49222
sessionInfo()
## R version 4.1.1 Patched (2021-09-10 r80880)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /home/shepherd/R-Installs/bin/R-4-1-branch/lib/libRblas.so
## LAPACK: /home/shepherd/R-Installs/bin/R-4-1-branch/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] GeneSummary_0.99.3
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.27 R6_2.5.1 jsonlite_1.7.2 magrittr_2.0.1
## [5] evaluate_0.14 rlang_0.4.11 stringi_1.7.4 jquerylib_0.1.4
## [9] bslib_0.3.0 rmarkdown_2.11 tools_4.1.1 stringr_1.4.0
## [13] xfun_0.26 yaml_2.2.1 fastmap_1.1.0 compiler_4.1.1
## [17] htmltools_0.5.2 knitr_1.34 sass_0.4.0