SomaScan.db 0.99.7
The SomaScan.db
package provides extended biological annotations to be used
in conjunction with the results of SomaLogic’s SomaScan assay, a
fee-for-service proteomic technology platform designed to detect proteins
across numerous biological pathways.
This vignette describes how to use the SomaScan.db
package to annotate SomaScan data, i.e. add additional information to an ADAT
file that will give biological context (at the gene level) to the platform’s
reagents and their protein targets. SomaScan.db
performs annotation by
mapping SomaScan reagent IDs (SeqIds
) to their corresponding protein(s) and
gene(s), as well as biological pathways (GO, KEGG, etc.) and identifiers from
other public data repositories.
SomaScan.db
utilizes the same methods and setup as other Bioconductor
annotation packages, and therefore the methods should be familiar if you’ve
worked with such packages previously.
To begin, install and load the SomaScan.db
package:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("SomaScan.db", version = remotes::bioc_version())
Once installed, the package can be loaded as follows:
library(SomaScan.db)
## Loading required package: AnnotationDbi
## Loading required package: stats4
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
## lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
## pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
## tapply, union, unique, unsplit, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: IRanges
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## I, expand.grid, unname
##
Loading this package will expose an annotation object with the same name
as the package, SomaScan.db
. This object is a SQLite database
containing annotation data for the SomaScan assay derived from popular public
repositories. Viewing the object will present a metadata table containing
information about the annotations and where they were obtained:
SomaScan.db
## SomaDb object:
## | DBSCHEMAVERSION: 2.1
## | Db type: ChipDb
## | Supporting package: SomaScan.db
## | DBSCHEMA: HUMANCHIP_DB
## | ORGANISM: Homo sapiens
## | SPECIES: Human
## | MANUFACTURER: SomaLogic
## | CHIPNAME: SomaScan
## | MANUFACTURERURL: https://somalogic.com/somascan-platform/
## | EGSOURCEDATE: 2022-Sep12
## | EGSOURCENAME: Entrez Gene
## | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
## | CENTRALID: ENTREZID
## | TAXID: 9606
## | GOSOURCENAME: Gene Ontology
## | GOSOURCEURL: http://current.geneontology.org/ontology/go-basic.obo
## | GOSOURCEDATE: 2022-07-01
## | GOEGSOURCEDATE: 2022-Sep12
## | GOEGSOURCENAME: Entrez Gene
## | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
## | KEGGSOURCENAME: KEGG GENOME
## | KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
## | KEGGSOURCEDATE: 2011-Mar15
## | GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)
## | GPSOURCEURL:
## | GPSOURCEDATE: 2022-Aug31
## | ENSOURCEDATE: 2022-Jun28
## | ENSOURCENAME: Ensembl
## | ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
## | UPSOURCENAME: Uniprot
## | UPSOURCEURL: http://www.UniProt.org/
## | UPSOURCEDATE: Fri Sep 23 16:26:35 2022
##
## Please see: help('select') for usage information
The same information can be retrieved as a data frame by calling
metadata(SomaScan.db)
.
Moving forward, this database object (SomaScan.db
) will be used throughout
the vignette to retrieve SomaScan annotations and map between identifiers.
For reference, the species information for the database can directly be retrieved with the following methods:
species(SomaScan.db)
## [1] "Homo sapiens"
taxonomyId(SomaScan.db)
## [1] 9606
It’s also possible to pull a more detailed summary of annotations and resource
identifiers (aka keys) by calling the package as a function, with the .db
extension removed:
SomaScan()
## Quality control information for SomaScan:
##
##
## This package has the following mappings:
##
## SomaScanALIAS2PROBE has 26576 mapped keys (of 258270 keys)
## SomaScanENSEMBL has 7173 mapped keys (of 7267 keys)
## SomaScanENSEMBL2PROBE has 7275 mapped keys (of 40467 keys)
## SomaScanENTREZID has 7178 mapped keys (of 7267 keys)
## SomaScanENZYME has 1311 mapped keys (of 7267 keys)
## SomaScanENZYME2PROBE has 666 mapped keys (of 975 keys)
## SomaScanGENENAME has 7178 mapped keys (of 7267 keys)
## SomaScanGO has 7126 mapped keys (of 7267 keys)
## SomaScanGO2ALLPROBES has 18821 mapped keys (of 22741 keys)
## SomaScanGO2PROBE has 14289 mapped keys (of 18809 keys)
## SomaScanMAP has 7174 mapped keys (of 7267 keys)
## SomaScanOMIM has 6726 mapped keys (of 7267 keys)
## SomaScanPATH has 3089 mapped keys (of 7267 keys)
## SomaScanPATH2PROBE has 227 mapped keys (of 229 keys)
## SomaScanPMID has 7175 mapped keys (of 7267 keys)
## SomaScanPMID2PROBE has 493341 mapped keys (of 778807 keys)
## SomaScanREFSEQ has 7178 mapped keys (of 7267 keys)
## SomaScanSYMBOL has 7178 mapped keys (of 7267 keys)
## SomaScanUNIPROT has 7165 mapped keys (of 7267 keys)
##
##
## Additional Information about this package:
##
## DB schema: HUMANCHIP_DB
## DB schema version: 2.1
## Organism: Homo sapiens
## Date for NCBI data: 2022-Sep12
## Date for GO data: 2022-07-01
## Date for KEGG data: 2011-Mar15
## Date for Golden Path data: 2022-Aug31
## Date for Ensembl data: 2022-Jun28
Note: Keys will be explained in greater detail later in this vignette.
The SomaScan.db
package has 5 primary methods that can be used to query the
database:
keys
keytypes
columns
select
mapIds
This vignette will describe how each of these methods can be used to obtain
annotation data from SomaScan.db
.
keys
methodThis annotation package is platform-based, meaning it was built around the
unique identifiers from a specific platform (in this case, SomaLogic’s
SomaScan platform). That identifier corresponds to each of the assay’s
analytes, and therefore the analyte identifiers (SeqIds
) are the primary
term used to query the database (aka “key”).
All keys in the database can be retrieved using keys
:
# Short list of primary keys
keys(SomaScan.db) |> head(10L)
## [1] "10000-28" "10001-7" "10003-15" "10006-25" "10008-43" "10010-10"
## [7] "10011-65" "10012-5" "10014-31" "10015-119"
Each key retrieved in the output above corresponds to one of the assay’s unique analytes.
keytype
methodWhen querying the database, we can also specify the type of key (“keytype”)
being used. The keytype refers to the type of identifier that is used to
generate a database query. While the database is centered around the SomaLogic
SeqId
, other identifiers can still be used to query the database.
We can list all available datatypes that can be used as query keys using
keytypes()
:
## List all of the supported key types.
keytypes(SomaScan.db)
## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
## [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
## [11] "GENETYPE" "GO" "GOALL" "IPI" "MAP"
## [16] "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM"
## [21] "PMID" "PROBEID" "PROSITE" "REFSEQ" "SYMBOL"
## [26] "UCSCKG" "UNIPROT"
Note: the SomaScan assay analyte identifiers (SeqIds
) are stored as the
“PROBEID” keytype.
keytypes
can also be used in conjunction with keys
to retrieve all
identifiers associated with the specified keytype. The example below will
retrieve all UniProt IDs in SomaScan.db
:
keys(SomaScan.db, keytype = "UNIPROT") |> head(20L)
## [1] "P04217" "V9HWD8" "P01023" "P18440" "Q400J6"
## [6] "F5H5R8" "A4Z6T7" "P11245" "A0A024R6P0" "P01011"
## [11] "P22760" "A0A024R410" "Q13685" "C9JEH3" "F1T0I5"
## [16] "Q16613" "P49588" "P80404" "X5D8S1" "B2RUU2"
columns
methodAll available external annotations, corresponding to “columns” of the
database, can be listed using columns()
:
columns(SomaScan.db)
## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
## [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
## [11] "GENETYPE" "GO" "GOALL" "IPI" "MAP"
## [16] "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM"
## [21] "PMID" "PROBEID" "PROSITE" "REFSEQ" "SYMBOL"
## [26] "UCSCKG" "UNIPROT"
Note: the SomaScan assay analyte identifiers (SeqIds
) are stored in the
“PROBEID” column.
This list may look very similar (or even identical) to the columns
output.
If identical, all columns can be used as query keys. For a more in-depth
explanation of what each of these columns contains, consult the manual:
help("OMIM") # Example help call
Each columns
entry also has a mapping object that contains the information
connecting SeqIds → the annotation’s identifiers. To read further
documentation about the object and the resource used to make it, check out the
manual page for the mapping itself:
?SomaScanOMIM
select
methodThe list of columns returned by columns
informs us as to what types of data
are available; therefore, the column values can be used to retrieve specific
pieces of information from the database. You can think of keys and columns as:
SeqIds
/probe IDs), aka rows
of the databaseThe SomaScan.db
database can be queried via select
, using both the keys
and columns.
When selecting columns and keys using the select
method, the keys are
returned in the left-most column of the output, in the PROBEID
column. The
results will be in the exact same order as the input keys:
# Randomly select a set of keys
example_keys <- withr::with_seed(101L, sample(keys(SomaScan.db),
size = 5L,
replace = FALSE
))
# Query keys in the database
select(SomaScan.db,
keys = example_keys,
columns = c("ENTREZID", "SYMBOL", "GENENAME")
)
## 'select()' returned 1:1 mapping between keys and columns
## PROBEID ENTREZID SYMBOL GENENAME
## 1 20564-53 7070 THY1 Thy-1 cell surface antigen
## 2 5481-16 5921 RASA1 RAS p21 protein activator 1
## 3 17792-158 7915 ALDH5A1 aldehyde dehydrogenase 5 family member A1
## 4 21760-22 7317 UBA1 ubiquitin like modifier activating enzyme 1
## 5 5508-62 1509 CTSD cathepsin D
Note: The message above (‘select()’ returned 1:1 mapping between keys and columns) will be described in detail in the next section of this vignette.
The data that is returned will always be in the same order as the provided
keys. If select
cannot find a mapping for a specific key, an NA
value will
be returned to retain the original query order.
# Inserting a new key that won't be found in the annotations ("TEST")
test_keys <- c(example_keys[1], "TEST")
select(SomaScan.db, keys = test_keys, columns = c("PROBEID", "ENTREZID"))
## 'select()' returned 1:1 mapping between keys and columns
## PROBEID ENTREZID
## 1 20564-53 7070
## 2 <NA> <NA>
In the example above, a “PROBEID” and “ENTREZID” value couldn’t be found for
the character string “TEST”, so an NA
was returned in its place.
When using select
, a message indicating the relationship between query keys
and column data will be displayed along with the query results.
This message will describe one of the three relationships below:
1:1 mapping between keys and columns
1:many mapping between keys and columns
many:many mapping between keys and columns
These messages describe the very real possibility that there are multiple
identifiers associated with each key in a query. This can cause the number of
rows returned by select()
to exceed the number of keys used to retrieve the
data; this is what is meant by the message “‘select()’ returned 1:many mapping
between keys and columns”.
In these cases, you will still see concordance between the order of the
provided keys and outputted results rows, but you should be aware that new
rows were inserted into the results. This message is not an error, merely an
informative notification to the user making it clear that more output rows
than input items should be expected. Importantly, this message also does not
relay information about the SomaScan menu itself or advice on how to handle
many-to-one relationships between SomaScan reagents and their corresponding
protein targets; rather, the message is directly related to this package’s
select
method and how it retrieves information from the database.
Because some columns may have a many-to-many relationship to each key, it is generally best practice to retrieve the minimum number of columns needed for a query. Additionally, when retrieving a column that is known to have a many-to-one relationship to each key, like GO terms, it’s best to request that information in its own query, like so:
# Good
select(SomaScan.db, keys = example_keys[3L], columns = "GO")
## 'select()' returned 1:many mapping between keys and columns
## PROBEID GO EVIDENCE ONTOLOGY
## 1 17792-158 GO:0004777 IBA MF
## 2 17792-158 GO:0004777 IDA MF
## 3 17792-158 GO:0004777 ISS MF
## 4 17792-158 GO:0005739 HDA CC
## 5 17792-158 GO:0005739 IBA CC
## 6 17792-158 GO:0005739 IDA CC
## 7 17792-158 GO:0005739 ISS CC
## 8 17792-158 GO:0005759 TAS CC
## 9 17792-158 GO:0006105 ISS BP
## 10 17792-158 GO:0006536 ISS BP
## 11 17792-158 GO:0007417 IMP BP
## 12 17792-158 GO:0009450 IBA BP
## 13 17792-158 GO:0009450 IDA BP
## 14 17792-158 GO:0009450 IEA BP
## 15 17792-158 GO:0009450 IMP BP
## 16 17792-158 GO:0009791 IEA BP
## 17 17792-158 GO:0042135 ISS BP
## 18 17792-158 GO:0042802 IPI MF
# Bad
select(SomaScan.db,
keys = example_keys[3L],
columns = c("UNIPROT", "ENSEMBL", "GO", "PATH", "IPI")
)
## Warning: You have selected the following columns that can have a many to one
## relationship with the primary key: UNIPROT, ENSEMBL, GO, PATH, IPI .
## Because you have selected more than a few such columns there is a
## risk that this selection may balloon up into a very large result as
## the number of rows returned multiplies accordingly. To experience
## smaller/more manageable results and faster retrieval times, you might
## want to consider selecting these columns separately.
## 'select()' returned 1:many mapping between keys and columns
## PROBEID UNIPROT ENSEMBL GO EVIDENCE ONTOLOGY PATH
## 1 17792-158 P51649 ENSG00000112294 GO:0004777 IBA MF 00250
## 2 17792-158 P51649 ENSG00000112294 GO:0004777 IBA MF 00250
## 3 17792-158 P51649 ENSG00000112294 GO:0004777 IBA MF 00650
## 4 17792-158 P51649 ENSG00000112294 GO:0004777 IBA MF 00650
## 5 17792-158 P51649 ENSG00000112294 GO:0004777 IBA MF 01100
## 6 17792-158 P51649 ENSG00000112294 GO:0004777 IBA MF 01100
## 7 17792-158 P51649 ENSG00000112294 GO:0004777 IDA MF 00250
## 8 17792-158 P51649 ENSG00000112294 GO:0004777 IDA MF 00250
## 9 17792-158 P51649 ENSG00000112294 GO:0004777 IDA MF 00650
## 10 17792-158 P51649 ENSG00000112294 GO:0004777 IDA MF 00650
## 11 17792-158 P51649 ENSG00000112294 GO:0004777 IDA MF 01100
## 12 17792-158 P51649 ENSG00000112294 GO:0004777 IDA MF 01100
## 13 17792-158 P51649 ENSG00000112294 GO:0004777 ISS MF 00250
## 14 17792-158 P51649 ENSG00000112294 GO:0004777 ISS MF 00250
## 15 17792-158 P51649 ENSG00000112294 GO:0004777 ISS MF 00650
## 16 17792-158 P51649 ENSG00000112294 GO:0004777 ISS MF 00650
## 17 17792-158 P51649 ENSG00000112294 GO:0004777 ISS MF 01100
## 18 17792-158 P51649 ENSG00000112294 GO:0004777 ISS MF 01100
## 19 17792-158 P51649 ENSG00000112294 GO:0005739 HDA CC 00250
## 20 17792-158 P51649 ENSG00000112294 GO:0005739 HDA CC 00250
## 21 17792-158 P51649 ENSG00000112294 GO:0005739 HDA CC 00650
## 22 17792-158 P51649 ENSG00000112294 GO:0005739 HDA CC 00650
## 23 17792-158 P51649 ENSG00000112294 GO:0005739 HDA CC 01100
## 24 17792-158 P51649 ENSG00000112294 GO:0005739 HDA CC 01100
## 25 17792-158 P51649 ENSG00000112294 GO:0005739 IBA CC 00250
## 26 17792-158 P51649 ENSG00000112294 GO:0005739 IBA CC 00250
## 27 17792-158 P51649 ENSG00000112294 GO:0005739 IBA CC 00650
## 28 17792-158 P51649 ENSG00000112294 GO:0005739 IBA CC 00650
## 29 17792-158 P51649 ENSG00000112294 GO:0005739 IBA CC 01100
## 30 17792-158 P51649 ENSG00000112294 GO:0005739 IBA CC 01100
## 31 17792-158 P51649 ENSG00000112294 GO:0005739 IDA CC 00250
## 32 17792-158 P51649 ENSG00000112294 GO:0005739 IDA CC 00250
## 33 17792-158 P51649 ENSG00000112294 GO:0005739 IDA CC 00650
## 34 17792-158 P51649 ENSG00000112294 GO:0005739 IDA CC 00650
## 35 17792-158 P51649 ENSG00000112294 GO:0005739 IDA CC 01100
## 36 17792-158 P51649 ENSG00000112294 GO:0005739 IDA CC 01100
## 37 17792-158 P51649 ENSG00000112294 GO:0005739 ISS CC 00250
## 38 17792-158 P51649 ENSG00000112294 GO:0005739 ISS CC 00250
## 39 17792-158 P51649 ENSG00000112294 GO:0005739 ISS CC 00650
## 40 17792-158 P51649 ENSG00000112294 GO:0005739 ISS CC 00650
## 41 17792-158 P51649 ENSG00000112294 GO:0005739 ISS CC 01100
## 42 17792-158 P51649 ENSG00000112294 GO:0005739 ISS CC 01100
## 43 17792-158 P51649 ENSG00000112294 GO:0005759 TAS CC 00250
## 44 17792-158 P51649 ENSG00000112294 GO:0005759 TAS CC 00250
## 45 17792-158 P51649 ENSG00000112294 GO:0005759 TAS CC 00650
## 46 17792-158 P51649 ENSG00000112294 GO:0005759 TAS CC 00650
## 47 17792-158 P51649 ENSG00000112294 GO:0005759 TAS CC 01100
## 48 17792-158 P51649 ENSG00000112294 GO:0005759 TAS CC 01100
## 49 17792-158 P51649 ENSG00000112294 GO:0006105 ISS BP 00250
## 50 17792-158 P51649 ENSG00000112294 GO:0006105 ISS BP 00250
## 51 17792-158 P51649 ENSG00000112294 GO:0006105 ISS BP 00650
## 52 17792-158 P51649 ENSG00000112294 GO:0006105 ISS BP 00650
## 53 17792-158 P51649 ENSG00000112294 GO:0006105 ISS BP 01100
## 54 17792-158 P51649 ENSG00000112294 GO:0006105 ISS BP 01100
## 55 17792-158 P51649 ENSG00000112294 GO:0006536 ISS BP 00250
## 56 17792-158 P51649 ENSG00000112294 GO:0006536 ISS BP 00250
## 57 17792-158 P51649 ENSG00000112294 GO:0006536 ISS BP 00650
## 58 17792-158 P51649 ENSG00000112294 GO:0006536 ISS BP 00650
## 59 17792-158 P51649 ENSG00000112294 GO:0006536 ISS BP 01100
## 60 17792-158 P51649 ENSG00000112294 GO:0006536 ISS BP 01100
## 61 17792-158 P51649 ENSG00000112294 GO:0007417 IMP BP 00250
## 62 17792-158 P51649 ENSG00000112294 GO:0007417 IMP BP 00250
## 63 17792-158 P51649 ENSG00000112294 GO:0007417 IMP BP 00650
## 64 17792-158 P51649 ENSG00000112294 GO:0007417 IMP BP 00650
## 65 17792-158 P51649 ENSG00000112294 GO:0007417 IMP BP 01100
## 66 17792-158 P51649 ENSG00000112294 GO:0007417 IMP BP 01100
## 67 17792-158 P51649 ENSG00000112294 GO:0009450 IBA BP 00250
## 68 17792-158 P51649 ENSG00000112294 GO:0009450 IBA BP 00250
## 69 17792-158 P51649 ENSG00000112294 GO:0009450 IBA BP 00650
## 70 17792-158 P51649 ENSG00000112294 GO:0009450 IBA BP 00650
## 71 17792-158 P51649 ENSG00000112294 GO:0009450 IBA BP 01100
## 72 17792-158 P51649 ENSG00000112294 GO:0009450 IBA BP 01100
## 73 17792-158 P51649 ENSG00000112294 GO:0009450 IDA BP 00250
## 74 17792-158 P51649 ENSG00000112294 GO:0009450 IDA BP 00250
## 75 17792-158 P51649 ENSG00000112294 GO:0009450 IDA BP 00650
## 76 17792-158 P51649 ENSG00000112294 GO:0009450 IDA BP 00650
## 77 17792-158 P51649 ENSG00000112294 GO:0009450 IDA BP 01100
## 78 17792-158 P51649 ENSG00000112294 GO:0009450 IDA BP 01100
## 79 17792-158 P51649 ENSG00000112294 GO:0009450 IEA BP 00250
## 80 17792-158 P51649 ENSG00000112294 GO:0009450 IEA BP 00250
## 81 17792-158 P51649 ENSG00000112294 GO:0009450 IEA BP 00650
## 82 17792-158 P51649 ENSG00000112294 GO:0009450 IEA BP 00650
## 83 17792-158 P51649 ENSG00000112294 GO:0009450 IEA BP 01100
## 84 17792-158 P51649 ENSG00000112294 GO:0009450 IEA BP 01100
## 85 17792-158 P51649 ENSG00000112294 GO:0009450 IMP BP 00250
## 86 17792-158 P51649 ENSG00000112294 GO:0009450 IMP BP 00250
## 87 17792-158 P51649 ENSG00000112294 GO:0009450 IMP BP 00650
## 88 17792-158 P51649 ENSG00000112294 GO:0009450 IMP BP 00650
## 89 17792-158 P51649 ENSG00000112294 GO:0009450 IMP BP 01100
## 90 17792-158 P51649 ENSG00000112294 GO:0009450 IMP BP 01100
## 91 17792-158 P51649 ENSG00000112294 GO:0009791 IEA BP 00250
## 92 17792-158 P51649 ENSG00000112294 GO:0009791 IEA BP 00250
## 93 17792-158 P51649 ENSG00000112294 GO:0009791 IEA BP 00650
## 94 17792-158 P51649 ENSG00000112294 GO:0009791 IEA BP 00650
## 95 17792-158 P51649 ENSG00000112294 GO:0009791 IEA BP 01100
## 96 17792-158 P51649 ENSG00000112294 GO:0009791 IEA BP 01100
## 97 17792-158 P51649 ENSG00000112294 GO:0042135 ISS BP 00250
## 98 17792-158 P51649 ENSG00000112294 GO:0042135 ISS BP 00250
## 99 17792-158 P51649 ENSG00000112294 GO:0042135 ISS BP 00650
## 100 17792-158 P51649 ENSG00000112294 GO:0042135 ISS BP 00650
## 101 17792-158 P51649 ENSG00000112294 GO:0042135 ISS BP 01100
## 102 17792-158 P51649 ENSG00000112294 GO:0042135 ISS BP 01100
## 103 17792-158 P51649 ENSG00000112294 GO:0042802 IPI MF 00250
## 104 17792-158 P51649 ENSG00000112294 GO:0042802 IPI MF 00250
## 105 17792-158 P51649 ENSG00000112294 GO:0042802 IPI MF 00650
## 106 17792-158 P51649 ENSG00000112294 GO:0042802 IPI MF 00650
## 107 17792-158 P51649 ENSG00000112294 GO:0042802 IPI MF 01100
## 108 17792-158 P51649 ENSG00000112294 GO:0042802 IPI MF 01100
## 109 17792-158 X5DQN2 ENSG00000112294 GO:0004777 IBA MF 00250
## 110 17792-158 X5DQN2 ENSG00000112294 GO:0004777 IBA MF 00250
## 111 17792-158 X5DQN2 ENSG00000112294 GO:0004777 IBA MF 00650
## 112 17792-158 X5DQN2 ENSG00000112294 GO:0004777 IBA MF 00650
## 113 17792-158 X5DQN2 ENSG00000112294 GO:0004777 IBA MF 01100
## 114 17792-158 X5DQN2 ENSG00000112294 GO:0004777 IBA MF 01100
## 115 17792-158 X5DQN2 ENSG00000112294 GO:0004777 IDA MF 00250
## 116 17792-158 X5DQN2 ENSG00000112294 GO:0004777 IDA MF 00250
## 117 17792-158 X5DQN2 ENSG00000112294 GO:0004777 IDA MF 00650
## 118 17792-158 X5DQN2 ENSG00000112294 GO:0004777 IDA MF 00650
## 119 17792-158 X5DQN2 ENSG00000112294 GO:0004777 IDA MF 01100
## 120 17792-158 X5DQN2 ENSG00000112294 GO:0004777 IDA MF 01100
## 121 17792-158 X5DQN2 ENSG00000112294 GO:0004777 ISS MF 00250
## 122 17792-158 X5DQN2 ENSG00000112294 GO:0004777 ISS MF 00250
## 123 17792-158 X5DQN2 ENSG00000112294 GO:0004777 ISS MF 00650
## 124 17792-158 X5DQN2 ENSG00000112294 GO:0004777 ISS MF 00650
## 125 17792-158 X5DQN2 ENSG00000112294 GO:0004777 ISS MF 01100
## 126 17792-158 X5DQN2 ENSG00000112294 GO:0004777 ISS MF 01100
## 127 17792-158 X5DQN2 ENSG00000112294 GO:0005739 HDA CC 00250
## 128 17792-158 X5DQN2 ENSG00000112294 GO:0005739 HDA CC 00250
## 129 17792-158 X5DQN2 ENSG00000112294 GO:0005739 HDA CC 00650
## 130 17792-158 X5DQN2 ENSG00000112294 GO:0005739 HDA CC 00650
## 131 17792-158 X5DQN2 ENSG00000112294 GO:0005739 HDA CC 01100
## 132 17792-158 X5DQN2 ENSG00000112294 GO:0005739 HDA CC 01100
## 133 17792-158 X5DQN2 ENSG00000112294 GO:0005739 IBA CC 00250
## 134 17792-158 X5DQN2 ENSG00000112294 GO:0005739 IBA CC 00250
## 135 17792-158 X5DQN2 ENSG00000112294 GO:0005739 IBA CC 00650
## 136 17792-158 X5DQN2 ENSG00000112294 GO:0005739 IBA CC 00650
## 137 17792-158 X5DQN2 ENSG00000112294 GO:0005739 IBA CC 01100
## 138 17792-158 X5DQN2 ENSG00000112294 GO:0005739 IBA CC 01100
## 139 17792-158 X5DQN2 ENSG00000112294 GO:0005739 IDA CC 00250
## 140 17792-158 X5DQN2 ENSG00000112294 GO:0005739 IDA CC 00250
## 141 17792-158 X5DQN2 ENSG00000112294 GO:0005739 IDA CC 00650
## 142 17792-158 X5DQN2 ENSG00000112294 GO:0005739 IDA CC 00650
## 143 17792-158 X5DQN2 ENSG00000112294 GO:0005739 IDA CC 01100
## 144 17792-158 X5DQN2 ENSG00000112294 GO:0005739 IDA CC 01100
## 145 17792-158 X5DQN2 ENSG00000112294 GO:0005739 ISS CC 00250
## 146 17792-158 X5DQN2 ENSG00000112294 GO:0005739 ISS CC 00250
## 147 17792-158 X5DQN2 ENSG00000112294 GO:0005739 ISS CC 00650
## 148 17792-158 X5DQN2 ENSG00000112294 GO:0005739 ISS CC 00650
## 149 17792-158 X5DQN2 ENSG00000112294 GO:0005739 ISS CC 01100
## 150 17792-158 X5DQN2 ENSG00000112294 GO:0005739 ISS CC 01100
## 151 17792-158 X5DQN2 ENSG00000112294 GO:0005759 TAS CC 00250
## 152 17792-158 X5DQN2 ENSG00000112294 GO:0005759 TAS CC 00250
## 153 17792-158 X5DQN2 ENSG00000112294 GO:0005759 TAS CC 00650
## 154 17792-158 X5DQN2 ENSG00000112294 GO:0005759 TAS CC 00650
## 155 17792-158 X5DQN2 ENSG00000112294 GO:0005759 TAS CC 01100
## 156 17792-158 X5DQN2 ENSG00000112294 GO:0005759 TAS CC 01100
## 157 17792-158 X5DQN2 ENSG00000112294 GO:0006105 ISS BP 00250
## 158 17792-158 X5DQN2 ENSG00000112294 GO:0006105 ISS BP 00250
## 159 17792-158 X5DQN2 ENSG00000112294 GO:0006105 ISS BP 00650
## 160 17792-158 X5DQN2 ENSG00000112294 GO:0006105 ISS BP 00650
## 161 17792-158 X5DQN2 ENSG00000112294 GO:0006105 ISS BP 01100
## 162 17792-158 X5DQN2 ENSG00000112294 GO:0006105 ISS BP 01100
## 163 17792-158 X5DQN2 ENSG00000112294 GO:0006536 ISS BP 00250
## 164 17792-158 X5DQN2 ENSG00000112294 GO:0006536 ISS BP 00250
## 165 17792-158 X5DQN2 ENSG00000112294 GO:0006536 ISS BP 00650
## 166 17792-158 X5DQN2 ENSG00000112294 GO:0006536 ISS BP 00650
## 167 17792-158 X5DQN2 ENSG00000112294 GO:0006536 ISS BP 01100
## 168 17792-158 X5DQN2 ENSG00000112294 GO:0006536 ISS BP 01100
## 169 17792-158 X5DQN2 ENSG00000112294 GO:0007417 IMP BP 00250
## 170 17792-158 X5DQN2 ENSG00000112294 GO:0007417 IMP BP 00250
## 171 17792-158 X5DQN2 ENSG00000112294 GO:0007417 IMP BP 00650
## 172 17792-158 X5DQN2 ENSG00000112294 GO:0007417 IMP BP 00650
## 173 17792-158 X5DQN2 ENSG00000112294 GO:0007417 IMP BP 01100
## 174 17792-158 X5DQN2 ENSG00000112294 GO:0007417 IMP BP 01100
## 175 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IBA BP 00250
## 176 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IBA BP 00250
## 177 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IBA BP 00650
## 178 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IBA BP 00650
## 179 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IBA BP 01100
## 180 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IBA BP 01100
## 181 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IDA BP 00250
## 182 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IDA BP 00250
## 183 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IDA BP 00650
## 184 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IDA BP 00650
## 185 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IDA BP 01100
## 186 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IDA BP 01100
## 187 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IEA BP 00250
## 188 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IEA BP 00250
## 189 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IEA BP 00650
## 190 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IEA BP 00650
## 191 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IEA BP 01100
## 192 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IEA BP 01100
## 193 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IMP BP 00250
## 194 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IMP BP 00250
## 195 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IMP BP 00650
## 196 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IMP BP 00650
## 197 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IMP BP 01100
## 198 17792-158 X5DQN2 ENSG00000112294 GO:0009450 IMP BP 01100
## 199 17792-158 X5DQN2 ENSG00000112294 GO:0009791 IEA BP 00250
## 200 17792-158 X5DQN2 ENSG00000112294 GO:0009791 IEA BP 00250
## 201 17792-158 X5DQN2 ENSG00000112294 GO:0009791 IEA BP 00650
## 202 17792-158 X5DQN2 ENSG00000112294 GO:0009791 IEA BP 00650
## 203 17792-158 X5DQN2 ENSG00000112294 GO:0009791 IEA BP 01100
## 204 17792-158 X5DQN2 ENSG00000112294 GO:0009791 IEA BP 01100
## 205 17792-158 X5DQN2 ENSG00000112294 GO:0042135 ISS BP 00250
## 206 17792-158 X5DQN2 ENSG00000112294 GO:0042135 ISS BP 00250
## 207 17792-158 X5DQN2 ENSG00000112294 GO:0042135 ISS BP 00650
## 208 17792-158 X5DQN2 ENSG00000112294 GO:0042135 ISS BP 00650
## 209 17792-158 X5DQN2 ENSG00000112294 GO:0042135 ISS BP 01100
## 210 17792-158 X5DQN2 ENSG00000112294 GO:0042135 ISS BP 01100
## 211 17792-158 X5DQN2 ENSG00000112294 GO:0042802 IPI MF 00250
## 212 17792-158 X5DQN2 ENSG00000112294 GO:0042802 IPI MF 00250
## 213 17792-158 X5DQN2 ENSG00000112294 GO:0042802 IPI MF 00650
## 214 17792-158 X5DQN2 ENSG00000112294 GO:0042802 IPI MF 00650
## 215 17792-158 X5DQN2 ENSG00000112294 GO:0042802 IPI MF 01100
## 216 17792-158 X5DQN2 ENSG00000112294 GO:0042802 IPI MF 01100
## 217 17792-158 X5D299 ENSG00000112294 GO:0004777 IBA MF 00250
## 218 17792-158 X5D299 ENSG00000112294 GO:0004777 IBA MF 00250
## 219 17792-158 X5D299 ENSG00000112294 GO:0004777 IBA MF 00650
## 220 17792-158 X5D299 ENSG00000112294 GO:0004777 IBA MF 00650
## 221 17792-158 X5D299 ENSG00000112294 GO:0004777 IBA MF 01100
## 222 17792-158 X5D299 ENSG00000112294 GO:0004777 IBA MF 01100
## 223 17792-158 X5D299 ENSG00000112294 GO:0004777 IDA MF 00250
## 224 17792-158 X5D299 ENSG00000112294 GO:0004777 IDA MF 00250
## 225 17792-158 X5D299 ENSG00000112294 GO:0004777 IDA MF 00650
## 226 17792-158 X5D299 ENSG00000112294 GO:0004777 IDA MF 00650
## 227 17792-158 X5D299 ENSG00000112294 GO:0004777 IDA MF 01100
## 228 17792-158 X5D299 ENSG00000112294 GO:0004777 IDA MF 01100
## 229 17792-158 X5D299 ENSG00000112294 GO:0004777 ISS MF 00250
## 230 17792-158 X5D299 ENSG00000112294 GO:0004777 ISS MF 00250
## 231 17792-158 X5D299 ENSG00000112294 GO:0004777 ISS MF 00650
## 232 17792-158 X5D299 ENSG00000112294 GO:0004777 ISS MF 00650
## 233 17792-158 X5D299 ENSG00000112294 GO:0004777 ISS MF 01100
## 234 17792-158 X5D299 ENSG00000112294 GO:0004777 ISS MF 01100
## 235 17792-158 X5D299 ENSG00000112294 GO:0005739 HDA CC 00250
## 236 17792-158 X5D299 ENSG00000112294 GO:0005739 HDA CC 00250
## 237 17792-158 X5D299 ENSG00000112294 GO:0005739 HDA CC 00650
## 238 17792-158 X5D299 ENSG00000112294 GO:0005739 HDA CC 00650
## 239 17792-158 X5D299 ENSG00000112294 GO:0005739 HDA CC 01100
## 240 17792-158 X5D299 ENSG00000112294 GO:0005739 HDA CC 01100
## 241 17792-158 X5D299 ENSG00000112294 GO:0005739 IBA CC 00250
## 242 17792-158 X5D299 ENSG00000112294 GO:0005739 IBA CC 00250
## 243 17792-158 X5D299 ENSG00000112294 GO:0005739 IBA CC 00650
## 244 17792-158 X5D299 ENSG00000112294 GO:0005739 IBA CC 00650
## 245 17792-158 X5D299 ENSG00000112294 GO:0005739 IBA CC 01100
## 246 17792-158 X5D299 ENSG00000112294 GO:0005739 IBA CC 01100
## 247 17792-158 X5D299 ENSG00000112294 GO:0005739 IDA CC 00250
## 248 17792-158 X5D299 ENSG00000112294 GO:0005739 IDA CC 00250
## 249 17792-158 X5D299 ENSG00000112294 GO:0005739 IDA CC 00650
## 250 17792-158 X5D299 ENSG00000112294 GO:0005739 IDA CC 00650
## 251 17792-158 X5D299 ENSG00000112294 GO:0005739 IDA CC 01100
## 252 17792-158 X5D299 ENSG00000112294 GO:0005739 IDA CC 01100
## 253 17792-158 X5D299 ENSG00000112294 GO:0005739 ISS CC 00250
## 254 17792-158 X5D299 ENSG00000112294 GO:0005739 ISS CC 00250
## 255 17792-158 X5D299 ENSG00000112294 GO:0005739 ISS CC 00650
## 256 17792-158 X5D299 ENSG00000112294 GO:0005739 ISS CC 00650
## 257 17792-158 X5D299 ENSG00000112294 GO:0005739 ISS CC 01100
## 258 17792-158 X5D299 ENSG00000112294 GO:0005739 ISS CC 01100
## 259 17792-158 X5D299 ENSG00000112294 GO:0005759 TAS CC 00250
## 260 17792-158 X5D299 ENSG00000112294 GO:0005759 TAS CC 00250
## 261 17792-158 X5D299 ENSG00000112294 GO:0005759 TAS CC 00650
## 262 17792-158 X5D299 ENSG00000112294 GO:0005759 TAS CC 00650
## 263 17792-158 X5D299 ENSG00000112294 GO:0005759 TAS CC 01100
## 264 17792-158 X5D299 ENSG00000112294 GO:0005759 TAS CC 01100
## 265 17792-158 X5D299 ENSG00000112294 GO:0006105 ISS BP 00250
## 266 17792-158 X5D299 ENSG00000112294 GO:0006105 ISS BP 00250
## 267 17792-158 X5D299 ENSG00000112294 GO:0006105 ISS BP 00650
## 268 17792-158 X5D299 ENSG00000112294 GO:0006105 ISS BP 00650
## 269 17792-158 X5D299 ENSG00000112294 GO:0006105 ISS BP 01100
## 270 17792-158 X5D299 ENSG00000112294 GO:0006105 ISS BP 01100
## 271 17792-158 X5D299 ENSG00000112294 GO:0006536 ISS BP 00250
## 272 17792-158 X5D299 ENSG00000112294 GO:0006536 ISS BP 00250
## 273 17792-158 X5D299 ENSG00000112294 GO:0006536 ISS BP 00650
## 274 17792-158 X5D299 ENSG00000112294 GO:0006536 ISS BP 00650
## 275 17792-158 X5D299 ENSG00000112294 GO:0006536 ISS BP 01100
## 276 17792-158 X5D299 ENSG00000112294 GO:0006536 ISS BP 01100
## 277 17792-158 X5D299 ENSG00000112294 GO:0007417 IMP BP 00250
## 278 17792-158 X5D299 ENSG00000112294 GO:0007417 IMP BP 00250
## 279 17792-158 X5D299 ENSG00000112294 GO:0007417 IMP BP 00650
## 280 17792-158 X5D299 ENSG00000112294 GO:0007417 IMP BP 00650
## 281 17792-158 X5D299 ENSG00000112294 GO:0007417 IMP BP 01100
## 282 17792-158 X5D299 ENSG00000112294 GO:0007417 IMP BP 01100
## 283 17792-158 X5D299 ENSG00000112294 GO:0009450 IBA BP 00250
## 284 17792-158 X5D299 ENSG00000112294 GO:0009450 IBA BP 00250
## 285 17792-158 X5D299 ENSG00000112294 GO:0009450 IBA BP 00650
## 286 17792-158 X5D299 ENSG00000112294 GO:0009450 IBA BP 00650
## 287 17792-158 X5D299 ENSG00000112294 GO:0009450 IBA BP 01100
## 288 17792-158 X5D299 ENSG00000112294 GO:0009450 IBA BP 01100
## 289 17792-158 X5D299 ENSG00000112294 GO:0009450 IDA BP 00250
## 290 17792-158 X5D299 ENSG00000112294 GO:0009450 IDA BP 00250
## 291 17792-158 X5D299 ENSG00000112294 GO:0009450 IDA BP 00650
## 292 17792-158 X5D299 ENSG00000112294 GO:0009450 IDA BP 00650
## 293 17792-158 X5D299 ENSG00000112294 GO:0009450 IDA BP 01100
## 294 17792-158 X5D299 ENSG00000112294 GO:0009450 IDA BP 01100
## 295 17792-158 X5D299 ENSG00000112294 GO:0009450 IEA BP 00250
## 296 17792-158 X5D299 ENSG00000112294 GO:0009450 IEA BP 00250
## 297 17792-158 X5D299 ENSG00000112294 GO:0009450 IEA BP 00650
## 298 17792-158 X5D299 ENSG00000112294 GO:0009450 IEA BP 00650
## 299 17792-158 X5D299 ENSG00000112294 GO:0009450 IEA BP 01100
## 300 17792-158 X5D299 ENSG00000112294 GO:0009450 IEA BP 01100
## 301 17792-158 X5D299 ENSG00000112294 GO:0009450 IMP BP 00250
## 302 17792-158 X5D299 ENSG00000112294 GO:0009450 IMP BP 00250
## 303 17792-158 X5D299 ENSG00000112294 GO:0009450 IMP BP 00650
## 304 17792-158 X5D299 ENSG00000112294 GO:0009450 IMP BP 00650
## 305 17792-158 X5D299 ENSG00000112294 GO:0009450 IMP BP 01100
## 306 17792-158 X5D299 ENSG00000112294 GO:0009450 IMP BP 01100
## 307 17792-158 X5D299 ENSG00000112294 GO:0009791 IEA BP 00250
## 308 17792-158 X5D299 ENSG00000112294 GO:0009791 IEA BP 00250
## 309 17792-158 X5D299 ENSG00000112294 GO:0009791 IEA BP 00650
## 310 17792-158 X5D299 ENSG00000112294 GO:0009791 IEA BP 00650
## 311 17792-158 X5D299 ENSG00000112294 GO:0009791 IEA BP 01100
## 312 17792-158 X5D299 ENSG00000112294 GO:0009791 IEA BP 01100
## 313 17792-158 X5D299 ENSG00000112294 GO:0042135 ISS BP 00250
## 314 17792-158 X5D299 ENSG00000112294 GO:0042135 ISS BP 00250
## 315 17792-158 X5D299 ENSG00000112294 GO:0042135 ISS BP 00650
## 316 17792-158 X5D299 ENSG00000112294 GO:0042135 ISS BP 00650
## 317 17792-158 X5D299 ENSG00000112294 GO:0042135 ISS BP 01100
## 318 17792-158 X5D299 ENSG00000112294 GO:0042135 ISS BP 01100
## 319 17792-158 X5D299 ENSG00000112294 GO:0042802 IPI MF 00250
## 320 17792-158 X5D299 ENSG00000112294 GO:0042802 IPI MF 00250
## 321 17792-158 X5D299 ENSG00000112294 GO:0042802 IPI MF 00650
## 322 17792-158 X5D299 ENSG00000112294 GO:0042802 IPI MF 00650
## 323 17792-158 X5D299 ENSG00000112294 GO:0042802 IPI MF 01100
## 324 17792-158 X5D299 ENSG00000112294 GO:0042802 IPI MF 01100
## IPI
## 1 IPI00336008
## 2 IPI00019888
## 3 IPI00336008
## 4 IPI00019888
## 5 IPI00336008
## 6 IPI00019888
## 7 IPI00336008
## 8 IPI00019888
## 9 IPI00336008
## 10 IPI00019888
## 11 IPI00336008
## 12 IPI00019888
## 13 IPI00336008
## 14 IPI00019888
## 15 IPI00336008
## 16 IPI00019888
## 17 IPI00336008
## 18 IPI00019888
## 19 IPI00336008
## 20 IPI00019888
## 21 IPI00336008
## 22 IPI00019888
## 23 IPI00336008
## 24 IPI00019888
## 25 IPI00336008
## 26 IPI00019888
## 27 IPI00336008
## 28 IPI00019888
## 29 IPI00336008
## 30 IPI00019888
## 31 IPI00336008
## 32 IPI00019888
## 33 IPI00336008
## 34 IPI00019888
## 35 IPI00336008
## 36 IPI00019888
## 37 IPI00336008
## 38 IPI00019888
## 39 IPI00336008
## 40 IPI00019888
## 41 IPI00336008
## 42 IPI00019888
## 43 IPI00336008
## 44 IPI00019888
## 45 IPI00336008
## 46 IPI00019888
## 47 IPI00336008
## 48 IPI00019888
## 49 IPI00336008
## 50 IPI00019888
## 51 IPI00336008
## 52 IPI00019888
## 53 IPI00336008
## 54 IPI00019888
## 55 IPI00336008
## 56 IPI00019888
## 57 IPI00336008
## 58 IPI00019888
## 59 IPI00336008
## 60 IPI00019888
## 61 IPI00336008
## 62 IPI00019888
## 63 IPI00336008
## 64 IPI00019888
## 65 IPI00336008
## 66 IPI00019888
## 67 IPI00336008
## 68 IPI00019888
## 69 IPI00336008
## 70 IPI00019888
## 71 IPI00336008
## 72 IPI00019888
## 73 IPI00336008
## 74 IPI00019888
## 75 IPI00336008
## 76 IPI00019888
## 77 IPI00336008
## 78 IPI00019888
## 79 IPI00336008
## 80 IPI00019888
## 81 IPI00336008
## 82 IPI00019888
## 83 IPI00336008
## 84 IPI00019888
## 85 IPI00336008
## 86 IPI00019888
## 87 IPI00336008
## 88 IPI00019888
## 89 IPI00336008
## 90 IPI00019888
## 91 IPI00336008
## 92 IPI00019888
## 93 IPI00336008
## 94 IPI00019888
## 95 IPI00336008
## 96 IPI00019888
## 97 IPI00336008
## 98 IPI00019888
## 99 IPI00336008
## 100 IPI00019888
## 101 IPI00336008
## 102 IPI00019888
## 103 IPI00336008
## 104 IPI00019888
## 105 IPI00336008
## 106 IPI00019888
## 107 IPI00336008
## 108 IPI00019888
## 109 IPI00336008
## 110 IPI00019888
## 111 IPI00336008
## 112 IPI00019888
## 113 IPI00336008
## 114 IPI00019888
## 115 IPI00336008
## 116 IPI00019888
## 117 IPI00336008
## 118 IPI00019888
## 119 IPI00336008
## 120 IPI00019888
## 121 IPI00336008
## 122 IPI00019888
## 123 IPI00336008
## 124 IPI00019888
## 125 IPI00336008
## 126 IPI00019888
## 127 IPI00336008
## 128 IPI00019888
## 129 IPI00336008
## 130 IPI00019888
## 131 IPI00336008
## 132 IPI00019888
## 133 IPI00336008
## 134 IPI00019888
## 135 IPI00336008
## 136 IPI00019888
## 137 IPI00336008
## 138 IPI00019888
## 139 IPI00336008
## 140 IPI00019888
## 141 IPI00336008
## 142 IPI00019888
## 143 IPI00336008
## 144 IPI00019888
## 145 IPI00336008
## 146 IPI00019888
## 147 IPI00336008
## 148 IPI00019888
## 149 IPI00336008
## 150 IPI00019888
## 151 IPI00336008
## 152 IPI00019888
## 153 IPI00336008
## 154 IPI00019888
## 155 IPI00336008
## 156 IPI00019888
## 157 IPI00336008
## 158 IPI00019888
## 159 IPI00336008
## 160 IPI00019888
## 161 IPI00336008
## 162 IPI00019888
## 163 IPI00336008
## 164 IPI00019888
## 165 IPI00336008
## 166 IPI00019888
## 167 IPI00336008
## 168 IPI00019888
## 169 IPI00336008
## 170 IPI00019888
## 171 IPI00336008
## 172 IPI00019888
## 173 IPI00336008
## 174 IPI00019888
## 175 IPI00336008
## 176 IPI00019888
## 177 IPI00336008
## 178 IPI00019888
## 179 IPI00336008
## 180 IPI00019888
## 181 IPI00336008
## 182 IPI00019888
## 183 IPI00336008
## 184 IPI00019888
## 185 IPI00336008
## 186 IPI00019888
## 187 IPI00336008
## 188 IPI00019888
## 189 IPI00336008
## 190 IPI00019888
## 191 IPI00336008
## 192 IPI00019888
## 193 IPI00336008
## 194 IPI00019888
## 195 IPI00336008
## 196 IPI00019888
## 197 IPI00336008
## 198 IPI00019888
## 199 IPI00336008
## 200 IPI00019888
## 201 IPI00336008
## 202 IPI00019888
## 203 IPI00336008
## 204 IPI00019888
## 205 IPI00336008
## 206 IPI00019888
## 207 IPI00336008
## 208 IPI00019888
## 209 IPI00336008
## 210 IPI00019888
## 211 IPI00336008
## 212 IPI00019888
## 213 IPI00336008
## 214 IPI00019888
## 215 IPI00336008
## 216 IPI00019888
## 217 IPI00336008
## 218 IPI00019888
## 219 IPI00336008
## 220 IPI00019888
## 221 IPI00336008
## 222 IPI00019888
## 223 IPI00336008
## 224 IPI00019888
## 225 IPI00336008
## 226 IPI00019888
## 227 IPI00336008
## 228 IPI00019888
## 229 IPI00336008
## 230 IPI00019888
## 231 IPI00336008
## 232 IPI00019888
## 233 IPI00336008
## 234 IPI00019888
## 235 IPI00336008
## 236 IPI00019888
## 237 IPI00336008
## 238 IPI00019888
## 239 IPI00336008
## 240 IPI00019888
## 241 IPI00336008
## 242 IPI00019888
## 243 IPI00336008
## 244 IPI00019888
## 245 IPI00336008
## 246 IPI00019888
## 247 IPI00336008
## 248 IPI00019888
## 249 IPI00336008
## 250 IPI00019888
## 251 IPI00336008
## 252 IPI00019888
## 253 IPI00336008
## 254 IPI00019888
## 255 IPI00336008
## 256 IPI00019888
## 257 IPI00336008
## 258 IPI00019888
## 259 IPI00336008
## 260 IPI00019888
## 261 IPI00336008
## 262 IPI00019888
## 263 IPI00336008
## 264 IPI00019888
## 265 IPI00336008
## 266 IPI00019888
## 267 IPI00336008
## 268 IPI00019888
## 269 IPI00336008
## 270 IPI00019888
## 271 IPI00336008
## 272 IPI00019888
## 273 IPI00336008
## 274 IPI00019888
## 275 IPI00336008
## 276 IPI00019888
## 277 IPI00336008
## 278 IPI00019888
## 279 IPI00336008
## 280 IPI00019888
## 281 IPI00336008
## 282 IPI00019888
## 283 IPI00336008
## 284 IPI00019888
## 285 IPI00336008
## 286 IPI00019888
## 287 IPI00336008
## 288 IPI00019888
## 289 IPI00336008
## 290 IPI00019888
## 291 IPI00336008
## 292 IPI00019888
## 293 IPI00336008
## 294 IPI00019888
## 295 IPI00336008
## 296 IPI00019888
## 297 IPI00336008
## 298 IPI00019888
## 299 IPI00336008
## 300 IPI00019888
## 301 IPI00336008
## 302 IPI00019888
## 303 IPI00336008
## 304 IPI00019888
## 305 IPI00336008
## 306 IPI00019888
## 307 IPI00336008
## 308 IPI00019888
## 309 IPI00336008
## 310 IPI00019888
## 311 IPI00336008
## 312 IPI00019888
## 313 IPI00336008
## 314 IPI00019888
## 315 IPI00336008
## 316 IPI00019888
## 317 IPI00336008
## 318 IPI00019888
## 319 IPI00336008
## 320 IPI00019888
## 321 IPI00336008
## 322 IPI00019888
## 323 IPI00336008
## 324 IPI00019888
The example above illustrates why it is preferred to request as few columns as possible, especially when working with GO terms.
In SomaScan.db
, the default keytype for select
is the probe ID. This means
that when using a SeqId
(aka “PROBEID”) to retrieve annotations,
the keytype=
argument does not need to be defined, and can be left
out of the select
call entirely. The default (PROBEID
) will be used.
select(SomaScan.db, keys = example_keys, columns = c("ENTREZID", "UNIPROT"))
## 'select()' returned 1:many mapping between keys and columns
## PROBEID ENTREZID UNIPROT
## 1 20564-53 7070 B0YJA4
## 2 20564-53 7070 P04216
## 3 5481-16 5921 P20936
## 4 5481-16 5921 Q59GK3
## 5 17792-158 7915 P51649
## 6 17792-158 7915 X5DQN2
## 7 17792-158 7915 X5D299
## 8 21760-22 7317 A0A024R1A3
## 9 21760-22 7317 P22314
## 10 5508-62 1509 P07339
## 11 5508-62 1509 V9HWI3
However, the database can be searched using more than just SeqIds
.
For example, you may want to retrieve a list of SeqIds
that are associated
with a specific gene of interest - let’s use SMAD2 as an example. You can work
“backwards” to retrieve the SeqIds
associated with SMAD2 by setting the
keytype="SYMBOL"
:
select(SomaScan.db,
columns = c("PROBEID", "ENTREZID"),
keys = "SMAD2", keytype = "SYMBOL"
)
## 'select()' returned 1:many mapping between keys and columns
## SYMBOL PROBEID ENTREZID
## 1 SMAD2 10364-6 4087
## 2 SMAD2 11353-143 4087
Sometimes, this may appear not to work. Let’s use CASC4 as an example:
select(SomaScan.db,
columns = c("PROBEID", "ENTREZID"),
keys = "CASC4", keytype = "SYMBOL"
)
## Error in .testForValidKeys(x, keys, keytype, fks): None of the keys entered are valid keys for 'SYMBOL'. Please use the keys method to see a listing of valid arguments.
The error message above implies that CASC4
is not a valid key for the
“SYMBOL” keytype, which means that no entry for CASC4 was found in the
“SYMBOL” column. However, genes can be tricky to search, and in some cases
have many common names. We can improve this using keytype="ALIAS"
; this
data type contains the various aliases associated with gene names found in the
“SYMBOL” column. Using keytype="ALIAS"
, we can cast a wider net:
select(SomaScan.db,
columns = c("SYMBOL", "PROBEID", "ENTREZID"),
keys = "CASC4", keytype = "ALIAS"
)
## 'select()' returned 1:many mapping between keys and columns
## ALIAS SYMBOL PROBEID ENTREZID
## 1 CASC4 GOLM2 10613-33 113201
## 2 CASC4 GOLM2 8838-10 113201
This reveals the source of our problem! CASC4 is also known as GOLM4, and this symbol is used in the annotations database. Because of this, searching for CASC4 as a symbol returns no results, but the same query is able to identify an entry when the “ALIAS” column is specified.
Additionally, we can see that CASC4/GOLM2 is associated with two SeqIds
-
10613-33
and 8838-10
. How is this possible, and why does this happen?
For more information, please reference the Advanced Usage Examples (
vignette("advanced_usage_examples", package = "SomaScan.db")
).
mapIDs
methodFor situations in which you wish only to retrieve one data type from the
database, the mapIds
method may be cleaner and more streamlined than using
select
, and can help avoid problems with one-to-many mapping of keys.
For example, if you are only interested in the gene symbols associated with a
set of SomaScan analytes, they can be retrieved like so:
mapIds(SomaScan.db, keys = example_keys, column = "SYMBOL")
## 'select()' returned 1:1 mapping between keys and columns
## 20564-53 5481-16 17792-158 21760-22 5508-62
## "THY1" "RASA1" "ALDH5A1" "UBA1" "CTSD"
mapIds
will return a named vector from a single column, while select
returns a data frame and can be used to retrieve data from multiple columns.
The primary difference between mapIds
and select
is how the method handles
one-to-many mapping, i.e. when the chosen key maps to > 1 entry in the
selected column. When this occurs, only the first value (by default) is
returned.
Compare the output in the examples below:
# Only 1 symbol per key
mapIds(SomaScan.db, keys = example_keys[3L], column = "GO")
## 'select()' returned 1:many mapping between keys and columns
## 17792-158
## "GO:0004777"
# All entries for chosen key
select(SomaScan.db, keys = example_keys[3L], column = "GO")
## 'select()' returned 1:many mapping between keys and columns
## PROBEID GO EVIDENCE ONTOLOGY
## 1 17792-158 GO:0004777 IBA MF
## 2 17792-158 GO:0004777 IDA MF
## 3 17792-158 GO:0004777 ISS MF
## 4 17792-158 GO:0005739 HDA CC
## 5 17792-158 GO:0005739 IBA CC
## 6 17792-158 GO:0005739 IDA CC
## 7 17792-158 GO:0005739 ISS CC
## 8 17792-158 GO:0005759 TAS CC
## 9 17792-158 GO:0006105 ISS BP
## 10 17792-158 GO:0006536 ISS BP
## 11 17792-158 GO:0007417 IMP BP
## 12 17792-158 GO:0009450 IBA BP
## 13 17792-158 GO:0009450 IDA BP
## 14 17792-158 GO:0009450 IEA BP
## 15 17792-158 GO:0009450 IMP BP
## 16 17792-158 GO:0009791 IEA BP
## 17 17792-158 GO:0042135 ISS BP
## 18 17792-158 GO:0042802 IPI MF
Note that the mapIds
method warning message states that it returned 1:many
mappings between keys and columns, however only one value was
returned for the desired SeqId
. This is because there were more mapped
values that were discarded when the results were converted to a named vector.
This may not be a problem for some columns, like “SYMBOL” (typically there is
only one gene symbol per gene), but it may present a problem for others (like
GO terms or KEGG pathways). Think carefully when using mapIds
, or consider
specifying the multiVals=
argument to indicate what should be done
with multi-mapped output.
The default behavior of mapIds
is to return the first available result:
# The default - returns the first available result
mapIds(SomaScan.db, keys = example_keys[3], column = "GO", multiVals = "first")
## 'select()' returned 1:many mapping between keys and columns
## 17792-158
## "GO:0004777"
Again, the select
message here indicates that, while only 1 value was
returned, there were many more GO term matches. All of the matches can be
viewed by specifying multiVals="list"
:
# Returns a list object of results, instead of only returning the first result
mapIds(SomaScan.db, keys = example_keys[3], column = "GO", multiVals = "list")
## 'select()' returned 1:many mapping between keys and columns
## $`17792-158`
## [1] "GO:0004777" "GO:0004777" "GO:0004777" "GO:0005739" "GO:0005739"
## [6] "GO:0005739" "GO:0005739" "GO:0005759" "GO:0006105" "GO:0006536"
## [11] "GO:0007417" "GO:0009450" "GO:0009450" "GO:0009450" "GO:0009450"
## [16] "GO:0009791" "GO:0042135" "GO:0042802"
Because the annotations in this package are compiled from public repositories,
information typically found in an ADAT may be missing. For example, in an ADAT
file, each SeqId
is associated with a protein target, and
the name of that target is provided as both an abbreviated symbol (“Target”)
and full description (“Target Full Name”). The SomaScan.db
package does not contain data from a particular ADAT file; however, it does
contain a function to add the full protein target name to any data
frame obtained via select
.
As an example, we will generate a data frame of Ensembl gene IDs and OMIM IDs:
ensg <- select(SomaScan.db,
keys = example_keys[1:3L],
columns = c("ENSEMBL", "OMIM")
)
## 'select()' returned 1:many mapping between keys and columns
ensg
## PROBEID ENSEMBL OMIM
## 1 20564-53 ENSG00000154096 188230
## 2 5481-16 ENSG00000145715 139150
## 3 5481-16 ENSG00000145715 605462
## 4 5481-16 ENSG00000145715 608354
## 5 17792-158 ENSG00000112294 271980
## 6 17792-158 ENSG00000112294 610045
We will now append the Target Full Name to this data frame:
addTargetFullName(ensg)
## PROBEID TARGETFULLNAME ENSEMBL
## 1 17792-158 Succinate-semialdehyde dehydrogenase, mitochondrial ENSG00000112294
## 2 17792-158 Succinate-semialdehyde dehydrogenase, mitochondrial ENSG00000112294
## 3 20564-53 Thy-1 membrane glycoprotein ENSG00000154096
## 4 5481-16 Ras GTPase-activating protein 1 ENSG00000145715
## 5 5481-16 Ras GTPase-activating protein 1 ENSG00000145715
## 6 5481-16 Ras GTPase-activating protein 1 ENSG00000145715
## OMIM
## 1 271980
## 2 610045
## 3 188230
## 4 139150
## 5 605462
## 6 608354
The full protein target name will be appended to the input data frame, with the Target Full Name (in the “TARGETFULLNAME” column) always added to the right of the “PROBEID” column.
In addition to the methods mentioned above, there is an R object that can be
used to retrieve SomaScan analytes from a specific menu version. The object is
a list, with each element in the list containing a character vector of
SeqIds
that were available in the specified menu.
summary(somascan_menu)
## Length Class Mode
## v4.0 4966 -none- character
## v4.1 7267 -none- character
lapply(somascan_menu, head)
## $v4.0
## [1] "10000-28" "10001-7" "10003-15" "10006-25" "10008-43" "10011-65"
##
## $v4.1
## [1] "10000-28" "10001-7" "10003-15" "10006-25" "10008-43" "10010-10"
This object also provides a quick and easy way of comparing the available SomaScan menus:
setdiff(somascan_menu$v4.1, somascan_menu$v4.0) |> head(50L)
## [1] "10010-10" "10025-1" "10039-32" "10069-2" "10351-51" "10354-57"
## [7] "10379-19" "10382-1" "10398-110" "10420-30" "10439-57" "10457-3"
## [13] "10460-1" "10463-23" "10470-34" "10472-53" "10473-2" "10479-18"
## [19] "10480-33" "10505-12" "10528-2" "10576-7" "10626-116" "10631-9"
## [25] "10636-1" "10670-26" "10738-11" "10741-22" "10743-13" "10746-24"
## [31] "10780-10" "10801-11" "10819-108" "10855-55" "10870-32" "10894-25"
## [37] "10966-1" "10967-12" "10970-3" "10976-44" "10980-11" "11081-1"
## [43] "11083-23" "11121-56" "11150-3" "11159-14" "11184-51" "11200-52"
## [49] "11203-97" "11232-46"
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SomaScan.db_0.99.7 AnnotationDbi_1.64.0 IRanges_2.36.0
## [4] S4Vectors_0.40.0 Biobase_2.62.0 BiocGenerics_0.48.0
## [7] withr_2.5.1 BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 jsonlite_1.8.7 compiler_4.3.1
## [4] BiocManager_1.30.22 crayon_1.5.2 blob_1.2.4
## [7] bitops_1.0-7 Biostrings_2.70.0 jquerylib_0.1.4
## [10] png_0.1-8 yaml_2.3.7 fastmap_1.1.1
## [13] org.Hs.eg.db_3.18.0 R6_2.5.1 XVector_0.42.0
## [16] GenomeInfoDb_1.38.0 knitr_1.44 bookdown_0.36
## [19] GenomeInfoDbData_1.2.11 DBI_1.1.3 bslib_0.5.1
## [22] rlang_1.1.1 KEGGREST_1.42.0 cachem_1.0.8
## [25] xfun_0.40 sass_0.4.7 bit64_4.0.5
## [28] RSQLite_2.3.1 memoise_2.0.1 cli_3.6.1
## [31] zlibbioc_1.48.0 digest_0.6.33 vctrs_0.6.4
## [34] evaluate_0.22 RCurl_1.98-1.12 rmarkdown_2.25
## [37] httr_1.4.7 pkgconfig_2.0.3 tools_4.3.1
## [40] htmltools_0.5.6.1