--- title: "SQLDataFrame: Internal Implementation" author: - name: Qian Liu affiliation: Roswell Park Comprehensive Cancer Center, Buffalo, NY - name: Martin Morgan affiliation: Roswell Park Comprehensive Cancer Center, Buffalo, NY date: "last compiled: `r Sys.Date()`" output: BiocStyle::html_document: toc: true toc_float: true package: SQLDataFrame vignette: > %\VignetteIndexEntry{SQLDataFrame Internal Implementation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` date: "last edit: 9/30/2019" [dbplyr]: https://cran.r-project.org/web/packages/dbplyr/index.html Load packages into R session. It will automatically load the package of `dplyr` and `dbplyr`. ```{r Load, message=FALSE, eval = TRUE} library(SQLDataFrame) library(DBI) ``` # SQLDataFrame slots ```{r constructor} dbfile <- system.file("extdata/test.db", package = "SQLDataFrame") conn <- DBI::dbConnect(dbDriver("SQLite"), dbname = dbfile) obj <- SQLDataFrame( conn = conn, dbtable = "state", dbkey = "state") obj ``` ## Slot accessors To make the `SQLDataFrame` object as light and compact as possible, there are only 5 slots contained in the object: `tblData`, `dbkey`, `dbnrows`, `dbconcatKey`, `indexes`. Metadata information could be returned through these 5 slots using slot accessors or other utility functions. ```{r} slotNames(obj) dbtable(obj) dbkey(obj) ``` ## `tblData` slot The `tblData` slot saves the `dbplyr::tbl_dbi` version of the database table, which is a light-weight representation of the database table in _R_. Of note is that this lazy tbl only contains unique rows. It could also be sorted by the `dbkey(obj)` if the `SQLDataFrame` object was generated from `union` or `rbind`. So when the `saveSQLDataFrame()` function was called, a database table will be written into a physical disk space and have the unique records. Accessor function is made avaible for this slot: ```{r} tblData(obj) ``` ## `dbnrows` and `dbconcatKey` The `dbnrows` slot saves the number of rows corresponding to the `tblData`, and `dbconcatKey` saves the realized (concatenated if multiple) key columns corresponding to the `tblData`. Accessor functions are also available for these 2 slots: ```{r} dbnrows(obj) dbconcatKey(obj) ``` ## `indexes` slot The `indexes` slots is an unnamed list saving the row and column indexes respectively corresponding to the `tblData` slot, so that the `SQLDataFrame` could possibly have duplicate rows or only a subset of data records from the `tblData`, while the `tblData` slot doesn't need to be changed. To be consistent, the slots of `dbnrows` and `dbconcatKey` will also remain unchanged. ```{r} obj@indexes obj_sub <- obj[sample(5, 3, replace = TRUE), 2:3] obj_sub obj_sub@indexes identical(tblData(obj), tblData(obj_sub)) ``` With a `filter` or `select` function (which is similar to `[i, ]` subsetting), only the `indexes` slot will be updated for the row or column index pointing to the `tblData`. ```{r} obj_filter <- obj %>% filter(division == "South Atlantic" & size == "medium") obj_filter@indexes identical(tblData(obj), tblData(obj_filter)) obj_select <- obj %>% select(division, size) obj_select@indexes identical(tblData(obj), tblData(obj_select)) ``` # SQLDataFrame methods ## ROWNAMES The `ROWNAMES,SQLDataFrame` method was defined to return the (concatenated if multiple) key column(s) value, so that the row subsetting with character vector works for the `SQLDataFrame` objects. ```{r} rnms <- ROWNAMES(obj) obj[sample(rnms, 3), ] ``` For `SQLDataFrame` object with composite keys: ```{r} obj1 <- SQLDataFrame(conn = conn, dbtable = "state", dbkey = c("region", "population")) ROWNAMES(obj1[1:10,]) obj1[c("South:3615.0", "West:365.0"), ] ``` # SessionInfo() ```{r} sessionInfo() ```