--- title: "Transforming FHIR documents to tables with BiocFHIR" author: "Vincent J. Carey, stvjc at channing.harvard.edu" date: "`r format(Sys.time(), '%B %d, %Y')`" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Transforming FHIR documents to tables with BiocFHIR} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document: highlight: pygments number_sections: yes theme: united toc: yes --- ```{r setup,results="hide",echo=FALSE} suppressPackageStartupMessages({ suppressMessages({ library(BiocFHIR) library(DT) library(jsonlite) }) }) ``` # Introduction The purpose of this vignette is to provide details on how FHIR documents are transformed to tables in BiocFHIR. This text uses R commands that will work for an R (version 4.2 or greater) in which BiocFHIR (version 0.0.14 or greater) has been installed. The source codes are always available at [github](https://github.com/vjcitn/BiocFHIR) and may be available for installation by other means. # Examining sample data, again In the "Upper level FHIR concepts" vignette, we used the following code to get a peek at the information structure in a single document representing a Bundle associated with a patient. ```{r takepeek} tfile = dir(system.file("json", package="BiocFHIR"), full=TRUE) peek = jsonlite::fromJSON(tfile) names(peek) peek$resourceType names(peek$entry) length(names(peek$entry$resource)) class(peek$entry$resource) dim(peek$entry$resource) head(names(peek$entry$resource)) ``` We perform a first stage of transformation with `process_fhir_bundle`: ```{r txpeek} bu = process_fhir_bundle(tfile) bu ``` # Bundle to data frames Each processed bundle is a collection of data.frame instances, formed by splitting the input "entry" element by "resourceType". These data.frames are mostly filled with NA missing values, but some columns have been ingested as lists. Executive decisions are made in the package regarding which columns are likely to hold useful information. Thus we have ```{r lkpo} po1 <- process_Observation(bu$Observation) dim(po1) datatable(po1) ``` # Filtering FHIR elements A list of vectors of field names serves as the basis for filtering JSON elements into records for tabulation. ```{r lksch} FHIR_retention_schemas() ``` Because each observation on Blood Pressure includes a "component" element with two elements (for systolic and diastolic blood pressure readings), special code is required to map the metadata for the Blood Pressure observations to the specific values for each component. # The resources extracted from a bundle The `process_*` functions in BiocFHIR address various resource types. As of version 0.0.15 we have ```{r listt} ls("package:BiocFHIR") |> grep(x=_, "process_[A-Z]", value=TRUE) ``` There is no guarantee that any given bundle with have resources among all these types. # Accumulating resources across bundles Bundles are not guaranteed to have any specific resources. To assemble all information on conditions recorded in the Synthea sample, we must program defensively. We obtain the indices of bundles possessing a "Condition" resource, and then combine the resulting tables, which are designed to have a common set of columns. ```{r lkconds} data("allin", package="BiocFHIR") hascond = sapply(allin, function(x)length(x$Condition)>0) oo = do.call(rbind, lapply(allin[hascond], function(x)process_Condition(x$Condition))) dim(oo) length(unique(oo$subject.reference)) ``` The most commonly reported conditions in the sample are: ```{r mostc} table(oo$code.coding.display) |> sort() |> tail() ``` # Session information ```{r lksess} sessionInfo() ```