---
title: "Handling FHIR documents with BiocFHIR"
author: "Vincent J. Carey, stvjc at channing.harvard.edu"
date: "`r format(Sys.time(), '%B %d, %Y')`"
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{Handling FHIR documents with BiocFHIR}
  %\VignetteEncoding{UTF-8}
output:
  BiocStyle::html_document:
    highlight: pygments
    number_sections: yes
    theme: united
    toc: yes
---

```{r setup,results="hide",echo=FALSE}
suppressPackageStartupMessages({
suppressMessages({
library(BiocFHIR)
library(DT)
library(jsonlite)
library(rjsoncons)
})
})
```

# Introduction

The purpose of this vignette is to provide
details on how FHIR documents are
handled in BiocFHIR.

This text uses R commands that will work for an R (version 4.2 or greater) in which
BiocFHIR (version 0.0.14 or greater) has been installed.  The source codes are
always available at [github](https://github.com/vjcitn/BiocFHIR) and may
be available for installation by other means.

We conclude this vignette with a very brief example of the
use of consjson to interrogate the FHIR JSON documents directly.

# Examining sample data, again

In the "Upper level FHIR concepts" vignette, we used the
following code to get a peek at the information
structure in a single document representing a Bundle
associated with a patient.

```{r takepeek}
tfile = dir(system.file("json", package="BiocFHIR"), full=TRUE)
peek = jsonlite::fromJSON(tfile)
names(peek)
peek$resourceType
names(peek$entry)
length(names(peek$entry$resource))
class(peek$entry$resource)
dim(peek$entry$resource)
head(names(peek$entry$resource))
```

# Choosing an approach to FHIR JSON ingestion

Some of the complexity of working with FHIR JSON in R in this way
can be seen in the following:

```{r lkcat}
head(which(vapply(peek$entry$resource$category, 
   function(x)!is.null(x), logical(1))))
peek$entry$resource$category[[6]]
peek$entry$resource$category[[6]]$coding
peek$entry$resource$category[[10]]
```

Elements of category can be data.frame or atomic.
This is a consequence of naive use of `jsonlite::fromJSON`.

When we reduce the transformations attempted by `fromJSON`,
empty fields are not propagated.

```{r lkpeek2}
peek2 = jsonlite::fromJSON(dir(system.file("json", package="BiocFHIR"), full=TRUE), simplifyVector=FALSE)
lapply(peek2$entry[1:5], function(x) names(x[["resource"]]))
```

Because the JSON ingestion does not attempt to simplify
table-like content, we have a list of lists with varying
depths of nesting.

We can tabulate the resource types using
```{r lktab}
rtyvec = vapply(peek2$entry, function(x) 
   x[["resource"]]$resourceType, character(1))
table(rtyvec)
```

# Working with a specific type

## List-based operations

Let's use `peek2` to extract
Conditions recorded on the patient.

```{r getcond}
iscond = which(rtyvec == "Condition")
conds = peek2$entry[iscond]
length(conds)
str(conds[[1]])
```

Digging out the data and metadata on the first condition 
recorded (Perennial allergic rhinitis), is somewhat
complex using R.  Direct operations on JSON with JMESPATH
might be more effective, but we postpone this investigation.

## Processing with BiocFHIR

In the `process_fhir_bundle` function of BiocFHIR
we allow `jsonlite::fromJSON` to conduct some
simplification of list structures amenable to representation
as tables (data.frames).
```{r chkb}
tbu = process_fhir_bundle(tfile)
tbu
```

For the reports of Conditions, we extract specific
fields that are commonly used in the Synthea examples.
Other bundle sets may use different fields.

```{r lktab2}
ctab = process_Condition(tbu$Condition)
dim(ctab)
datatable(ctab)
```

The fields collected in `process_Condition` are specified
in `FHIR_retention_schemas()`.  Eventually this will
need to become a user-specified element of ingestion and
transformation.

# Direct querying of FHIR JSON

We've shown how we can operate on FHIR documents from the Synthea
project using specific schemas to select elements from lists
produced by parsing JSON.  The FHIR specification is very
flexible, and the `process_*` methods defined here may
not work for FHIR documents from other sources.

The jsoncons library provides C++ code for parsing and
filtering JSON, and the [rjsoncons](https://CRAN.R-project.org/package=rjsoncons)
package is available to support JMESPATH queries.

In this example, we'll take 4 Synthea FHIR documents and
extract patient addresses to a data.frame.

```{r dojme}
z = make_test_json_set()
myl = lapply(z[1:4], jsonlite::fromJSON) # list that rconsjson will convert to JSON
library(rjsoncons)
tmp = jmespath(myl, "[*].entry[0].resource.address") |> jsonlite::fromJSON()
do.call(rbind,lapply(tmp, function(x) x[,-(1:2)]))
```

The JMESPATH query projects from all documents via the initial `[*]`.
It then retrieves the address element from the resource element of
the first ([0]) entry.  An overview of the hierarchical structure of
`myl` can be obtained using `listviewer::jsonedit`.

# Session information

```{r lksess}
sessionInfo()
```