---
title: "BiocFHIR -- infrastructure for parsing and analyzing FHIR data"
author: "Vincent J. Carey, stvjc at channing.harvard.edu"
date: "`r format(Sys.time(), '%B %d, %Y')`"
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{BiocFHIR -- infrastructure for parsing and analyzing FHIR data}
  %\VignetteEncoding{UTF-8}
output:
  BiocStyle::html_document:
    highlight: pygments
    number_sections: yes
    theme: united
    toc: yes
---

```{r setup,results="hide",echo=FALSE}
suppressPackageStartupMessages({
suppressMessages({
library(BiocFHIR)
library(DT)
library(jsonlite)
})
})
```

# Introduction

FHIR stands for Fast Health Interoperability Resources.  

The [Wikipedia article](https://en.wikipedia.org/wiki/Fast_Healthcare_Interoperability_Resources)
is a useful overview.  The official website is [fhir.org](http://fhir.org).

This R package addresses very basic tasks of parsing FHIR R4 documents in JSON format.
The overall information model of FHIR documents is complex and various
decisions are made to help extract and annotate fields presumed to have
high value.  Submit github issues if important fields are not being
propagated.

Install this package using
```{r dobioc, eval=FALSE}
BiocManager::install("BiocFHIR")
```

## The basic structure of FHIR R4 JSON

We use `jsonlite::fromJSON` to import a randomly selected
FHIR document from a collection simulated by the MITRE corporation.
See the associated [site](https://synthea.mitre.org/downloads) for details.

We'll drill down through the hierarchy of elements collected in
a FHIR document with some base R commands, after importing the JSON.
```{r lkd1}
testf = dir(system.file("json", package="BiocFHIR"), full=TRUE)
tt = fromJSON(testf)
names(tt)
tt[1:2]
tte = tt$entry
class(tte)
dim(tte)
head(names(tte))
tter = tte$resource
dim(tter)
head(names(tter))
table(tter$resourceType)
```

It is by filtering the data frame `tter` that we acquire
information that may be useful in data analysis.  The
data frame is sparse: many fields are not used in many records.
Code in this package attempts to produce useful tables
from the sparse information.

As a prologue to table extraction, we do some basic
decomposition of `tter` using `process_fhir_bundle`.

```{r dobu1}
bu1 = process_fhir_bundle(testf) # just give file path
bu1
```

`bu1` is just a list of data.frames, but with considerable
nesting of data.frames and lists within the basic
data.frames corresponding to the major FHIR concepts.
"Flattening" of such structures is not fully automatic.

## Example: a table on Conditions recorded on the patient.

We use `process_Condition` to extract information.
```{r dopro1}
cond1 = process_Condition(bu1$Condition)
datatable(cond1)
```

## A family of documents

We have collected 50 documents from the synthea resource.
These were obtained using random draws from the 1180 records
provided.  A temporary folder holding them can be produced
as follows:

```{r doextr}
tset = make_test_json_set()
tset[1]
```

We import ten documents into a list.
```{r getalli}
myl = lapply(tset[1:10], process_fhir_bundle)
myl[1:2]
sapply(myl,length)
```
We see with the last command that documents can have different numbers
of components present.

# Session information

```{r lksess}
sessionInfo()
```