#' start plumber api
#' @param host the host address
#' @param port the port to use
#' @importFrom plumber pr pr_run pr_get
#' @export
<- function(port = 5000, host = "127.0.0.1") {
run_api <- system.file("plumber.r", package = "plumb.ex")
path |>
path pr() |>
pr_run(port = port, host = host)
}
Over the past couple of weeks, I’ve been sharing some code that I use for dynamic filtering and aggregation of data frames in R
. The idea of these functions was to have single-step methods for complex filter or aggregate queries. Each of the functions worked with a list of instructions, specifically formatted for the task. I adopted this approach to make it easier to use these functions in web applications or APIs. If the filter or aggregate query request can be sent as a list, then it can be easily converted to JSON and used in API calls.
In this post, I am going to demo how we can integrate the functions from the previous two posts into a {plumber}
API and, after deployment, have /filter
and /aggregate
endpoints available as a “service”.
Plumber API for filter and aggregate
The {plumber}
API is organized in the form of an R
package.
├── DESCRIPTION
├── inst
│ └── plumber.r
├── man
│ ├── dot-aggregate.Rd
│ ├── dot-filter.Rd
│ └── run_api.Rd
├── NAMESPACE
├── plumb.ex.Rproj
├── R
│ ├── api.r
│ └── funs.r
└── test.r
The R
directory contains the script funs.r
where I define the .filter
and .aggregate
functions I covered in the previous two posts. The .api.r
script contains a single function that starts the API
on our preferred host and port (defaults to localhost and 5000).
This is similar to {golem}
’s run_app()
function and it streamlines the development cycle. Whenever I make updates to the underlying code in funs.r
or the {plumber}
endpoints in inst/plumber.r
, I can run:
::document() # re-documents and loads all functions
devtoolsrun_api()
and that would start the API making it accessible at http://127.0.0.1:5000
, with the {swagger}
interface for manual testing at http://127.0.0.1:5000/__docs__/
. It is then ready for queries.
Endpoints
The package exports the .filter
and .aggregate
functions. Then the API uses these when defining the endpoints. In this example API we have three endpoints defined in inst/plumber.r
:
# plumber.R.
#* @plumber
#* @apiTitle Plumber filter or aggregate
#* @apiDescription Plumber example for dynamic filtering and aggregation
#* @apiVersion 1.0.0
function(pr) {
|>
pr ::pr_set_serializer(plumber::serializer_unboxed_json())
plumber
}
#* Hello
#* @get /hello
function() {
"Hello, there!"
}
#* Filter
#* @param data the name of the data frame to filter, `"iris"` or `"mtcars"`
#* @param instructions_list a list of instructions for aggregation.
#* @post /filter
function(data, instructions_list) {
<- get(data)
d <- jsonlite::fromJSON(instructions_list, simplifyVector = FALSE)
l .filter(d, l)
}
#* Aggregate
#* @param data the name of the data frame to aggregate, `"iris"` or `"mtcars"`
#* @param instructions_list a list of instructions for aggregation.
#* @post /aggregate
function(data, instructions_list) {
<- get(data)
d <- jsonlite::fromJSON(instructions_list, simplifyVector = FALSE)
l .aggregate(d, l)
}
The first is /hello
and is simply a health check endpoint. Using httr2
we can make requests to it as follows:
$> library(jsonlite)
rlibrary(httr2)
request("http://127.0.0.1:5000/hello") |>
req_perform() |>
resp_body_json()
1] "Hello, there!" [
The other two endpoints are simple wrappers around .filter
and .aggregate
to do two steps:
get the dataset requested by the user. For simplicity in this example, the dataset argument is passed on as a string, and then we user
get
to get that dataset, assuming its one of the ones prepackaged withR
. But its easy to modify that and simply send JSON data to the endpoint.convert the JSON received by the API into a list as required in
R
. This is needed because the instructions list for filtering or aggregation is sent with the request as JSON, and the validation within our functions expects a list. One could relax these requirement as well by allowing a JSON string in theinstructions_list
argument and parsing this internally.
Interactive demo
Start the API
During development, we would run:
::document() # re-documents and loads all functions
devtoolsrun_api()
If the plumb.ex
package is installed, we would instead run:
library(plumb.ex) # re-documents and loads all functions
::run_api(port = 6000) plumb.ex
If its deployed, for one method see this post, we follow the code below to make requests.
Filter
From R
and {shiny}
we can call the /filter
endpoint using the following steps. First, we create our filter instructions, convert that to JSON, and finally use httr
or httr2
to make a request to the API:
library(jsonlite)
library(httr2)
<- list(
filter_data list(col = "Sepal.Length", fun = "between", min = 4.9, max = 5),
list(col = "Species", fun = "in", val = c("setosa", "versicolor"))
)
<- jsonlite::toJSON(filter_data, auto_unbox = TRUE)
filter_json ::prettify(filter_json) jsonlite
[
{
"col": "Sepal.Length",
"fun": "between",
"min": 4.9,
"max": 5
},
{
"col": "Species",
"fun": "in",
"val": [
"setosa",
"versicolor"
]
}
]
request("http://127.0.0.1:6000") |>
req_url_path("filter") |>
req_body_json(
data = list(data = "iris", instructions_list = filter_json),
auto_unbox = TRUE
|>
) req_perform() |>
resp_body_json(simplifyVector = TRUE)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.9 3.0 1.4 0.2 setosa
2 5.0 3.6 1.4 0.2 setosa
3 5.0 3.4 1.5 0.2 setosa
4 4.9 3.1 1.5 0.1 setosa
5 5.0 3.0 1.6 0.2 setosa
6 5.0 3.4 1.6 0.4 setosa
7 4.9 3.1 1.5 0.2 setosa
8 5.0 3.2 1.2 0.2 setosa
9 4.9 3.6 1.4 0.1 setosa
10 5.0 3.5 1.3 0.3 setosa
11 5.0 3.5 1.6 0.6 setosa
12 5.0 3.3 1.4 0.2 setosa
13 4.9 2.4 3.3 1.0 versicolor
14 5.0 2.0 3.5 1.0 versicolor
15 5.0 2.3 3.3 1.0 versicolor
Aggregate
Similarly, for the /aggregate
endpoint:
library(jsonlite)
library(httr2)
<- list(
aggregate_instructions groups = list(col = "Species"),
aggregates = list(
list(col = "Sepal.Length", fun = "mean"),
list(col = "Sepal.Width", fun = "mean")
)
)
<- jsonlite::toJSON(aggregate_instructions, auto_unbox = TRUE)
aggregate_json ::prettify(aggregate_json) jsonlite
{
"groups": {
"col": "Species"
},
"aggregates": [
{
"col": "Sepal.Length",
"fun": "mean"
},
{
"col": "Sepal.Width",
"fun": "mean"
}
]
}
request("http://127.0.0.1:5000") |>
req_url_path("aggregate") |>
req_body_json(
data = list(data = "iris", instructions_list = aggregate_json),
auto_unbox = TRUE
|>
) req_perform() |>
resp_body_json(simplifyVector = TRUE)
Species Sepal.Length_mean Sepal.Width_mean
1 setosa 5.01 3.43
2 versicolor 5.94 2.77
3 virginica 6.59 2.97
How is this useful?
I struggle with this sometimes my self. Isn’t this why we have databases to begin with? Of course, having an API that queries a database would be preferable in most applications requiring large amounts of data. However, sometimes we don’t have the expertise, bandwidth, or budget for a database setup, or we could be working with local files, in arrow
, parquet
, qs
or some other format where it might make sense or is justifiable to run the queries in R
it self.
Summary
In this post, we demonstrated how to create a {plumber}
API that provides endpoints for dynamic filtering and aggregation of datasets in R
. We structured the API as an R
package, defined the necessary functions, and set up the endpoints to handle JSON requests. We also provided examples of how to interact with the API using httr2
from R
. This approach can be useful for scenarios where a database setup is not feasible, and we need to work with data files directly in R
.