library(r2lambda)
library(jsonlite)
library(magrittr)Use {r2lambda} to download Tidytuesday dataset
In this exercise, we’ll create an AWS Lambda function that downloads the tidytuesday data set for the most recent Tuesday (or most recent Tuesday from a date of interest).
Required packages
Runtime function
The first step is to write the runtime function. This is the function that will be executed when we invoke the Lambda function after it has been deployed. To download the Tidytuesday data set, we will use the {tidytuesdayR} package. In the runtime script, we define a function called tidytyesday_lambda that takes one optional argument date. If date is omitted, the function returns the data set(s) for the most recent Tuesday, otherwise, it looks up the most recent Tuesday from a date of interest and returns the corresponding data set(s).
library(tidytuesdayR)
tidytuesday_lambda <- function(date = NULL) {
if (is.null(date))
date <- Sys.Date()
most_recent_tuesday <- tidytuesdayR::last_tuesday(date = date)
tt_data <- tidytuesdayR::tt_load(x = most_recent_tuesday)
data_names <- names(tt_data)
data_list <- lapply(data_names, function(x) tt_data[[x]])
return(data_list)
}
tidytuesday_lambda("2022-02-02")R script to build the lambda
To build the lambda image, we need an R script that sources any required code, loads any needed libraries, defines a runtime function, and ends with a call to lambdr::start_lambda(). The runtime function does not have to be defined in this file. We could, for example, source another script, or load a package and set a loaded function as the runtime function in the subsequent call to r2lambda::build_lambda (see below). We save this script to a file and record the path:
r_code <- "
library(tidytuesdayR)
tidytuesday_lambda <- function(date = NULL) {
if (is.null(date))
date <- Sys.Date()
most_recent_tuesday <- tidytuesdayR::last_tuesday(date = date)
tt_data <- tidytuesdayR::tt_load(x = most_recent_tuesday)
data_names <- names(tt_data)
data_list <- lapply(data_names, function(x) tt_data[[x]])
return(data_list)
}
lambdr::start_lambda()
"
tmpfile <- tempfile(pattern = "ttlambda_", fileext = ".R")
write(x = r_code, file = tmpfile)Build, test, and deploy the lambda function
1. Build
We set the
runtime_functionargument to the name of the function we wish thedockercontainer to run when invoked. In this case, this istidytuesday_lambda. This adds aCMDinstruction to theDockerfileWe set the
runtime_pathargument to the path we stored the script defining our runtime function.We set the
dependenciesargument toc("tidytuesdayR")because we need to have thetidytuesdayRpackage installed within thedockercontainer if we are to download the dataset. This steps adds aRUNinstruction to theDockerfilethat callsinstall.packagesto install{tidytuesdayR}from CRAN.Finally, the
tagargument sets the name of our Lambda function which we’ll use later to test and invoke the function. Thetagargument also becomes the name of the folder that{r2lambda}will create to build the image. This folder will have two files,Dockerfileandruntime.R.runtime.Ris our script fromruntime_path, renamed before it is copied in thedockerimage with aCOPYinstruction.
runtime_function <- "tidytuesday_lambda"
runtime_path <- tmpfile
dependencies <- "tidytuesdayR"
r2lambda::build_lambda(
tag = "tidytuesday3",
runtime_function = runtime_function,
runtime_path = runtime_path,
dependencies = dependencies
)2. Test
To make sure our Lambda docker container works as intended, we start it locally, and invoke it to test the response. The response is a list of three elements:
response <- r2lambda::test_lambda(tag = "tidytuesday3", payload = list(date = Sys.Date()))status, should be 0 if the test worked,stdout, the standard output stream of the invocation, andstderr, the standard error stream of the invocation
stdout and stderr are raw vectors that we need to parse, for example:
rawToChar(response$stdout) If the stdout slot of the response returns the correct output of our function, we are good to deploy to AWS.
3. Deploy
The deployment step is simple, in that all we need to do is specify the name (tag) of the Lambda function we wish to push to AWS ECR. The deploy_lambda function also accepts ..., which are named arguments ultimately passed onto paws.compute:::lambda_create_function. This is the function that calls the Lambda API. To see all available arguments run ?paws.compute:::lambda_create_function.
The most important arguments are probably Timeout and MemorySize, which set the time our function will be allowed to run and the amount of memory it will have available. In many cases it will make sense to increase the defaults of 3 seconds and 128 mb.
r2lambda::deploy_lambda(tag = "tidytuesday3", Timeout = 30)4. Invoke
If all goes well, our function should now be available on the cloud awaiting requests. We can invoke it from R using invoke_lambda. The arguments are:
function_name– the name of the functioninvocation_type– typicallyRequestResponseinclude_log– whether to print the logs of the run on the consolepayload– a named list with arguments sent to theruntime_function. In this case, the runtime function,tidytuesday_lambdahas a single argumentdate, so the corresponding list islist(date = Sys.Date()). As our function can be called without any argument, we can also send an empty list as the payload.
response <- r2lambda::invoke_lambda(
function_name = "tidytuesday3",
invocation_type = "RequestResponse",
payload = list(),
include_logs = TRUE
)Just like in the local test, the response payload comes as a raw vector that needs to be parsed into a data.frame:
tidytuesday_dataset <- response$Payload %>%
rawToChar() %>%
jsonlite::fromJSON(simplifyDataFrame = TRUE)
tidytuesday_dataset[[1]][1:5, 1:5]Summary
In this post, we went over some details about:
- how to prepare an
Rscript before deploying it as a Lambda function,
- what are the roles of several of the key arguments,
- how to request longer timeout or more memory for a Lambda function, and
- how to parse the response payload returned by the Lambda function
Stay tuned for a follow-up post where we set this Lambda function to run on a weekly schedule!