Deploy an R script as an AWS Lambda function without leaving the R console
Overview and motivation
AWS Lambda is a serverless compute service that lets us deploy any code as a cloud
function without worrying about setting up a compute server (for example like discussed
here
and here).
The code can be deployed as a zip archive or a container image. In the R
case,
only the container option is available and enabled by the {lambdr}
package (see also this workflow).
The deployment using {lambdr}
procedure involves:
- writing an
R
script to serve as the runtime function - creating a Lambda Dockerfile and docker image locally
- pushing the docker image to AWS Elastic Container Registry (ECR)
- creating a Lambda function using the ECR image either with the AWS
command-line interface (
cli
) or in the web console
Going through this procedure is a rewarding experience, as it teaches many things
related to docker containers, AWS setup, using the AWS cli
, etc. But,
once you have gone through the this procedure a few times, it becomes a bit
cumbersome and time-consuming to navigate between an R
console, the shell
, and
aws cli
or the AWS web console in a browser to write and update the code,
re-create the docker image, push, and then re-create the Lambda function. It would be
nice to have a “one-call” solution, where we call a single function, point it to a
file that contains the code we wish to deploy, and sit back while our R
session
goes through the steps.
In this post, I will introduce an R
package, called {r2lambda}
, that aims to make
it easier to deploy R
code as AWS Lambda function by automating the above procedure.
{r2lambda}
Once everything is set up (see below), {r2lambda} will let you do the following:
- Write your
R
code:
library("x")
library("y")
my_fun <- function(arg1) {
# stuff happening to `arg`
}
lambdr::start_lambda()
and save to a file (e.g., ‘my_lambda.R’).
- Call
r2lambda::deploy_lambda()
to create the AWS Lambda function in one line of code:
deploy_lambda(
tag = "my-lambda",
runtime_function = "my_fun",
runtime_path = "path/to/script/my_lambda.R",
dependencies = c("x", "y")
)
Where,
tag
becomes the name of the docker image and lambda functionruntime_function
is the function we want the docker/lambda to run when invokedruntime_path
is the path to the scriptdependencies
is a character vector of dependencies to install in the docker image
This step usually takes a few minutes, because we are pushing a potentially large docker
image to the cloud.
- Test your Lambda using
r2lambda::invoke_lambda()
:
invoke_lambda(
function_name = "my-lambda",
payload = list(arg1 = 1),
invocation_type = "RequestResponse"
)
Where,
function_name
is the same as thetag
argument ofdeploy_lambda
payload
is a named list of arguments that the runtime functionmy_fun
takesinvocation_type
is the type of invocation (can also beDryRun
andEvent
)
The named list in the payload is converted to json
internally before sending the request.
That’s it! That is all you need to do to deploy a basic Lambda function from
your R
script using {r2lambda}
. I emphasized ‘basic’, because we don’t yet have
ways to customize the lambda environment (API gateway, events, granting access to
other services, etc). But some of this functionality is already planned and hopefully
coming soon.
Setup
With the main usage out of the way, lets go over some of the implementation details, environment setup, and installation.
System dependencies
The only system dependency is docker
, because R
lambdas are docker
images,
so we are not going anywhere unless docker is installed.
R
dependencies
The core R
dependencies are {lambdr}
,
{stevedore}
, and {paws}
.
{lambdr}
provides the R
runtime for AWS Lambda. In practice, the most important
points about using {lambdr}
are 1) to setup the Dockerfile:
- installation of system dependencies
- installation of R dependencies of the runtime function
- setting the runtime function to be run by the container via
CMD
,
and 2) to setup the R
script by adding lambdr::start_lambda()
at the bottom.
{stevedore}
is a docker
client for R
. It provides an interface to the docker
API.
In the context of deploiyng Lambda functions, it is used to list and tag local images,
and to login and push images to the AWS ECR repository. Using {stevedore}
simplifies
how {r2lambda}
works in two ways. First, we don’t need to use system calls to run
{docker}
commands, and second, we don’t depend on the aws cli
.
{paws}
is the R
software development kit for AWS. {r2lambda}
uses {paws}
to connect to the AWS services using your credentials, to create execution roles
for the Lambda, to create the Lambda it self, and to invoke the Lambda function.
The remaining R
dependencies provide features for input validation and testing
({checkmate}
), logging ({logger}
), and text interpolation ({glue}
).
Environmental variables for AWS configuration
Our AWS credentials are needed so that functions that use the paws
SDK can
authenticate with AWS
. This is a simple .Renviron
:
ACCESS_KEY_ID = "YOUR AWS ACCESS KEY ID"
SECRET_ACCESS_KEY = "YOUR AWS SECRET ACCESS KEY"
PROFILE = "YOUR AWS PROFILE"
REGION = "YOUR AWS REGION"
Installation
Once the prerequisites are ready, we can install {r2lambda}
from GitHub
using {remotes}
:
# install_packages("remotes")
remotes::install_github("discindo", "r2lambda")
Demo run with logs
A full deployment and invocation run with a demo runtime script should look like the code and output below.
> runtime_function <- "parity"
> runtime_path <- system.file("parity.R", package = "r2lambda")
> dependencies <- NULL
>
> deploy_lambda(
+ tag = "parity-test1",
+ runtime_function = runtime_function,
+ runtime_path = runtime_path,
+ dependencies = dependencies
+ )
INFO [2023-01-29 20:32:41] [deploy_lambda] Checking system dependencies (`aws cli`, `docker`).
/usr/bin/docker
INFO [2023-01-29 20:32:41] [deploy_lambda] Creating temporary working directory.
INFO [2023-01-29 20:32:41] [deploy_lambda] Creating Dockerfile.
WARN [2023-01-29 20:32:41] [deploy_lambda] Created Dockerfile and lambda runtime script in temporary folder.
INFO [2023-01-29 20:32:41] [deploy_lambda] Building Docker image.
Sending build context to Docker daemon 3.584kB
Step 1/13 : FROM public.ecr.aws/lambda/provided
---> ccae8d728af2
Step 2/13 : ENV R_VERSION=4.0.3
---> Using cache
---> bf3dd3c804f3
Step 3/13 : RUN yum -y install wget git tar
---> Using cache
---> 8b82b80771cf
Step 4/13 : RUN yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm && wget https://cdn.rstudio.com/r/centos-7/pkgs/R-${R_VERSION}-1-1.x86_64.rpm && yum -y install R-${R_VERSION}-1-1.x86_64.rpm && rm R-${R_VERSION}-1-1.x86_64.rpm
---> Using cache
---> c98bc560eff4
Step 5/13 : ENV PATH="${PATH}:/opt/R/${R_VERSION}/bin/"
---> Using cache
---> 4565b8100c39
Step 6/13 : RUN yum -y install openssl-devel
---> Using cache
---> d8e46fe52a6d
Step 7/13 : RUN Rscript -e "install.packages(c('httr', 'jsonlite', 'logger', 'remotes'), repos = 'https://packagemanager.rstudio.com/all/__linux__/centos7/latest')"
---> Using cache
---> 46ca4b6e95a0
Step 8/13 : RUN Rscript -e "remotes::install_github('mdneuzerling/lambdr')"
---> Using cache
---> 67283a940985
Step 9/13 : RUN mkdir /lambda
---> Using cache
---> d6762390f9a9
Step 10/13 : COPY runtime.R /lambda
---> Using cache
---> 94af1e345ecc
Step 11/13 : RUN chmod 755 -R /lambda
---> Using cache
---> cd15870ad843
Step 12/13 : RUN printf '#!/bin/sh\ncd /lambda\nRscript runtime.R' > /var/runtime/bootstrap && chmod +x /var/runtime/bootstrap
---> Using cache
---> 66d74d4de62e
Step 13/13 : CMD ["parity"]
---> Using cache
---> e47d4fea17a1
Successfully built e47d4fea17a1
Successfully tagged parity-test1:latest
WARN [2023-01-29 20:32:41] [deploy_lambda] Docker image built. This can take up substantial amount of disk space.
WARN [2023-01-29 20:32:41] [deploy_lambda] Use `docker image ls` in your shell to see the image size.
WARN [2023-01-29 20:32:41] [deploy_lambda] Use `docker rmi <image>` in your shell to remove an image.
INFO [2023-01-29 20:32:41] [deploy_lambda] Pushing Docker image to AWS ECR.
... [truncated]
Login Succeeded
The push refers to repository [*.dkr.ecr.us-east-1.amazonaws.com/parity-test1]
... [truncated]
latest: digest: sha256:9f38150cf89bf6a3f7d95c853105afe82616f85f3afbb65d7f71d2f1400dedeb size: 3045
WARN [2023-01-29 20:45:25] [deploy_lambda] Docker image pushed to ECR. This can take up substantial resources and incur cost.
WARN [2023-01-29 20:45:25] [deploy_lambda] Use `paws::ecr()`, the AWS CLI, or the AWS console to manage your images.
INFO [2023-01-29 20:45:25] [deploy_lambda] Creating Lambda role and basic policy.
WARN [2023-01-29 20:45:26] [deploy_lambda] Created AWS role with basic lambda execution permissions.
WARN [2023-01-29 20:45:26] [deploy_lambda] Use `paws::iam()`, the AWS CLI, or the AWS console to manage your roles, and permissions.
INFO [2023-01-29 20:45:36] [deploy_lambda] Creating Lambda function from image.
WARN [2023-01-29 20:45:37] [deploy_lambda] Lambda function created. This can take up substantial resources and incur cost.
WARN [2023-01-29 20:45:37] [deploy_lambda] Use `paws::lambda()`, the AWS CLI, or the AWS console to manage your functions.
WARN [2023-01-29 20:45:37] [deploy_lambda] Lambda function created successfully.
WARN [2023-01-29 20:45:37] [deploy_lambda] Pushed docker image to ECR with URI `*.dkr.ecr.us-east-1.amazonaws.com/parity-test1`
WARN [2023-01-29 20:45:37] [deploy_lambda] Created Lambda execution role with ARN `arn:aws:iam::*:role/parity-test1--261b7f62-a048-11ed-bd89-10c37b6dce99`
WARN [2023-01-29 20:45:37] [deploy_lambda] Created Lambda function `parity-test1` with ARN `arn:aws:lambda:us-east-1:*:function:parity-test1`
SUCCESS [2023-01-29 20:45:37] [deploy_lambda] Done.
>
> invoke_lambda(
+ function_name = "parity-test1",
+ payload = list(number = 3),
+ invocation_type = "RequestResponse"
+ )
INFO [2023-01-29 20:45:37] [invoke_lambda] Validating inputs.
INFO [2023-01-29 20:45:37] [invoke_lambda] Invoking function.
Error: ResourceConflictException (HTTP 409). The operation cannot be performed at this time. The function is currently in the following state: Pending
> invoke_lambda(
+ function_name = "parity-test1",
+ payload = list(number = 3),
+ invocation_type = "RequestResponse"
+ )
INFO [2023-01-29 21:32:53] [invoke_lambda] Validating inputs.
INFO [2023-01-29 21:32:53] [invoke_lambda] Invoking function.
Lambda response payload:
{"parity":"odd"}
SUCCESS [2023-01-29 21:33:02] [invoke_lambda] Done.
The code is purposely verbose to let
the user know what actions are being taken and what resources are being set up on
AWS. Of course, setting up and running services on AWS incurs costs, so it is
always a good idea to review your AWS console and disable or remove services that
are no longer needed. Actions like these can also be done with the paws
SDK, and
I hope that a future iteration of {r2lamdba}
might make some AWS clean up possible
from the R
console. For now, deploy_lambda
will specify the URI or ARN of each
service it creates both as a console log, and in the returned list. So the user
can easily find the created services and disable/remove/update as needed. As of now,
{r2lambda}
interacts only with IAM, ECR, and Lambda, so be sure at least to log
into AWS, and review the actions taken.
TODO list
{r2lambda}
is a not more than a week old, work-in-progress package. It fulfills a relatively simple
version of the initial goal –one-function deploy of an R
script with CRAN dependencies–
but it is not quite ready for wider usage. I think several important features would
add much more value to the user, including:
managing dependencies from different repositories. As of now, only CRAN packages are supported. The
deps
argument is passed onto a simple function that adds aninstall.packages
call to the Dockerfile, so any package hosted in a git repository or on Bioconductor, would not be able to be installed.detecting dependencies via
{attachment}
or{dockerfiler}
and populating the Dockerfile with the correct installation calls.adding functions that would allow removing of services from AWS. For example, if creating a Lambda function is a one function call, it would be quite useful if cleaning up after an erroneous deployment is a one function call as well. This would include removing the Lambda function, any IAM roles and attached policies to the Lambda, as well as the ECR image associated with the function.
setting up a way to test the Lambda function locally before deploying to AWS. This is described here and is something I do regularly to avoid premature pushing to ECR, as this is the most time-consuming step. But I am yet to add this routine to the package.
making it easier to customize the Lambda environment on deploy. To support at least basic configurations like memory and timeout limits but also adding additional policies (e.g., permissions to put objects in AWS S3 or interact with a database service like RDS or DynamoDB).
Summary
Please give {r2lambda}
a try and share your feedback by commenting in the
repository Discussions. Would
love to hear about your experience, any problems you might have encountered,
any similar tools you might know about, and of courses, improvement ideas.