<- function() {
current_time print(paste("CURRENT TIME: ", Sys.time()))
}
A common use of the AWS Lambda service is to set a function to run on a recurring schedule, e.g. to collect logs, move data, or perform some ETL process. In this post, we’ll see how we can set up an AWS Lambda function, running R
, on a schedule.
A lambda runtime function
We start with a simple function that does not require any input and does not return anything. If this example lambda is to run on a schedule, we don’t want to worry about any input arguments. Also, we want this lambda function to simply have a side effect, like printing something to the logs, without returning any data or writing to a database. This will help us greatly with the setup, in that we’ll be able to deploy and schedule the lambda with minimal involvement from other AWS services.
With this in mind, we have the following function that simply prints the system time. Printing the current time makes sense because we can easily check that the lambda runs on the correct schedule from the logs.
Build, test, and deploy
Then, we follow the procedure described in Tidy Tuesday dataset Lambda post. We write this to a file that we’ll use to build the lambda docker
image:
<- "
r_code current_time <- function() {
print(paste('CURRENT TIME:', Sys.time()))
}
lambdr::start_lambda()
"
<- tempfile(pattern = "current_time_lambda_", fileext = ".R")
tmpfile write(x = r_code, file = tmpfile)
And then build the docker
image. Note that we don’t have any dependencies other than base R
.
::build_lambda(
r2lambdatag = "current_time",
runtime_function = "current_time",
runtime_path = tmpfile,
dependencies = NULL
)
We test the lambda docker container locally, because it makes sense. The console output should include the log messages and the standard output string showing the current time.
::test_lambda(tag = "current_time", payload = list()) r2lambda
Then, we deploy the lambda to AWS, leaving the lambda environment to its defaults, as 3 seconds should be enough to get and print the current time.
::deploy_lambda(tag = "current_time") r2lambda
Finally, to make sure everything went well, we invoke the cloud instance of our function. Be sure to include the logs, as this particular function does not return anything.
::invoke_lambda(
r2lambdafunction_name = "current_time",
invocation_type = "RequestResponse",
payload = list(),
include_logs = TRUE
)
Schedule to run every minute
To make a lambda function run on a recurring schedule, we need to update an already deployed function. This involves three steps and two AWS services, Lambda for serverless computing and EventBridge for serverless event routing:
- creating a schedule event role (EventBridge,
paws::eventbridge
) - adding permissions to this role to invoke lambda functions (Lambda,
paws::lambda
) - adding our target lambda function to event (EventBridge,
paws::eventbridge
)
Detailed instructions are available in the AWS documentation. The function schedule_lambda
abstracts these three steps in one go. To set a Lambda on a schedule, we need the name of the function we wish to update, and the rate at which we want EventBridge to invoke it. Two expression formats for setting the rate are supported, cron
and rate
. For example, to schedule a lambda to run every Sunday at midnight, we could use execution_rate = "cron(0 0 * * Sun)"
. Alternatively, to schedule a lambda to run every 15 minutes, we might use execution_rate = "rate(15 minutes)"
. The details are in this AWS article
::schedule_lambda(
r2lambdalambda_function = "current_time",
execution_rate = "rate(1 minute)"
)
Checking the AWS logs
To see if our function runs every minute, we can take a look at the AWS logs. If the function was writing to a database, or dropping files in an S3 bucket, we could also check the contents of those resources for the effects of the scheduled lambda function. But as our example function only prints the current time, the only way to know that it indeed runs every minute is to check the logs.
To do this, we’ll use paws
and r2lambda::aws_connect
to establish an AWS CloudWatchLogs service locally, and fetch the recent logs to look for traces of our lambda function.
In the first step, we connect to cloudwatchlogs
and fetch the names of the log groups. Inspect the logs
object below to find the name corresponding to the lambda function whose logs we want to fetch.
<- r2lambda::aws_connect(service = "cloudwatchlogs")
logs_service <- logs_service$describe_log_groups()
logs <- sapply(logs$logGroups, "[[", 1)) (logGroups
Then, we can grab only the data for our scheduled lambda function:
<- logs_service$filter_log_events(
current_time_lambda_logs logGroupName = "/aws/lambda/current_time")
And pull only the message printed by our R
function wrapped in the lambda:
<- sapply(current_time_lambda_logs$events, "[[", "message")
messages <- messages[grepl("CURRENT TIME", messages)]
current_time_messages data.frame(Current_time_lambda = current_time_messages)
#> Current_time_lambda
#> 1 [1] "CURRENT TIME: 2023-02-26 22:53:55"\n
#> 2 [1] "CURRENT TIME: 2023-02-26 22:54:41"\n
#> 3 [1] "CURRENT TIME: 2023-02-26 22:55:41"\n
#> 4 [1] "CURRENT TIME: 2023-02-26 22:56:41"\n
#> 5 [1] "CURRENT TIME: 2023-02-26 22:57:41"\n
Evidently, the Lambda function printed the system time every one minute, as we intended!
Clean up
We don’t want to let a this lambda fire every minute, even if trivial it still uses resources and incurs some cost. So its wise to delete the event schedule rule and maybe even the lambda function it self.
To remove the event rule, we first need to remove associated targets. In the code below, we connect to EventBridge, lookup the names of all event rules, find the rule we wish to remove (in this case the most-recent one with index 1), and then, first remove its target followed by deleting the rule it self. (I’ll probably add a function abstract this procedure in the {r2lamdba}
package.)
# connect to the EventBridge service
<- r2lambda::aws_connect("eventbridge")
events_service # find the names of all rules
<- events_service$list_rules()[[1]] %>% sapply("[[", 1)
schedule_rules
# find the targets associated with the rule we want to remove
<- schedule_rules[[1]]
rule_to_remove
<- events_service$list_targets_by_rule(Rule = rule_to_remove)$Targets[[1]]$Id
target_arn_to_remove $remove_targets(Rule = rule_to_remove, Ids = target_arn_to_remove)
events_service$delete_rule(Name = rule_to_remove)
events_service
$list_rules()[[1]] %>% sapply("[[", 1) events_service
Finally, to remove the Lambda, we do something similar. Look up the names of all deployed functions on our account, and then delete the one(s) we’d like to delete.
<- r2lambda::aws_connect("lambda")
lambda_service $list_functions()$Functions %>% sapply("[[","FunctionName")
lambda_service$delete_function(FunctionName = "current_time") lambda_service
Summary
In this post: - we wrote a simple lambda runtime function, - built a docker image locally, - tested the lambda invocation, - deployed it to AWS Lambda, - updated it to run on a schedule, - checked the AWS logs to confirm it executes at the correct times, and - cleaned up our AWS environment.
I hope you found this tutorial useful, and that it will motivate you to try the {r2lambda}
package. It is available on GitHub and can be installed with remotes::install_github
. I am looking for feedback on whether or not the workflows from r2lambda
are working for other people – not many have tried it so far. I am also interested in suggestions on how to improve the interface, what features to add, what additional documentation to include, and so on. Try it and share your experience!