vignettes/usecase-r-api-microservices.Rmd
usecase-r-api-microservices.Rmd
Being able to trigger R code via public HTTP requests opens up lots of possibilities using webhooks, API calls and so on. You can also make use of private HTTP requests so that services running multiple languages can also easily interface with one another using a concept called micro-services.
The use case described below uses R code deployed to a private as a strategy for running lots of parallel computation, taking advantage of Cloud Run’s autoscaling features to turn up and down compute resources as needed.
Hosting the computation behind a plumber API and hosted on Cloud Run, you can control what data is worked on via URL parameters, and loop calling the API.
The R computation behind plumber could call other services or compute directly, although keep in mind the 60min timeout for Cloud Run API responses.
The demo API queries a BigQuery database and computes a forecast for the time-series it gets back. The data is from the Covid public datasets hosted on BigQuery.
The plumber script below has two endpoints:
/
to test its running/covid_traffic
which is fetched with parameters region
and industry to specify what times-series to fetch a forecast for.We will want these to be private (e.g. only authenticated calls) since they contain potentially expensive BigQuery queries.
if(Sys.getenv("PORT") == "") Sys.setenv(PORT = 8000)
#auth via BQ_AUTH_FILE environment argument
library(bigQueryR)
library(xts)
library(forecast)
#' @get /
#' @html
function(){
"<html><h1>It works!</h1></html>"
}
#' @get /covid_traffic
#' @param industry the industry to filter results down to e.g "Software"
#' @param region the region to filter results down to e.g "Europe"
function(region=NULL, industry=NULL){
if(any(is.null(region), is.null(industry))){
stop("Must supply region and industry parameters")
}
# to handle spaces
region <- URLdecode(region)
industry <- URLdecode(industry)
sql <- sprintf("SELECT date, industry, percent_of_baseline FROM `bigquery-public-data.covid19_geotab_mobility_impact.commercial_traffic_by_industry` WHERE region = '%s' order by date LIMIT 1000", region)
message("Query: ", sql)
traffic <- bqr_query(
query = sql,
datasetId = "covid19_geotab_mobility_impact",
useLegacySql = FALSE,
maxResults = 10000
)
# filter to industry in R this time
test_data <- traffic[traffic$industry == industry,
c("date","percent_of_baseline")]
tts <- xts(test_data$percent_of_baseline,
order.by = test_data$date,
frequency = 7)
# replace with long running sophisticated analysis
model <- forecast(auto.arima(tts))
# output a list that can be turned into JSON via jsonlite::toJSON
o <- list(
params = c(region, industry),
x = model$x,
mean = model$mean,
lower = model$lower,
upper = model$upper
)
message("Return: ", jsonlite::toJSON(o))
o
}
The first step is to host the plumber API above.
The steps below:
library(googleCloudRunner)
bs <- c(
cr_buildstep_secret("my-auth",
decrypted = "inst/docker/parallel_cloudrun/plumber/auth.json"),
cr_buildstep_docker("cloudrun_parallel",
dir = "inst/docker/parallel_cloudrun/plumber",
kaniko_cache = TRUE),
cr_buildstep_run("parallel-cloudrun",
image = "gcr.io/$PROJECT_ID/cloudrun_parallel:$BUILD_ID",
allowUnauthenticated = FALSE,
env_vars = "BQ_AUTH_FILE=auth.json,BQ_DEFAULT_PROJECT_ID=$PROJECT_ID")
)
# make a buildtrigger pointing at above steps
cloudbuild_file <- "inst/docker/parallel_cloudrun/cloudbuild.yml"
by <- cr_build_yaml(bs)
cr_build_write(by, file = cloudbuild_file)
Now the cloudbuild.yml is available, a build trigger is created so that it will deploy upon each commit to this GitHub:
repo <- cr_buildtrigger_repo("MarkEdmondson1234/googleCloudRunner")
cr_buildtrigger(cloudbuild_file,
"parallel-cloudrun",
trigger = repo,
includedFiles = "inst/docker/parallel_cloudrun/**")
The files are then committed to GitHub and the build reviewed in the GCP Console.
Once the builds are done, you should have a plumber API deployed with
a URL similar to
https://parallel-cloudrun-ewjogewawq-ew.a.run.app/
Now the plumber API is up, but to call it with authentication will
need a JSON Web Token, or JWT. These are supported via the
cr_jwt_create()
functions:
cr <- cr_run_get("parallel-cloudrun")
# Interact with the authenticated Cloud Run service
the_url <- cr$status$url
jwt <- cr_jwt_create(the_url)
# needs to be recreated every 60mins
token <- cr_jwt_token(jwt, the_url)
# call Cloud Run with token using httr call
library(httr)
res <- cr_jwt_with_httr(
GET("https://parallel-cloudrun-ewjogewawq-ew.a.run.app/"),
token)
content(res)
The JWT token needs renewing each hour.
The above shows you can call the API’s test endpoint, but now we wish to call the API many times with the parameters to filter to the data we want.
This is facilitated by a wrapper function to call our API:
# interact with the API we made
call_api <- function(region, industry, token){
api <- sprintf(
"https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=%s&industry=%s",
URLencode(region), URLencode(industry))
message("Request: ", api)
res <- cr_jwt_with_httr(httr::GET(api),token)
httr::content(res, as = "text", encoding = "UTF-8")
}
# test call
result <- call_api(region = "Europe", industry = "Software", token = token)
Once the test call is complete, you can loop over your data - the parameters for this particular dataset are shown below:
# the variables to loop over
regions <- c("North America", "Europe","South America","Australia")
industry <- c("Transportation (non-freight)",
"Software",
"Telecommunications",
"Manufacturing",
"Real Estate",
"Energy & Utilities",
"Education",
"Insurance",
"Media & Internet",
"Minerals & Mining",
"Healthcare Services & Hospitals",
"Organizations",
"Finance")
For parallel processing, ideally you don’t want R to block waiting
for the HTTP response for each URL you are sending. This is easiest
achieved through using curl
and its curl_fetch_multi()
function.
A helper function that takes care of adding multiple URLs to the same
curl pool is cr_jwt_with_curl()
- if you pass it a vector
of URL strings to call and the token, it will attempt to call all of
them at the same time.
An example calling each possible combination of the
industries/regions above is shown below, which utilises
expand.grid()
to create the URL variations.
## curl multi asynch
# function to make the URLs
make_urls <- function(regions, industry){
combos <- expand.grid(regions, industry, stringsAsFactors = FALSE)
unlist(mapply(function(x,y){sprintf("https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=%s&industry=%s",URLencode(x), URLencode(y))},
SIMPLIFY = FALSE, USE.NAMES = FALSE,
combos$Var1 ,combos$Var2))
}
# make all the URLs
all_urls <- make_urls(regions = regions, industry = industry)
# calling them
cr_jwt_async(all_urls, token = token)
Some example output is shown below - note that some combinations didn’t have data so the API returned a 500 error, but later valid combinations completed and returned the JSON response for the forecast:
> # calling them
> cr_jwt_async(all_urls, token = token)
ℹ 2020-09-20 21:58:06 > Calling asynch: https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=North%20America&industry=Transportation%20(non-freight)
ℹ 2020-09-20 21:58:06 > Calling asynch: https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=Europe&industry=Transportation%20(non-freight)
ℹ 2020-09-20 21:58:06 > Calling asynch: https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=South%20America&industry=Transportation%20(non-freight)
ℹ 2020-09-20 21:58:13 > 500 failure for request https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=North%20America&industry=Software
ℹ 2020-09-20 21:58:06 > Calling asynch: https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=Australia&industry=Transportation%20(non-freight)
ℹ 2020-09-20 21:58:06 > Calling asynch: https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=North%20America&industry=Software
[[1]]
[1] "{\"params\":[\"North America\",\"Transportation (non-freight)\"],\"x\":[100,99,108,104,105,104,105,102,102,108,104,105,104,105,99,98,59,78,78,78,78,100,100,108,106,105,105,105,100,103,109,104,104,105,106,103,102,108,104,104,103,100,96,97,67,64,62,60,59],\"mean\":[64.031,68.3173,71.9691,75.0803,77.731,79.9894,81.9135,83.5527,84.9493,86.1392],\"lower\":[[52.8354,46.9089],[53.6094,45.8236],[55.1655,46.2703],[56.9063,47.2856],[58.6237,48.5089],[60.2322,49.7734],[61.6977,50.9961],[63.0104,52.136],[64.1733,53.1751],[65.1951,54.108]],\"upper\":[[75.2265,81.1531],[83.0251,90.8109],[88.7726,97.6679],[93.2543,102.8751],[96.8384,106.9531],[99.7466,110.2054],[102.1292,112.8308],[104.095,114.9694],[105.7254,116.7235],[107.0833,118.1704]]}"
[[2]]
[1] "{\"params\":[\"South America\",\"Transportation (non-freight)\"],\"x\":[14,13,12,9,13,17,19,15,11,14,15,16,16,14,11,15,19,18,16,16,12,18,21,18,18,12,18,19,18,19,16,15,17,20,21,20,18,24,26,25,18,22,19,17,21,24,20,19,19,23,27,27,24,22,17,24,26,24,20,18,21,24,23,20,26,19,20,17,21,26,23,19,17,27,23,20,21,22,17,26,23,20,18,21,19,18,28,22,18,24,15,21,27,21,21],\"mean\":[19.7145,21.4333,23.6604,24.1475,22.5433,20.6944,20.6384,22.468,24.3021,24.3207],\"lower\":[[15.9943,14.0249],[17.6591,15.6611],[19.8826,17.8828],[20.3669,18.3656],[18.7179,16.6929],[16.7392,14.6454],[16.566,14.4103],[18.3612,16.1872],[20.1922,18.0166],[20.2036,18.0241]],\"upper\":[[23.4346,25.404],[25.2075,27.2054],[27.4381,29.438],[27.928,29.9293],[26.3686,28.3936],[24.6495,26.7433],[24.7107,26.8665],[26.5747,28.7487],[28.412,30.5876],[28.4378,30.6173]]}"
[[3]]
[1] "{\"params\":[\"Europe\",\"Transportation (non-freight)\"],\"x\":[40,52,55,50,52,44,59,55,54,60,63,43,54,49,46,38,64,55,53,51,58,61,42,43,40,54,53,60,63,54,44,43,52,38,53,51,47,53,58,44,54,62,56,59,63,48,62,66,60,68,73,69,71,77,77,63,66],\"mean\":[68.5688,68.5688,68.5688,68.5688,68.5688,68.5688,68.5688,68.5688,68.5688,68.5688],\"lower\":[[58.5753,53.2851],[58.1036,52.5637],[57.6523,51.8735],[57.2189,51.2107],[56.8015,50.5723],[56.3984,49.9558],[56.0082,49.359],[55.6297,48.7802],[55.2621,48.2179],[54.9043,47.6707]],\"upper\":[[78.5622,83.8525],[79.0339,84.5738],[79.4852,85.2641],[79.9186,85.9269],[80.3361,86.5653],[80.7392,87.1818],[81.1294,87.7786],[81.5078,88.3573],[81.8755,88.9196],[82.2333,89.4668]]}"