Being able to trigger R code via public HTTP requests opens up lots of possibilities using webhooks, API calls and so on. You can also make use of private HTTP requests so that services running multiple languages can also easily interface with one another using a concept called micro-services.

The use case described below uses R code deployed to a private as a strategy for running lots of parallel computation, taking advantage of Cloud Run’s autoscaling features to turn up and down compute resources as needed.

Hosting the computation behind a plumber API and hosted on Cloud Run, you can control what data is worked on via URL parameters, and loop calling the API.

The R computation behind plumber could call other services or compute directly, although keep in mind the 60min timeout for Cloud Run API responses.

Creating the Cloud Run deployment

The demo API queries a BigQuery database and computes a forecast for the time-series it gets back. The data is from the Covid public datasets hosted on BigQuery.

The plumber script below has two endpoints:

  • / to test its running
  • /covid_traffic which is fetched with parameters region and industry to specify what times-series to fetch a forecast for.

We will want these to be private (e.g. only authenticated calls) since they contain potentially expensive BigQuery queries.

plumber API script

if(Sys.getenv("PORT") == "") Sys.setenv(PORT = 8000)

#auth via BQ_AUTH_FILE environment argument
library(bigQueryR)
library(xts)
library(forecast)

#' @get /
#' @html
function(){
  "<html><h1>It works!</h1></html>"
}


#' @get /covid_traffic
#' @param industry the industry to filter results down to e.g "Software"
#' @param region the region to filter results down to e.g "Europe"
function(region=NULL, industry=NULL){

  if(any(is.null(region), is.null(industry))){
    stop("Must supply region and industry parameters")
  }

  # to handle spaces
  region <- URLdecode(region)
  industry <- URLdecode(industry)

  sql <- sprintf("SELECT date, industry, percent_of_baseline FROM `bigquery-public-data.covid19_geotab_mobility_impact.commercial_traffic_by_industry`  WHERE region = '%s' order by date LIMIT 1000", region)

  message("Query: ", sql)

  traffic <- bqr_query(
    query = sql,
    datasetId = "covid19_geotab_mobility_impact",
    useLegacySql = FALSE,
    maxResults = 10000
  )

  # filter to industry in R this time
  test_data <- traffic[traffic$industry == industry,
                       c("date","percent_of_baseline")]

  tts <- xts(test_data$percent_of_baseline,
             order.by = test_data$date,
             frequency = 7)

  # replace with long running sophisticated analysis
  model <- forecast(auto.arima(tts))

  # output a list that can be turned into JSON via jsonlite::toJSON
  o <- list(
    params = c(region, industry),
    x = model$x,
    mean = model$mean,
    lower = model$lower,
    upper = model$upper
  )

  message("Return: ", jsonlite::toJSON(o))
  
  o

}

Deploy plumber API as a private micro-service

The first step is to host the plumber API above.

The steps below:

  • Download an authentication file for the script
  • Build the Docker container for the plumber API
  • Deploys the plumber API to a private Cloud Run instance
  • Creates the cloudbuild.yml and writes it to a file in the repo
library(googleCloudRunner)

bs <- c(
  cr_buildstep_secret("my-auth",
      decrypted = "inst/docker/parallel_cloudrun/plumber/auth.json"),
  cr_buildstep_docker("cloudrun_parallel",
                      dir = "inst/docker/parallel_cloudrun/plumber",
                      kaniko_cache = TRUE),
  cr_buildstep_run("parallel-cloudrun",
                   image = "gcr.io/$PROJECT_ID/cloudrun_parallel:$BUILD_ID",
                   allowUnauthenticated = FALSE,
                   env_vars = "BQ_AUTH_FILE=auth.json,BQ_DEFAULT_PROJECT_ID=$PROJECT_ID")
)

# make a buildtrigger pointing at above steps
cloudbuild_file <- "inst/docker/parallel_cloudrun/cloudbuild.yml"
by <- cr_build_yaml(bs)
cr_build_write(by, file = cloudbuild_file)

Create a build trigger so it builds upon each git commit

Now the cloudbuild.yml is available, a build trigger is created so that it will deploy upon each commit to this GitHub:

repo <- cr_buildtrigger_repo("MarkEdmondson1234/googleCloudRunner")
cr_buildtrigger(cloudbuild_file,
                "parallel-cloudrun",
                trigger = repo,
                includedFiles = "inst/docker/parallel_cloudrun/**")

The files are then committed to GitHub and the build reviewed in the GCP Console.

Once the builds are done, you should have a plumber API deployed with a URL similar to https://parallel-cloudrun-ewjogewawq-ew.a.run.app/

Calling the private plumber API with a JWT

Now the plumber API is up, but to call it with authentication will need a JSON Web Token, or JWT. These are supported via the cr_jwt_create() functions:

cr <- cr_run_get("parallel-cloudrun")

# Interact with the authenticated Cloud Run service
the_url <- cr$status$url
jwt <- cr_jwt_create(the_url)

# needs to be recreated every 60mins
token <- cr_jwt_token(jwt, the_url)

# call Cloud Run with token using httr call
library(httr)
res <- cr_jwt_with_httr(
  GET("https://parallel-cloudrun-ewjogewawq-ew.a.run.app/"),
  token)
content(res)

The JWT token needs renewing each hour.

Use your private micro-service

The above shows you can call the API’s test endpoint, but now we wish to call the API many times with the parameters to filter to the data we want.

This is facilitated by a wrapper function to call our API:

# interact with the API we made
call_api <- function(region, industry, token){
  api <- sprintf(
    "https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=%s&industry=%s",
    URLencode(region), URLencode(industry))

  message("Request: ", api)
  res <- cr_jwt_with_httr(httr::GET(api),token)

  httr::content(res, as = "text", encoding = "UTF-8")

}

# test call
result <- call_api(region = "Europe", industry = "Software", token = token)

Once the test call is complete, you can loop over your data - the parameters for this particular dataset are shown below:

# the variables to loop over
regions <- c("North America", "Europe","South America","Australia")
industry <- c("Transportation (non-freight)",
              "Software",
              "Telecommunications",
              "Manufacturing",
              "Real Estate",
              "Energy & Utilities",
              "Education",
              "Insurance",
              "Media & Internet",
              "Minerals & Mining",
              "Healthcare Services & Hospitals",
              "Organizations",
              "Finance")

Parallel calling of the API

For parallel processing, ideally you don’t want R to block waiting for the HTTP response for each URL you are sending. This is easiest achieved through using curl and its curl_fetch_multi() function.

A helper function that takes care of adding multiple URLs to the same curl pool is cr_jwt_with_curl() - if you pass it a vector of URL strings to call and the token, it will attempt to call all of them at the same time.

An example calling each possible combination of the industries/regions above is shown below, which utilises expand.grid() to create the URL variations.

## curl multi asynch

# function to make the URLs
make_urls <- function(regions, industry){

  combos <- expand.grid(regions, industry, stringsAsFactors = FALSE)

  unlist(mapply(function(x,y){sprintf("https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=%s&industry=%s",URLencode(x), URLencode(y))},
                SIMPLIFY = FALSE, USE.NAMES = FALSE,
         combos$Var1 ,combos$Var2))
}

# make all the URLs
all_urls <- make_urls(regions = regions, industry = industry)

# calling them
cr_jwt_async(all_urls, token = token)

Some example output is shown below - note that some combinations didn’t have data so the API returned a 500 error, but later valid combinations completed and returned the JSON response for the forecast:

> # calling them
> cr_jwt_async(all_urls, token = token)
ℹ 2020-09-20 21:58:06 > Calling asynch:  https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=North%20America&industry=Transportation%20(non-freight)
ℹ 2020-09-20 21:58:06 > Calling asynch:  https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=Europe&industry=Transportation%20(non-freight)
ℹ 2020-09-20 21:58:06 > Calling asynch:  https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=South%20America&industry=Transportation%20(non-freight)
ℹ 2020-09-20 21:58:13 > 500 failure for request https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=North%20America&industry=Software
ℹ 2020-09-20 21:58:06 > Calling asynch:  https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=Australia&industry=Transportation%20(non-freight)
ℹ 2020-09-20 21:58:06 > Calling asynch:  https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=North%20America&industry=Software
[[1]]
[1] "{\"params\":[\"North America\",\"Transportation (non-freight)\"],\"x\":[100,99,108,104,105,104,105,102,102,108,104,105,104,105,99,98,59,78,78,78,78,100,100,108,106,105,105,105,100,103,109,104,104,105,106,103,102,108,104,104,103,100,96,97,67,64,62,60,59],\"mean\":[64.031,68.3173,71.9691,75.0803,77.731,79.9894,81.9135,83.5527,84.9493,86.1392],\"lower\":[[52.8354,46.9089],[53.6094,45.8236],[55.1655,46.2703],[56.9063,47.2856],[58.6237,48.5089],[60.2322,49.7734],[61.6977,50.9961],[63.0104,52.136],[64.1733,53.1751],[65.1951,54.108]],\"upper\":[[75.2265,81.1531],[83.0251,90.8109],[88.7726,97.6679],[93.2543,102.8751],[96.8384,106.9531],[99.7466,110.2054],[102.1292,112.8308],[104.095,114.9694],[105.7254,116.7235],[107.0833,118.1704]]}"

[[2]]
[1] "{\"params\":[\"South America\",\"Transportation (non-freight)\"],\"x\":[14,13,12,9,13,17,19,15,11,14,15,16,16,14,11,15,19,18,16,16,12,18,21,18,18,12,18,19,18,19,16,15,17,20,21,20,18,24,26,25,18,22,19,17,21,24,20,19,19,23,27,27,24,22,17,24,26,24,20,18,21,24,23,20,26,19,20,17,21,26,23,19,17,27,23,20,21,22,17,26,23,20,18,21,19,18,28,22,18,24,15,21,27,21,21],\"mean\":[19.7145,21.4333,23.6604,24.1475,22.5433,20.6944,20.6384,22.468,24.3021,24.3207],\"lower\":[[15.9943,14.0249],[17.6591,15.6611],[19.8826,17.8828],[20.3669,18.3656],[18.7179,16.6929],[16.7392,14.6454],[16.566,14.4103],[18.3612,16.1872],[20.1922,18.0166],[20.2036,18.0241]],\"upper\":[[23.4346,25.404],[25.2075,27.2054],[27.4381,29.438],[27.928,29.9293],[26.3686,28.3936],[24.6495,26.7433],[24.7107,26.8665],[26.5747,28.7487],[28.412,30.5876],[28.4378,30.6173]]}"

[[3]]
[1] "{\"params\":[\"Europe\",\"Transportation (non-freight)\"],\"x\":[40,52,55,50,52,44,59,55,54,60,63,43,54,49,46,38,64,55,53,51,58,61,42,43,40,54,53,60,63,54,44,43,52,38,53,51,47,53,58,44,54,62,56,59,63,48,62,66,60,68,73,69,71,77,77,63,66],\"mean\":[68.5688,68.5688,68.5688,68.5688,68.5688,68.5688,68.5688,68.5688,68.5688,68.5688],\"lower\":[[58.5753,53.2851],[58.1036,52.5637],[57.6523,51.8735],[57.2189,51.2107],[56.8015,50.5723],[56.3984,49.9558],[56.0082,49.359],[55.6297,48.7802],[55.2621,48.2179],[54.9043,47.6707]],\"upper\":[[78.5622,83.8525],[79.0339,84.5738],[79.4852,85.2641],[79.9186,85.9269],[80.3361,86.5653],[80.7392,87.1818],[81.1294,87.7786],[81.5078,88.3573],[81.8755,88.9196],[82.2333,89.4668]]}"