Here will be some example use cases you can use googleCloudRunner for.

Since almost any code can be called and passed via Docker images, there are a lot of potential uses.

  • Scheduling R scripts in the cloud
  • R APIs to call from anywhere
  • Triggering R code to run on events, such as BigQuery table updates or GitHub pushes
  • Checking a package, creating a website, deploying it to GitHub
  • Running authenticated tests for R packages in private environment
  • Creating an Rmarkdown powered website using Cloud Run
  • Integrating R with other language applications
  • Public and private Docker image creations on triggers

Some community contributed Cloud Build images are listed here, including hugo, make, and tar.

You can use any public Docker image if you know which arguments etc. to send to it, for example the Slack messages are using technosophos/slack-notify.

Deploy a pkgdown website for your R package

When creating an R package pkgdown is a fantastic resource for creating a package website from your R function’s documentation and vignettes.

This workflow uses Google’s Key Management Store to securely hold your Git ssh login details, then use those details to commit a built website on each Git commit. This means you do not have to build the website locally.

Each commit you make, a background task will build the website with your changes and commit it back to the repo - see example for this website:

A suggested setup workflow to do this is below:

  1. Encrypt your git SSH key with gcloud using these instructions, noting your keyring and key for the next steps.
  2. This will create an encrypted file - add it to your R package’s directory.
  3. Use cr_deploy_pkgdown() to create a cloudbuild.yml file in your R package’s directory, giving it your decryption details from step 1.
cr_deploy_pkgdown(keyring = "your-keyring", key = "github-key")
  1. Add and commit the cloudbuild.yml file to your git repository
  2. Go to GCP console > Cloud Build > Triggers and link your git repo to Cloud Build.
  3. Create a Build Trigger for your git repository:
  • point at the cloudbuild.yml file you commited (e.g. cloudbuild-pkgdown.yml)
  • Make sure to exclude the docs/** folder in the build trigger else the trigger will retrigger when the website is built and pushed to the repo!
  • You may also want to exclude other folders such as tests/**
  • Add a substitution variable _GITHUB_REPO with your Git address you want to push to e.g. MarkEdmondson1234/googleCloudRunner

The below is an example for googleCloudRunner’s website:

The example above also adds other substitution variables to help run some of the examples.

You can customise the deployment more by using cr_buildstep_pkgdown() in your own custom build files. For instance, you could download other auth keys using cr_buildstep_decrypt() again, so that your website has working authenticated examples.

Run package tests and code coverage

This workflow will run the package tests you have upon each commit to your git repo.

You can also optionally submit those test results to codecov via the excellent covr R package, helping you see which code your tests actually test. This is what creates this badge for this package:

codecov

If you do not need online authentication for your tests, then this is only a case of deploying the premade default cloudbuild.yml file via cr_deploy_packagetests().

The below assumes you have created tests for your package.

  1. If you want to use Codecov, generate a Codecov token on its website and link it to your git repository
  2. Create the tests cloudbuild.yml file via cr_deploy_packagetests()
  3. Add and commit the cloudbuild.yml file to your git repository
  4. Go to GCP console > Cloud Build > Triggers and link your git repo to Cloud Build.
  5. Create a Build Trigger for your git repository:
  • point at the cloudbuild.yml file you commited (e.g. cloudbuild-tests.yml)
  • Exclude any folders such as the docs/** folder where changes should not trigger a recheck
  • Add a substitution variable _CODECOV_TOKEN if you are using it

The below is an example for googleCloudRunner’s website:

The example above also adds other substitution variables to help run some of the examples.

Authenticated tests

You can customise the deployment more by using cr_buildstep_packagetests() in your own custom build files.

For googleCloudRunner and API packages in general, an authentication key is needed to run online tests. This authentication key can be encrypted via gcloud and added to Google Cloud KMS by adding a decryption step to your tests via cr_buildstep_decrypt().

In that case, the decryption step needs to occur before the tests run, which you can do by supplying cr_buildstep_decrypt() to cr_deploy_packagetests().

You will also want to use that auth file somehow, in the below example it is placed in an environment argument that your tests use to find the authentication file:

cr_deploy_packagetests(
  steps = cr_buildstep_decrypt("auth.json.enc", "auth.json",
                               keyring = "my-keyring", key = "my-key"),
  env = c("NOT_CRAN=true", "MY_AUTH_FILE=auth.json")
)

Use the resulting cloudbuild.yml file in the same manner as unauthenticated tests.

Run R code in a background task

Sometimes you just want to have some R code running whilst you get on with something else. In this case you can use cr_buildstep_r() to run your R code even after you have turned your computer off, since its running in the cloud.

When running R code, it needs to run in an environment that has the packages and resources it needs. If those are covered by images such as rocker/verse (the tidyverse) then you can commit it straight away. Otherwise you can first make a custom Docker file with the R packages you need in it, and then run your code against that.

Once you have your R code and chosen the Docker image, then you can use cr_deploy_r() to point at your R file and Docker image, set the timeout to what you need (the maximum is 24 hours).

An RStudio gadget is also available to help you deploy:

If you want help creating the Docker image, try containerit to generate your Dockerfile, then use cr_deploy_docker() to build it.

Execute R scripts directly from Cloud Storage

In some cases you may hit character limits of the cloudbuild file in which case you will need to execute the R code from a file. The easiest way to do this is to upload your R code to a Cloud Storage bucket and then supply the gs:// URI to cr_buildstep_r():

The below example uses a public bucket and R image, for your use case you would change to your own private one that your Cloud Build service email has access to:

Authentication within the R script on Cloud Build

The default metadata of the Cloud Build server is the same as googleComputeEngineR, so you can use googleAuthR::gar_gce_auth("default", scopes = "https://www.googleapis.com/auth/cloud-platform"). You can also use the Cloud Build email (the one that looks like {project-number@cloudbuild.gserviceaccount.com}) if its given the right access in Cloud IAM, authentication in your R script will look like googleAuthR::gar_gce_auth("{project-number@cloudbuild.gserviceaccount.com}", scopes = "https://www.googleapis.com/auth/cloud-platform")

This enables you in the R script to download authentication files from Cloud Storage, for example. A minimal example is shown below:

This would then be saved to an R script and deployed via the gadget or:

cr_deploy_r("the_script.R", r_image = "gcr.io/gcer-public/googleauthr-verse")

…where the docker R image is one with googleCloudStorageR installed.

Create Docker image of a package each commit

If you just want a one-off Docker image, use cr_deploy_docker() or make your own build via cr_buildstep_docker()

If you want the Docker image to rebuild each git commit, this is what is used to deploy this package to gcr.io/gcer-public/googlecloudrunner - in this case the Dockerfile in the root of the repo is built. Simply place the Dockerfile in the root of your repo, and then create a Build Trigger with similar settings:

Migrate an existing scheduled Docker container

You may have an existing Docker container containing code, that doesn’t need a public URL.

For example, a self-contained R script with googleAnalyticsR and bigQueryR pre-installed, that downloads data from Google Analytics and uploads it to BigQuery. This may be running on a VM from googleComputeEngineR, Kubernetes or Airflow KubernetesPodOperator.

If you give the cloud build service email the right permissions, then this can all be done on Cloud Build + Cloud Scheduler.

For example, say your existing container is called gcr.io/my-project/my-r-script. A Cloud Build could look simply like this:

r_code <- cr_build_yaml(
  cr_buildstep("gcr.io/my-project/my-r-script")
)
build <- cr_build(r_code)
built <- cr_build_wait(build)

You could add other steps if you wanted, such as sending an email when done via cr_buildstep_mailgun() or cr_buildstep_slack():

r_code <- cr_build_yaml(
  cr_buildstep("gcr.io/my-project/my-r-script"),
  cr_buildstep_slack("The script run ok")
)
build <- cr_build(r_code, `_SLACK_WEBHOOK` = "https://your_slack_webhook")
built <- cr_build_wait(build)

To set this up in a schedule, add it to the scheduler like so:

schedule_me <- cr_build_schedule_http(built)
cr_schedule("r-code-example", "15 8 * * *", httpTarget = schedule_me)

Trigger an R function from pub/sub

This uses cr_deploy_plumber to deploy an R API that can then be triggered from pub/sub events, such as Cloud logging events or when files get updated in a Cloud Storage bucket. This in turn can trigger other Cloud Builds.

A plumber API that accepts pub/sub messages is facilitated by cr_plumber_pubsub()

api.R:

Deploy the R API using an image that has plumber and googleCloudRunner installed. There is one available at gcr.io/gcer-public/googlecloudrunner

Dockerfile:

FROM gcr.io/gcer-public/googlecloudrunner:master

COPY [".", "./"]

ENTRYPOINT ["R", "-e", "pr <- plumber::plumb(commandArgs()[4]); pr$run(host='0.0.0.0', port=as.numeric(Sys.getenv('PORT')))"]
CMD ["api.R"]

Put both the Dockerfile and the api.R script in the same folder then deploy it to Cloud Run:

cr_deploy_plumber("the_folder")

Or use the RStudio gadget:

Once built you will have a pub/sub live that will trigger the function you defined. You can test pubsub messages via cr_pubsub()

In this case the function only echos, but you can modify the function to call other libraries, functions, operate on data sent in pubsub such as bucket object names, BigQuery tableIds, etc.

If you would like to trigger a pub/sub message when a file arrives on Google Cloud Storage, use googleCloudStorageR::gcs_create_pubsub()

Run R code when a pub/sub message is created from a new file in Google Cloud Storage

This is a demo to show how to make use of Pub/Sub messages. Pub/Sub messages are used throughout the Google Cloud Platform, for this example we will use one for when a new object is created in a Google Cloud Storage bucket.

A video walkthrough of the below is available here

The plumber API with pub/sub

The API endpoint in plumber needs an endpoint to recieve Pub/Sub messages, then an R function to process and do what you want with that message. In this case we shall email the file name to an end user. This could be expanded to do some R analysis, or what ever you need.

The above script relies on two environment arguments with the Mailgun keys, which can be set in the Cloud Run interface once a deployment is live. The environment arguments persist inbetween versions so you only need to set this up once.

For this example, the Dockerfile is a standard one with plumber and googleCloudRunner installed - you may need to add your own dependencies:

FROM gcr.io/gcer-public/googlecloudrunner:master
COPY ["./", "./"]
ENTRYPOINT ["R", "-e", "pr <- plumber::plumb(commandArgs()[4]); pr$run(host='0.0.0.0', port=as.numeric(Sys.getenv('PORT')))"]
CMD ["api.R"]

Deployment

This can be deployed via the RStudio Addin, or via the code below:

After the first deployment, go to the web UI and Deploy New Revision - keep everything the same but update the environment arguments. These will persist for future revisions:

You can test the Pub/Sub API is working from R with the below code:

cr_pubsub("https://your-cloudendpoint.run.app/pubsub",
          jsonlite::toJSON(list(name = "test_file_from_r")))

Setup Pub/Sub

To activate Pub/Sub messages from a Google Cloud Storage bucket, use a service account with Cloud Storage Admin rights (note not Cloud Storage Object Admin).

In the Web UI of Pub/Sub set up a Topic:

You can only setup cloud storage pub/sub messages to this topic using the API, which can be done using googleCloudStorageR::gce_create_pubsub():

library(googleCloudStorageR)
gcs_create_pubsub("your-topic", "your-project", "your-bucket")

Each topic can have many subscriptions. One of these will push to the Cloud Run URL set up in the previous step:

Add files to Google Cloud Storage

Now when a new file arrives into the bucket, it will:

  1. Trigger a Pub/Sub to the Subscription
  2. Pass on to the Pub/Sub Topic
  3. Push to Cloud Run
  4. Execute the plumber R code to send an email
  5. Arrive in your inbox

You can test it by uploading files either in the web UI or via

googleCloudStorageR::gcs_upload(mtcars, bucket = "your-bucket", 
                                name = paste(Sys.time(), "test"))

You will need to setup mailgun to stop it appearing in your spam folder, by verifying your domain etc.

Build an Rmd on a schedule and host its HTML on Cloud Storage

Cloud Storage can host public HTML files like a website, which can be accessed via public or private viewers. This can be useful in setting up reporting infrastructure.

A video of the below workflow is here:

The Cloud Build below makes use of Cloud Build artifacts to send the built Rmd files to a public Cloud Storage bucket. The Rmd is built on a schedule to pull in new data from a Google Sheet:

Once the build is tested, we now schedule the Rmd to build each morning, adapting to changes in the source data file.

# schedule the build
cr_schedule("rmd-demo", schedule = "15 5 * * *",
            httpTarget = cr_build_schedule_http(built))

The example Rmd file is built and available on a public Cloud Storage bucket - the above example is available at this link

Build and deploy Rmd files to Cloud Run on a schedule

Cloud Run scales from 0 to millions of hits, so can be an option for hosting a website. In particular you may want a private internal website, have R code executing via a plumber API in the same container, or have lots of traffic or data that goes over other free hosting options such as GitHub pages. A nginx server configuration is included to host any HTML you provide via cr_deploy_html().

You may prefer using Cloud Storage public URLs if you don’t need any of Cloud Run’s features, like the previous example.

Coupled to that, a common use case is to render Rmd files and host them on a website like the tweet above. For the above tweet scenario, the Rmd has a setup block that reads from googlesheets via the code below:

The build Rmd docker in this case needs all the libraries listed (flexdashboard, googlesheets4 etc.) included in its build - this could be built before hand in another Cloud Build - in this case the libraries are all in gcr.io/gcer-public/render_rmd

For this example, the build is not reading from a git repo but the Rmd file is downloaded from a Cloud Storage bucket, that you may have uploaded to manually, via googleCloudStorageR or perhaps copied over from a repo in another Cloud Build on a Build Trigger.

The scheduled build then can be enabled via:

  1. Uploading your Rmarkdown files to a Cloud Storage bucket
  2. Create a build that will:
  • download the Rmd file
  • render the Rmd creating the HTML files
  • configure nginx for Cloud Run,
  • build a Docker image of nginx with your HTML
  • serve it on Cloud Run.
  1. Run a test build to check it works.

You should see a Cloud Run URL in the logs, like this one: https://my-html-on-cloudrun-ewjogewawq-ew.a.run.app/scheduled_rmarkdown.html

  1. Schedule the build using cron syntax

Do it all in R using cr_deploy_r()

An alternative if you only wanted to do a scheduled deployment would be to put all steps in an R script (downloading, building and uploading to Cloud Storage via googleCloudStorageR) and use the RStudio gadget or cr_deploy_r()

Polygot Cloud Builds - integrating R code with other languages

Since Docker containers can hold any language within them, they offer a universal UI to combine languages. This offers opportunities to extend other languages with R features, and give other languages access to R code without needing to know R.

An example below uses:

And will perform downloading unsampled data from Google Analytics, creating a statistical report of the data and then uploading the raw data to BigQuery for further analysis.

library(googleCloudRunner)
polygot <- cr_build_yaml(
  steps = c(
    cr_buildstep(
      id = "download encrypted auth file",
      name = "gsutil",
      args = c("cp",
               "gs://marks-bucket-of-stuff/auth.json.enc",
               "/workspace/auth.json.enc"),
    ),
    cr_buildstep_decrypt(
      id = "decrypt file",
      cipher = "/workspace/auth.json.enc",
      plain = "/workspace/auth.json",
      keyring = "my-keyring",
      key = "ga_auth"
    ),
    cr_buildstep(
      id = "download google analytics",
      name = "gcr.io/gcer-public/gago:master",
      env = c("GAGO_AUTH=/workspace/auth.json"),
      args = c("reports",
               "--view=81416156",
               "--dims=ga:date,ga:medium",
               "--mets=ga:sessions",
               "--start=2014-01-01",
               "--end=2019-11-30",
               "--antisample",
               "--max=-1",
               "-o=google_analytics.csv"),
      dir = "build"
    ),
    cr_buildstep(
      id = "download Rmd template",
      name = "gsutil",
      args = c("cp",
               "gs://mark-edmondson-public-read/polygot.Rmd",
               "/workspace/build/polygot.Rmd")
    ),
    cr_buildstep_r(
      id="render rmd",
      r = "lapply(list.files('.', pattern = '.Rmd', full.names = TRUE),
             rmarkdown::render, output_format = 'html_document')",
      name = "gcr.io/gcer-public/packagetools:master",
      dir = "build"
      ),
    cr_buildstep_bash(
      id = "setup nginx",
      bash_script = system.file("docker", "nginx", "setup.bash",
                                package = "googleCloudRunner"),
      dir = "build"
      ),
    cr_buildstep_docker(
      # change to your own container registry
      image = "gcr.io/gcer-public/polygot_demo",
      tag = "latest",
      dir = "build"
      ),
    cr_buildstep_run(
      name = "polygot-demo",
      image = "gcr.io/gcer-public/polygot_demo",
      concurrency = 80),
    cr_buildstep(
      id = "load BigQuery",
      name = "gcr.io/cloud-builders/gcloud",
      entrypoint = "bq",
      args = c("--location=EU",
               "load",
               "--autodetect",
               "--source_format=CSV",
               "test_EU.polygot",
               "google_analytics.csv"
               ),
      dir = "build"
    )
  )
)

# build the polygot cloud build steps
build <- cr_build(polygot)
built <- cr_build_wait(build)

An example of the demo output is on this Cloud Run instance URL: https://polygot-demo-ewjogewawq-ew.a.run.app/polygot.html

It also uploads the data to a BigQuery table:

This constructed cloud build can also be used outside of R, by writing out the Cloud Build file via cr_build_write()

This can then be scheduled as described in Cloud Scheduler section on scheduled cloud builds.

schedule_me <- cr_build_schedule_http(built)
cr_schedule("polygot-example", "15 8 * * *", httpTarget = schedule_me)

An example of the cloudbuild.yaml is on GitHub here.