Evolving R with Docker
and the Cloud

Mark Edmondson (@HoloMarkeD)

Feb 28th, 2020 - CelebRation 2020 Copenhagen

code.markedmondson.me

Credentials

My R Timeline

GA Effect

ga-effect

My CRAN packages

  • searchConsoleR
  • googleAnalyticsR
  • googleAuthR -> gargle
  • googleComputeEngineR (cloudyr)
  • googleCloudStorageR (cloudyr)
  • bigQueryR (cloudyr)
  • googleLanguageR (rOpenSci)
  • googleCloudRunner (New!)

googleAuthRverse

Slack: #googleAuthRverse

Agenda today

  • Abstracting R applications into the Cloud using Docker (10min)
  • Demo on what that abstraction offers (10 min)
  • My evolving mindset for using R/Docker/Cloud (10 min)
  • Any questions? (10 min)

R in the Cloud using Docker

What is..?

Docker - a container system for building and sharing applications

Cloud - computing delivered via the internet, not locally

What does serverless offer?

Serverless - cloud services that often use containers to host applications without configuring servers

  • Focus on code, not dev-ops
  • Scale from 0 to billions
  • Reliability and security
  • Abstraction

Climbing up the R pyramid

Climbing up the Cloud pyramid

The keystone

  • R - abstraction of R environments
  • Cloud - run any code on cloud systems

Docker + R = R in Production

  • Flexible No need to ask IT to install R places, use docker run; across cloud platforms; ascendent tech

  • Version controlled No worries new package releases will break code

  • Scalable Run multiple Docker containers at once, fits into event-driven, stateless serverless future

Docker levels the playing ground between languages in the cloud

Useful R Docker images

  • rocker/r-ver
  • rocker/rstudio
  • rocker/tidyverse
  • rocker/shiny
  • rocker/ml-gpu

Thanks to Rocker Team

rocker-team

Dockerfiles

Demo

Schedule an R script in the Cloud

Create an R API that scales from 0 to 1 billion

googleCloudRunner - Use Cases

As easy as possible enabling of R use cases in the Cloud

  • Scheduled R scripts (API calls, data updates)
  • Long-running R scripts (Batched R scripts)
  • Scale to 0 R APIs (R events, R-a-a-Service)
  • Continuous development (build R website/packages upon Git commit)

My evolving mindset for using Docker/Cloud

A use anywhere R/RStudio

  • Configure RStudio Server just like home…?
  • Doesn’t use the cloud to full potential

Tailored R environment

  • Workshops with material pre-loaded
  • ML training machine
  • package sets (tidyverse, googleauthrverse etc.)
  • Shiny

Scaling up R

  • library(future) for cluster parallel work
  • Run many of the same R environments at a time

Code. Data. Config.

  • Good data science principles -> Good Cloud principles
  • The right tool for the data - BigQuery, Cloud Storage etc.
  • Switch configuration via environment args, yaml etc.

Swap to new platforms

Kubernetes

Docker image originally for GCE, deployed to Kubernetes:

https://code.markedmondson.me/r-on-kubernetes-serverless-shiny-r-apis-and-scheduled-scripts/

Shiny apps on k8s

shiny-kubernetes

Build R on a schedule

Cloud Build (like the demo earlier)

Trigger R on events

Cloud Run (like the demo earlier)

My current R / Cloud setup

Same R/Docker container, many options:

  • Docker on VM - GPU support, ML dev work
  • Kubernetes - Shiny apps
  • Cloud Build - Batched and scheduled jobs
  • Cloud Run - R APIs, event driven workflows

Ability to share R code with non-R users

Summary

Take-aways

  • Abstracting R using Docker opens up R horizons
  • Cloud offers to make hard things easy for R tasks
  • Code & data & config separation means best tool for the job

Gratitude

  • Thank you for listening
  • Thanks to Anne Petersen for inviting me
  • Thanks to R Core team for Rv1.0.0 and beyond
  • Thanks to RStudio for all their cool things
  • Thanks again to Rocker
  • Thanks to Google for Developer Expert programme

Say hello