Use Cases for R on the
Google Cloud Platform

Mark Edmondson (@HoloMarkeD)

October 14th, 2020

Credentials

My R Timeline

Digital agencies since 2007
useR since 2012 - Motive: how to use all this web data?
Shiny enthusiast e.g. https://app.iihnordic.dk/ga-effect/
Google Developer Expert - Google Analytics & Google Cloud
Several Google API themed packages on CRAN via googleAuthR
Part of cloudyr group (AWS/Azure/GCP R packages for the cloud) https://cloudyr.github.io/
Now: Data Engineer @ IIH Nordic

GA Effect

https://app.iihnordic.dk/ga-effect/

ga-effect

googleAuthRverse

searchConsoleR
googleAuthR
googleAnalyticsR
googleComputeEngineR (cloudyr)
bigQueryR (cloudyr)
googleCloudStorageR (cloudyr)
googleLanguageR (rOpenSci)
googleCloudRunner (NEW!)

Slack group to talk around the packages #googleAuthRverse

Why R for digital marketing

Data Science programming

R has specialised tools for every stage of data projects
Gathering data - standard data.frame objects
Cleaning data - tidyverse
Modelling data - many statistical packages
Presentation - R Markdown, Shiny, ggplot2, JavaScript viz

Where R sits

Its a data science language that changes the way you think about data
I love Python too, the 2nd best programming language for everything
SQL, Go and JavaScript round out 99% of data needs

Why R in the (Google) Cloud?

No need to migrate code from R to scale it in production
Use R’s UX to integrate with Cloud services
Level up R’s abilities
Share R micro-services with non-R users

Google Cloud Platform - Serverless Pyramid

Scale (almost) always starts with Docker containers

Dockerfiles from The Rocker Project

https://www.rocker-project.org/

Maintain useful R images

rocker/r-ver
rocker/rstudio
rocker/tidyverse
rocker/shiny
rocker/ml-gpu

Thanks to Rocker Team

rocker-team

Dockerfiles

FROM rocker/tidyverse
MAINTAINER Mark Edmondson (r@sunholo.com)

# install R package dependencies
RUN apt-get update && apt-get install -y \
    libssl-dev 

## Install packages from CRAN
RUN install2.r --error \ 
    -r 'http://cran.rstudio.com' \
    googleAuthR \ 
    googleComputeEngineR \ 
    googleAnalyticsR \ 
    searchConsoleR \ 
    googleCloudStorageR \
    bigQueryR \ 
    ## install Github packages
    && installGithub.r MarkEdmondson1234/youtubeAnalyticsR

Docker + R = R in Production

Flexible No need to ask IT to install R places, use docker run; across cloud platforms; ascendant tech
Version controlled No worries new package releases will break code
Scalable Run multiple Docker containers at once, fits into event-driven, stateless serverless future

Scaling R scripts, Shiny apps and APIs

Strategies to scale R

Vertical scaling - increase the size and power of one machine (VMs)
Horizontal scaling - split up your problem into lots of little machines (VM clusters)
Serverless scaling - send your code + data into cloud and let them sort out how many machines

Google Cloud Platform - Serverless Pyramid

googleCloudRunner - serverless scaling

Cloud Run

Built on top of Kubernetes via Knative
Managed Container-as-a-Service for HTTP requests

Cloud Run Pros/Cons

Good for R APIs

Pros

Auto-scaling
Scale from 0
Simple to deploy
https / authentication embedded

Cons

Needs stateless, idempotent workflows
Limited support for Shiny

plumber APIs

https://www.rplumber.io/

Make an API out of your script:

#' @get /hello
#' @html
function(){
  "<html><h1>hello world</h1></html>"
}

Adapt plumber API for your R needs

#' Echo the parameter that was sent in
#' @param msg The message to echo back.
#' @get /echo
function(msg=""){
  list(msg = paste0("The message is: '", msg, "'"))
}

#' Plot out data from the iris dataset
#' @param spec If provided, filter the data to only this species (e.g. 'setosa')
#' @get /plot
#' @png
function(spec){
  myData <- iris
  title <- "All Species"

  # Filter if the species was specified
  if (!missing(spec)){
    title <- paste0("Only the '", spec, "' Species")
    myData <- subset(iris, Species == spec)
  }

  plot(myData$Sepal.Length, myData$Petal.Length,
       main=title, xlab="Sepal Length", ylab="Petal Length")
}

Cloud Run Docker file

Based on:

FROM trestletech/plumber

COPY [".", "./"]

ENTRYPOINT ["R", "-e", "pr <- plumber::plumb(commandArgs()[4]); pr$run(host='0.0.0.0', port=as.numeric(Sys.getenv('PORT')))"]
CMD ["api.R"]

Demo Cloud Run R application

R API

Can scale to a billion, and be available for other languages.

Website plot: https://cloudrunr-ewjogewawq-uc.a.run.app/plot?spec=setosa
Curl: `curl https://cloudrunr-ewjogewawq-uc.a.run.app/echo?msg=Hello%20R

Cloud Run - R Use Cases

Data modelling as a service via API call
Parallel processing via multiple API calls
Dynamic plots for hosted in iframes for data viz products like Data Studio/Tableau
JavaScript/HTML rendering of data

Cloud Build

Cloud Build runs docker commands in sequence
Triggered via API call or git commit
Useful for batched services
As any code can be in container, can combine R with other languages
Cloud Build runs cloudbuild.yaml scripts that call Docker containers

Example cloudbuild.yml

steps:
- name: 'gcr.io/cloud-builders/docker'
  id: Docker Version
  args: ["version"]
- name: 'alpine'
  id:  Hello Cloud Build
  args: ["echo", "Hello Cloud Build"]
- name: 'rocker/r-base'
  id: Hello R
  args: ["Rscript", "-e", "paste0('1 + 1 = ', 1+1)"]

Polygot cloudbuild.yml

steps:
- name: gcr.io/gcer-public/gago:master
  args:
  - reports
  - --view=81416156
  - --dims=ga:date,ga:medium
  - --mets=ga:sessions
  - --start=2014-01-01
  - --end=2019-11-30
  - --antisample
  - --max=-1
  - -o=google_analytics.csv
  id: download google analytics
  dir: build
  env: GAGO_AUTH=/workspace/auth.json
- name: gcr.io/cloud-builders/gsutil
  args:
  - cp
  - gs://mark-edmondson-public-read/polygot.Rmd
  - /workspace/build/polygot.Rmd
  id: download Rmd template
- name: gcr.io/gcer-public/packagetools:master
  args:
  - Rscript
  - -e
  - |-
    lapply(list.files('.', pattern = '.Rmd', full.names = TRUE),
                 rmarkdown::render, output_format = 'html_document')
  id: render rmd
  dir: build

Continuous Development with Cloud Build

Set up a build trigger for the GitHub repo you commit the Dockerfile to:

Cloud Build Use Cases

Scheduled R batch scripts
Continuous development/integration
Docker image builds
Long running processes (24hrs)
Language neutral yaml format to share dataflows

Cloud Build app

Pre-authenticated APIs for the IIH Team
Shiny App running on Google Kubernetes Engine
Share cloudbuild.yml files with pre-made jobs like GA import into BigQuery

R and GCP Community

GoogleNext19 - Data Science at Scale with R on GCP

A 40 mins talk at Google Next19 with lots of new things to try!

https://www.youtube.com/watch?v=XpNVixSN-Mg&feature=youtu.be

next-intro

New concepts

Great video that goes more into Spark clusters, Jupyter notebooks, training using ML Engine and scaling using Seldon on Kubernetes that I haven’t tried yet

next19

bigrquery integration with dplyr

Use dplyr R code across datasets including BigQuery (from https://rpubs.com/shivanandiyer/BigRQuery)

library(bigrquery) # R Interface to Google BigQuery API  
library(dplyr) # Grammar for data manipulation  
library(DBI) # Interface definition to connect to databases 

bq_conn <-  dbConnect(bigquery(), 
                      project = "project-id",
                      dataset = "dataset-id", 
                      use_legacy_sql = FALSE)
                      
bq_table <- dplyr::tbl(bq_conn, "my-table")

Use standard dplyr code that translates to BigQuery SQL behind the scenes

top_10 <-
  bq_table %>% 
    group_by(my_column) %>% 
    summarise_all(funs(sum)) %>% 
    arrange(desc(offence)) %>% 
    top_n(10)

Conclusions

Take-aways

Anything scales on Google Cloud Platform, including R
Docker docker docker
Try library(googleCloudRunner) for latest thinking - https://code.markedmondson.me/googleCloudRunner/

Gratitude

Thank you for listening
Thanks to Moe for inviting me
Thanks to RStudio for all their cool things. Support them by buying their stuff.
Thanks again to Rocker
Thanks to Google for Developer Expert programme and building cool stuff.

Say hello afterwards

@HoloMarkeD
https://code.markedmondson.me
Contact us at IIH Nordic to build scale for you mark@iihnordic.com

fgf

Use Cases for R on the Google Cloud Platform

Mark Edmondson (@HoloMarkeD)

October 14th, 2020

Credentials

My R Timeline

GA Effect

googleAuthRverse

Why R for digital marketing

Data Science programming

Where R sits

Why R in the (Google) Cloud?

Google Cloud Platform - Serverless Pyramid

Scale (almost) always starts with Docker containers

Dockerfiles from The Rocker Project

Thanks to Rocker Team

Dockerfiles

Docker + R = R in Production

Scaling R scripts, Shiny apps and APIs

Strategies to scale R

Google Cloud Platform - Serverless Pyramid

googleCloudRunner - serverless scaling

Cloud Run

Cloud Run Pros/Cons

plumber APIs

Adapt plumber API for your R needs

Cloud Run Docker file

Demo Cloud Run R application

Cloud Run - R Use Cases

Cloud Build

Example cloudbuild.yml

Polygot cloudbuild.yml

Continuous Development with Cloud Build

Cloud Build Use Cases

Cloud Build app

R and GCP Community

GoogleNext19 - Data Science at Scale with R on GCP

New concepts

bigrquery integration with dplyr

Use standard dplyr code that translates to BigQuery SQL behind the scenes

Conclusions

Take-aways

Gratitude

Say hello afterwards

Use Cases for R on the
Google Cloud Platform