googleAuthR
searchConsoleR
googleAuthR
googleAnalyticsR
googleComputeEngineR
(cloudyr)bigQueryR
(cloudyr)googleCloudStorageR
(cloudyr)googleLanguageR
(rOpenSci)googleCloudRunner
(NEW!)Slack group to talk around the packages #googleAuthRverse
data.frame
objectshttps://www.rocker-project.org/
Maintain useful R images
rocker/r-ver
rocker/rstudio
rocker/tidyverse
rocker/shiny
rocker/ml-gpu
FROM rocker/tidyverse
MAINTAINER Mark Edmondson (r@sunholo.com)
# install R package dependencies
RUN apt-get update && apt-get install -y \
libssl-dev
## Install packages from CRAN
RUN install2.r --error \
-r 'http://cran.rstudio.com' \
googleAuthR \
googleComputeEngineR \
googleAnalyticsR \
searchConsoleR \
googleCloudStorageR \
bigQueryR \
## install Github packages
&& installGithub.r MarkEdmondson1234/youtubeAnalyticsR
Flexible No need to ask IT to install R places, use docker run
; across cloud platforms; ascendant tech
Version controlled No worries new package releases will break code
Scalable Run multiple Docker containers at once, fits into event-driven, stateless serverless future
Good for R APIs
Pros
Auto-scaling
Scale from 0
Simple to deploy
https / authentication embedded
Cons
Needs stateless, idempotent workflows
Limited support for Shiny
Make an API out of your script:
#' Echo the parameter that was sent in
#' @param msg The message to echo back.
#' @get /echo
function(msg=""){
list(msg = paste0("The message is: '", msg, "'"))
}
#' Plot out data from the iris dataset
#' @param spec If provided, filter the data to only this species (e.g. 'setosa')
#' @get /plot
#' @png
function(spec){
myData <- iris
title <- "All Species"
# Filter if the species was specified
if (!missing(spec)){
title <- paste0("Only the '", spec, "' Species")
myData <- subset(iris, Species == spec)
}
plot(myData$Sepal.Length, myData$Petal.Length,
main=title, xlab="Sepal Length", ylab="Petal Length")
}
Based on:
FROM trestletech/plumber
COPY [".", "./"]
ENTRYPOINT ["R", "-e", "pr <- plumber::plumb(commandArgs()[4]); pr$run(host='0.0.0.0', port=as.numeric(Sys.getenv('PORT')))"]
CMD ["api.R"]
Can scale to a billion, and be available for other languages.
steps:
- name: gcr.io/gcer-public/gago:master
args:
- reports
- --view=81416156
- --dims=ga:date,ga:medium
- --mets=ga:sessions
- --start=2014-01-01
- --end=2019-11-30
- --antisample
- --max=-1
- -o=google_analytics.csv
id: download google analytics
dir: build
env: GAGO_AUTH=/workspace/auth.json
- name: gcr.io/cloud-builders/gsutil
args:
- cp
- gs://mark-edmondson-public-read/polygot.Rmd
- /workspace/build/polygot.Rmd
id: download Rmd template
- name: gcr.io/gcer-public/packagetools:master
args:
- Rscript
- -e
- |-
lapply(list.files('.', pattern = '.Rmd', full.names = TRUE),
rmarkdown::render, output_format = 'html_document')
id: render rmd
dir: build
Set up a build trigger for the GitHub repo you commit the Dockerfile to:
A 40 mins talk at Google Next19 with lots of new things to try!
https://www.youtube.com/watch?v=XpNVixSN-Mg&feature=youtu.be
Great video that goes more into Spark clusters, Jupyter notebooks, training using ML Engine and scaling using Seldon on Kubernetes that I haven’t tried yet
Use dplyr
R code across datasets including BigQuery (from https://rpubs.com/shivanandiyer/BigRQuery
)
library(bigrquery) # R Interface to Google BigQuery API
library(dplyr) # Grammar for data manipulation
library(DBI) # Interface definition to connect to databases
bq_conn <- dbConnect(bigquery(),
project = "project-id",
dataset = "dataset-id",
use_legacy_sql = FALSE)
bq_table <- dplyr::tbl(bq_conn, "my-table")
library(googleCloudRunner)
for latest thinking - https://code.markedmondson.me/googleCloudRunner/