googleAuthRhttps://app.iihnordic.dk/ga-effect/
searchConsoleRgoogleAuthRgoogleAnalyticsRgoogleComputeEngineR (cloudyr)bigQueryR (cloudyr)googleCloudStorageR (cloudyr)googleLanguageR (rOpenSci)googleCloudRunner (NEW!)Slack group to talk around the packages #googleAuthRverse
data.frame objectshttps://www.rocker-project.org/
Maintain useful R images
rocker/r-verrocker/rstudiorocker/tidyverserocker/shinyrocker/ml-gpuFROM rocker/tidyverse
MAINTAINER Mark Edmondson (r@sunholo.com)
# install R package dependencies
RUN apt-get update && apt-get install -y \
libssl-dev
## Install packages from CRAN
RUN install2.r --error \
-r 'http://cran.rstudio.com' \
googleAuthR \
googleComputeEngineR \
googleAnalyticsR \
searchConsoleR \
googleCloudStorageR \
bigQueryR \
## install Github packages
&& installGithub.r MarkEdmondson1234/youtubeAnalyticsR Flexible No need to ask IT to install R places, use docker run; across cloud platforms; ascendant tech
Version controlled No worries new package releases will break code
Scalable Run multiple Docker containers at once, fits into event-driven, stateless serverless future
Good for R APIs
Pros
Auto-scaling
Scale from 0
Simple to deploy
https / authentication embedded
Cons
Needs stateless, idempotent workflows
Limited support for Shiny
Make an API out of your script:
#' Echo the parameter that was sent in
#' @param msg The message to echo back.
#' @get /echo
function(msg=""){
list(msg = paste0("The message is: '", msg, "'"))
}
#' Plot out data from the iris dataset
#' @param spec If provided, filter the data to only this species (e.g. 'setosa')
#' @get /plot
#' @png
function(spec){
myData <- iris
title <- "All Species"
# Filter if the species was specified
if (!missing(spec)){
title <- paste0("Only the '", spec, "' Species")
myData <- subset(iris, Species == spec)
}
plot(myData$Sepal.Length, myData$Petal.Length,
main=title, xlab="Sepal Length", ylab="Petal Length")
}Based on:
FROM trestletech/plumber
COPY [".", "./"]
ENTRYPOINT ["R", "-e", "pr <- plumber::plumb(commandArgs()[4]); pr$run(host='0.0.0.0', port=as.numeric(Sys.getenv('PORT')))"]
CMD ["api.R"]
Can scale to a billion, and be available for other languages.
steps:
- name: gcr.io/gcer-public/gago:master
args:
- reports
- --view=81416156
- --dims=ga:date,ga:medium
- --mets=ga:sessions
- --start=2014-01-01
- --end=2019-11-30
- --antisample
- --max=-1
- -o=google_analytics.csv
id: download google analytics
dir: build
env: GAGO_AUTH=/workspace/auth.json
- name: gcr.io/cloud-builders/gsutil
args:
- cp
- gs://mark-edmondson-public-read/polygot.Rmd
- /workspace/build/polygot.Rmd
id: download Rmd template
- name: gcr.io/gcer-public/packagetools:master
args:
- Rscript
- -e
- |-
lapply(list.files('.', pattern = '.Rmd', full.names = TRUE),
rmarkdown::render, output_format = 'html_document')
id: render rmd
dir: buildSet up a build trigger for the GitHub repo you commit the Dockerfile to:
A 40 mins talk at Google Next19 with lots of new things to try!
https://www.youtube.com/watch?v=XpNVixSN-Mg&feature=youtu.be
Great video that goes more into Spark clusters, Jupyter notebooks, training using ML Engine and scaling using Seldon on Kubernetes that I haven’t tried yet
Use dplyr R code across datasets including BigQuery (from https://rpubs.com/shivanandiyer/BigRQuery)
library(bigrquery) # R Interface to Google BigQuery API
library(dplyr) # Grammar for data manipulation
library(DBI) # Interface definition to connect to databases
bq_conn <- dbConnect(bigquery(),
project = "project-id",
dataset = "dataset-id",
use_legacy_sql = FALSE)
bq_table <- dplyr::tbl(bq_conn, "my-table") library(googleCloudRunner) for latest thinking - https://code.markedmondson.me/googleCloudRunner/