vignettes/usecase-slackbot-google-analytics.Rmd
usecase-slackbot-google-analytics.Rmd
This use case will use Cloud Run to create an R API that Slack can interact with.
The application will download some information from Google Analytics and send it to Slack for review. I’m most interested in what new websites are linking to my blog so will fetch that data for the endpoint, but this can be adapted upon for your own needs.
The workflow needed is:
This will use googleAnalyticsR
to get the data. I want
to see what sessions came in the last 30 days with a referrer I haven’t
seen before in the previous year.
library(googleAnalyticsR)
library(dplyr)
ga_id <- 12345678
# get last years referrer data
two_years <- google_analytics(
ga_id,
date_range = c(Sys.Date()-(365), Sys.Date()),
dimensions = c("date","fullReferrer","landingPagePath"),
metrics = "sessions",
rows_per_call = 50000,
max = -1)
last30Days <- two_years %>% filter(date >= Sys.Date() - 30)
previousDays <- two_years %>% filter(date < Sys.Date() - 30)
# the referrers seen in last30days but not previously
new_refs <- setdiff(unique(last30Days$fullReferrer),
unique(previousDays$fullReferrer))
last_30_new_refs <- last30Days %>% filter(fullReferrer %in% new_refs)
Now I’ll put the call behind a plumber API endpoint. The script will assume authentication will come from a local json auth file, that will be in the deployment. To make it a bit more general, the GA viewId will be able to be sent as a parameter, that will work for all GA accounts the auth file has access to.
library(googleAnalyticsR)
library(dplyr)
# the function will be called from the endpoints
do_ga <- function(ga_id){
# get last years referrer data
two_years <- google_analytics(
ga_id,
date_range = c(Sys.Date()-(365), Sys.Date()),
dimensions = c("date","fullReferrer","landingPagePath"),
metrics = "sessions",
rows_per_call = 50000,
max = -1)
last30Days <- two_years %>% filter(date >= Sys.Date() - 30)
previousDays <- two_years %>% filter(date < Sys.Date() - 30)
# the referrers seen in last30days but not previously
new_refs <- setdiff(unique(last30Days$fullReferrer),
unique(previousDays$fullReferrer))
last_30_new_refs <- last30Days %>%
filter(fullReferrer %in% new_refs)
last_30_new_refs
}
#' @get /
#' @serializer html
function(){
"<html><h1>It works!</h1></html>"
}
#' @get /last-30-days
#' @serializer csv
function(ga_id){
# get last years referrer data
do_ga(ga_id)
}
To support the plumber API a server file is also created:
pr <- plumber::plumb("api.R")
pr$run(host='0.0.0.0', port=as.numeric(Sys.getenv('PORT')), swagger=TRUE)
And a Dockerfile that needs plumber, googleAnalyticsR and readr
FROM rstudio/plumber
RUN install2.r --error \
-r 'http://cran.rstudio.com' \
googleAnalyticsR readr
COPY ["./", "./"]
ENTRYPOINT ["Rscript", "server.R"]
The files above are put into a folder “slackbot”
|
|- api.R
|- Dockerfile
|- server.R
The app is now deployed to Cloud Run via
cr_deploy_plumber()
library(googleCloudRunner)
cr_deploy_plumber("slackbot/")
#ℹ 2021-03-19 10:09:23 > Using existing Dockerfile found in folder
#ℹ 2021-03-19 10:09:23 > Uploading inst/slackbot/ folder for Cloud Run
#ℹ 2021-03-19 10:09:23 > Dockerfile found in inst/slackbot/
#
#── #Deploy docker build for image: gcr.io/your-project/slackbot ─────────────────
#
#── #Upload inst/slackbot/ to gs://your-bucket/slackbot.tar.gz ───────
#ℹ 2021-03-19 10:09:23 > Uploading slackbot.tar.gz to your-bucket/slackbot.tar.gz
#ℹ 2021-03-19 10:09:23 > File size detected as 789 bytes
#ℹ 2021-03-19 10:09:23 > Google Cloud Storage Source enabled: /workspace/deploy/
#ℹ 2021-03-19 10:09:24 > Cloud Build started - logs:
# https://console.cloud.google.com/cloud-build/builds/0f57a7ae-d64d-4d96-85ec-9e2ad64de52d?project=1080525199262
#ℹ Starting Cloud Build
#→ Status: WORKING
# ...
#
# ── #> Launching CloudRun image: gcr.io/your-project/slackbot:6a97870b-038b-4fe3-a3ea-4e5e6876c1aa ──────────────────────
# ...
# ── #> Running at: https://slackbot-asfkbkdf-ew.a.run.app ───────────────────────────────────────────────────────────────────
#==CloudRunService==
#name: slackbot
#location: europe-west1
#lastModifier: 12345@cloudbuild.gserviceaccount.com
#containers: gcr.io/your-project/slackbot:6a97870b-038b-4fe3-a3ea-4e5e6876c1aa
#creationTimestamp: 2021-03-19T09:14:48.727922Z
#observedGeneration: 2
#url: https://slackbot-asfkbkdf-ew.a.run.app
It will take a few minutes to first build the Docker image (cup of
tea time) then deploy it to Cloud Run. The Docker image building will be
quicker the next time it builds due to the
kaniko_cache=TRUE
default enabling skipping over R library
builds in the cache.
Once finished the build should return the Cloud Run URL.
The homepage should work but the Google Analytics endpoint
(/last-30days?ga_id=1234567
) won’t yet as we haven’t
included the authentication file.
To make working and updating the API easier, a build trigger will build the Dockerfile and deploy the API upon each git commit.
The project files above are added to a git repository and published
on say GitHub, and the GitHub repo is added to Cloud Build as described
in the Build Triggers docs - in my case its the googleCloudRunner’s
own repo and I put everything in the inst/slackbot
folder
so that alters the directory in the buildsteps below.
Once the files are on GitHub a build trigger is created which will
mimic what cr_deploy_plumber()
did earlier - it will build
the Dockerfile and re-deploy the Cloud Run app. To have more control, we
now use cr_buildstep()
templates to make the build
ourselves.
If you tried to run the GA endpoint in the logs you can see that the
default Client Id for Google Analytics won’t allow non-interactive use -
we need to use our own clientId and auth file. These sensitive files are
uploaded to Secret
Manager so they can be used more securely within builds via
cr_buildstep_secret()
library(googleCloudRunner)
bs <- c(
# get secret files
cr_buildstep_secret("mark-edmondson-gde-clientid", "client.json",
dir = "inst/slackbot/"),
cr_buildstep_secret("googleanalyticsr-tests","auth.json",
dir = "inst/slackbot/"),
# build with the same name as deployed
cr_buildstep_docker(
image = "slackbot",
kaniko_cache = TRUE,
dir = "inst/slackbot/"
),
#deploy the app
cr_buildstep_run(
"slackbot",
image = "gcr.io/$PROJECT_ID/slackbot:$BUILD_ID",
memory = "1Gi",
env_vars = c("GAR_CLIENT_JSON=client.json",
"GA_AUTH_FILE=auth.json")
)
)
# increase timeout to 20 mins
build_yml <- cr_build_yaml(bs, timeout = 2400)
build_obj <- cr_build_make(build_yml)
# setup trigger of build
repo <- cr_buildtrigger_repo("MarkEdmondson1234/googleCloudRunner")
cr_buildtrigger(
build_obj, name = "slackbot-trigger", trigger = repo,
description = "Deploying the Slackbot example",
includedFiles = "inst/slackbot/**"
)
I now reference Slack’s dev
docs on how to create a Webhook URL which is of the form
https://hooks.slack.com/services/XXXX6/B0YYYYYY/3rNZZZZZ
There is also useful info on the formatting of
the messaging and you can make test calls with curl
and
httr::POST()
.
I will use httr::POST()
in the API to pass along the GA
data, so now is a good time to test with the Slack format.
library(httr)
slack_url <- "https://hooks.slack.com/services/XXXX6/B0YYYYYY/3rNZZZZZ"
# it works!
POST(slack_url, verbose(), body = list(text = "Hello World"), encode = "json")
# what does CSV format looks like?
the_body <- list(
text = paste0("```\n",
paste0(collapse = "\n",knitr::kable(mtcars)),
"```\n"
)
)
POST(slack_url, verbose(), body = the_body, encode = "json")
I played around with what exactly to send to Slack for a data.frame and ended up with the above, which looked like this in Slack:
It won’t work with large tables, so limit the table down if using kable, or in the end I ended up with just a list of the referral URLs:
To call the Slack bot, the API needs to collect the data then make a HTTP request itself to pass on the data to the Slack URL. I don’t want to include the Slack URL in the code, so I’ll also alter the Build Trigger to accept a substitution variable in the build, that will hold the Slack URL - this way I can swap out to other Slack webhooks more easily.
library(googleCloudRunner)
bs <- c(
# get secret files
cr_buildstep_secret("mark-edmondson-gde-clientid", "client.json",
dir = "inst/slackbot/"),
cr_buildstep_secret("googleanalyticsr-tests","auth.json",
dir = "inst/slackbot/"),
# build with the same name as deployed
cr_buildstep_docker(
image = "slackbot",
kaniko_cache = TRUE,
dir = "inst/slackbot/"
),
#deploy the app
cr_buildstep_run(
"slackbot",
image = "gcr.io/$PROJECT_ID/slackbot:$BUILD_ID",
memory = "1Gi", # make the cloud run app a bit bigger
env_vars = c("GAR_CLIENT_JSON=client.json",
"GA_AUTH_FILE=auth.json",
"SLACK_URL=$_SLACK_URL") # will get from substitution variable
)
)
# increase timeout to 20 mins
build_yml <- cr_build_yaml(bs, timeout = 2400)
build_obj <- cr_build_make(build_yml)
# setup trigger of build
repo <- cr_buildtrigger_repo("MarkEdmondson1234/googleCloudRunner")
# your own slack URL
slack_url <- "https://hooks.slack.com/services/XXXX6/B0YYYYYY/3rNZZZZZ"
cr_buildtrigger(
build_obj, name = "slackbot-trigger", trigger = repo,
description = "Deploying the Slackbot example",
includedFiles = "inst/slackbot/**",
substitutions =
list(
`_SLACK_URL`=slack_url # used in the builds, can modify in build Web UI
),
overwrite=TRUE
)
The R API is now also modified to include the new endpoint
/trigger-slack
:
library(googleAnalyticsR)
library(dplyr)
library(httr)
#...as above...
#' @get /trigger-slack
#' @serializer json
function(ga_id){
# get last years referrer data
last_30_new_refs <- do_ga(ga_id)
the_data <- unique(last_30_new_refs$fullReferrer)
the_body <- list(
text = paste0("GoogleAnalyticsLast30DaysNewReferrals\n```\n",
paste0(collapse = "\n",the_data),
"\n```\n")
)
# get the Slack URL from an env var
slack_url <- Sys.getenv("SLACK_URL")
res <- POST(slack_url, body = the_body, encode = "json")
list(slack_http_response = res$status_code)
}
We also add knitr to help with rendering the output in the Dockerfile (httr is already a dependency for googleAnalyticsR):
FROM rstudio/plumber
RUN install2.r --error \
-r 'http://cran.rstudio.com' \
googleAnalyticsR readr knitr
COPY ["./", "./"]
ENTRYPOINT ["Rscript", "server.R"]
After making the changes to the Dockerfile and api.R, we commit the changes to redeploy via the Build Trigger (~5mins)
After deployment it should now be able to respond to the GA endpoint:
library(httr)
ga_id <- 1234557
my_app <- "https://slackbot-xxxx-ew.a.run.ap"
res <- GET(paste0(my_app, "/last-30-days?ga_id=",ga_id))
content(res)
Check the logs for Cloud Build and Cloud Run to see if its all working.
The first time I tried it the memory was too low for the app, which is why above it specifies “1Gi” for the Cloud Run size.
But if all goes well then you should be able to see the CSV file come back. (yay!)
The URL endpoint /trigger-slack?ga_id=12345
should send
the same data you see at /last-30-days?ga_id=12345
but send
that data to Slack:
library(httr)
ga_id <- 1234557
my_app <- "https://slackbot-xxxx-ew.a.run.ap"
res <- GET(paste0(my_app, "/trigger-slack?ga_id=",ga_id))
content(res)
Now we can schedule the URL via Cloud Scheduler. I’d like it every Friday with my coffee but before my tea.
ga_id <- 1233456
my_app <- "https://slackbot-xxxxxxx-ew.a.run.app"
my_endpoint <- paste0(my_app,"/trigger-slack?ga_id=",ga_id)
http_target <- HttpTarget(
uri = my_endpoint,
httpMethod = "GET"
)
# every Friday at 0830
cr_schedule("slackbot-schedule",
schedule = "30 8 * * 5",
httpTarget = http_target,
overwrite = TRUE)
Once the app is deployed you can spend most of your time developing the R code rather than worrying about the deployment, which upon each commit will recreate itself.
You could try adding other endpoints for different GA functions, or making the API/webhook compatible with other services other than Slack.
You may also want to make the Cloud Run API private, and should if
its controlling private of expensive operations, so then look at
cr_jwt_create()
or cr_run_schedule_http()
as
your private API options.