EARL London - 13th-15th September 2016

Hello

  • I'm Mark Edmondson (@HoloMarkeD)
  • Englishman living in Copenhagen since 2010
  • Data Insight Developer for IIH Nordic
  • Google Developer Expert for Google Analytics
  • RStudio Shiny Advocate

Slow websites lose money

Google and Bing study: ~1% loss of revenue for every 500ms slower

Why is Chrome so fast?

One reason is Google Chrome uses prefetch

Google Chrome's secret - it predicts your browsing and prefetches URL

Prefetch tag makes Chrome (and websites) faster by loading next page.

<link rel="dns-prefetch" href="//widget.com">
<link rel="preconnect" href="//cdn.example.com">
<link rel="prefetch" href="//example.com/next-page.html">
<link rel="prerender" href="//example.com/thankyou.html">

And we can use it too - can be inserted dynamically:

var hint = document.createElement("link")
hint.setAttribute("rel","prerender")
hint.setAttribute("href","next-page.html")
document.getElementsByTagName("head")[0].appendChild(hint)

Prefetch browser coverage

Can R predict the next page?

Can we use R to supply 'href' and 'pr'?

var hint = document.createElement("link")
hint.setAttribute("rel","prerender")

hint.setAttribute("href", "next-page.html")

document.getElementsByTagName("head")[0].appendChild(hint)

Can we predict quick enough to dynamically add it to the page before a user clicks?

  • Yes - using R, OpenCPU and Google Tag Manager

The deployment

OpenCPU and Google Tag Manager

OpenCPU

Google Tag Manager

  • Free tag management system on Google infrastructure
  • JavaScript container you can edit remotely
  • DataLayer object to manage data in centralised manner
  • Deploy analytics tracking, beacons or any JavaScript
  • See also: DTM for Adobe Analytics, Tealium for paid features

OpenCPU helps productionise your R code

  • OpenCPU - used by your R team to come up with the models to a form a web developer can use
  • Google Tag Manager lets you deploy OpenCPU output
  • Only one method of deployment, a more stable execution would involve the client web development team working directly with the OpenCPU JSON
  • But Google Tag Manager great for situations where you can't do that, or for proof of concept

Data flow diagram

data architecture

R - Creating the model

Get Google Analytics Data

Extracting data per user using googleAnalyticsR

library(googleAnalyticsR)
ga_auth()
gaId <- xxxx # Your View ID

## In this case, dimension3 contains userId in format:
## u={cid}&t={hit-timestamp}
raw <- google_analytics_4(gaId,
                          date_range = c("2016-02-01","2016-02-01"),
                          metrics = c("pageviews"),
                          dimensions = c("dimension3", "pagePath"),
                          max = -1)

Or extract via BigQuery if you have Google Analytics 360 via google_analytics_bq()

Raw Data

dimension3 pagePath pageviews
u=100116318.1454322382&t=1454322382033 /example/809 1
u=100116318.1454322382&t=1454322412130 /example/1212 1
u=100116318.1454322382&t=1454322431492 /example/339 1
u=100116318.1454322382&t=1454322441120 /example/1494 1
u=100116318.1454322382&t=1454322450156 /example/339 1
u=100116318.1454322382&t=1454322461871 /example/1703 1

GA data - after processing

cid sessionLen timestamp pagePath pageviews
1005103157.1454327958 2 2016-02-01 12:59:18 /example/1 1
1005103157.1454327958 2 2016-02-01 13:02:42 /example/155 1
1010303050.1454327644 2 2016-02-01 12:54:03 /example/144 1
1010303050.1454327644 2 2016-02-01 13:00:03 /example/80 1
1011007665.1454333263 2 2016-02-01 14:27:43 /example/1359 1

GA data - fit for model

Our model library markovchain needs a vector of sequential pageviews per userId.

## for each cid, split pagePath in timestamp order
sequenceVD <- processed %>% select(cid, timestamp, pagePath) %>%
  group_by(cid) %>% arrange(timestamp) %>% 
  distinct(pagePath) %>%
  mutate(step = row_number(), n = n()) %>% arrange(cid) %>%
  filter(n > 1) %>% select(-n) %>%
  spread(step, pagePath)

Model data

cid 1 2 3 4
1000641120.1465683551 /da/a-z/6236/2665 /da/a-z/6236/2670 NA NA
1001334948.1469706364 /da/a-z/6236/2660 /da/a-z/6236 NA NA
1003589990.1471286236 /da/a-z/6236/2707 /da/a-z/6236/2660 NA NA
1003723352.1470269948 /da/a-z/6236/2707 /da/a-z/6236/2660 NA NA
1004139521.1469437411 /da/a-z/6236/2660 /da/a-z/6236/2707 NA NA
1004647640.1468402554 /da/a-z/6236/2678 /da/a-z/6236/2714 /da/a-z/6236/2670 NA

Create Model

Create a Markov chain model of first order

library(markovchain)

model <- markovchainListFit(sequenceVD[,2:10], name = "seq")

## save model for use on OpenCPU
save(model, file="./data/model.rda")

Using model for predictions

Predictions now we have built the model object.

library(markovchain)

## make predictions
predict(model$estimate, newdata = "/da/a-z/6236/2665")

## prediction output
Sequence: /example/251

Visualisation of the model

Deploying to OpenCPU via Github

OpenCPU allows webhooks to Github: updates the model everytime you push to Github

Create a small custom package with the model data and the function to predict pageviews

predictNextPage <- function(current_url){
  out <- try(predict(model, newdata = current_url), silent = TRUE)
  if(inherits(out, "try-error")){
    out <- "None"
  }
  out
}

Example package for loading model prediction into OpenCPU

Calling OpenCPU

Using GTM

GTM calling OpenCPU

GTM Add Prefetch

Putting it into practice

Demo website

Test prerender

Client website

Preliminary results

Summary

  • R can speed up websites via prefetch prediction
  • OpenCPU turns R into JSON, GTM deploys JSON onto websites easily
  • R can be quickly deployed through GTM to act upon realtime website data
  • Page prediction just one application of this data infrastructure

Further work

  • Tune the model
  • Feed R forecasts into DMPs, AdWords, Display
  • Client auto-segmentation for content recommendations
  • Have you any ideas of applications?

Thank You