RUM collection done right

Arto Kylmanen

Nov. 22, 2023


Even today, PageSpeed is a somewhat misunderstood concept among SEO professionals. I was highly engaged into Core Web Vitals even before the term was coined, aiming primarily to improve conversion combined with my firm belief that Google highly uses engagement in deciding quality of a given site.

I've been setting up RUM monitoring and reporting setups for at least 7 years. This has allowed me to gain a lot of experience in what should be collected and how this data can be used.

Today, I'm going to share with you a general framework I am using for websites that need RUM tracking. If you want to follow along, you'll need GTM (publish rights) and GA4 (editor rights) available to you and ability to understand JavaScript and Browser APIs.

Outcome

Learn how to collect RUM data accompanied with auxiliary information that can provide you actionable insights into under what conditions your product performs poorly.

Scopes

First, we must establish some scopes here - it's important. Performance is not as straightforward to track as e.g conversion, because it has many deterministic and non-deterministic variables that highly affect the final metrics. Because of this, we want to collect a bit more data than just the raw Core Web Vitals.
Here we can circle to my first point - only collecting raw LCP without having data about it's surrounding conditions is not too helpful if we're looking to pinpoint (and subsequently solve) the problem.

Here is a list of scopes that I believe give us deeper understanding on what's going on:

Here is the full list of datapoints we want to collect:

  1. Performance Metric Name (string)
  2. Performance Metric Value (float)
  3. CDN server used (string)
  4. CDN Cache Hit (string)
  5. Device RAM (float)
  6. User Agent (string)
  7. Connection Type (string)
  8. Estimated network speed (float)

Tech needed

There are 4 technologies we need to setup all of the tracking:

Setup

We'll begin by setting up the collection of the Metrics themselves. This has been made easy for us, as there is an excellent tag template available
https://tagmanager.google.com/gallery/#/owners/gtm-templates-simo-ahava/templates/core-web-vitals

It's made by a great Technical SEO, Simo Ahava. He has implementation tutorial for it on his website, so please follow that tutorial on tracking core web vitals in GA4.


After completion, you should have something like this (I don't add all the variables he adds in the tutorial, personal choice)

CWV Tag After core template install in GTM

Run a quick debug to make sure that the collection is working there.

Adding additional data

Now let's move the additional performance data.

We start with CDN Data Collection - in this case we take a look at Cloudflare CDN. It returns some information about the connection in the response headers. Using JavaScript, we can collect this data and push it to the datalayer.

JavaScript to collect RUM Data
<script type="text/javascript">
  (function() {
    var xhr = new XMLHttpRequest();
    xhr.open('HEAD', '/', true);
    xhr.onreadystatechange = function() {
      if (xhr.readyState === 4 && xhr.status === 200) {
        var cfRay = xhr.getResponseHeader('cf-ray');
        var cfCacheStatus = xhr.getResponseHeader('cf-cache-status');

        if (cfRay) {
          var cfRayParts = cfRay.split('-');
          var cfDataCenter = cfRayParts[cfRayParts.length - 1];
          window.dataLayer = window.dataLayer || [];
          window.dataLayer.push({
            'event': 'cf_data_center',
            'cf-ray-check': {
              'data_center_id': cfDataCenter,
              'cache_status': cfCacheStatus,
            }
          });
        }
      }
    };
    xhr.send(null);
  })();
</script>

Do a quick debug to make sure it's all working. After that, we follow a highly similar structure as with the Core Web Vital collection to push this data to GA4.

We're going to create 2 new Datalayer variables as follows

"DLV - Cloudflare Cache Status" with value of

cf-ray-check.cache_status

AND

"DLV - Cloudflare Datacenter ISO Code" with value of

cf-ray-check.data_center_id

Then, let's create a new tag, called Cloudflare Data and add the variables. Set the trigger to Window Loaded

Cloudflare data collection tag setup

Rest of the performance data

Next, let's move on to collecting the device and network data. This will also be done with JavaScript tag and pushed to datalayer.

However, before we move on, we quickly want to clarify on how this data works and how to interpret it. Due to anti-fingerprinting technologies, it's not exactly possible to get 100% accurate data on these auxiliary points.

The "Effective Connection Type" can have one of the following values:

On top of this, this is not based on the actual connection type, but rather on the estimation of the browser on the quality of the connection type. So, 3G output can also be a poor 4G connection.

Getting exact measurements of RAM is information that is considered personal. So the output of the code consists of rounding down to the nearest power of 2 and then converting to GB by dividing with 1024. Afterward, it's confined within specific lower (0.25 GB) and upper (8 GB) limits to safeguard the privacy of users at extreme ends of the spectrum. Same concept applies to estimated download speed.

Make sure that you understand the implications of this in terms of reporting and insight mining.


The code itself is straightforward - we use several browser properties to just be pushed to datalayer. Add a new Custom HTML tag to GTM with following contents. Window loaded as the trigger here as well.

<script type="text/javascript">
  var perf_memory = navigator.deviceMemory
  var perf_connection_type = navigator.connection.effectiveType
  var perf_connection_speed = navigator.connection.downlink
  window.dataLayer = window.dataLayer || [];
          window.dataLayer.push({
            'event': 'aux_performance',
            'aux_performance_data': {
              'perf_memory': perf_memory,
              'perf_connection_type': perf_connection_type,
              'perf_connection_speed': perf_connection_speed
            }
          });
</script>

Quick QA confirms that the tag is firing and pushing to datalayer. Then we just setup everything the same way as with Cloudflare tracker - create the variables and push the to GA4 with event tag.

Auxiliary Performance Data tag setup

Please note that we are skipping the custom dimension and metric configurations from GA4. This is because they would primarily be used if you intend to link GA4 directly to Looker Studio. On June 16th 2023 there was a quite poorly documented update for Looker Studio, where all direct GA4 data was changed to auto aggregation. This makes it impossible to re-aggregate metrics, which furthermore makes linking these dimensions into reports that make sense pretty much impossible. This change broke all of my reports that had direct GA4 <> Looker Studio connection and I had to change to alternative configurations.

If you're looking to implement this kind of tracking for your products and need help getting insightful reporting setup contact me on LinkedIn.

Back to index

Copied to clipboard