Automate Performance Laboratory Testing

Arto Kylmanen

Nov. 8, 2023


While I'm not a huge fan of laboratory testing, I have to admit it can be useful for the purposes of benchmarking. But who wants to run an infinite amount of PSI tests and save results in a Google Sheets file? Even when working with a single product, you might want to run performance tests on all pages instead of just a homepage.

Here's a small tutorial on how to (pretty much) 100% automate this process. Here is an example. Implementation is dead simple. As we will be relying on Clouflare here, it's free as well. Do I have to say more?

Let's dive in.

Outcome

System that automates the monitoring of CWV Laboratory tests using the PageSpeed Insights API, with results being deployed and displayed with Cloudflare Worker + Pages Combo.

Prerequisites:

Step 1: Get your PSI API Key

To get started with the PageSpeed Insights API, you'll need to obtain an API key from the Google Cloud Platform. If you have an existing key, for example from Screaming Frog integration, that will work just as well here.

If you don't here's how you get one:

Navigate to the Google Cloud Console. Register if needed.

Click on the project drop-down at the top of the page and then click on "New Project".

Give your project a name, such as "Core Web Vitals Checker", and click "Create".

Creating a new project in Google Cloud Platform

Enable the PageSpeed Insights API:

From the burger on the left top corner, click on it an navigate to APIs & Services,

Click on "Enable APIs and Services".

Enable APIs menu

In the API Library, search for "PageSpeed Insights API"

Click on the API from the results and then click "Enable".

Enable PageSpeed Insights API

After enabling the API, click on "Create Credentials" in the API page.

Choose "API key" from the options presented.

Your new API key will be created and displayed to you.

Create a new API Key for the project

Restrict the API Key (Optional but Recommended):

For security, it's a good practice to restrict your API key so that it can only be used by your applications.

Click on "Restrict Key" after your API key is created.

Under "API restrictions", select "Restrict key" and choose "PageSpeed Insights API" from the dropdown.

Step 2: Set Up Your Python Environment

Fire up IDE of your choice - I'll be using PyCharm

Here's a preview of the directory structure we will be using for this project

/psi_automator/

|-- psi_automator.py

|-- urls.txt

|-- .env

|-- /results/

|----- cwv_results.json

Install dependencies
pip install requests python-dotenv

Let's add the API key we made into the .env file like so:

.env
API_KEY=YOURAPIKEY123123123123

Note, that in .env file, we don't encapsulate strings into quotes like in python files.

A word about .env files. You might ask, why are we using those instead of putting the API key directly into code? While it might be easier, we want to keep sensitive information away from the code.

Also, if you use GitHub, it's easy to gitignore .env files.

Overall, keeping sensitive and environment-specific (e.g. you might want to have separate API keys for testing and production) is much easier with .env file as we don't have to change code when deploying.

Step 3: Prepare Your URL List

Create a text file named urls.txt.

Populate it with the URLs you want to monitor, one per line.

List the URLs in urls dot txt file

At the time of writing this article, the API was having some uptime issues - final URL's you will see in the report differ from this image - just FYI

Step 4: Write first version

And here's the boilerplate code. With a dirt easy script like this, we will be making only small improvements.

V1.0
import requests
import os

from dotenv import load_dotenv
# Import the PSI API key from the .env file
load_dotenv()


def check_core_web_vitals(api_key, url_list):
    with open(url_list, 'r') as file:
        url_list = file.read().splitlines()

    # This is where the magic happens
    for url in url_list:
        try:
            response = requests.get(
                f'https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={url}&key={api_key}'
            )
            response.raise_for_status()  # Will raise an exception for HTTP errors
            result = response.json()
            print(result)

        except requests.exceptions.RequestException as e:
            print(f"Failed to retrieve data for {url}: {e}")


if __name__ == '__main__':
    url_file = 'urls.txt'
    check_core_web_vitals(os.environ['API_KEY'], url_file)
Run it
python psi_automator.py
Large JSON response object

And now our terminals got filled with data. On the response that PSI sends (on asuccessful call) is quite a massive and detailed JSON object - for purposes of this article, we will only focus on the metrics themselves. Let's take a look at the response object:

PSI API Response Object simplified
{
  "captchaResult": string,
  "kind": "pagespeedonline#result",
  "id": string,
  "loadingExperience": {
    "id": string,
    "metrics": {
      (key): {
        "percentile": integer,
        "distributions": [
          {
            "min": integer,
            "max": integer,
            "proportion": double
          }
        ],
        "category": string
      }
    },
    "overall_category": string,
    "initial_url": string
  },
  "originLoadingExperience": {
    "id": string,
    "metrics": {
      (key): {
        "percentile": integer,
        "distributions": [
          {
            "min": integer,
            "max": integer,
            "proportion": double
          }
        ],
        "category": string
      }
    },
    "overall_category": string,
    "initial_url": string
  },
  "lighthouseResult": {
    "requestedUrl": string,
    "finalUrl": string,
    "lighthouseVersion": string,
    "userAgent": string,
    "fetchTime": string,
    "environment": {
      "networkUserAgent": string,
      "hostUserAgent": string,
      "benchmarkIndex": double
    },
    "runWarnings": [
      (value)
    ],
    "configSettings": {
      "emulatedFormFactor": string,
      "locale": string,
      "onlyCategories": (value),
      "onlyCategories": (value)
    },
    "audits": {
      (key): {
        "id": string,
        "title": string,
        "description": string,
        "score": (value),
        "score": (value),
        "scoreDisplayMode": string,
        "displayValue": string,
        "explanation": string,
        "errorMessage": string,
        "warnings": (value),
        "warnings": (value),
        "details": {
          (key): (value)
        }
      }
    },
    "categories": {
      (key): {
        "id": string,
        "title": string,
        "description": string,
        "score": (value),
        "score": (value),
        "manualDescription": string,
        "auditRefs": [
          {
            "id": string,
            "weight": double,
            "group": string
          }
        ]
      }
    },
    "categoryGroups": {
      (key): {
        "title": string,
        "description": string
      }
    },
    "runtimeError": {
      "code": string,
      "message": string
    },
    "timing": {
      "total": double
    },
    "i18n": {
      "rendererFormattedStrings": {
        "varianceDisclaimer": string,
        "opportunityResourceColumnLabel": string,
        "opportunitySavingsColumnLabel": string,
        "errorMissingAuditInfo": string,
        "errorLabel": string,
        "warningHeader": string,
        "auditGroupExpandTooltip": string,
        "passedAuditsGroupTitle": string,
        "notApplicableAuditsGroupTitle": string,
        "manualAuditsGroupTitle": string,
        "toplevelWarningsMessage": string,
        "scorescaleLabel": string,
        "crcLongestDurationLabel": string,
        "crcInitialNavigation": string,
        "lsPerformanceCategoryDescription": string,
        "labDataTitle": string
      }
    }
  },
  "analysisUTCTimestamp": string,
  "version": {
    "major": integer,
    "minor": integer
  }
}

So, right now we want to get the metrics for Core Web Vitals, so let's focus on those. They are found in ligthhouseResult > audits.

So now we want to extract those metrics and save them into a JSON file.

Upgraded code
import requests
import json
from datetime import datetime
import os

from dotenv import load_dotenv
# Import the PSI API key from the .env file
load_dotenv()


def check_core_web_vitals(api_key, url_list, results_file):
    # Check if the results file already exists
    if not os.path.exists("results/"):
        os.makedirs('results/')
    results_path = f"results/{results_file}"
    if os.path.isfile(results_path):
        with open(results_path, 'r') as json_file:
            try:
                results = json.load(json_file)
            except json.JSONDecodeError:
                results = {}
    else:
        results = {}

    date_of_audit = datetime.now().strftime('%Y-%m-%d')
    if date_of_audit not in results:
        results[date_of_audit] = {}

    with open(url_list, 'r') as file:
        url_list = file.read().splitlines()

    for url in url_list:
        try:
            response = requests.get(
                f'https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={url}&key={api_key}'
            )
            response.raise_for_status()  # Will raise an exception for HTTP errors
            result = response.json()
            lcp = result['lighthouseResult']['audits']['largest-contentful-paint']['displayValue'].replace('\u00a0s', '')
            tbt = result['lighthouseResult']['audits']['total-blocking-time']['displayValue'].replace('\u00a0ms', '')
            cls = result['lighthouseResult']['audits']['cumulative-layout-shift']['displayValue'].replace('\u00a0', '')

            # Update the results dictionary
            results[date_of_audit][url] = {
                'LCP': lcp,
                'TBT': tbt,
                'CLS': cls
            }
        except requests.exceptions.RequestException as e:
            print(f"Failed to retrieve data for {url}: {e}")

    # Write the updated results back into the JSON file
    with open(results_path, 'w') as json_file:
        json.dump(results, json_file, indent=4)

    return results

if __name__ == '__main__':
    url_file = 'urls.txt'
    save_results = 'cwv_results.json'
    check_core_web_vitals(os.environ['API_KEY'], url_file, save_results)
Run it again
python psi_automator.py

Wait for the script to complete and take a look at the results file

results
{
    "2023-11-09": {
        "https://developers.google.com": {
            "LCP": "1.0",
            "TBT": "290",
            "CLS": "0.011"
        },
        "https://www.instagram.com/": {
            "LCP": "2.6",
            "TBT": "200",
            "CLS": "0"
        },
        "https://fonts.google.com/": {
            "LCP": "2.0",
            "TBT": "410",
            "CLS": "0.449"
        },
        "https://families.google.com/": {
            "LCP": "1.0",
            "TBT": "10",
            "CLS": "0.002"
        },
        "https://store.google.com/": {
            "LCP": "1.2",
            "TBT": "140",
            "CLS": "0"
        },
        "https://classroom.google.com/": {
            "LCP": "1.7",
            "TBT": "30",
            "CLS": "0.039"
        },
        "https://wallet.google.com/": {
            "LCP": "0.9",
            "TBT": "50",
            "CLS": "0.168"
        }
    }

That's all with the scripting for now - let's start configuring Cloudflare so we can upload these results automatically for reporting.

Step 5: Set Up Cloudflare Worker

Create a Worker:

Navigate to the Workers section in your Cloudflare dashboard and select "Create Application"

Create a new Cloudflare Edge Application

Click on "Create a Worker", follow instructions and finally select to edit the worker to enter the online editing interface. Copy the code below into the worker and press "Save and Deploy" in the top right corner of the editor.

Cloudflare Worker online editing interface
Worker boilerplate
addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  if (request.method === 'GET' && new URL(request.url).pathname === '/') {
    return new Response(html, {
      headers: { 'Content-Type': 'text/html' }
    })
  }

  return new Response('Not found', { status: 404 })
}

const html = `<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Performance Laboratory Report</title>
  <link href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css" rel="stylesheet">
  <style>

  #main-content {
    min-height: 100%;
    /* Equal to footer height */
    padding-bottom: 50px; 
  }

  .footer {
    height: 50px;
    background: #333;
    color: #fff;
    text-align: center;
    line-height: 50px; /* Same as height to vertically center the text */
    position: fixed;
    bottom: 0;
    width: 100%;
  }
  
    body {
      background: #f4f7f6;
      margin-top: 20px;
      height: 100%;
      margin: 0;
    }
    .container {
      background: #ffffff;
      padding: 20px;
      border-radius: 8px;
      box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
    }
    .table-responsive {
      margin-top: 20px;
    }
    h1 {
      color: #333333;
      font-size: 24px;
      font-weight: 600;
      margin-bottom: 30px;
    }
    .table {
      margin-top: 20px;
    }
    .table thead th {
      background-color: #4b79a1;
      color: #ffffff;
    }
    .table-hover tbody tr:hover {
      background-color: #f5f5f5;
    }
    #search-input {
      background: #e3e3e3;
      border: none;
      padding: 10px;
      border-radius: 20px;
      margin-bottom: 20px;
      box-shadow: inset 0 0 5px rgba(0, 0, 0, 0.1);
    }
    .footer {
      text-align: center;
      padding: 10px 0;
      background: #333;
      color: #fff;
      position: fixed;
      bottom: 0;
      width: 100%;
      font-size: 0.8em;
    }
  </style>
</head>
<body>
<div id="main-content">
  <div class="container mt-5">
    <h1>Performance Laboratory Report</h1>
    <input type="text" id="search-input" class="form-control" placeholder="Search URLs...">
    <div class="table-responsive">
      <table class="table table-hover">
        <thead class="thead-dark">
          <tr>
            <th scope="col">URL</th>
            <th scope="col">LCP</th>
            <th scope="col">TBT</th>
            <th scope="col">CLS</th>
            <th scope="col">Date</th>
          </tr>
        </thead>
        <tbody id="cwv-table-body">
          <!-- Data will be inserted here by jQuery -->
        </tbody>
      </table>
    </div>
  </div>
</div>
  <div class="footer">
    © 2023 Performance Laboratory. All rights reserved.
  </div>
  <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
  <script>
    $(document).ready(function(){
      $.getJSON('https://simple-psi-automator-data.pages.dev/cwv_results.json', function(data) {
        let rows = '';
        $.each(data, function(date, urls) {
          $.each(urls, function(url, metrics) {
            rows += '<tr>' +
                      '<td>' + url + '</td>' +
                      '<td>' + metrics.LCP + '</td>' +
                      '<td>' + metrics.TBT + '</td>' +
                      '<td>' + metrics.CLS + '</td>' +
                      '<td>' + date + '</td>' +
                    '</tr>';
          });
        });
        $('#cwv-table-body').html(rows);
      });

      $('#search-input').on('keyup', function() {
        var value = $(this).val().toLowerCase();
        $("#cwv-table-body tr").filter(function() {
          $(this).toggle($(this).text().toLowerCase().indexOf(value) > -1)
        });
      });
    });
  </script>
</body>
</html>`;

Step 6: Configure Cloudflare Pages

Create a Pages Project:

Go to the Cloudflare Pages section.

Click on "Create a project".

Choose a source where your results directory will be deployed from. This could be a GitHub repository or directly uploading files.

Deploy the results Directory:

If using a GitHub repository, push the results directory to it.

Set up the build settings in Cloudflare Pages to deploy the results directory.

Start the deployment process.

Step 7: Automate Deployment

To automate deployement, we will be using Wrangler. It's going to streamline the process of deploying code to Cloudflare's edge network, which spans across numerous global locations, allowing for faster content delivery and execution of serverless code.

By installing Wrangler globally using npm (Node Package Manager), you gain the ability to interact with your Cloudflare account directly from your terminal. The wrangler login command securely authenticates your machine with Cloudflare, eliminating the need for manual API key handling. Once authenticated, you can deploy your projects with ease.

The Python script modification with the os.system command allows for the automation of deployment. After the Core Web Vitals check is completed, the script triggers Wrangler to deploy the results to Cloudflare Pages, a JAMstack platform for frontend projects. This deployment is done in the main environment of the project named simple-psi-automator-data, ensuring that the latest performance data is always available and served efficiently from the edge. This integration between Python automation and Cloudflare's deployment tools creates a seamless workflow for monitoring and displaying website performance metrics.

npm install -g @cloudflare/wrangler

Run wrangler login and follow the instructions to authenticate with your Cloudflare account.

Add deployment command to the end of your __main__ loop:

Add to __main__ loop
os.system("npx wrangler pages deploy results/ --project-name <YOUR_PROJECT_NAME> --env main")

Step 8: Verify Everything Works

Run the script again, and check from command line that deployment has been successful. Wrangler will output a preview URL that you can check to make sure it all works as it should

PSI results report shown on cloudflare worker

Step 9: Hook the worker to domain and setup Zero Access firewall

Now we are ready to move to a production - we'll need a domain that we have control of for this. We will set a custom domain to it

Configure custom domain
Add new custom domain to the worker

Next, let's see that the domain is working:
https://simple-psi-report.sentienttechnology.pro/

And for me, it is! Amazing, we are almost there.

Finally, you might want to limit access to this report to only specific stakeholders, perhaps inside your organization or team only. We can use CF Zero Trust firewall to accomplish this easily.

Find "Zero Trust" on the left-hand menu and click it to enter the Zero Trust dashboard.

Follow the screenshots and leave other options to their default

Add new Zero Trust Application
Select self-hosted application

Select self-hosted application

Configure the domain

Configure domain or subdomain you want to use

Set the access rules

Set the authentication rules to fit your needs

From there, just continue with default settings and add the application.

Step 9.5 (OPTIONAL): Set Noindex headers

IF you do not configure Zero Access firewall, you might want to set a noindex metatags for the report, so it doesn't get picked up by Google. Just update the boilerplate HTML for that. This is the case for me, as I will keep the page public.

Step 10: Schedule the Python Script

Before we do this, let's add a small modification to the __main__ loop of the code. We might not want to run a huge amount of URLs all the time, but rather break them into sets. Additionally, we might want to have more than one reporting platform, perhaps scoped to a specific team or so.

So, let's import system module first:

Add to top of your script
import sys
Modify __main__ loop
if __name__ == '__main__':
    url_file = sys.argv[1] if len(sys.argv) > 1 else 'urls.txt'
    save_results = sys.argv[2] if len(sys.argv) > 2 else 'cwv_results.json'
    check_core_web_vitals(os.environ['API_KEY'], url_file, save_results)
    os.system("npx wrangler pages deploy results/ --project-name simple-psi-automator-data --env main")

Now we are able to run a job like so:

python psi_automator.py input.txt output.json

If you choose not to, it will default to urls.txt and cwv_results.json. You're free to edit defaults how it suits you, but remember to update the Worker boilerplate accordingly or spin up new workers.

Now, you can schedule the Python script to run at your desired frequency (e.g., daily).

Linux/Mac

On UNIX platforms, we can use cronjobs:

0 * * * * /usr/bin/python3 /path/to/psi_automator.py >> /path/to/log.txt 2>&1
30 3 * * * /usr/bin/python3 /path/to/psi_automator.py /path/to/custom_urls.txt >> /path/to/log.txt 2>&1
0 22 * * 1 /usr/bin/python3 /path/to/psi_automator.py /path/to/custom_urls.txt /path/to/custom_cwv_results.json >> /path/to/log.txt 2>&1

Remember to specify your virtual environment if you're using one and use absolute paths for the all the files.

Windows

On Windows, you can use a Task Scheduler to set up this script to run periodically.

Step 11: Deploy to remote server

Quite obviously the machine has to be on for the cronjob to run. This script is very lightweight, so it does not exactly take many resources to run. However, in this case, let's make it run on a Linode so we don't need to worry about it. You're free to use a hosting provider of your choice, but I prefer Linode. They offer quite cheap and reliable servers with fixed pricing.

Sign in to Linode. Using this link will give you 100$ free credit and a referral to yours truly.

Now let's create a new linode:

Creating a new Linode from the cloud dashboard

From there, select Ubuntu 22.04 as your operating system and select region that's closest to you.

For this project, we can select the smallest Nanode (Shared CPU) instance available, as this script is very lightweight.

Select the smallest Nanode available

Fill in a strong root password for yourself and select if you want a private IP and backups from the add-ons. When done, click on "Create Linode" to launch your server.

Once completed, you will see the instance on your home page, under Linodes. You can see under access the SSH command to access the server. Login through your command prompt with the SSH command. You may need to install SSH in order to do that. You can search Google for specific instructions for your operating system on how to do that.

Once logged in it's highly recommended to create a new user. Generally, avoid using root user for regular things, as it's way too powerful of an account. We would rather create a new account and give it "sudo" or in other words admin rights. Run following command to create a new user and follow the prompt instructions.

Create new user
adduser <YOUR_NEW_USERNAME>

And add the user to "sudo" group

Add user to sudo group
usermod -aG sudo newusername

Once done, logout from the linode with command exit and login back, but replace "root" in the SSH command with your username.

Now the server has been setup successfully. Let's move the code there.

Step 11.1 Publish your code to Github

Go on Github and create a new repository. Also create a new access token, if you don't have one. When using git from command line, we use access tokens instead of passwords.

Navigate back to your IDE and run the following commands in order:

Commands for Linux system
git init
echo .env > .gitignore
echo results/ >> gitignore
git remote add origin <YOUR_REPO_URL>
git checkout -B main
pip freeze > requirements.txt
git add psi_automator/ .gitignore requirements.txt
git commit -m "First Commit"
git push -u origin main

Remember to replace <YOUR_REPO> with whatever your repository address is.

Step 11.2 Pull into the remote server

Login to your server if you logged out. Run the following commands:

Run line by line
mkdir myapps/ && cd myapps/
git clone <YOUR_REPO_URL>
cd <NAME_OF_REPO>
pip install -r requirements.txt
echo API_KEY=<YOUR_PSI_API_KEY> .
npm install -g @cloudflare/wrangler

You can check with command ls to see what directories are available if you have issues with cd command.

This is where things get a bit different as we want to login to wrangler. We get the URL on remote, but we need to login in a browser, which we don't have access to from the terminal.

Copy the URL and paste it into your local browser (in which you are logged in to cloudflare). Once you authenticate, you'll get a error window where the URL starts with "localhost" and parameters.

Next open another terminal and login to your remote machine again, with the original session where you ran wrangler login still active. If you close that session, you will have to start again.

With the new session open, run in command prompt:

curl <LOCALHOST_URL>

Replacing the <LOCALHOST_URL> with the URL you copied from your local browser. Once done, the login session on the 1st window should go through. You can now close the second window.

From here, just run the script normally from command line to test it. Once you confirm it's working, set up cronjobs as previously explained in Step 10

You can set as many files as you want, but it's good practice to space them apart. The code above is not well equipped to handle 2 instances running at the same time, mainly from the perspective of the output file. If two instances read the file at the same time the one finishing latter will not have the results of the former. We could refactor the code a bit to make it more stable, but make it truly supportive, we would need a surprising amount of extra code.

Next Steps

This is yet again a highly scalable script - you can add dynamic logic to the worker to display multiple projects, for example scope them per tabs.

Design of the reporting platform could be perhaps improved, or comparison methods added.

You can also read into PSI API documentation and collect more metrics, something relevant to your use-case and needs and report on those.

Sky's the limit, really. Thanks for reading!

Back to index

Copied to clipboard