While I'm not a huge fan of laboratory testing, I have to admit it can be useful for the purposes of benchmarking. But who wants to run an infinite amount of PSI tests and save results in a Google Sheets file? Even when working with a single product, you might want to run performance tests on all pages instead of just a homepage.
Here's a small tutorial on how to (pretty much) 100% automate this process. Here is an example. Implementation is dead simple. As we will be relying on Clouflare here, it's free as well. Do I have to say more?
Let's dive in.
System that automates the monitoring of CWV Laboratory tests using the PageSpeed Insights API, with results being deployed and displayed with Cloudflare Worker + Pages Combo.
To get started with the PageSpeed Insights API, you'll need to obtain an API key from the Google Cloud Platform. If you have an existing key, for example from Screaming Frog integration, that will work just as well here.
If you don't here's how you get one:
Navigate to the Google Cloud Console. Register if needed.
Click on the project drop-down at the top of the page and then click on "New Project".
Give your project a name, such as "Core Web Vitals Checker", and click "Create".
Enable the PageSpeed Insights API:
From the burger on the left top corner, click on it an navigate to APIs & Services,
Click on "Enable APIs and Services".
In the API Library, search for "PageSpeed Insights API"
Click on the API from the results and then click "Enable".
After enabling the API, click on "Create Credentials" in the API page.
Choose "API key" from the options presented.
Your new API key will be created and displayed to you.
Restrict the API Key (Optional but Recommended):
For security, it's a good practice to restrict your API key so that it can only be used by your applications.
Click on "Restrict Key" after your API key is created.
Under "API restrictions", select "Restrict key" and choose "PageSpeed Insights API" from the dropdown.
Fire up IDE of your choice - I'll be using PyCharm
Here's a preview of the directory structure we will be using for this project
/psi_automator/
|-- psi_automator.py
|-- urls.txt
|-- .env
|-- /results/
|----- cwv_results.json
pip install requests python-dotenv
Let's add the API key we made into the .env file like so:
API_KEY=YOURAPIKEY123123123123
Note, that in .env file, we don't encapsulate strings into quotes like in python files.
A word about .env files. You might ask, why are we using those instead of putting the API key directly into code? While it might be easier, we want to keep sensitive information away from the code.
Also, if you use GitHub, it's easy to gitignore .env files.
Overall, keeping sensitive and environment-specific (e.g. you might want to have separate API keys for testing and production) is much easier with .env file as we don't have to change code when deploying.
Create a text file named urls.txt.
Populate it with the URLs you want to monitor, one per line.
At the time of writing this article, the API was having some uptime issues - final URL's you will see in the report differ from this image - just FYI
And here's the boilerplate code. With a dirt easy script like this, we will be making only small improvements.
import requests import os from dotenv import load_dotenv # Import the PSI API key from the .env file load_dotenv() def check_core_web_vitals(api_key, url_list): with open(url_list, 'r') as file: url_list = file.read().splitlines() # This is where the magic happens for url in url_list: try: response = requests.get( f'https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={url}&key={api_key}' ) response.raise_for_status() # Will raise an exception for HTTP errors result = response.json() print(result) except requests.exceptions.RequestException as e: print(f"Failed to retrieve data for {url}: {e}") if __name__ == '__main__': url_file = 'urls.txt' check_core_web_vitals(os.environ['API_KEY'], url_file)
python psi_automator.py
And now our terminals got filled with data. On the response that PSI sends (on asuccessful call) is quite a massive and detailed JSON object - for purposes of this article, we will only focus on the metrics themselves. Let's take a look at the response object:
{ "captchaResult": string, "kind": "pagespeedonline#result", "id": string, "loadingExperience": { "id": string, "metrics": { (key): { "percentile": integer, "distributions": [ { "min": integer, "max": integer, "proportion": double } ], "category": string } }, "overall_category": string, "initial_url": string }, "originLoadingExperience": { "id": string, "metrics": { (key): { "percentile": integer, "distributions": [ { "min": integer, "max": integer, "proportion": double } ], "category": string } }, "overall_category": string, "initial_url": string }, "lighthouseResult": { "requestedUrl": string, "finalUrl": string, "lighthouseVersion": string, "userAgent": string, "fetchTime": string, "environment": { "networkUserAgent": string, "hostUserAgent": string, "benchmarkIndex": double }, "runWarnings": [ (value) ], "configSettings": { "emulatedFormFactor": string, "locale": string, "onlyCategories": (value), "onlyCategories": (value) }, "audits": { (key): { "id": string, "title": string, "description": string, "score": (value), "score": (value), "scoreDisplayMode": string, "displayValue": string, "explanation": string, "errorMessage": string, "warnings": (value), "warnings": (value), "details": { (key): (value) } } }, "categories": { (key): { "id": string, "title": string, "description": string, "score": (value), "score": (value), "manualDescription": string, "auditRefs": [ { "id": string, "weight": double, "group": string } ] } }, "categoryGroups": { (key): { "title": string, "description": string } }, "runtimeError": { "code": string, "message": string }, "timing": { "total": double }, "i18n": { "rendererFormattedStrings": { "varianceDisclaimer": string, "opportunityResourceColumnLabel": string, "opportunitySavingsColumnLabel": string, "errorMissingAuditInfo": string, "errorLabel": string, "warningHeader": string, "auditGroupExpandTooltip": string, "passedAuditsGroupTitle": string, "notApplicableAuditsGroupTitle": string, "manualAuditsGroupTitle": string, "toplevelWarningsMessage": string, "scorescaleLabel": string, "crcLongestDurationLabel": string, "crcInitialNavigation": string, "lsPerformanceCategoryDescription": string, "labDataTitle": string } } }, "analysisUTCTimestamp": string, "version": { "major": integer, "minor": integer } }
So, right now we want to get the metrics for Core Web Vitals, so let's focus on those. They are found in ligthhouseResult > audits.
So now we want to extract those metrics and save them into a JSON file.
import requests import json from datetime import datetime import os from dotenv import load_dotenv # Import the PSI API key from the .env file load_dotenv() def check_core_web_vitals(api_key, url_list, results_file): # Check if the results file already exists if not os.path.exists("results/"): os.makedirs('results/') results_path = f"results/{results_file}" if os.path.isfile(results_path): with open(results_path, 'r') as json_file: try: results = json.load(json_file) except json.JSONDecodeError: results = {} else: results = {} date_of_audit = datetime.now().strftime('%Y-%m-%d') if date_of_audit not in results: results[date_of_audit] = {} with open(url_list, 'r') as file: url_list = file.read().splitlines() for url in url_list: try: response = requests.get( f'https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={url}&key={api_key}' ) response.raise_for_status() # Will raise an exception for HTTP errors result = response.json() lcp = result['lighthouseResult']['audits']['largest-contentful-paint']['displayValue'].replace('\u00a0s', '') tbt = result['lighthouseResult']['audits']['total-blocking-time']['displayValue'].replace('\u00a0ms', '') cls = result['lighthouseResult']['audits']['cumulative-layout-shift']['displayValue'].replace('\u00a0', '') # Update the results dictionary results[date_of_audit][url] = { 'LCP': lcp, 'TBT': tbt, 'CLS': cls } except requests.exceptions.RequestException as e: print(f"Failed to retrieve data for {url}: {e}") # Write the updated results back into the JSON file with open(results_path, 'w') as json_file: json.dump(results, json_file, indent=4) return results if __name__ == '__main__': url_file = 'urls.txt' save_results = 'cwv_results.json' check_core_web_vitals(os.environ['API_KEY'], url_file, save_results)
python psi_automator.py
Wait for the script to complete and take a look at the results file
{ "2023-11-09": { "https://developers.google.com": { "LCP": "1.0", "TBT": "290", "CLS": "0.011" }, "https://www.instagram.com/": { "LCP": "2.6", "TBT": "200", "CLS": "0" }, "https://fonts.google.com/": { "LCP": "2.0", "TBT": "410", "CLS": "0.449" }, "https://families.google.com/": { "LCP": "1.0", "TBT": "10", "CLS": "0.002" }, "https://store.google.com/": { "LCP": "1.2", "TBT": "140", "CLS": "0" }, "https://classroom.google.com/": { "LCP": "1.7", "TBT": "30", "CLS": "0.039" }, "https://wallet.google.com/": { "LCP": "0.9", "TBT": "50", "CLS": "0.168" } }
That's all with the scripting for now - let's start configuring Cloudflare so we can upload these results automatically for reporting.
Create a Worker:
Navigate to the Workers section in your Cloudflare dashboard and select "Create Application"
Click on "Create a Worker", follow instructions and finally select to edit the worker to enter the online editing interface. Copy the code below into the worker and press "Save and Deploy" in the top right corner of the editor.
addEventListener('fetch', event => { event.respondWith(handleRequest(event.request)) }) async function handleRequest(request) { if (request.method === 'GET' && new URL(request.url).pathname === '/') { return new Response(html, { headers: { 'Content-Type': 'text/html' } }) } return new Response('Not found', { status: 404 }) } const html = `<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Performance Laboratory Report</title> <link href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css" rel="stylesheet"> <style> #main-content { min-height: 100%; /* Equal to footer height */ padding-bottom: 50px; } .footer { height: 50px; background: #333; color: #fff; text-align: center; line-height: 50px; /* Same as height to vertically center the text */ position: fixed; bottom: 0; width: 100%; } body { background: #f4f7f6; margin-top: 20px; height: 100%; margin: 0; } .container { background: #ffffff; padding: 20px; border-radius: 8px; box-shadow: 0 0 10px rgba(0, 0, 0, 0.1); } .table-responsive { margin-top: 20px; } h1 { color: #333333; font-size: 24px; font-weight: 600; margin-bottom: 30px; } .table { margin-top: 20px; } .table thead th { background-color: #4b79a1; color: #ffffff; } .table-hover tbody tr:hover { background-color: #f5f5f5; } #search-input { background: #e3e3e3; border: none; padding: 10px; border-radius: 20px; margin-bottom: 20px; box-shadow: inset 0 0 5px rgba(0, 0, 0, 0.1); } .footer { text-align: center; padding: 10px 0; background: #333; color: #fff; position: fixed; bottom: 0; width: 100%; font-size: 0.8em; } </style> </head> <body> <div id="main-content"> <div class="container mt-5"> <h1>Performance Laboratory Report</h1> <input type="text" id="search-input" class="form-control" placeholder="Search URLs..."> <div class="table-responsive"> <table class="table table-hover"> <thead class="thead-dark"> <tr> <th scope="col">URL</th> <th scope="col">LCP</th> <th scope="col">TBT</th> <th scope="col">CLS</th> <th scope="col">Date</th> </tr> </thead> <tbody id="cwv-table-body"> <!-- Data will be inserted here by jQuery --> </tbody> </table> </div> </div> </div> <div class="footer"> © 2023 Performance Laboratory. All rights reserved. </div> <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script> <script> $(document).ready(function(){ $.getJSON('https://simple-psi-automator-data.pages.dev/cwv_results.json', function(data) { let rows = ''; $.each(data, function(date, urls) { $.each(urls, function(url, metrics) { rows += '<tr>' + '<td>' + url + '</td>' + '<td>' + metrics.LCP + '</td>' + '<td>' + metrics.TBT + '</td>' + '<td>' + metrics.CLS + '</td>' + '<td>' + date + '</td>' + '</tr>'; }); }); $('#cwv-table-body').html(rows); }); $('#search-input').on('keyup', function() { var value = $(this).val().toLowerCase(); $("#cwv-table-body tr").filter(function() { $(this).toggle($(this).text().toLowerCase().indexOf(value) > -1) }); }); }); </script> </body> </html>`;
Create a Pages Project:
Go to the Cloudflare Pages section.
Click on "Create a project".
Choose a source where your results directory will be deployed from. This could be a GitHub repository or directly uploading files.
Deploy the results Directory:
If using a GitHub repository, push the results directory to it.
Set up the build settings in Cloudflare Pages to deploy the results directory.
Start the deployment process.
To automate deployement, we will be using Wrangler. It's going to streamline the process of deploying code to Cloudflare's edge network, which spans across numerous global locations, allowing for faster content delivery and execution of serverless code.
By installing Wrangler globally using npm (Node Package Manager), you gain the ability to interact with your Cloudflare account directly from your terminal. The wrangler login
command securely authenticates your machine with Cloudflare, eliminating the need for manual API key handling. Once authenticated, you can deploy your projects with ease.
The Python script modification with the os.system
command allows for the automation of deployment. After the Core Web Vitals check is completed, the script triggers Wrangler to deploy the results to Cloudflare Pages, a JAMstack platform for frontend projects. This deployment is done in the main
environment of the project named simple-psi-automator-data
, ensuring that the latest performance data is always available and served efficiently from the edge. This integration between Python automation and Cloudflare's deployment tools creates a seamless workflow for monitoring and displaying website performance metrics.
npm install -g @cloudflare/wrangler
Run wrangler login
and follow the instructions to authenticate with your Cloudflare account.
Add deployment command to the end of your __main__
loop:
os.system("npx wrangler pages deploy results/ --project-name <YOUR_PROJECT_NAME> --env main")
Run the script again, and check from command line that deployment has been successful. Wrangler will output a preview URL that you can check to make sure it all works as it should
Now we are ready to move to a production - we'll need a domain that we have control of for this. We will set a custom domain to it
Next, let's see that the domain is working:
https://simple-psi-report.sentienttechnology.pro/
And for me, it is! Amazing, we are almost there.
Finally, you might want to limit access to this report to only specific stakeholders, perhaps inside your organization or team only. We can use CF Zero Trust firewall to accomplish this easily.
Find "Zero Trust" on the left-hand menu and click it to enter the Zero Trust dashboard.
Follow the screenshots and leave other options to their default
Select self-hosted application
Configure domain or subdomain you want to use
Set the authentication rules to fit your needs
From there, just continue with default settings and add the application.
IF you do not configure Zero Access firewall, you might want to set a noindex metatags for the report, so it doesn't get picked up by Google. Just update the boilerplate HTML for that. This is the case for me, as I will keep the page public.
Before we do this, let's add a small modification to the __main__
loop of the code. We might not want to run a huge amount of URLs all the time, but rather break them into sets. Additionally, we might want to have more than one reporting platform, perhaps scoped to a specific team or so.
So, let's import system module first:
import sys
if __name__ == '__main__': url_file = sys.argv[1] if len(sys.argv) > 1 else 'urls.txt' save_results = sys.argv[2] if len(sys.argv) > 2 else 'cwv_results.json' check_core_web_vitals(os.environ['API_KEY'], url_file, save_results) os.system("npx wrangler pages deploy results/ --project-name simple-psi-automator-data --env main")
Now we are able to run a job like so:
python psi_automator.py input.txt output.json
If you choose not to, it will default to urls.txt
and cwv_results.json
. You're free to edit defaults how it suits you, but remember to update the Worker boilerplate accordingly or spin up new workers.
Now, you can schedule the Python script to run at your desired frequency (e.g., daily).
On UNIX platforms, we can use cronjobs:
0 * * * * /usr/bin/python3 /path/to/psi_automator.py >> /path/to/log.txt 2>&1 30 3 * * * /usr/bin/python3 /path/to/psi_automator.py /path/to/custom_urls.txt >> /path/to/log.txt 2>&1 0 22 * * 1 /usr/bin/python3 /path/to/psi_automator.py /path/to/custom_urls.txt /path/to/custom_cwv_results.json >> /path/to/log.txt 2>&1
Remember to specify your virtual environment if you're using one and use absolute paths for the all the files.
On Windows, you can use a Task Scheduler to set up this script to run periodically.
Quite obviously the machine has to be on for the cronjob to run. This script is very lightweight, so it does not exactly take many resources to run. However, in this case, let's make it run on a Linode so we don't need to worry about it. You're free to use a hosting provider of your choice, but I prefer Linode. They offer quite cheap and reliable servers with fixed pricing.
Sign in to Linode. Using this link will give you 100$ free credit and a referral to yours truly.
Now let's create a new linode:
From there, select Ubuntu 22.04 as your operating system and select region that's closest to you.
For this project, we can select the smallest Nanode (Shared CPU) instance available, as this script is very lightweight.
Fill in a strong root password for yourself and select if you want a private IP and backups from the add-ons. When done, click on "Create Linode" to launch your server.
Once completed, you will see the instance on your home page, under Linodes. You can see under access the SSH command to access the server. Login through your command prompt with the SSH command. You may need to install SSH in order to do that. You can search Google for specific instructions for your operating system on how to do that.
Once logged in it's highly recommended to create a new user. Generally, avoid using root user for regular things, as it's way too powerful of an account. We would rather create a new account and give it "sudo" or in other words admin rights. Run following command to create a new user and follow the prompt instructions.
adduser <YOUR_NEW_USERNAME>
And add the user to "sudo" group
usermod -aG sudo newusername
Once done, logout from the linode with command exit and login back, but replace "root" in the SSH command with your username.
Now the server has been setup successfully. Let's move the code there.
Go on Github and create a new repository. Also create a new access token, if you don't have one. When using git from command line, we use access tokens instead of passwords.
Navigate back to your IDE and run the following commands in order:
git init echo .env > .gitignore echo results/ >> gitignore git remote add origin <YOUR_REPO_URL> git checkout -B main pip freeze > requirements.txt git add psi_automator/ .gitignore requirements.txt git commit -m "First Commit" git push -u origin main
Remember to replace <YOUR_REPO> with whatever your repository address is.
Login to your server if you logged out. Run the following commands:
mkdir myapps/ && cd myapps/ git clone <YOUR_REPO_URL> cd <NAME_OF_REPO> pip install -r requirements.txt echo API_KEY=<YOUR_PSI_API_KEY> . npm install -g @cloudflare/wrangler
You can check with command ls
to see what directories are available if you have issues with cd
command.
This is where things get a bit different as we want to login to wrangler. We get the URL on remote, but we need to login in a browser, which we don't have access to from the terminal.
Copy the URL and paste it into your local browser (in which you are logged in to cloudflare). Once you authenticate, you'll get a error window where the URL starts with "localhost" and parameters.
Next open another terminal and login to your remote machine again, with the original session where you ran wrangler login
still active. If you close that session, you will have to start again.
With the new session open, run in command prompt:
curl <LOCALHOST_URL>
Replacing the <LOCALHOST_URL>
with the URL you copied from your local browser. Once done, the login session on the 1st window should go through. You can now close the second window.
From here, just run the script normally from command line to test it. Once you confirm it's working, set up cronjobs as previously explained in Step 10
You can set as many files as you want, but it's good practice to space them apart. The code above is not well equipped to handle 2 instances running at the same time, mainly from the perspective of the output file. If two instances read the file at the same time the one finishing latter will not have the results of the former. We could refactor the code a bit to make it more stable, but make it truly supportive, we would need a surprising amount of extra code.
This is yet again a highly scalable script - you can add dynamic logic to the worker to display multiple projects, for example scope them per tabs.
Design of the reporting platform could be perhaps improved, or comparison methods added.
You can also read into PSI API documentation and collect more metrics, something relevant to your use-case and needs and report on those.
Sky's the limit, really. Thanks for reading!