Rethinking Lighthouse Performance

Arto Kylmanen

Sept. 1, 2023


Testing websites' technical performance in a laboratory setting is a complicated task. By nature, many of the technologies that affect how fast webpage loads are highly non-deterministic in nature. This fact renders many laboratory setting tools inconsistent and somewhat unreliable.

As long as I have been a Technical SEO Expert, I've battled with misconceptions about technical performance, far beyond and before Core Web Vitals made these concepts surface in the SEO world.

Considering the Lighthouse performance score (or any single SEO score, for that matter) as gospel is a risky endeavor. From organizational perspective, where we want to measure progress as an integer, sum or percentage it makes sense. It's easier to demonstrate progress to non-SEO knowledgeable management or client with a simple score. But it does not reveal the intricacies, the painpoints, the correlation with conversion.

By this time, it's fairly well known that Google does not run any performance tests for websites for ranking purposes. This is done exactly because they understand the highly non-deterministic nature of internet and the unreliability that it brings.

Problem 1: Playing the score / overoptimizing

Especially what comes to some SEO agencies and freelancers, this is a common and unfortunate situation. When Lighthouse is used as the only way to measure performance improvements, more than often the site is optimized for the tool, not for the users. This translates to success on paper but not in the real world. Near perfect Lighthouse Score is not a guarantee of a solid Page Experience pass and subsequent ranking benefit.

This makes companies and webmasters waste money just to receive a prettier score, with sometimes little to none actual ranking, traffic and conversion benefits that true speed optimization brings.

Problem 2: The Non-deterministic Nature of the Web

Non-determinism means that nature of a thing is not predictable. While it can be statistically averaged out, the outcome of each instance is different.

Loading a website is something akin of rolling a dice. There are countless invisible factors that affect internet speed and device performance capability. While some can be accounted for, others are virtually impossible to mitigate.

Let's take a look at different aspects that affect page loading, how Lighthouse is attempting to mitigate them and how successful those attempts are.

Local network variability

Local network variability refers to connection from device to gateway - e.g router or a mobile network tower. Where you are in relation to the gateway affects the speed and reliability of the connection. We all have that one spot in our house where Wi-Fi just isn't reaching for some reason.

Another example is just difference in connection. If one person is using 3g connection and another one 5g connection, page experience will be wildly different between them. Different countries also have different network speeds on average, which Lighthouse does not account for.

Lighthouse attempts to mitigate this by analyzing the network traffic aspects such as speed and strength changes as well as packet loss. Mitigation for this variable can be considered sufficient.

Tier-1 network variability

In simplest terms, Tier-1 network refers to the general internet we use most of the time. There are several aspects that are highly invisible and uncontrollable about this. Examples of this are routing, DNS server availibility (for cold loads), difference between different CDN endpoints, maintenance of networks, weather and even conditions in space.

Lighthouse claims that they have mitigated this successfully, but I remain sceptical of accuracy of a single test simply because of duration. If you run a test when a major network point is congested or under maintenance (or these is a solar storm going on), Lighthouse will not be able to mitigate that if the bottleneck doesn't change during the test.

Web server variability

Web server in the end is just a computer and suspectible to same downfalls as your local device. They have a limitations on their CPU power.

A great example of this are peak traffic hours. From perspective of DevOps, we mitigate this with automatic scaling and spreading the load using CDNs/sharding.

This variable is particularly prevalent on small and shared servers.

Lighthouse is not able to mitigate for this as it does not have access to server status so it cannot take it into account.

Client hardware variability

This is probably the biggest one. We all have different devices. Loading a web app from Galazy S8 and iPhone 13 are going to have wildly different results. We're talking not just about raw CPU power, but also upgrades to network chips and improvements in loss correction in software side.

While websites are considerably lighter than computer games and software, it is the same concept here. Try and run the latest big-name game or Adobe After Effects on a 5-year old laptop.

Lighthouse can partially mitigate this aspect with some simple algorithmic approaches. However, it's a partial mitigation only because it cannot reliably account for the age of the device. Devices tend to slow down over time due to aging, hardware degradation, and increasingly resource-intensive software updates.

Client resource contention

This is the client-side of Web server variability. Run Screaming Frog at max threads while trying to have a smooth browsing experience. Especially on slow networks the difference is easily noticeable.

Lighthouse can partially mitigate to this by reading into hardware statistics, but this aspect is highly non-deterministic by nature as if heavy program is running, it can oscillate the CPU usage, which leaves things to luck - can the browser shove in the website rendering execution just at the right time. Additionally for reasons of security, apps cannot just get access to the internals of your hardware by default.

Browser nondeterminism

This is basically same as above, but scoped to browser. Did another website just update itself on the background? Perhaps the browser is running music, or possibly something heavier, such as chat apps? Are the installed (SEO) plugins heavy, analyzing the page at each load, deviating the results?

This is one of the reasons many SEOs know that it's better to run Lighthouse in incognito mode.

While Lighthouse can access some statistics about the browser, overall the mitigation for this variable is at best partial. As with client resource contention, because of security reasons, apps don't get unchecked access into the internals of your browser.

Page nondeterminism

This affects every dynamic page, which is most pages in this day and age. Things such as A/B tests or geo-specific content serving affect the results.

Lighthouse can partially mitigate for this IF it detects any non-determinism, such as A/B test loading and then not loading. However, for geo-based results it will be locked to the location it is ran from and cannot compensate for dynamic aspects.

Problem 3: Poor user representation

Now that we've discussed the general aspects of the unreliability of laboratory settings in Lighthouse testing, let's move on to the next challenge: Typically, only one page is tested, and more often than not, it's the homepage.

That would be accurate (if we forget the non-determinism we just talked about) if you had a single-page website. This is not true for majority of websites and another aspect where I see sometimes lapses in thinking. Homepage is tested and optimized, but other pages are not tested and forgotten.

This hardly gives us any insight into how users receive and perceive the website. Basically what we are doing is loading a homepage, doing nothing and leaving. In essence, we're only assessing what bounced sessions look like.

Users do a lot more than bounce, though. They navigate, browse and interact with the site. While initial load and Core Web Vitals improve the first impression of the user, it's generally the user experience while actually using the site which creates a valuable and pleasant experience to the user.

Another consideration in this scope is also CLS score - this can happen way below the viewport as well and is a factor that Lighthouse cannot accomodate for. If you've ever hit a stonewall where analysis tool shows no CLS, but Search Console Page Experience is lit up like a Christmas tree, this is one of the possible reasons for that.

So, what's the solution?

If you are in position where you consider Lighthouse Performance Score as an important SEO metric or development KPI, don't. Whether you are a product owner, a client reviewing work or developer working on a site, this is a poor metric by itself to display success in real capacity.

Take the same approach as with SEO in general - changes are not instant and time will tell. If you revamp content on your homepage and optimize for new keywords, the keyword density or some arbitrary content score is not going to tell you much. What you do, is wait for ranking on those relevant keywords. That will be your success metrics.

When it comes to Technical Performance, you should focus your KPI benchmarks on two things: the Page Experience score in Google Search or you can also setup a custom RUM collection to view results faster.

With former, you will see improvement at most within 28 days of deploy and with latter you should see faster changes. Nevertheless, keep an eye that it correlates with Page Experience score, especially after fresh setup.

In the end, monitor the engagement and conversion on your site for any improvements or degredations. If you're able to isolate the performance as a variable well enough, that is.

Should I stop using Lighthouse?

In short, no. Just don't treat it as gospel, even if it's easy.

Utilize the information from this article to locally mitigate as many factors as possible if you want to use it for development and benchmarking effectively.

For all the critique that I've given to Lighthouse, let's talk about how we can make it better. For this, I will write up another article on how to create customized and use-case specific Lighthouse tests, which can serve as a good development benchmark to better understand the full curve of user interaction on your site. It will be hyperlinked from here when ready.

Thanks for reading!

Back to index

Copied to clipboard