Brittle tests

It‘s that time again to dive back into a discussion I had at work a while ago and turn the debate loose on the internet. This article comes directly from a discussion my partner-in-crime Tech Lead and I were having in terms of the best way to support our design system consumers when testing their apps using our design system web components. Shocking no one, we had differing opinions on what constitutes a brittle test, though we both agreed we didn’t want our consumers writing them.

Gif of a wine glass shattering

So let’s get to the bottom of what brittle tests are, shall we?

Spoiler: I still don’t know. After you skim this article, lets continue the discussion over on Bluesky

What is a brittle test?#

I think the simplest definition of a brittle test is that it fails when you don’t want it to, or when you don’t expect it to. We’ve all seen flaky tests that depend on third-party systems or APIs and sometimes those systems are down when we’re trying to run our tests and push releases to production. Its why there are whole companies devoted to mocking test data and whole testing strategies designed to help mitigate test failures caused by integrating disparate systems.

But in my design system, we don’t really have any third-party dependencies or services, so the type of test we picture is pretty standard. We pictured devs pulling our design system components into their applications, then running unit tests and expecting their applications to behave properly with and around our web components. The fact that our design system is made of web components and not framework components is particularly relevant here.

So let me explain the perspective that my coworker and I each had.

A test that you have to change when implementation details change#

My coworker’s idea of a brittle test is one that needs constant updating whenever implementation details change in the application. His idea of “brittleness” is that the test should only be testing the desired results, such as the proper text rendered to the screen without any knowledge of the particulars about how the text actually got rendered to the screen. His mindset centered on usage of the popular testing-library package that everyone is using these days.

The goal of testing-library is that you can query your page for elements and text by such things as their accessible roles, their label text, their test ids, etc. The library itself is framework independent (ie, it runs on more than just react) and the goal is to remove the need to check implementation details like framework component state values in your tests and just test the effect of those state updates to the rendered page. Its a great approach and I love the library, but web components present a problem.

Querying the shadow root in a test#

gif of prisoner behind bars saying "i will find you"

In order to execute a testing-library test, the testing-library way using a web component with shadow dom, the locators need to be able to query into the shadow dom looking for the elements being located. Testing library doesn’t solve this need by default, but the web components community has jumped in to provide a tool that enables shadow root queries in testing- library.

Check out shadow-dom-testing-library

Thanks Konnor!

So if you use that tool, your testing-library tests can query the shadow dom of your web components like any other light dom element and everything works! Your tests can just check to see if the desired text was rendered on the page and you can go about your day!

Testing library has no knowledge of web components#

My coworker’s point of view is that using the shadow root tool with testing-library would make tests naturally not brittle, because the tests would not need to know anything about web components being used in order for devs to test their applications. Here’s an example test:

import { render } from "@testing-library/react"
import { screen } from "shadow-dom-testing-library"

test("Lets test some rendering", () => {
  render(<Button />)
  screen.getByShadowRole("button")
  await screen.findByShadowLabelText(/Car Manufacturer/i)
  screen.queryAllByShadowTitle("delete")
})

The test just queries elements on the page, checking shadow roots when needed. Therefore if the dev changed the button component to some other element, they would not necessarily have to change their tests. In my coworker’s mind, this approach makes said tests more stable.

A test that breaks without the code changing at all#

My point of view about brittle tests is that a test is brittle if it can suddenly break unexpectedly without the code necessarily changing at all. I take issue with tests querying the shadow roots of web components at will because of the fact that those shadow root internals are considered “internal” and are subject to change in a non-breaking package bump kind of way. It is entirely possible that the internal structure of a web component changes in a patch version bump in a way that could completely break tests that were querying for specific elements in shadow dom. It only depends on exactly what the test was querying for.

And there is no real way to determine what sort of queries can be used. Take another look at the simple button example:

import { render } from "@testing-library/react"
import { screen } from "shadow-dom-testing-library"

test("Lets test some rendering", () => {
  render(<App />)
  screen.getByShadowRole("button")
  await screen.findByShadowLabelText(/Car Manufacturer/i)
  screen.queryAllByShadowTitle("delete")
})

Querying by label text for a button element in a button component that presumably uses a button web component is probably pretty safe. There is not likely to be a reason to change the web component internals in such a way that there is no longer a label. So that test above is likely safe.

But what if example from above was more like this:

import { render } from "@testing-library/react"
import { screen } from "shadow-dom-testing-library"

test("Lets test some rendering", () => {
  render(<App />)
  screen.getByShadowRole("button")
  await screen.findByShadowLabelText(/Car Manufacturer/i, {
    // breaks if the selector is removed
    selector: '.element-where-the-text-is'
  })
  screen.queryAllByShadowTitle("delete")
})

This example doesn’t just query by text, it also filters queried elements by a selector, in this case a CSS class. If the dev updates their web component and .element-where-the-text-is no longer resolves to an element, or no longer resolves to the element where the text is, the test above would fail.

Not all locator queries are the same#

Testing-library has a number of ways to query elements. You can query ByRole(), ByPlaceholderText, and ByTestId. In my opinion, some of these queries are safer than others when talking about querying into “private” shadow root templates. Not all of the queries enable filtering by css selectors, but a lot of them do. Querying ByRole checks the accessible role on an element for a match and does not offer a filter by selector. I think ByRole is one of the safer ones, because it is unlikely that design system components would be shifting around accessibility aria roles in a way that breaks those queries. But still, it could happen.

Playwright also has locators similar to testing-library and the same issue applies

Using testable access properties instead of direct queries#

My idea for tests would take advantage of what I call Internal Access Properties instead of directly querying the shadow root. I wrote an article about them a while back, so give that a read for the details.

The short version is that Internal Access Properties provide access to predicable elements in the shadow root that tests can depend on to always be accurate, and to not change across patch bumps. The drawback is that to use IAPs means that tests need to have some implementation details in them because the tests now “know” that a web component is involved with a particular API and particular properties. Here is an example of a test using IAPs:

import { render } from "@testing-library/react"

test("Lets test some rendering", () => {
  render(<App />)
  const el = screen.getByText("Car Manufacturer")
  expect(el.innerButton.innerText).toBe("Car Manufacturer");
})

In this version, the test does not query the shadow root directly. Tests access web component host elements only, and interact with internal elements through the Internal Access Property innerButton that is guaranteed to always return the actual <button> inside the <x-button> web component inside the <App />. This test knows there is an element with an innerButton property involved in rendering the text to the screen. If that property name were to change, or the web component switched out, the test would need to be updated.

The debate#

The debate was really about what makes a brittle test, and, more specifically, what testing recommendation should we give to our design system consumers?

Should we advise them to use testing-library queries and the shadow root plugin so that their tests don’t have to know that web components are involved? If we do, then we expose them to the possibility of tests breaking unexpectedly as we iterate on our web component internals. That possibility entirely depends on what kinds of tests and queries devs will write.

Or should we recommend that they not query shadow dom directly, use IAPs with a guaranteed API? If we do, we introduce implementation details into tests which would need to be refactored anytime the structure of IAPs change, or devs switch out one component for another. Not testing implementation details is a fantastic approach and the IAP version bends that rule a bit.

Or is there a super secret third option my coworker and I are totally missing?

Tell me what you think over on Bluesky

The End