Test smells are signs that something has gone bad in your code. Plenty of great stuff has been written about them, and we at our team have contributed practical examples of how to spot smelly test code (here and here).
While test smells may arise for a bunch of different reasons, there is one recurring theme that we'd like to cover today, and it has to do with team structure. The point we'd like to make is that a good automated test is an overlap of several different domain areas:
- what the user wants, interpreted by the management as requirements
- knowledge of all the technical kinks and potential weak spots of the SUT, known by developers and manual testers
- test theory, known by testers
- the implementation of tests in a particular language and framework, known by SDETs
Getting all those fields to play together nicely is no mean feat, and when you can't do it, you get many of the test smells that have been discussed in the community. We'll go through the particular causes of test smells and see how they might be related to team structure.
In fact, test code might be even more sensitive to these issues than production code. Why is that?
How is test code special?
Test code is used differently from production code.
- Readability is very important for production code. However, the user of production code doesn't have to read it; it should work out of the box. But the user of tests reads those tests when they fail, which is part of their job description. Which means test readability is important not just for their authors, but also for their users.
- Also, unlike production code, test code isn't checked by tests. So the correctness of test code should be self-evident - which means it should be really simple. Again, readability.
- Another important difference is institutional: test code is not what the customer is paying for, so it is often treated as a second-class citizen, which means there is more pressure to make maintaining it as cheap as possible. This means that, again, it has to be simple and easily readable.
Here's an ideal test of your test code. In your IDE, magnify the screen so you can only see the test method. Can you understand what it's doing immediately without peeking into its functions, dependencies, etc.? Level two: can another person understand it? If yes, then the time it takes to sort test results and to maintain the test is cut severalfold.
This simplicity requires a clear understanding of what is being tested and who will use the code. What obstacles make it harder to write simple and readable tests?
Root: problems with underlying code
Many test smells tell us there are problems with the underlying code, specifically excessive coupling, not enough modularity, and bad separation of concerns.
In another article, our team has already talked about how writing tests in and of itself can push you to write better-structured code. Here, we're getting to the same idea from a different angle: bad tests indicate problems with structure.
Let's start with several related test smells:
- Eager Test: your test tries to verify too much functionality.
- Assertion Roulette: too many assertions (guessing which one failed the test becomes a roulette).
- The Giant: just a really big test.
These are often signs that the system we're testing is too large and doing too many things at once - a god object. And they represent a very real problem: it's been measured that the Eager test smell is statistically associated with more change- and defect-prone production code.
If we return to the example from our article above, we were trying to improve a service that:
- checks the user's IP,
- determines the city based on the IP,
- retrieves current weather,
- compares it to previous measurements,
- checks how much time has passed since the last measurement,
- if it's above a certain threshold, the new measurement is recorded,
- and the result is printed to the console.
In the first, bad version of this code, all of this is done inside one function - (like this):
def local_weather():
# First, get the IP
url = "https://api64.ipify.org?format=json"
response = requests.get(url).json()
ip_address = response["ip"]
# Using the IP, determine the city
url = f"https://ipinfo.io/{ip_address}/json"
response = requests.get(url).json()
city = response["city"]
with open("secrets.json", "r", encoding="utf-8") as file:
owm_api_key = json.load(file)["openweathermap.org"]
# Hit up a weather service for weather in that city
url = (
"https://api.openweathermap.org/data/2.5/weather?q={0}&"
"units=metric&lang=ru&appid={1}"
).format(city, owm_api_key)
weather_data = requests.get(url).json()
temperature = weather_data["main"]["temp"]
temperature_feels = weather_data["main"]["feels_like"]
# If past measurements have already been taken, compare them to current results
has_previous = False
history = {}
history_path = Path("history.json")
if history_path.exists():
with open(history_path, "r", encoding="utf-8") as file:
history = json.load(file)
record = history.get(city)
if record is not None:
has_previous = True
last_date = datetime.fromisoformat(record["when"])
last_temp = record["temp"]
last_feels = record["feels"]
diff = temperature - last_temp
diff_feels = temperature_feels - last_feels
# Write down the current result if enough time has passed
now = datetime.now()
if not has_previous or (now - last_date) > timedelta(hours=6):
record = {
"when": datetime.now().isoformat(),
"temp": temperature,
"feels": temperature_feels
}
history[city] = record
with open(history_path, "w", encoding="utf-8") as file:
json.dump(history, file)
# Print the result
msg = (
f"Temperature in {city}: {temperature:.0f} °C\n"
f"Feels like {temperature_feels:.0f} °C"
)
if has_previous:
formatted_date = last_date.strftime("%c")
msg += (
f"\nLast measurement taken on {formatted_date}\n"
f"Difference since then: {diff:.0f} (feels {diff_feels:.0f})"
)
print(msg)
(Example and its improvements below courtesy of Maksim Stepanov)
Now, if we were to test this monstrosity, our test would have to mock all external services and probably fiddle with file recording, so it's already quite a setup. Also, what would we check in such a test? If there is a mistake and we look only at the console output, how do we figure out where the mistake occurred? The function under test is 60 lines long and operates multiple external systems; we'd definitely need some in-between checks for intermediate values. As you can see, we've got a recipe for an overgrown test.
Of course, this brings us to another test smell:
Excessive setup: the test needs a lot of work to get up and running.
Like, in our case, calling up three real services and a database. Dave Farley tells an even funnier example of excessive setup: creating and executing a Jenkins instance in order to check that a URL doesn't contain a certain string. Talk about overkill! This is a clear sign that parts of our code are too entangled with each other.
Our example with the weather service brings up another smell:
Indirect testing: when you have to do weird things to get to the system you want to test.
In our example with a long chain of calls, the first of those calls determines the user's IP. How do we test that particular call? The IP is in a local variable, and there is no way to get to it. Similar problems arise in more complex code, where we see private fields storing important stuff hidden away where the sun doesn't shine.
As we've said, all of these test smells often point to production code that is too tightly coupled and monolithic. So what do you do with it?
You refactor:
- Implement proper separation of concerns and split up different types of IO and app logic.
- Also, introduce dependency injections to use test doubles instead of dragging half the application into your test to check one variable.
- If the system under test is difficult or costly to test by nature (like a UI) - make it as slim as possible, extract as much logic from it as you can, leaving a Humble Object.
As an example, our single long function can be refactored into this to make it more testable.
There is one more thing you can do to improve testability, and it brings us back to the original point of the article. Talk to the testers on your team! Work together, and share your concerns.
Maybe even work in a single repository, with tests written in the same language as production code? Our team has been practicing this approach, and we're seeing great results.
Root: lack of attention to test code
Test code demands the same level of thought and care as production code. In another article, we've shown some common mistakes that occur when writing even simple unit tests. It took us six iterations to get from our initial version to one we were content with, and the test increased from one line to six in the process. Making things simple and obvious takes effort.
Many of the mistakes we've noticed are just cutting corners - like, for instance, naming tests test0
, test1
, test2
, etc., or creating a test with no assertions just to "see if the thing runs" and doesn't throw an exception (the Secret catcher smell).
The same is true about Hard-coding data - it's always easier to write values for strings and numbers directly than to hide them away in variables with (another extra effort) meaningful names. Here's an example of hard-coding:
@Test
void shouldReturnHelloPhrase() {
String a = "John";
String b = hello("John");
assert(b)
.matches("Hello John!");
}
Is the "John" in the input the same as the "John" in the output? Do we know that for sure? If we want to test a different name, do we have to change both Johns? Now we're tempted to go into the hello()
method to make sure. We wouldn't need to do that if the test was written like this:
@Test
void shouldReturnHelloPhrase() {
String name = "John";
String result = hello(name);
assert(result)
.contains("Hello " + name + "!");
}
As you can see, cutting corners results in extra work when analyzing results and maintaining the tests.
One more example is taking care to hide unnecessary details (something also covered by Gerard Meszaros). Compare two examples (from this article):
@Test
public void shouldAuthorizeUserWithValidCredentials() {
TestUser user = new TestUser();
openAuthorizationPage();
$("#user-name").setValue(user.username);
$("#password").setValue(user.password());
$("#login-button").click();
checkUserAuthorized();
}
Versus:
@Test
public void shouldAuthorizeUserWithValidCredentials() {
authorize(trueUsername, truePassword);
checkUserAuthorized();
}
Clearly, the second test is much more readable. But it took some work to get it there:
- we hid
openAuthorizationPage()
into a fixture (it was called by all test methods in that class); - we created the username and password class fields;
- we moved the authorization into a step method -
authorize(username, password)
.
Choosing and consistently implementing the right abstractions for your tests takes time and thought - though it does save time and thought in the long run.
Mistreating test code will lead to all the smells we've mentioned. Why does it happen? Maybe because people work under time pressure and it's tempting to cut corners.
Or maybe it is because there are different definitions of done, and when asked about delivery dates, one often answers in the context of "when can you run the happy path" instead of "when will it have tests and documentation." Tests have to be explicitly part of the definition of done.
Of course, there are other things that can cause this mistreatment, and very often, it's a simple lack of testing skills.
Root: not knowing test theory
This may sound obvious, but when writing tests, it's a good idea to know test theory. Stuff like:
- one test should check one thing;
- test should not depend upon each other;
- do the checks at lower levels if possible (API vs E2E, unit vs API);
- don't check the same thing at different levels, etc.
Test theory has been perfected through manual testing, and it is becoming increasingly clear that automated testing cannot isolate itself as just a branch of programming; people need the full-stack experience of both testing and coding.
A common mistake we've seen among developers with little E2E testing experience (or any kind of testing experience) is stuffing everything into E2E tests and making them really long, checking a thousand little things at a time. This produces the Giant and Assertion Roulette smells we've already discussed.
On the other hand, manual testers with little automated testing experience might do the opposite: write a test with no assertions (the aforementioned Secret catcher). Think about how you manually check that a page opens manually: you open it - bam, that's it, your wonderful human eyes have confirmed it's open. But an automated test needs an explicit check.
All of this means it's important to share expertize and review each other's work.
Root: direct translation into automated tests
The final problem we'll discuss today is overstuffing the E2E level - something we touched upon in the previous section. Naturally, we can't survive without UI tests, but generally speaking, it's enough to have the happy paths tested there. Digging into corner cases (which are by nature more numerous) is better done at lower levels, where it's much less costly.
This problem, when your test base is not a pyramid but an ice cream cone, can also have organizational causes.
People have criticized "automation factories", where the SDET and the manual tester live in two parallel worlds: one of them knows how to test, the other knows what to test. This results in test cases being translated into automated tests unthinkingly, and you end up with just a whole bunch of E2E tests because all manual testing is done via the UI. These tests end up being:
- less atomic, and thus less stable and more difficult to analyze when they fail;
- more costly to run.
There is another reason why this bloating of the E2E level can happen. The E2E tests are the ones that correspond directly to user needs and requirements. Which, as Google devs tell us, makes them particularly attractive to decision-makers ("Focus on the user and all else will follow").
The conclusion is that we can't have SDETs and manual testers live in separate worlds. They need to understand each other's work; manual testers need to provide guidance and testing expertise to automation engineers, but the step-by-step implementation of automated tests has to be left to SDETs.
Bringing it together
Test code can smell for many different reasons, but here, one theme keeps repeating itself. A major point in the developments of the last two decades is that quality should be a concern for all team roles, not just QA engineers. It's been found that "having automated tests primarily created and maintained either by QA or an outsourced party is not correlated with IT performance."
In this situation, certain test smells can be the outcome. Some smells point toward poor testability of underlying code, which, in turn, means testing and development are too far apart.
Poorly written and un-refactored test code, lack of abstractions that make tests more readable, excessive hard-coded data can all be signs that test code is treated as a formality. Here, again, quality is treated as an exclusive responsibility of testers.
Problems with the test pyramid, a bloated E2E level, and overgrown tests mean there is trouble with sharing expertise on test theory. It can also mean a rigid separation between manual and automated testing.
All in all, pay attention to problems with your tests - they can be signs of more serious faults within your team.