AI is sure advancing in leaps and bounds lately, doesn't it? We're planning to introduce AI-driven features in our tools, and we'll be discussing applications of AI in QA here in this blog. For now, though, let's do something more lighthearted. We've probably all been asking ourselves if AI can do our jobs, right? So let's give ChatGPT a test on automation in QA and see how it does!
This isn't serious, of course, for many reasons. It won't actually test if this neural network can do the job of a QA engineer... But let's do it anyway!
First test - just ticking boxes
I've taken the current free version of ChatGPT available on the OpenAI website and fed it the questions from an online course on API test automation in Java posted on the Test Automation University website.
As might be expected from an automated mind trained on the collective wisdom of the internet - it did rather well! Without any access to the course materials, it gave rather deep answers:
In fact, it only made two mistakes in the entire course, although one of them was kind of funny. The question was:
Which is NOT a consideration when determining who will write the automated tests? Choose one:
- Person's capacity
- Person's skill set
- Person's seniority
- Person's desire
ChatGPT answered unequivocally:
A person's desire is NOT a consideration when determining who will write the automated tests.
I know this is not indicative of anything, and yet in my mind, I can't help but picture this:
Let's try some Cucumber
So far, we've basically been automating googling. Now, instead of taking a test let's see if our bot can learn from a course on, say, Cucumber. To get a baseline, I wanted to see how ChatGPT would handle a task without the course, and asked it to write some tests in Gherkin for an imaginary search engine:
Well, this is sort of an answer. But since I've got the TAU course on the subject fresh in my mind, I remember that lumping too much stuff into one scenario isn't a good practice for writing in Gherkin. Let's see if ChatGPT could learn from that course.
Working on style
Long story short, that wasn't easy. ChatGPT can correctly describe what the course is about when asked directly. But applying the recommendations from the course to the search box example took some effort. Specifically, there was this part at the end about the three golden rules of Gherkin. One of those rules is not to put multiple "Given", "When", or "Then" statements in one scenario, and another prohibits scenarios of more than five steps. And as you can see from the screenshot, the bot broke those rules.
At first, I tried to lead it to the problem indirectly, asking it to improve the answer based on the course. The bot would change things in the code, sometimes related to the course, sometimes not. Then I told it those rules explicitly and asked it to follow them. "Sure, no problem", it replied, and wrote an amended scenario that still had the exact same issue from before. We went at it for some time, me pointing out the problem, it swearing that it understood me and fixing something unrelated. At one point, it said explicitly - yup, I've done it, there is only one of each statement, and still wrote a scenario with multiple Whens and Thens:
I know, I know, there is a simple reason for this misunderstanding, but there was a moment when my mammal brain was certain the soulless machine was messing with me.
Turns out, the bot wasn't quite certain what a "When" or a "Then" statement was - when asked directly, it picked the wrong lines. So, after a few more rounds of educational exchanges, it finally fixed the scenario. Just to be sure, I gave it a final task:
Victory at last, human and machine working together!
Well, that was a fun afternoon
Like I said, none of this is serious. I probably could have done better as a teacher, and if I wanted help with code, I'd go for a neural network built specifically for that purpose.
And yet, if you really squint this story can be seen as a metaphor for potential applications of neural networks in testing today. If a task is phrased in a very general manner, like "cover this search box with tests", a good human tester will definitely do better, because they know the software that they are testing. If you want the network to do better, that human tester has to be the one to state the requirements.
Of course, you could just train a network on your own data. But that means dealing with NDA issues, and naturally, investing time and money. In the end, you'll probably get an exceptionally useful tool. But it's not likely that it will make you run out of work.