As ChatGPT and AI in general have surged to the forefront of discussion in tech spheres, here at Trymata we started wondering if AI could help our users with their usability testing.So we gave it a spin ourselves, experimenting with different prompts for writing usability testing tasks, and observing what works, what doesn’t, and where the AI tends to need a little more guidance.
Here’s what we found:
The bottom line is, you can use ChatGPT to help you write your user test script – but you’ll still need to put the finishing touches on it yourself.
Try it: Generate your AI test script now >
Fortunately, we can tell you exactly what wording to use in your prompts to the AI, to minimize the amount of fine-tuning required before you copy and paste into your Trymata test setup. Read on for our recommendations.
Article contents:
Our recommended AI prompt/query
What ChatGPT does well when writing task scripts
Common problems with ChatGPT usability test scripts
Conclusion
Best wording for generating your AI task script
We’ll start off with the short answer, before getting into the what-ifs, howevers, and don’t-forgets.
After trying out many different prompts, the wording that we found to generally work the best for a remote unmoderated usability test of a live website is:
Write a usability test scenario and tasks for [website URL]
Example:
Write a usability test scenario and tasks for www.misfitsmarket.com
Alternatively, you can replace the URL with a description of what your website is or does:
Write a usability test scenario and tasks for a website that… [describe your site]
Example:
Write a usability test scenario and tasks for a website that sells monthly subscription boxes of fruits and vegetables.
Both of the wording options above work quite well, and mostly generate pretty similar test scripts. Next, we’ll briefly discuss the kind of outputs you get from these prompts, what’s good about them, and the common problems you’ll want to double-check for before running a test with your ChatGPT script.
Quick link: Jump straight to the common problems
What works well about ChatGPT’s test scripts
Overall, the user test scripts that ChatGPT generates from these prompts closely follow the best practices for writing remote unmoderated usability tests, and are stylistically well-suited to a platform like Trymata. Here are some of the specific aspects it gets right:
- Understands your site and writes an applicable scenario
- Tasks approximate a natural user flow
- Task wording and length are good
- Total test length is also in the right range
- Script format is a perfect match for Trymata
1. ChatGPT is able to accurately surmise what a website is about from the URL, and then make good assumptions about the site’s audience and their goals.
Even if you don’t provide any kind of description of your website in your initial prompt, the AI engine is good at understanding what your site is for. This is why our first prompt suggestion works: you don’t really need to describe your website to the AI to get decent results.
Using what it can figure out about your website, ChatGPT creates a very good scenario to kick off your test. For example, this was the scenario it generated for our example query above (for www.misfitsmarket.com):
As a first-time user of the website, you are looking for a way to buy fresh and organic produce at a discounted price. You have heard about Misfits Market and want to see if they have what you’re looking for.
The engine was able to recognize that the site sells produce, and that discount prices is one of their selling points.
Note that using the site description method instead can yield even better results, especially if your platform or user flow has some unique aspects that AI may not easily predict. Here’s the scenario we got when using the description-style query instead – as you can see, this scenario is tailored more specifically to the audience’s identity and experiences:
You are a busy professional who wants to eat healthy but often finds it difficult to make time to shop for fresh fruits and vegetables. You have heard about a website that sells monthly subscription boxes of fresh fruits and vegetables and decide to check it out.
Either of these scenarios would work well for a user testing study of the Misfits Market website. We encourage you to try both prompt styles yourself and see what you get!
2. The tasks that ChatGPT generates are also based on good assumptions about the steps that would constitute a natural user flow on the site.
In addition to accurately identifying a website’s main purpose and a typical visitor’s main goals, ChatGPT also does a good job of mapping out a series of tasks that approximate a natural user journey through the site.
When we generated testing tasks for www.misfitsmarket.com above, for example, the script included steps like finding subscription options and picking one, checking the delivery schedule and information, and learning how to change or cancel a subscription.
This held true when we tried generating scripts for non-ecommerce platforms as well (since ecommerce sites tend to be some of the simplest and most predictable out there).
For example, when we asked for tasks for a cryptocurrency wallet, they consisted of things like setting up a new wallet, adding funds, sending a test amount to another wallet, and locating the transaction history. Tasks for a user test of Wikipedia included finding a section of an article from the table of contents, looking for related articles, and identifying sources from the citations.
3. ChatGPT tasks are short, simple, and concise, with straightforward wording and small, sequential goals to work towards.
The writing style we observed in the ChatGPT-generated tasks when using these prompts was in line with best practices for writing remote unmoderated user testing tasks.
Word choices in every task were plain and easy to understand (instead of using advanced, difficult, or industry/brand-specific words that users may not be familiar with).
Sentence structure was short and uncomplicated, and the total length of each task was generally just 12-15 words.
Additionally, each task was crafted around a single small goal for the user to accomplish before moving to the next one.
In our experience at Trymata, shorter tasks focusing on just one goal tend to be the most successful for remote unmoderated user testing. Users can easily feel overwhelmed by long or multi-part tasks. Not only that, but the more details or subsections a task includes, the more likely it is that a user accidentally misses or skips part of it.
4. The total length of the test scripts generated by ChatGPT is also good for a typical user test.
When creating a user test, it’s important to not overload it with too many steps, nor to provide too few instructions to fill the allotted time.
The scripts we got back from ChatGPT when using the recommended prompts were all within the range of 6-10 tasks per test, which is the range that we usually suggest to our users.
Note that the ideal number of tasks can vary based on a variety of factors, including how long each individual task is, the kind of interface you’re testing, the amount of time you have with each tester, and more. We’re not saying that you should never have less than 6 tasks or more than 10; only that for a pretty standard website test, that tends to be the ‘sweet spot.’
5. The format of the ChatGPT test scripts when using this style of prompt is a perfect match for Trymata’s test creation format.
If you’re using Trymata (or even another scenario-and-tasks-based remote user testing tool), the script you’ll get from ChatGPT using our recommended queries goes perfectly with the test setup model.
This makes it easy to map the AI’s output directly onto your Trymata test, and copy-paste each piece directly into the corresponding setup area.
The script format you get back from ChatGPT is not always exactly the same, but in our experience it always includes a brief scenario blurb (40-45 words) and a set of 6-10 numbered tasks.
In some – but not all – cases, it also included a numbered set of post-test questions (which can be easily mapped to our custom post-test written survey), and/or a final note to the researcher. This final note often mentions that the specific tasks may need to be adjusted based on the specific functionalities of your website, and sometimes suggested how long the test might take and what things the researcher should look out for during test sessions.
The final note is not a part of the script, so make sure not to include it in your test setup; just read it for your own reference and understanding.
Common problems with ChatGPT-generated user testing scripts
While the AI-generated scripts are broadly good for all of the above reasons, there are also specific flaws with them that tend to crop up.
Make sure to check your script for any of these problems before using your ChatGPT script for a real test. Pay special attention to items #2 and #3.
- May include unnecessary link for first task
- May reference items that don’t match your site contents
- May ask users to complete a real purchase
- Tasks can be overly leading
1. Sometimes the first task in the test script is a link to the test website, or an instruction to navigate to the site – which is not necessary for Trymata tests.
This one, of course, mainly applies if you’re running your user tests with Trymata.
When setting up your test with our usability testing platform, there’s a separate section for inputting your website URL. At the start of the test session, we automatically load up this URL in the tester’s browser, so there’s no need to provide it to them in a task too.
If your ChatGPT test script includes a step like this, you can feel free to delete it.
2. ChatGPT does NOT read specific titles, navigation options, or text from your site, so the tasks may reference pages/contents that don’t actually exist.
It’s very important to note that while ChatGPT can surmise what your website is about on its own, it does not actually read from the site’s real contents when crafting the tasks. Instead, it seems to make best-guess assumptions about the kinds of contents such a site is likely to have, and rely on those guesses.
This means that some tasks may instruct a user to click a certain navigation option, or visit a certain page, or locate a certain heading, which your site does not actually have.
For example, when we asked ChatGPT to write a user testing scenario and tasks for www.bananarepublic.com, one of the tasks it generated was:
Use the top menu to find the “Dresses” category, and narrow down the options by selecting “Occasion” and then “Special Occasion”.
However, the Banana Republic website does not in fact have a “Dresses” category – instead it groups “Dresses & Jumpsuits” together. Within that category, there are no “Occasion” or “Special Occasion” options at all, nor even anything similar to them.
Obviously this specific task would have to be modified with real navigation options from the site before running this test, or the users would end up confused and unable to complete the task.
If this happens in your own ChatGPT script, just update the wording to reflect existing contents from your site. In this example, we might rewrite it to say:
Use the top menu to find the category with dresses, and narrow down the options by adding filters for your preferred style, size, and color.
3. ChatGPT task scripts for ecommerce websites often tell users to complete a purchase. Users should NOT be asked to make a real transaction in a user test.
This is another very important flaw. Almost every time we generated a user testing script for an ecommerce website using these prompt styles, there was a task telling users to actually complete a purchase.
For obvious reasons, this is not something that can or should be asked of users during a test. If your ChatGPT script includes a task like this, be sure to change it.
The easiest fix is to change the task to have users stop once asked for credit card information. When writing tasks from scratch, we typically use a wording like:
Proceed through the steps to purchase your item, up until you are asked for credit card information. You may use made-up information for any of the steps along the way.
A task like this allows you to see most parts of the checkout flow, while keeping things comfortable and secure for the users.
If you really need to see them complete the checkout process and onwards, we recommend setting up a promo code they can use which flags the transaction as a test case on your site’s backend, and allows them to proceed without entering a real credit card. Obviously, this solution requires some extra engineering work to achieve.
The other best solution would be to test on a prototype or beta/staging site, instead of a live website.
4. The tasks that ChatGPT writes are often somewhat ‘leading’ – you may want to make instructions a little broader.
If you’re not sure what we mean by ‘leading,’ it’s when the task tells the user exactly what to click on or where to navigate to. This is not necessarily wrong in and of itself, but it does minimize some of the user’s agency as they decide how best to complete a task, and may skew how they go about it.
Take, for example, the Banana Republic task from section 2 above:
Use the top menu to find the “Dresses” category, and narrow down the options by selecting “Occasion” and then “Special Occasion”.
This task tells the user exactly what to click on, which means they aren’t making that decision for themselves. Of course, if you aren’t actually trying to investigate how users interact with the navigation menus for this specific test, that’s fine. Additionally, if you are interested in getting feedback on a specific interaction, you probably do want to tell users exactly what element they should interact with.
Otherwise, it’s best to frame a task like this with slightly more general wording, such as:
Find where the dresses are on this website, and narrow down your options to look for a dress that you could wear for a special occasion.
Conclusion
By using the conversational prompts we recommend in this article with ChatGPT, and correcting for any of the common issues listed above, you should be able to easily get a ready-to-use script for your Trymata usability test.
Ready to run your usability test? Sign up for a free 2-week trial now! >
One thing we haven’t touched on in this article is the variations of using ChatGPT for other kinds of tests, such as prototype tests, mobile app tests, or moderated user tests. The recommendations here are primarily focused on writing tasks for unmoderated website tests.
In order to keep this article focused and easily digestible, other test types have not been touched on. However, we did include some of these variations in our research, so check back on our blog later for follow-up articles. Let us know in the comments what kinds of tests you’d like to write tasks for with ChatGPT!
Create your AI user testing script now >