You have a year of support tickets. You also have a year of doc updates. There is no easy way to tell whether the docs have improved enough to answer the questions that came in. A replay gives you a number.
The setup
Take resolved tickets from the last twelve months. For each one:
- Give a model only the current docs and the user's original question. No ticket history, no human reply.
- Have the model write an answer.
- Compare the model's answer to the actual resolution.
Score each ticket as one of:
- Match: the model answered correctly from the docs.
- Partial: the answer covers part of the resolution.
- Miss: the docs do not have enough information.
For a hundred tickets this can run in an hour and costs a few dollars. For a thousand, you can sample.
What the numbers mean
The headline number is the match rate. If 60% of past tickets could have been answered from current docs, that is the upper bound of what self-serve could resolve today. The real number is lower, because users do not always find the right page, but the upper bound matters.
The more useful output is the list of misses, grouped by topic. That tells you which pages have the highest support cost.
A few things to watch
- Tickets are not a random sample of questions. They are the questions hard enough that the user wrote in. Easy questions get answered by the docs already.
- Some misses are intentional. Pricing, account-specific issues, and security questions are often not in the docs by design. Filter those out before reporting.
- A model that confabulates makes the match rate look better than it is. Use a strict comparison, or have a human spot-check a sample.
Running it again
The point is not the first run. It is running the same evaluation every month or quarter. The match rate over time is a measure of how much the docs are doing for support, and you can attribute changes to specific doc updates.