What the output looks like

Actual scores, actual deductions. Nothing hypothetical.

Maria Gonzalez

Medical Interpretation · English → Spanish

84

Deductions

-5Dropped the dosage instruction entirely
Accuracy
-4Only partially interpreted patient question
Completeness
-3Said 'she says' instead of first person
Protocol
-2Stumbled on medical terminology
Fluency
-2Skipped clarifying an ambiguous term
Terminology

Generated in 18s · 3-agent consensus

100 - 16 = 84

Before Agent One

12 calls reviewed this week

all done by hand

3 reviewers, scores all over the place

up to 18 pts apart

45 minutes per report

that is 9 hrs/week just on QA

After Agent One

487 calls reviewed this week

zero manual effort

Same scoring logic on every call

no reviewer variance

Report done in 18 seconds

0.15 hrs/week on QA

Mistakes we catch on every call

These show up constantly. The AI flags them automatically.

Summarizing instead of interpreting

critical

The interpreter paraphrases what was said instead of rendering it faithfully. This changes meaning.

-50

Role confusion

critical

Saying "I think..." as themselves instead of interpreting in first person for the speaker.

-10

Dropping medical or legal terms

moderate

Leaving out dosages, diagnoses, or legal terms. The kind of detail that matters most.

-5

Inserting personal opinions

critical

The interpreter adds their own commentary or advice. Not their role.

-10

Skipping parts of the conversation

moderate

Cherry-picking what to interpret. Side comments, overlapping speech, quiet remarks get left out.

-5

Not flagging communication breakdowns

moderate

When a speaker is unclear or uses idioms, the interpreter should step in. Often they do not.

-3

Things we have learned about interpretation QA

Short reads. Straight to the point.

Patterns

The 5 interpreter errors that keep showing up

Summarizing instead of interpreting is still number one. It also carries the biggest penalty.

2 min read
Metrics

Pass rate is a vanity metric

80% pass rate sounds fine until you realize the threshold is 70. That is a low bar. QA depth tells you more.

1 min read
Industry

Most QA systems score the wrong thing

They score the call. We score the interpretation. Big difference. One measures outcomes, the other measures the interpreter.

2 min read
Scoring

Why summarizing costs 50 points

It is not a minor slip. When an interpreter summarizes instead of interpreting, the meaning changes. Automatic 50-point deduction.

1 min read
Technology

Why we use three AI agents, not one

A single model can make things up. Three independent evaluations reach consensus. That is harder to fool than any single reviewer.

2 min read
Scale

Going from 5% to 100% call coverage

Most teams review a handful of calls. When you review all of them, you start seeing patterns that were completely hidden.

1 min read

Done listening to calls manually?

Score every call with your rubric. You can be up and running in minutes. No integrations required.