Why most low-scoring AI responses were incomplete, not incorrect
Evaluation of 949 GenAI responses across seven product teams. Responses were scored using a structured rubric separating completeness failures from accuracy failures. The dominant failure mode was missing context — present in 71% of low-scoring responses.
949 responses evaluated across 7 product teams · 71% of low scores: incomplete context, not incorrect output
Read case study →