Case Studies in AI Evaluation, Retrieval Readiness, and Workflow Design

Case studies grouped by capability. Each documents a specific engagement — the context, the method applied, and what the evidence showed.

AI Response Quality

Case Study · AI Response Quality

Why most low-scoring AI responses were incomplete, not incorrect

Evaluation of 949 GenAI responses across seven product teams. Responses were scored using a structured rubric separating completeness failures from accuracy failures. The dominant failure mode was missing context — present in 71% of low-scoring responses.

949 responses evaluated across 7 product teams · 71% of low scores: incomplete context, not incorrect output

Read case study →

Retrieval Readiness

Related: Retrieval Readiness Assessment — the method applied in this work.

Workflow Scoping

Case Study · Workflow Scoping

Scoping an AI-supported content workflow for a product documentation team

Applied the Smallest Reliable System framework to define automation boundaries and maintenance requirements for a content team moving to an AI-assisted authoring workflow.

Read case study →

Related: Smallest Reliable System Scoping — the method applied in this work.

Measurement

Case Study · Measurement

Measuring knowledge quality improvement across a workflow pilot

Usability scoring applied at the start and end of a structured knowledge improvement cycle across a product workflow pilot. Tracked completeness, accuracy, and actionability across scoring periods.

Usability score: 39 → 57 across pilot cycles · improvement attributed to source content changes, not model changes

Read case study →