Case Studies in AI Evaluation, Retrieval Readiness, and Workflow Design

Case studies grouped by capability. Each documents a specific engagement — the context, the method applied, and what the evidence showed.

AI Response Quality

Case Study · AI Response Quality

Why most low-scoring AI responses were incomplete, not incorrect

Evaluation of 949 GenAI responses across seven product teams. Responses were scored using a structured rubric separating completeness failures from accuracy failures. The dominant failure mode was missing context — present in 71% of low-scoring responses.

949 responses evaluated across 7 product teams · 71% of low scores: incomplete context, not incorrect output

Read case study →

Retrieval Readiness

Case Study · Retrieval Readiness

Running a retrieval readiness assessment across five content workstreams

Structured audit of existing knowledge bases before deploying RAG across five parallel workstreams. Each workstream was assessed for source quality, coverage gaps, and structural consistency.

5 RAG readiness workstreams coordinated

Read case study →

Related: Retrieval Readiness Assessment — the method applied in this work.

Workflow Scoping

Case Study · Workflow Scoping

Scoping an AI-supported content workflow for a product documentation team

Applied the Smallest Reliable System framework to define automation boundaries and maintenance requirements for a content team moving to an AI-assisted authoring workflow.

Read case study →

Related: Smallest Reliable System Scoping — the method applied in this work.

Measurement

Case Study · Measurement

Measuring knowledge quality improvement across a workflow pilot

Usability scoring applied at the start and end of a structured knowledge improvement cycle across a product workflow pilot. Tracked completeness, accuracy, and actionability across scoring periods.

Usability score: 39 → 57 across pilot cycles · improvement attributed to source content changes, not model changes

Read case study →