Case Study · RAG Content Guidance

Developed shared guidance for improving RAG content

Scale 20–30 contributors across 5 workstreams

Role Content framework design, evaluation, and program coordination

Context Cross-functional working group, pre-RAG deployment

I led a cross-functional working group that identified what documentation needs to work well in RAG systems and turned the findings into shared content guidance. The work revealed that important context must appear in the content itself, not only in metadata.

The problem

Teams were experimenting with GenAI and retrieval-based systems, but results were inconsistent and difficult to explain. Content written for sequential human reading often lost meaning when it was retrieved as a fragment.

The organization had no shared way to identify which content characteristics supported retrieval, assess current documentation, or turn the findings into concrete writing guidance.

The approach

I organized 20–30 contributors into five workstreams covering content structure, metadata and tagging, clarity and style, metrics and audit, and adoption and training. Each workstream had a defined lead, scope, and deliverables so contributors could make progress alongside their regular work.

I coordinated the workstreams, supported contributors, tracked progress, and brought the findings together into a shared content framework.

1

Identify content factors — Determine which characteristics may affect retrieval and response quality.

2

Assess current content — Review documentation for recurring gaps in structure, context, metadata, and clarity.

3

Develop shared guidance — Turn the findings into practical guidance that documentation teams can apply.

4

Revise sample content — Apply the guidance to an initial content set and identify remaining problems.

5

Prepare for validation — Define how revised content should be tested against live retrieval once the required infrastructure is available.

The framework treated retrieval as a content-design problem. Structure, visible context, and clarity determined whether a retrieved fragment could stand on its own.

Structure

Write topics as self-contained units.
Use clear hierarchy and scannable formatting.
Balance granularity with enough context to preserve meaning.

Metadata

Use descriptive titles and explicit product or feature context.
Do not rely on metadata fields alone to carry meaning.
Put important retrieval signals in the content that is embedded.

Clarity

Use precise terminology and define unfamiliar terms.
Make sentence relationships explicit.
Remove ambiguity that becomes harder to resolve when content is retrieved out of context.

What the work revealed

The work showed that vector retrieval depended on context visible in the embedded content, not only on traditional metadata fields or tags.

Key finding For vector-based retrieval, meaning must live in the content itself, not only in metadata.

Three patterns emerged consistently across the assessment:

Content must stand alone. A fragment can be factually correct and still fail if it lacks the context needed to answer the question.
Over-chunking removes meaning. Content divided too finely may work for navigation but produce fragments that are too small to support a useful answer.
Retrieval quality begins with content design. Missing context, ambiguous terms, and incomplete topic scope often traced back to authoring decisions rather than retrieval tuning alone.

How the framework was applied

I turned the framework into a structured evaluation model that assessed documentation topics and produced prioritized improvement suggestions. Applying it to an initial content set revealed recurring gaps in structure, clarity, and contextual completeness.

The team released a first draft of the shared guidance and prepared content for follow-up testing. Full validation against live retrieval required search infrastructure and test environments that were not yet available before the project ended.