OD Agent Team — Consulting Methodology, Agent System, and Technical Vocabulary
A planned, evidence-based approach to improving an organization's capacity to change and achieve greater effectiveness. OD practitioners diagnose organizational systems, design interventions, and facilitate learning at individual, team, and organizational levels.
A structured diagnostic process using multiple data-collection methods — survey, interview, observation — to develop an integrated picture of organizational functioning. Assessments produce findings, not just data; they interpret patterns across multiple lenses before recommending action.
In this system, a full assessment typically combines Agent 01 (survey), Agent 02 (interview protocol), Agent 04 (quant analysis), and Agent 06 (synthesis).
A discrete domain of organizational functioning being investigated — e.g., Leadership Effectiveness, Psychological Safety, Change Readiness. Dimensions define the scope of inquiry and are the organizing frame for survey items, interview probes, and synthesis findings.
Selected at run creation. The system generates instruments and protocols calibrated to the declared dimensions.
The population of people included in the assessment — e.g., all civilian personnel, senior leadership only, a single department. Scope affects sampling strategy, instrument design, and the generalizability of findings.
The confidentiality guarantee offered to survey respondents. Anonymous — no identifying data collected; responses cannot be traced. Confidential — identity known to the researcher but not shared; reporting aggregated. Named — responses linked to individuals; used when attribution is required.
Mode directly influences candor. Federal engagements default to anonymous or confidential.
A latent variable representing a theoretical concept that cannot be directly observed — e.g., Psychological Safety, Organizational Commitment. Constructs are measured indirectly through survey items; reliability is assessed via Cronbach's Alpha.
In Agent 04, constructs are defined as named groupings of survey item numbers with expected scoring formulas.
A structured questionnaire administered to a defined population to collect quantifiable data on organizational dimensions. Items use Likert-scale, semantic differential, or forced-choice formats. Instruments are validated for internal consistency before deployment.
A semi-structured guide for conducting one-on-one or focus group interviews. Contains opening framing, dimension-aligned probe questions, follow-up probes, and closing. Protocols are calibrated to interview type (individual, focus group, leadership panel) and session duration.
An established, validated OD measurement tool used to inform or benchmark a custom instrument design. Examples: Denison Organizational Culture Survey, Gallup Q12, OCAI. The system holds reference instruments in its content library and uses them to calibrate item wording and construct coverage.
A survey item or disaggregation variable used to break down results by subgroup — e.g., Department, Tenure, Pay Grade. Filters enable comparative analysis across organizational segments. Must be defined before instrument design to ensure they are captured in the data.
Statistical examination of survey data to identify patterns, score constructs, assess reliability, and test distributions. Outputs include descriptive statistics, construct scores, Cronbach's Alpha values, and factor-level summaries.
Thematic examination of interview and open-response data. Produces coded themes, representative quotes, and narrative descriptions of organizational experience. Complements quantitative findings by adding context, nuance, and voice.
A research design that combines quantitative and qualitative data collection and analysis. The OD assessment workflow is a mixed-methods design — survey data and interview data are collected independently, then triangulated in synthesis.
The process of comparing findings across two or more data sources to assess convergence, complementarity, or divergence. A finding is convergent when quant and qual data point in the same direction. Complementary when they address different facets. Divergent when they contradict — which itself is a finding requiring explanation.
Agent 06 builds a triangulation map as its first deterministic stage before generating the narrative synthesis.
A statistical measure of internal consistency for a construct — i.e., whether the items in a scale are measuring the same underlying concept. Alpha ranges 0–1; values above 0.70 are generally acceptable; above 0.80 is strong. Low alpha signals items should be revised or removed.
A pointer from a synthesis claim back to the source data — e.g., a survey item number, an interview theme code, or a statistical finding. Evidence references are validated by Agent 06 to ensure every recommendation is traceable to observed data.
A specific, observable outcome that a learning program is designed to produce — e.g., "Participants will demonstrate at least two adaptive leadership behaviors in post-program simulations." Objectives drive content selection, activity design, and evaluation instrument construction.
A detailed operational document for workshop facilitators. Contains session timeline, learning objectives, activity scripts, transition language, materials lists, contingency notes, and debrief prompts. Produced by Agent 05 in DOCX and Markdown formats.
A set of instruments used to measure program effectiveness at multiple points — pre-assessment (baseline), post-assessment (learning gain), session feedback (reaction), and observation checklist (behavior). Based on Kirkpatrick's four levels. Produced by Agent 07 in DOCX and XLSX formats.
The format in which a learning program is delivered. In-person — face-to-face in a physical space. Virtual — fully remote via video platform. Hybrid — some participants in-person, others remote simultaneously. Self-paced — asynchronous individual completion. Modality affects activity design, timing, and technology requirements.
A specialized AI module that executes a defined OD task. Each agent takes a structured input, runs deterministic preparation stages, calls the LLM for reasoning and generation, merges outputs, validates quality, and returns a structured result. Agents are stateless — all state persists in Supabase.
The TypeScript implementation of an agent, conforming to the AgentDefinition<I, O> interface. Declares the agent key, input/output schemas, deterministic stages, LLM stages, output merge function, and quality checks.
The shared runner (framework.ts) that executes any registered agent. Handles input validation, Supabase state persistence, stage sequencing, cost logging, and status transitions. Individual agents declare what to do; the framework handles how to run it safely.
A map of agent keys to agent definitions (registry.ts). Agents are registered via side-effect imports in agents/index.ts. Any Worker that runs agents must import index.ts to trigger registration before calling runAgent().
An agent stage that runs pure computation — data preparation, validation, scoring, mapping — without calling the LLM. Deterministic stages run first and produce structured outputs that inform the LLM stages. Their outputs are predictable, fast, and cost-free.
An agent stage that calls the Anthropic API to generate content — survey items, interview probes, narrative analysis, recommendations. LLM stages receive prepared context from deterministic stages and return structured JSON parsed from the model response.
Default model: claude-haiku-4-5 for speed and cost. Heavy reasoning tasks (Agent 06) use claude-sonnet-4-6.
The final function in an agent definition that combines results from all deterministic and LLM stages into the agent's final typed output object. Runs after all stages complete. May call Supabase Storage to upload generated artifacts (DOCX, XLSX, Markdown).
A post-merge validation function that inspects the agent's output against defined criteria — coverage thresholds, required fields, structural constraints. A failed quality check sets step status to partial and surfaces issues for operator review.
A top-level workflow execution record in the as_runs table. Each run has a unique ID, a workflow key, a run name, input payload, status, and timestamp. A run contains one or more steps.
A per-agent execution record within a run, stored in as_steps. Contains the agent key, step order, input payload, output payload, status, dependency step IDs, and error detail. Steps persist all outputs to Supabase after execution — no in-memory state survives across requests.
The structured JSON object passed to an agent at run creation. Validated against the agent's Zod input schema before execution begins. Contains all parameters the agent needs — client context, dimensions, data paths, configuration. Immutable after run creation.
The structured JSON object produced by a completed agent step, stored in as_steps.output_payload. Conforms to the agent's Zod output schema. Downstream agents in a chain receive this object as the basis for their input construction.
The mechanism by which an upstream agent's output is automatically mapped into the input of a downstream agent upon approval. Implemented in chain.ts. When a step is approved and has downstream dependencies, the review controller builds the chained input and fires the next agent without operator intervention.
Active in the Leadership Development Pipeline (03 → 05 → 07).
A step that must reach complete status before a downstream step can begin. Declared in the workflow template as dependsOn logical IDs, resolved to real Supabase UUIDs at run creation. Sequential workflows enforce execution order through dependency tracking.
Every step moves through a defined status lifecycle. The operator console reflects current status in real time.
Operator action that marks a step complete and records a review decision in as_reviews. If the step has downstream dependents with output chaining configured, they are automatically triggered.
Operator action that marks a step failed and records a rejection decision. Downstream steps are not triggered. The step can be rerun after the underlying issue is addressed (typically by adjusting the input payload and creating a new run).
Re-executes a failed or partial step using its original input payload. Resets the step to pending and triggers the agent framework again. Optionally resets downstream steps if resetDownstream: true.
Each workflow corresponds to one or more agents. Single-agent workflows run one step; sequential workflows run steps in order gated by operator approval.
| Workflow Key | Name | Agents | Mode |
|---|---|---|---|
| survey_designer | Survey Instrument Designer | Agent 01 | Single |
| interview_protocol | Interview Protocol Developer | Agent 02 | Single |
| quant_analysis | Quantitative Analysis | Agent 04 | Single |
| synthesis_report | Synthesis Report | Agent 06 | Single |
| content_curator | Content Curation Engine | Agent 03 | Single |
| facilitation_guide | Facilitation Guide Generator | Agent 05 | Single |
| evaluation_builder | Evaluation Package Builder | Agent 07 | Single |
| assessment_standard | Assessment Standard | Agent 01 → 02 | Sequential |
| leadership_development_pipeline | Leadership Development Pipeline | Agent 03 → 05 → 07 | Sequential + Chained |
Generates a validated survey instrument from declared diagnostic dimensions, reference instruments, and contextual parameters. Produces item text, response scales, construct groupings, and a Qualtrics-ready export format.
Client name, industry, assessment scope, diagnostic dimensions, anonymity mode, demographic filters, reference instruments.
Survey items organized by construct, response scale definitions, demographic items, Qualtrics block structure, reliability notes.
Builds structured interview protocols aligned to assessment dimensions and session logistics. Produces opening framing, dimension-aligned probe questions, follow-up probes, and debrief guidance calibrated to interview type and duration.
Client context, diagnostic dimensions, interview types (individual, focus group, leadership panel), total sessions, session duration.
Scores and selects content library items by objective alignment, construct match, reading level, and source quality. Produces a curated content module set optimized for the program's learning objectives and delivery constraints.
Currently relies on the 15-item internal content library. External search is stubbed (always degraded). Expansion planned.
Program name, client, industry, delivery modality, module count, total duration, program objectives.
Runs statistical analysis on a CSV survey export stored in Supabase Storage. Scores constructs, computes descriptive statistics, calculates Cronbach's Alpha per construct, identifies distribution patterns, and surfaces highest and lowest scoring items.
Client context, Supabase Storage path to the survey export CSV, construct definitions (name + item numbers + scoring formula).
Produces a detailed facilitation guide with session timeline, activity scripts, transition language, debrief prompts, and contingency notes. Uploads DOCX and Markdown outputs to the guides/ Supabase Storage bucket.
When chained from Agent 03, receives curated content modules as input and builds the guide around them. In standalone mode, generates without content curation upstream.
Integrates quantitative and qualitative findings into a triangulated, executive-ready report. Builds a triangulation map, generates dimension-level narratives, identifies cross-cutting themes, writes recommendations with evidence references, and produces an executive summary.
Uses claude-sonnet-4-6 for its three LLM stages due to reasoning complexity. All recommendations are validated against evidence references before output is finalized.
Completed synthesis runs display a "View Report" button in the operator console. The report viewer at /report.html renders the full output and provides DOCX download.
Builds a full evaluation package including pre/post assessments, session feedback forms, and observation checklists derived from the facilitation guide's activities and objectives. Uploads DOCX and XLSX outputs to Supabase Storage.
Controls which instruments are generated: pre_assessment, post_assessment, session_feedback, observation_checklist. Any combination can be requested.
Runs Agent 01 and Agent 02 in sequence from a single form submission. Agent 01 runs first; upon approval, Agent 02 fires automatically using the same organizational context. Produces a matched survey instrument and interview protocol for the same engagement.
Runs Agent 03, 05, and 07 in sequence with output chaining. Agent 03 curates content; upon approval, Agent 05 receives those modules and generates the facilitation guide; upon approval, Agent 07 uses the guide to build the evaluation package. One form submission, one complete L&D deliverable set.
Chaining logic is implemented. End-to-end validation run pending — first live test of the full 03 → 05 → 07 sequence on the Cloudflare deployment.
The static hosting layer. Serves files from the public/ directory — index.html, report.html, terminology.html, and static assets. No build step required; files are deployed as-is.
Edge functions that execute server-side logic. In this project, Workers are defined as TypeScript files in the functions/ directory and compiled by Wrangler at deploy time. They run on Cloudflare's global edge network, not a centralized server.
Cloudflare's file-based routing system for Workers within a Pages project. A TypeScript file at functions/api/agents/run.ts automatically becomes the handler for POST /api/agents/run. No routing configuration required.
Cloudflare's CLI and build tool. Used to run local development (wrangler pages dev), compile Workers, and deploy the project (wrangler pages deploy public). Reads configuration from wrangler.toml.
The backend database and storage layer. PostgreSQL database holds all run and step state in as_-prefixed tables. Supabase Storage holds generated artifacts (DOCX, XLSX, Markdown) in named buckets. The same Supabase project is shared between the original Next.js build and this Cloudflare deployment.
Supabase's database-level access control. Enabled on all as_ tables. The service role key bypasses RLS for all agent writes; the anon key is restricted to permitted operations only. Never use the anon key for agent execution.
The TypeScript interface (src/lib/types/env.ts) that declares all secrets and bindings available to Workers. Cloudflare injects this object into every function via context.env. Never use process.env in this project — it does not exist in the Workers runtime.
The pattern of explicitly passing env through every function that needs secrets — from the Worker's context.env into RunAgentParams.env, into AgentExecutionContext.env, and into every createAdminClient(env) call. No global state; every secret is explicitly scoped.
A local secrets file read by Wrangler during wrangler pages dev. Equivalent to .env.local in Next.js. Never committed to git — listed in .gitignore. For reference, see .dev.vars.example.
A Cloudflare compatibility flag declared in wrangler.toml. Required to use Node.js built-in APIs — specifically Buffer — which are needed by the docx and xlsx packages for document generation. Do not remove this flag.
Top-level workflow run records. Columns: id, run_name, workflow_key, input_payload, status, created_at.
Per-agent step records. Columns: id, run_id, agent_key, step_order, status, input_payload, output_payload, depends_on, error_detail.
Human review records. Columns: id, step_id, decision (approved/rejected), reviewed_by, review_notes, created_at.
JSONB workflow definitions seeded into Supabase. Each template declares the workflow key, family, step sequence, agent keys, and dependency map. The /api/agents/templates endpoint reads active templates to populate the operator console workflow selector.
Append-only audit log. Every significant state transition appends an event record — run created, step started, step complete, review recorded. Used for debugging and run history.
Shared cost logging table (pre-existing). Every LLM call writes input tokens, output tokens, estimated cost, model, and workflow context. Enables cost tracking across all agent executions.
Default model for all agent LLM stages. Fast, cost-efficient, and suitable for structured generation tasks where the prompt provides strong context. Used in Agents 01, 02, 03, 05, 07.
Heavy reasoning model. Used in Agent 06 (Synthesis Report) for its three LLM stages — triangulation narrative, recommendations, and executive summary — where multi-step analytical reasoning across large contexts is required.
A TypeScript string constant in src/lib/agents/prompts/templates.ts that defines the system prompt or user prompt for an LLM stage. Prompts are interpolated with run-time values using the interpolate() utility before being sent to the API.
Do not import raw .md files — no TypeScript declaration exists for them. All prompts must be TypeScript constants.