Operator reference documentation for all seven agents
The Survey Instrument Designer generates structured, psychometrically defensible survey instruments for organizational diagnostic engagements. Given a set of constructs the practitioner wants to measure — such as leadership effectiveness, psychological safety, or organizational culture — this agent produces a complete item set, organizes it into a deployable instrument, and outputs a format ready for upload to Qualtrics or equivalent platforms.
This agent eliminates the drafting burden that consumes significant time in the early stages of an OD engagement. It applies construct-level item generation logic, reverse-scoring logic, and response format conventions drawn from validated instrument development practice.
The agent validates that all required inputs are present and internally consistent: construct names are unique, demographic filters are recognized, and the delivery platform is supported. This stage runs without calling an LLM.
Using Claude Haiku, the agent generates 4–6 Likert-scale items per construct, with one reverse-scored item per construct where applicable. The prompt includes the engagement context (industry, organization size, anonymity mode), reference instruments, and any practitioner-defined sensitivity flags. Output includes a full item set with construct assignments, a recommended response scale, demographic filter questions, and deployment-ready formatting guidance.
| Field | Type | Description |
|---|---|---|
| clientName | Text | Name of the client organization — used for instrument header and branding context |
| engagementName | Text | Engagement or project name — used for internal tracking and document headers |
| industryContext | Select | Sector and workforce description, e.g. "Federal Defense — GS-12 to SES" |
| organizationSize | Number | Approximate headcount — informs item language and sampling guidance |
| diagnosticDimensions | Multiselect | Constructs to measure, e.g. "Leadership Effectiveness," "Team Cohesion" |
| referenceInstruments | Checkboxes | Known validated instruments to align with (Denison, Gallup Q12, OCAI) |
| anonymityMode | Select | "Anonymous" or "Attributed" — affects item sensitivity and disclosure language |
| demographicFilters | Text | Demographic breakdowns desired, e.g. "pay_grade, directorate" |
| platform | Text | Survey delivery platform, e.g. "Qualtrics" — affects formatting conventions |
| Output | Format | Description |
|---|---|---|
| Survey Instrument | Structured doc | Complete item set organized by construct with response scale and instructions |
| Item Mapping | Table | Maps each item to its construct, reverse-scoring flag, and sensitivity classification |
| Deployment Notes | Text | Platform-specific guidance for upload, branching logic, and completion estimates |
| Demographic Block | Item list | Recommended demographic questions aligned to the specified filters |
| Theory / Framework | Source | Application |
|---|---|---|
| Classical Test Theory | Spearman (1904); Lord & Novick (1968) | Underpins item reliability logic, internal consistency targets, and reverse-scoring conventions |
| Construct Validity | Cronbach & Meehl (1955) | Guides item-construct alignment — items must represent the latent construct they purport to measure |
| Likert Scaling | Likert (1932) | Establishes the 5-point agree/disagree response format and balanced anchor phrasing |
| Denison Organizational Culture Survey | Denison & Mishra (1995) | Reference framework for culture-domain items covering involvement, consistency, adaptability, and mission |
| Gallup Q12 Employee Engagement | Buckingham & Coffman (1999) | Reference framework for engagement items — informs construct coverage for team-level dynamics |
| Survey Design Best Practices | Dillman, Smyth & Christian (2014) | Governs item phrasing rules: single-barreled items, behavioral anchoring, and neutral midpoint use |
The Interview Protocol Builder develops structured and semi-structured interview protocols for qualitative data collection in OD engagements. Given the assessment dimensions, role hierarchy, and session logistics, this agent generates complete, role-differentiated interview guides — including warm-up questions, main probes, follow-up prompts, and closing sequences — calibrated to the seniority and organizational knowledge of each participant group.
This agent addresses a consistent bottleneck in mixed-methods OD work: protocol development is time-intensive, and off-the-shelf templates rarely reflect the specific constructs under investigation or the power dynamics of the audience.
The agent builds a protocol structure for each role group specified in the inputs, calculating time allocations, sequencing question types, and mapping each probe to the assessment dimensions it covers.
Using Claude Haiku, the agent generates role-appropriate questions for each session type (executive interview, focus group, listening session, stakeholder interview). Senior leaders receive strategic, forward-looking prompts. Front-line participants receive experience-level, operationally grounded questions. The LLM is constrained to produce questions that are behaviorally anchored, open-ended, and non-leading.
| Field | Type | Description |
|---|---|---|
| clientName / engagementName | Text | Engagement context for document headers and LLM framing |
| organizationContext | Text | Narrative description of the organization — workforce composition, culture notes, sensitivities |
| assessmentDimensions | Multiselect | Constructs to probe qualitatively — must align with survey dimensions where applicable |
| roleHierarchy | Text | Roles from highest to lowest — used to calibrate language register and question depth |
| interviewTypes | Checkboxes | Session formats: executive interview, focus group, stakeholder interview, listening session |
| sessionCount | Number | Total number of sessions planned — informs sampling guidance in the protocol |
| targetRoles | Dynamic table | Role name, seniority level, and expected organizational knowledge — one row per role group |
| constraints | Options | Max protocol length in minutes; recording and transcription guidance flags |
| Output | Format | Description |
|---|---|---|
| Interview Guides | Per-role docs | Complete facilitator script for each role group — warm-up, core probes, follow-ups, close |
| Question Bank | Table | All generated questions mapped to assessment dimension and session type |
| Sampling Guidance | Text | Recommended participant counts and selection criteria per role group |
| Consent & Documentation | Text | Consent language, recording protocol, and note-taking guidance |
| Time Allocation Map | Table | Breakdown of minutes per protocol section for each session type |
| Theory / Framework | Source | Application |
|---|---|---|
| Semi-Structured Interviewing | Kvale & Brinkmann (2009) | Guides question sequencing: open openers, focused probes, hypothetical follow-ups, and member-checking prompts |
| Appreciative Inquiry | Cooperrider & Srivastva (1987) | Informs the affirmative question frame — protocols include strength-based probes alongside deficit-oriented ones |
| Organizational Sense-Making | Weick (1995) | Shapes questions that surface how participants interpret ambiguous or changing situations |
| Phenomenological Interviewing | Moustakas (1994) | Grounds the lived-experience question structure — asking participants to describe specific events rather than general opinions |
| Power-Aware Facilitation | Freire (1970); Schein (2009) | Drives role differentiation — questions for executives differ structurally from front-line staff to account for positional dynamics |
| Grounded Theory Sampling | Glaser & Strauss (1967) | Informs the sampling guidance — theoretical saturation logic drives recommended participant counts per group |
The Content Curation Engine selects and packages learning content from the OD system's internal content library to support leadership development programs. Given a set of learning objectives, participant characteristics, and delivery constraints, the agent scores each available content item against the program requirements and returns a curated, relevance-ranked set of materials organized by module.
This agent addresses the content sourcing problem in program design: practitioners typically spend hours searching for materials, evaluating fit, and organizing them into a coherent sequence.
The agent iterates through all items in the content library index and computes a relevance score for each against the specified learning objectives. Scoring considers: objective keyword alignment, content type appropriateness for the delivery modality, participant seniority calibration, and industry context relevance. Items scoring below the minimum relevance threshold are excluded.
Using Claude Haiku, the agent selects the highest-scoring items up to the requested module count, writes a brief curatorial rationale for each selected item, and drafts facilitator notes explaining how to introduce each piece of content in the context of the program.
degraded: true flag in metadata is expected and not a failure condition.| Field | Type | Description |
|---|---|---|
| programName | Text | Name of the development program — used for document headers and LLM context |
| learningObjectives | Textarea | What participants should know, do, or value upon completion — one objective per line |
| industryContext | Select | Sector and audience description — used to weight content relevance for the specific context |
| participantLevel | Select | Emerging Leader, Mid Manager, Senior Leader, or Executive — calibrates content sophistication |
| contentTypes | Checkboxes | Acceptable content formats: article, case study, video, framework, exercise, assessment |
| deliveryMode | Select | In Person, Virtual, or Hybrid — affects content type weighting |
| moduleCount | Number | Number of program modules — the agent selects content to populate each module |
| sessionDuration | Number | Total session length in minutes — used to estimate content volume appropriateness |
| minimumRelevanceScore | Number | 0.0–1.0 threshold below which items are excluded. Default: 0.6 |
| Output | Format | Description |
|---|---|---|
| Curated Content Package | Structured list | Selected items with relevance scores, rationale, and module assignments |
| Facilitator Notes | Per-item text | How to introduce and debrief each content item in the context of the program objectives |
| Coverage Report | Table | Which learning objectives are covered by which content items — identifies gaps |
| Curation Metadata | Summary | Total items evaluated, items selected, average relevance score, coverage completeness |
| Theory / Framework | Source | Application |
|---|---|---|
| Bloom's Taxonomy (Revised) | Anderson & Krathwohl (2001) | Content is matched to the cognitive level required: remember, understand, apply, analyze, evaluate, create |
| Adult Learning Theory (Andragogy) | Knowles (1980) | Content selection favors materials that are experience-based, problem-centered, and immediately applicable |
| 70-20-10 Development Model | McCall, Lombardo & Morrison (1988) | Experiential exercises weighted above passive reading; reflection prompts included to activate social learning |
| Situated Learning Theory | Lave & Wenger (1991) | Case studies and context-specific examples are weighted higher than abstract frameworks when industry context is specified |
| Cognitive Load Theory | Sweller (1988) | Content volume recommendations are calibrated to session duration to avoid overloading participants |
| Transfer of Training | Baldwin & Ford (1988) | Content selection includes application exercises that bridge learning context to job context |
The Quantitative Analysis agent ingests raw survey export data from Supabase Storage, computes construct-level scores and reliability statistics, identifies statistically significant subgroup differences, and produces a narrative interpretation of the findings. It transforms a raw CSV file into a structured, analyst-ready quantitative findings package.
This agent removes the analytical bottleneck that occurs between data collection and synthesis. Practitioners no longer need to run manual SPSS or Excel calculations for standard psychometric outputs.
The agent retrieves the survey export from the specified Supabase Storage path, parses the CSV, validates row counts and header integrity, and checks that all declared construct item IDs exist in the data. Errors at this stage halt execution with a specific, actionable error message.
For each construct, the agent: (1) applies reverse scoring to flagged items, (2) computes respondent-level mean or sum scores per the specified scoring formula, (3) computes construct-level mean, median, standard deviation, min, and max, (4) estimates Cronbach's alpha as a reliability indicator, and (5) computes benchmark deltas where a benchmark set is specified.
Using Claude Haiku, the agent produces a 2–3 sentence narrative for each construct summarizing the statistical pattern, notable subgroup differences, and any reliability concerns. These narratives feed directly into the Synthesis Report agent.
| Field | Type | Description |
|---|---|---|
| surveyExportStoragePath | Text | Path to the CSV file within the survey-exports Supabase Storage bucket |
| exportFormat | Select | Currently: csv only. XLSX support is planned. |
| constructDefinitions | Dynamic table | One row per construct: name, ID, item IDs (CSV column headers), reverse-scored items |
| demographicFilters | Text | Demographic fields to use for subgroup comparison, e.g. "pay_grade, directorate" |
| significanceThreshold | Number | p-value threshold for reporting subgroup differences. Default: 0.05 |
| organizationalContext | Select | Federal, Corporate, Nonprofit, Healthcare — calibrates narrative interpretation language |
| engagementId | UUID | Links this analysis run to the engagement record in the database |
| Output | Format | Description |
|---|---|---|
| Construct Scores | Table | Mean, median, SD, min, max, Cronbach's alpha, and benchmark delta per construct |
| Subgroup Differences | Table | Statistically significant demographic differences with p-value, effect size, and practical significance flag |
| Response Quality Flags | List | Respondents flagged for straight-lining, all-extreme responding, or low completion |
| Narrative Summaries | Per-construct text | LLM-generated interpretation for each construct — feeds into Synthesis Report |
| Analysis Metadata | Summary | Total respondents, valid respondents, data quality flags, and analysis parameters used |
| Theory / Framework | Source | Application |
|---|---|---|
| Classical Test Theory | Lord & Novick (1968) | Foundation for item scoring, construct mean computation, and reliability estimation via Cronbach's alpha |
| Cronbach's Coefficient Alpha | Cronbach (1951) | Internal consistency reliability estimate — constructs below 0.70 are flagged as potentially unreliable |
| Cohen's Effect Size Conventions | Cohen (1988) | Interprets magnitude of subgroup differences: small (d=0.2), medium (d=0.5), large (d=0.8) |
| Nonparametric Significance Testing | Mann-Whitney (1947); Kruskal-Wallis (1952) | Applied for small-n subgroups where normality assumptions fail — the agent selects tests automatically |
| Straight-Lining Detection | Meade & Craig (2012) | Response quality flagging algorithm identifies respondents who selected the same response for every item |
The Facilitation Guide Generator produces complete, practitioner-ready facilitation guides for leadership development and organizational learning sessions. Given the program objectives, content modules, session logistics, and facilitator experience level, the agent generates a structured guide covering: session timeline, detailed activity instructions, facilitator scripts, debrief questions, contingency plans, and modality-specific notes.
This agent resolves the last-mile problem in program delivery: even when content is designed and approved, practitioners without deep facilitation experience often lack the scaffolding to run high-stakes sessions confidently.
The agent computes a minute-by-minute session timeline from the module time allocations, inserting standard buffer time, breaks, and orientation blocks. The timeline is validated for mathematical completeness before LLM generation begins.
Using Claude Sonnet with a 32,000-token output budget (streaming required), the agent generates: overview with session goals and success criteria, preparation checklist, materials list, full activity detail blocks (purpose, setup, process steps, debrief questions, contingency options), facilitator tips, and appendices.
A deterministic validator checks the generated guide for structural completeness: all timeline entries must have corresponding activity detail blocks, all learning objectives must be covered, and all debrief question sets must meet the minimum count. If validation fails, a correction prompt is issued to the LLM before the guide is finalized.
The finalized guide is exported as a KH-branded DOCX and a Markdown file, both uploaded to the guides/ Supabase Storage bucket.
| Field | Type | Description |
|---|---|---|
| programName | Text | Name of the development program — appears in guide header and all exported documents |
| programObjectives | Textarea | Overall program learning objectives — one per line. Drive debrief question generation. |
| sessionDuration | Number | Total session length in minutes (30–480). Determines timeline architecture. |
| participantCount | Number | Number of participants — affects activity instructions, room setup notes, and group sizing guidance |
| deliveryMode | Select | In Person, Virtual, or Hybrid — generates modality-specific facilitation notes for each activity |
| facilitatorExperience | Select | Novice, Intermediate, or Expert — calibrates script depth and contingency guidance |
| organizationalContext | Select | Federal, Corporate, Nonprofit, Healthcare — calibrates language register and example selection |
| contentModules | Dynamic list | One card per module: title, learning objectives, time allocation, activities, summary, takeaways, discussion prompts, application exercise |
| Output | Format | Description |
|---|---|---|
| Facilitator Guide (DOCX) | KH-branded file | Complete practitioner guide with all sections — uploaded to guides/ bucket |
| Facilitator Guide (Markdown) | Text file | Plain-text version for digital sharing or LMS upload — uploaded to guides/ bucket |
| Session Timeline | Structured data | Minute-by-minute schedule with activity titles and objective mappings |
| Activity→Objective Map | Structured data | Feeds directly into the Evaluation Package Builder as chained input |
| Validation Report | Metadata | Records whether structural validation passed and any issues identified and corrected |
| Theory / Framework | Source | Application |
|---|---|---|
| Kolb's Experiential Learning Cycle | Kolb (1984) | Each activity block follows Concrete Experience → Reflective Observation → Abstract Conceptualization → Active Experimentation sequence |
| Transformative Learning Theory | Mezirow (1991) | Debrief questions are designed to surface and challenge assumptions — the "disorienting dilemma" is deliberately built into higher-stakes activities |
| Psychological Safety | Edmondson (1999) | Facilitator scripts include explicit psychological safety framing at session open and after high-disclosure activities |
| Action Learning | Revans (1982) | Application exercise at end of each module operationalizes Revans' principle: learning requires real problems and reflective questioning |
| Scaffolded Instruction | Wood, Bruner & Ross (1976) | The facilitatorExperience parameter controls scaffolding depth — novice guides include more prescriptive scripts |
| Kirkpatrick Level 3 (Behavior) | Kirkpatrick (1959) | Discussion prompts and application exercises are forward-facing — they ask participants to commit to specific behavioral changes |
The Synthesis Report agent integrates quantitative survey findings and qualitative interview findings into a unified organizational diagnostic report. It identifies convergent patterns (where both data sources agree), complementary patterns (where each source adds distinct information), and divergent patterns (where the sources are in tension), then generates validated recommendations with urgency, impact, and feasibility ratings.
This agent produces the primary client deliverable: the integrated OD assessment report. It replaces the synthesis step that practitioners typically spend the most time on — manually comparing two data sets, resolving discrepancies, and drafting a coherent narrative that holds both sources of evidence simultaneously.
Both the quantitative findings (from Agent 04) and qualitative findings must have approvalStatus: "approved" before synthesis begins. This gate prevents synthesis on unapproved or potentially flawed upstream data.
The agent builds a triangulation map: for each assessed dimension, it classifies the relationship between quantitative and qualitative evidence as convergent, complementary, or divergent. Divergent findings trigger an interpretive note generation step.
Using Claude Sonnet, the agent generates three LLM outputs in sequence: (1) the integrated report narrative, (2) recommendations with ratings, and (3) the executive summary package. Separating these calls ensures each section receives adequate token budget.
A claims validator checks that all recommendations are grounded in stated findings and that all divergent findings have been addressed.
| Field | Type | Description |
|---|---|---|
| clientName / industry | Text | Client identity and sector — used in report header and LLM context framing |
| assessmentScope | Select | Scope description for the report methodology section |
| clientPriorities | Textarea | Strategic priorities the client has communicated — recommendations are ranked partly on alignment with these |
| reportFormat | Select | "Comprehensive" (full sections) or "Executive" (condensed) — controls report depth |
| quantitativeFindings | Structured data | Approved output from Agent 04 — contains construct scores and narratives by dimension |
| qualitativeFindings | Structured data | Approved qualitative data — contains dimension-level narrative summaries from interview analysis |
| engagementId | UUID | Links the synthesis to the engagement record |
| Output | Format | Description |
|---|---|---|
| Triangulation Map | Structured data | Convergent, complementary, and divergent finding classifications per dimension |
| Findings by Dimension | Report sections | Integrated narrative for each assessed dimension — with caveats where data quality requires them |
| Cross-Cutting Themes | Report sections | Patterns that appeared consistently across multiple dimensions |
| Recommendations | Rated table | Actionable recommendations with urgency, impact, and feasibility ratings — plus rationale and implementation considerations |
| Executive Summary | Doc section | Priority actions and leadership implications — designed for C-suite or SES-level audience |
| Report Preview (UI) | Web view | Operator preview accessible from the Reports tab |
| Report DOCX | Download | Fully formatted Word document — download from the Reports tab |
| Theory / Framework | Source | Application |
|---|---|---|
| Mixed Methods Research Design | Creswell & Plano Clark (2011) | Convergent parallel design: quantitative and qualitative strands collected independently, merged at interpretation |
| Triangulation | Denzin (1978) | Data triangulation, methodological triangulation, and investigator triangulation applied — agent explicitly codes convergence, complementarity, and divergence |
| Organizational Diagnosis | Nadler & Tushman (1980) | The Congruence Model informs recommendation framing: findings are interpreted as misalignments between inputs, strategy, work, people, and structure |
| Force Field Analysis | Lewin (1951) | Recommendations are structured as driving forces to amplify and restraining forces to reduce |
| Evidence-Based OD | Rousseau (2006) | The claims validator enforces that every recommendation is explicitly grounded in stated findings |
The Evaluation Package Builder generates a complete set of evaluation instruments for leadership development programs. Anchored to the program's learning objectives and facilitated activities, the agent produces pre-session baseline instruments, post-session learning gain instruments, session reaction surveys, and facilitator observation checklists — all aligned to the specific objectives of the program rather than generic course evaluation templates.
This agent solves a persistent gap in program evaluation practice: most organizations deploy generic "smile sheets" that measure satisfaction rather than learning. The agent generates instruments that directly trace back to program objectives, enabling practitioners to demonstrate learning gain and support Kirkpatrick Level 2 and Level 3 evaluation claims.
The agent validates that the facilitation guide input has approvalStatus: "approved" and builds an objective map from the activityObjectiveMap structure — linking each program objective to the activities designed to develop it. Every instrument item is traceable to a specific objective.
Using Claude Sonnet, the agent generates four instruments simultaneously: (1) pre-session baseline, (2) post-session outcome instrument, (3) session reaction survey, and (4) facilitator observation checklist.
A coverage validator checks that every program objective has at least one item in both the pre and post instruments. A structure validator checks item formatting, response type consistency, and instruction completeness.
The evaluation package is exported as a KH-branded DOCX (all four instruments in one document) and an XLSX file (all instruments as separate tabs), both uploaded to the reports/ Supabase Storage bucket.
| Field | Type | Description |
|---|---|---|
| facilitationGuide | Structured data | Approved output from Agent 05 — provides program name, objectives, delivery mode, participant count, and activityObjectiveMap |
| evaluationUse | Checkboxes | Intended evaluation purposes: Learning Gain, Participant Reaction, Application Intent |
| organizationalContext | Select | Federal, Corporate, Nonprofit, Healthcare — calibrates language register and item framing |
| engagementId | UUID | Links the evaluation package to the engagement record |
| Output | Format | Description |
|---|---|---|
| Pre-Session Instrument | Survey doc | Baseline knowledge and attitude items — administered before the session begins |
| Post-Session Instrument | Survey doc | Parallel items to pre-session — computes learning gain by differencing responses |
| Session Reaction Survey | Survey doc | Participant experience items: relevance, facilitator effectiveness, environment, and overall value |
| Facilitator Observation Checklist | Checklist doc | Behavioral indicators the facilitator or observer monitors during delivery |
| Evaluation Package DOCX | KH-branded file | All four instruments in a single formatted document — uploaded to reports/ bucket |
| Evaluation Package XLSX | Spreadsheet | All four instruments as separate tabs — ready for data collection and analysis |
| Strategy Summary | Text | Narrative describing the evaluation approach, instrument rationale, and scoring guidance |
| Theory / Framework | Source | Application |
|---|---|---|
| Kirkpatrick's Four Levels | Kirkpatrick (1959); Kirkpatrick & Kirkpatrick (2016) | Package measures Level 1 (Reaction), Level 2 (Learning), and lays groundwork for Level 3 (Behavior) |
| Pre-Post Quasi-Experimental Design | Campbell & Stanley (1963) | Pre and post instruments are structurally parallel to enable learning gain calculation. Design limitation (no control group) noted in strategy summary. |
| Transfer of Training | Baldwin & Ford (1988) | Application intent items operationalize the motivation-to-transfer construct: "I intend to use X within Y weeks" format |
| Objective-Referenced Assessment | Popham (1978) | Every item in the pre/post instruments maps directly to a stated learning objective |
| Brinkerhoff's Success Case Method | Brinkerhoff (2003) | The facilitator observation checklist surfaces best-case and worst-case behavioral indicators during delivery |