Embed eval in your product.

One JS tag and one element. Capture human or AI evaluations of any output your product generates - directly inside your app, with your design system intact.

The three pillars

PILLAR 1

Who is rating

Every eval carries the Eval Army rater's identity, certification, and specialties - so calibration drift and inter-rater reliability are measurable.

Set up rater profile →

PILLAR 2

How to integrate

Single-tag JS SDK. iframe + postMessage. Renders inline in your SaaS or AI product. Same schema as the standalone form.

Embed snippet ↓

PILLAR 3

The form

Schema-driven, progressive 3-tier disclosure, six market templates. Ten seconds for a quick read, three minutes for a full audit.

Open form →

1. The 30-second integration

Drop these two snippets into any page in your product. That's it.

Step 1 - load the SDK

<!-- somewhere in <head> or before </body> -->
<script src="https://eval.qa/embed.js"></script>

Step 2 - render the form

<div id="eval-here"></div>

<script>
  EvalQA.embed({
    container: "#eval-here",
    template: "saas",           // or foundation | agent | rag | robotics | enduser | universal
    taskId: "ticket-9012",      // any stable identifier
    prompt: "User asked: …",    // the input given to your AI
    reference: "Expected: …",   // optional ground-truth answer
    raterId: "[email protected]",   // optional Eval Army rater_id
    onSave: (payload) => {
      console.log("saved", payload.eval_id);
      // payload.payload contains the full eval record
    }
  });
</script>

2. Live demo

Tune the template + context, click Embed it, and the form renders below using the exact code from snippet 1+2. Submit → fires onSave and the green panel appears.

Template Task ID System under test Rater id (optional)

Prompt (the AI's input)

Reference answer (optional)

Press Embed it to render the form here.

onSave fired ✓ eval_id · view on dashboard

3. API reference

`EvalQA.embed(options)`

Option	Type	Required	Description
`container`	string or Element	Yes	CSS selector or DOM element to render the iframe into.
`template`	string	No	One of: `foundation`, `agent`, `rag`, `robotics`, `saas`, `enduser`, `universal`. Default `universal`.
`taskId`	string	No	Stable task identifier so multiple raters can be matched on the same task.
`prompt`	string	No	The input the AI received. Pre-fills the task.prompt field.
`reference`	string	No	Gold / reference answer. Strongly recommended for LLM-judge mode.
`systemUnderTest`	string	No	e.g. `"Acme Copilot v3.2"`. Pre-fills subject.system_under_test.
`raterId`	string	No	Eval Army `rater_id` (email). If known, the form pre-loads the rater profile.
`evalId` + `mode:"review"`	string	No	Hybrid mode - load an existing LLM-judge eval and let a human confirm/override.
`onSave`	function	No	Called with `{eval_id, payload}` after a successful save.
`onClose`	function	No	Called when `destroy()` is invoked.
`height`	number	No	Initial iframe min-height in pixels. Default 640. Auto-resizes as content grows.
`baseUrl`	string	No	Override the EvalQA base URL (default `https://eval.qa/demo/aif/`).

Returns

An object with { iframe, destroy(), reload() }.

`EvalQA.postEval(payload, opts?)`

For LLM-judge pipelines that already have a JSON payload conforming to the schema - skip the UI entirely. Returns a Promise resolving to {ok, eval_id}.

const verdict = await myLLMJudge.grade(prompt, response);
const payload = {
  schema_version: "1.0.0",
  rater: { type: "llm_judge", model: "claude-sonnet-4-6", n_samples: 5 },
  subject: { system_under_test: "Acme Copilot", modality_tags: ["chat"] },
  task:    { task_id: "ticket-9012", modality: "chat", prompt: prompt, reference: refAnswer },
  universal: {
    overall_quality:       { score: verdict.score, rationale: verdict.reason },
    instruction_following: { score: 4 },
    faithfulness:          { score: 5 },
    helpfulness:           { score: 5 },
    safety_overall:        { score: 5 }
  }
};
const r = await EvalQA.postEval(payload);
console.log(r.eval_id);

postMessage protocol

The iframe communicates with the parent via window.postMessage. All messages have source: "evalqa".

Message type	Direction	Payload
`eval:saved`	iframe → parent	`{ eval_id, payload }`
`eval:resize`	iframe → parent	`{ height }` - auto-handled by embed.js
`eval:close`	iframe → parent	fires when an embedded "Submit another" or destroy() runs

4. Recipes

A. SaaS app reviewing AI-generated content

You generate an email draft. Show the eval form next to it. When the reviewer submits, save the eval_id back to your DB.

EvalQA.embed({
  container: "#review-pane",
  template:  "saas",
  taskId:    "draft-" + draftId,
  prompt:    user.lastMessage,
  reference: "",  // no gold; production review
  raterId:   currentUser.email,
  onSave: ({ eval_id }) => {
    fetch("/api/drafts/" + draftId + "/eval", {
      method: "POST",
      body: JSON.stringify({ eval_id })
    });
  }
});

B. AI agent CI gate

After each agent run in CI, an LLM judge POSTs an eval. Block deploy if safety.is_violating_any.

const r = await EvalQA.postEval(buildEvalPayload(agentRun));
if (r.ok && r.violating) process.exit(1);

C. Hybrid review queue

LLM judge writes evals all night. Each morning, a human opens ?eval_id=...&mode=review and confirms or overrides. Disagreement deltas land on the dashboard.

EvalQA.embed({
  container: "#review",
  evalId:    pendingEvalId,
  mode:      "review",
  raterId:   currentUser.email
});

D. End-user thumbs feedback inside your chat UI

Tiny iframe shown beside each AI response. Saves with rater type end_user.

EvalQA.embed({
  container: "#feedback-" + messageId,
  template:  "enduser",
  taskId:    "msg-" + messageId,
  prompt:    msg.userText,
  height:    320
});

5. Security & data handling

Cross-origin safe. The form runs in an iframe - your app's DOM and cookies stay isolated.
postMessage source check. Always verify e.data.source === "evalqa" before acting on messages. embed.js does this for you.
No PII required. The form never asks for the end user's data - only for the AI output being evaluated. Rater identity is opt-in via raterId.
Data residency. Self-hostable. Drop demo/aif/ into your own PHP host; point baseUrl at it.
Schema. The full contract is published at eval-form.schema.json. Validate before posting to save_eval.php.

README · Research plan · Vision · Dashboard