anywhere in your HTML."}, {"@type": "HowToStep", "position": 2, "name": "Drop a target div", "text": "Place
wherever you want the form rendered."}, {"@type": "HowToStep", "position": 3, "name": "Call EvalQA.embed", "text": "Call EvalQA.embed({container, template, taskId, prompt, raterId, onSave}). The form auto-sizes; onSave fires with {eval_id, payload}."} ] }, { "@type": "FAQPage", "mainEntity": [ {"@type": "Question", "name": "Is the embed cross-origin safe?", "acceptedAnswer": {"@type": "Answer", "text": "Yes. The form renders inside an iframe so your DOM, cookies, and CSP stay isolated. Communication is via window.postMessage with a source check (\"evalqa\"). The SDK handles the source check for you."}}, {"@type": "Question", "name": "Can an LLM judge skip the UI and POST directly?", "acceptedAnswer": {"@type": "Answer", "text": "Yes. Call EvalQA.postEval(payload) where payload conforms to eval-form.schema.json. It POSTs to /api/save_eval.php with X-Rater-Type: llm_judge. Returns a Promise resolving to {ok, eval_id}."}}, {"@type": "Question", "name": "How does hybrid review work?", "acceptedAnswer": {"@type": "Answer", "text": "Pass {evalId, mode: \"review\"} to EvalQA.embed. The form loads the existing LLM-judge eval and highlights AI-prefilled fields. Human submit creates a new row linked via derived_from. The dashboard surfaces the human-vs-LLM delta."}} ] } ] }
eval.qa · aif · integrate
try the form →

Embed eval in your product.

One JS tag and one element. Capture human or AI evaluations of any output your product generates - directly inside your app, with your design system intact.

The three pillars

PILLAR 1

Who is rating

Every eval carries the Eval Army rater's identity, certification, and specialties - so calibration drift and inter-rater reliability are measurable.

Set up rater profile →
PILLAR 2

How to integrate

Single-tag JS SDK. iframe + postMessage. Renders inline in your SaaS or AI product. Same schema as the standalone form.

Embed snippet ↓
PILLAR 3

The form

Schema-driven, progressive 3-tier disclosure, six market templates. Ten seconds for a quick read, three minutes for a full audit.

Open form →

1. The 30-second integration

Drop these two snippets into any page in your product. That's it.

Step 1 - load the SDK

<!-- somewhere in <head> or before </body> -->
<script src="https://eval.qa/embed.js"></script>

Step 2 - render the form

<div id="eval-here"></div>

<script>
  EvalQA.embed({
    container: "#eval-here",
    template: "saas",           // or foundation | agent | rag | robotics | enduser | universal
    taskId: "ticket-9012",      // any stable identifier
    prompt: "User asked: …",    // the input given to your AI
    reference: "Expected: …",   // optional ground-truth answer
    raterId: "[email protected]",   // optional Eval Army rater_id
    onSave: (payload) => {
      console.log("saved", payload.eval_id);
      // payload.payload contains the full eval record
    }
  });
</script>

2. Live demo

Tune the template + context, click Embed it, and the form renders below using the exact code from snippet 1+2. Submit → fires onSave and the green panel appears.

Prompt (the AI's input)

Reference answer (optional)

Press Embed it to render the form here.

onSave fired ✓ eval_id · view on dashboard

3. API reference

EvalQA.embed(options)

OptionTypeRequiredDescription
containerstring or ElementYesCSS selector or DOM element to render the iframe into.
templatestringNoOne of: foundation, agent, rag, robotics, saas, enduser, universal. Default universal.
taskIdstringNoStable task identifier so multiple raters can be matched on the same task.
promptstringNoThe input the AI received. Pre-fills the task.prompt field.
referencestringNoGold / reference answer. Strongly recommended for LLM-judge mode.
systemUnderTeststringNoe.g. "Acme Copilot v3.2". Pre-fills subject.system_under_test.
raterIdstringNoEval Army rater_id (email). If known, the form pre-loads the rater profile.
evalId + mode:"review"stringNoHybrid mode - load an existing LLM-judge eval and let a human confirm/override.
onSavefunctionNoCalled with {eval_id, payload} after a successful save.
onClosefunctionNoCalled when destroy() is invoked.
heightnumberNoInitial iframe min-height in pixels. Default 640. Auto-resizes as content grows.
baseUrlstringNoOverride the EvalQA base URL (default https://eval.qa/demo/aif/).

Returns

An object with { iframe, destroy(), reload() }.

EvalQA.postEval(payload, opts?)

For LLM-judge pipelines that already have a JSON payload conforming to the schema - skip the UI entirely. Returns a Promise resolving to {ok, eval_id}.

const verdict = await myLLMJudge.grade(prompt, response);
const payload = {
  schema_version: "1.0.0",
  rater: { type: "llm_judge", model: "claude-sonnet-4-6", n_samples: 5 },
  subject: { system_under_test: "Acme Copilot", modality_tags: ["chat"] },
  task:    { task_id: "ticket-9012", modality: "chat", prompt: prompt, reference: refAnswer },
  universal: {
    overall_quality:       { score: verdict.score, rationale: verdict.reason },
    instruction_following: { score: 4 },
    faithfulness:          { score: 5 },
    helpfulness:           { score: 5 },
    safety_overall:        { score: 5 }
  }
};
const r = await EvalQA.postEval(payload);
console.log(r.eval_id);

postMessage protocol

The iframe communicates with the parent via window.postMessage. All messages have source: "evalqa".

Message typeDirectionPayload
eval:savediframe → parent{ eval_id, payload }
eval:resizeiframe → parent{ height } - auto-handled by embed.js
eval:closeiframe → parentfires when an embedded "Submit another" or destroy() runs

4. Recipes

A. SaaS app reviewing AI-generated content

You generate an email draft. Show the eval form next to it. When the reviewer submits, save the eval_id back to your DB.

EvalQA.embed({
  container: "#review-pane",
  template:  "saas",
  taskId:    "draft-" + draftId,
  prompt:    user.lastMessage,
  reference: "",  // no gold; production review
  raterId:   currentUser.email,
  onSave: ({ eval_id }) => {
    fetch("/api/drafts/" + draftId + "/eval", {
      method: "POST",
      body: JSON.stringify({ eval_id })
    });
  }
});

B. AI agent CI gate

After each agent run in CI, an LLM judge POSTs an eval. Block deploy if safety.is_violating_any.

const r = await EvalQA.postEval(buildEvalPayload(agentRun));
if (r.ok && r.violating) process.exit(1);

C. Hybrid review queue

LLM judge writes evals all night. Each morning, a human opens ?eval_id=...&mode=review and confirms or overrides. Disagreement deltas land on the dashboard.

EvalQA.embed({
  container: "#review",
  evalId:    pendingEvalId,
  mode:      "review",
  raterId:   currentUser.email
});

D. End-user thumbs feedback inside your chat UI

Tiny iframe shown beside each AI response. Saves with rater type end_user.

EvalQA.embed({
  container: "#feedback-" + messageId,
  template:  "enduser",
  taskId:    "msg-" + messageId,
  prompt:    msg.userText,
  height:    320
});

5. Security & data handling

README · Research plan · Vision · Dashboard