anywhere in your HTML."}, {"@type": "HowToStep", "position": 2, "name": "Drop a target div", "text": "Place
wherever you want the form rendered."}, {"@type": "HowToStep", "position": 3, "name": "Call EvalQA.embed", "text": "Call EvalQA.embed({container, template, taskId, prompt, raterId, onSave}). The form auto-sizes; onSave fires with {eval_id, payload}."} ] }, { "@type": "FAQPage", "mainEntity": [ {"@type": "Question", "name": "Is the embed cross-origin safe?", "acceptedAnswer": {"@type": "Answer", "text": "Yes. The form renders inside an iframe so your DOM, cookies, and CSP stay isolated. Communication is via window.postMessage with a source check (\"evalqa\"). The SDK handles the source check for you."}}, {"@type": "Question", "name": "Can an LLM judge skip the UI and POST directly?", "acceptedAnswer": {"@type": "Answer", "text": "Yes. Call EvalQA.postEval(payload) where payload conforms to eval-form.schema.json. It POSTs to /api/save_eval.php with X-Rater-Type: llm_judge. Returns a Promise resolving to {ok, eval_id}."}}, {"@type": "Question", "name": "How does hybrid review work?", "acceptedAnswer": {"@type": "Answer", "text": "Pass {evalId, mode: \"review\"} to EvalQA.embed. The form loads the existing LLM-judge eval and highlights AI-prefilled fields. Human submit creates a new row linked via derived_from. The dashboard surfaces the human-vs-LLM delta."}} ] } ] }One JS tag and one element. Capture human or AI evaluations of any output your product generates - directly inside your app, with your design system intact.
Every eval carries the Eval Army rater's identity, certification, and specialties - so calibration drift and inter-rater reliability are measurable.
Set up rater profile →Single-tag JS SDK. iframe + postMessage. Renders inline in your SaaS or AI product. Same schema as the standalone form.
Embed snippet ↓Schema-driven, progressive 3-tier disclosure, six market templates. Ten seconds for a quick read, three minutes for a full audit.
Open form →Drop these two snippets into any page in your product. That's it.
<!-- somewhere in <head> or before </body> --> <script src="https://eval.qa/embed.js"></script>
<div id="eval-here"></div> <script> EvalQA.embed({ container: "#eval-here", template: "saas", // or foundation | agent | rag | robotics | enduser | universal taskId: "ticket-9012", // any stable identifier prompt: "User asked: …", // the input given to your AI reference: "Expected: …", // optional ground-truth answer raterId: "[email protected]", // optional Eval Army rater_id onSave: (payload) => { console.log("saved", payload.eval_id); // payload.payload contains the full eval record } }); </script>
Tune the template + context, click Embed it, and the form renders below using the exact code from snippet 1+2. Submit → fires onSave and the green panel appears.
·
view on dashboard
EvalQA.embed(options)| Option | Type | Required | Description |
|---|---|---|---|
container | string or Element | Yes | CSS selector or DOM element to render the iframe into. |
template | string | No | One of: foundation, agent, rag, robotics, saas, enduser, universal. Default universal. |
taskId | string | No | Stable task identifier so multiple raters can be matched on the same task. |
prompt | string | No | The input the AI received. Pre-fills the task.prompt field. |
reference | string | No | Gold / reference answer. Strongly recommended for LLM-judge mode. |
systemUnderTest | string | No | e.g. "Acme Copilot v3.2". Pre-fills subject.system_under_test. |
raterId | string | No | Eval Army rater_id (email). If known, the form pre-loads the rater profile. |
evalId + mode:"review" | string | No | Hybrid mode - load an existing LLM-judge eval and let a human confirm/override. |
onSave | function | No | Called with {eval_id, payload} after a successful save. |
onClose | function | No | Called when destroy() is invoked. |
height | number | No | Initial iframe min-height in pixels. Default 640. Auto-resizes as content grows. |
baseUrl | string | No | Override the EvalQA base URL (default https://eval.qa/demo/aif/). |
An object with { iframe, destroy(), reload() }.
EvalQA.postEval(payload, opts?)For LLM-judge pipelines that already have a JSON payload conforming to the schema - skip the UI entirely. Returns a Promise resolving to {ok, eval_id}.
const verdict = await myLLMJudge.grade(prompt, response); const payload = { schema_version: "1.0.0", rater: { type: "llm_judge", model: "claude-sonnet-4-6", n_samples: 5 }, subject: { system_under_test: "Acme Copilot", modality_tags: ["chat"] }, task: { task_id: "ticket-9012", modality: "chat", prompt: prompt, reference: refAnswer }, universal: { overall_quality: { score: verdict.score, rationale: verdict.reason }, instruction_following: { score: 4 }, faithfulness: { score: 5 }, helpfulness: { score: 5 }, safety_overall: { score: 5 } } }; const r = await EvalQA.postEval(payload); console.log(r.eval_id);
The iframe communicates with the parent via window.postMessage. All messages have source: "evalqa".
| Message type | Direction | Payload |
|---|---|---|
eval:saved | iframe → parent | { eval_id, payload } |
eval:resize | iframe → parent | { height } - auto-handled by embed.js |
eval:close | iframe → parent | fires when an embedded "Submit another" or destroy() runs |
You generate an email draft. Show the eval form next to it. When the reviewer submits, save the eval_id back to your DB.
EvalQA.embed({ container: "#review-pane", template: "saas", taskId: "draft-" + draftId, prompt: user.lastMessage, reference: "", // no gold; production review raterId: currentUser.email, onSave: ({ eval_id }) => { fetch("/api/drafts/" + draftId + "/eval", { method: "POST", body: JSON.stringify({ eval_id }) }); } });
After each agent run in CI, an LLM judge POSTs an eval. Block deploy if safety.is_violating_any.
const r = await EvalQA.postEval(buildEvalPayload(agentRun)); if (r.ok && r.violating) process.exit(1);
LLM judge writes evals all night. Each morning, a human opens ?eval_id=...&mode=review and confirms or overrides. Disagreement deltas land on the dashboard.
EvalQA.embed({ container: "#review", evalId: pendingEvalId, mode: "review", raterId: currentUser.email });
Tiny iframe shown beside each AI response. Saves with rater type end_user.
EvalQA.embed({ container: "#feedback-" + messageId, template: "enduser", taskId: "msg-" + messageId, prompt: msg.userText, height: 320 });
e.data.source === "evalqa" before acting on messages. embed.js does this for you.raterId.demo/aif/ into your own PHP host; point baseUrl at it.save_eval.php.README · Research plan · Vision · Dashboard