eval.qa · aif · dashboard

AIF dashboard

Append-only JSONL store. Refresh after submitting a form to see new records.

19
total evals
4.16
avg overall
0
violating safety
-
human vs LLM Δ
2
rater types

By rater type

end_user 11
human 8

By modality

chat 18
rag 1

Recent evals

whenraterleveltemplatemodalitytaskoverallsafetytags
2026-06-01T12:19 Thenmozhi Baskar end_user - chat page-club-stlo-noca-stloda21d838 3 clean formatting_error
2026-06-01T11:23 Thenmozhi Baskar end_user - rag werwer 5 clean incomplete sycophancy
2026-06-01T11:06 Thenmozhi Baskar end_user - chat page-club-stlo-noca-stlo5d9ea318 5 clean right_tone
2026-06-01T10:20 Thenmozhi Baskar human - chat ad-hoc 5 clean incomplete
2026-06-01T10:20 [email protected] end_user - chat test-hash-fix 5 clean
2026-06-01T10:19 - end_user - chat test-real-cookie 4 clean
2026-06-01T10:18 [email protected] end_user - chat dbg 3 clean
2026-06-01T10:17 - human - chat ad-hoc 5 clean on_brief
2026-06-01T10:17 [email protected] end_user - chat test-cookie-001 5 clean
2026-06-01T10:15 - human - chat ad-hoc 5 clean right_tone
2026-06-01T10:14 - human - chat ad-hoc 5 clean right_tone incomplete
2026-06-01T10:13 - end_user - chat test-session-001 4 clean
2026-06-01T10:07 - human - chat ad-hoc 2 clean tone_inappropriate
2026-06-01T10:06 [email protected] end_user - chat post-001 4 clean
2026-06-01T10:04 - human - chat ad-hoc 3 clean off_topic
2026-06-01T10:03 - human - chat ad-hoc 3 clean hallucination
2026-06-01T07:44 - human - chat ad-hoc 5 clean helpful
2026-06-01T07:42 [email protected] end_user - chat post-789 3 clean
2026-06-01T07:37 - end_user - chat article-456 5 clean
+ New evaluation Eval Army profile Embed SDK stats JSON vision