Join the Eval Army - a domain-trained, certified workforce that evaluates AI for the world's top labs and startups. Weekly cash. Frontier-model access. A public credential you can take anywhere. Equity for the first cohort.
Open contracts right now from our customer roster. Pay range shown is per hour. Apply with one click once you're past L1.
Every contract maps to one of these. Your specialty tags decide which contracts show up in your feed.
Read what the model produced, click the right score on a behaviorally-anchored 1 - 5 scale, write a one-line rationale. Ten seconds to three minutes per item depending on stakes.
Hybrid mode: an LLM judge pre-fills the eval, you confirm or override. Faster than scoring from scratch - and where most of the L2+ volume lives.
For L4 and L5. Define the anchors, write gold-standard reference items, adjudicate disputes between other raters. The work that shapes the field.
Pay scales with your level and task complexity. Training time is compensated. Direct deposit, PayPal, or Wise.
Work when you want, where you want. Most active raters do 5 - 20 hours per week. No quota. No shifts.
L1 - L5 certification is public, portable, and verifiable. Listed on your profile, citable on your résumé.
50+ countries. Async-first. Tax-aware payouts. We handle the W-9 / 1099 / international equivalents.
Free access to frontier model APIs through the workbench. Your eval skill compounds into prompt-engineering, red-teaming, and rubric-design fluency.
Disagreements are escalated to L5 adjudicators. Transparent reasons. No silent ghosting, no opaque QA strikes.
The Friday payout is real. So is the credential that goes on your résumé, the frontier-model access you'd otherwise pay for, and the equity in the field you're literally helping define.
Every level you earn is publicly verifiable at eval.qa/r/your-handle. LinkedIn badge. Résumé bullet. Citable on grant applications. Portable to any future employer in the AI ecosystem.
Portable, verifiable, public. Listed in the EvalQA registry. LinkedIn badge included.
$200/mo workbench credits to Claude Opus, GPT-5, Gemini 3 for L2+. Pays for itself.
Prompt engineering · Red-teaming · Rubric design · AILuminate safety. Earned from the work, not a test.
L4+ raters who contribute to a public rubric are credited by name. Build a citable research footprint.
First 1,000 L3+ raters receive a tokenized stake in the EvalQA upside. SAFE-equivalent, vested over 24 months.
L4+ raters get fast-tracked introductions to AI Eval Engineer roles at our customer companies. Six placements in 2026 so far.
Annual gathering of the Eval Army. Sponsored attendance for L3+. Regional meetups in 12 cities.
Top-decile raters get a yearly $500 gear/compute stipend. Branded swag drops quarterly for everyone.
Every eval you submit is calibrated against gold items. Sustain high agreement and you're invited to the next level.
"I'm a paralegal by day. Two evenings a week on EvalQA pays my rent. The work is genuinely interesting - I never feel like I'm wasting my brain."
"Onboarding was 27 minutes. I was on a paying contract by the next morning. After three months I'm L3 and earning more than my old contract QA job."
"What I love: when I disagree with an LLM judge, my override actually changes the training signal. I can see my impact in the dashboard."
Start now. We respond to passing exams within 48 hours.
Name, email, specialty tags. No résumé required.
20 gold-standard items across your specialties. Pass at κ ≥ 0.6 against our reference raters.
Contracts in your specialties appear in your feed within 48 hours. Apply with one click.
Submit evals, earn, climb the ladder. Every Friday by direct deposit, PayPal, or Wise.
No. Most raters come from professional backgrounds - writing, code, medicine, law, language, design - and learn the eval workflow during onboarding. The L1 exam takes about 30 minutes.
Every Friday by direct deposit, PayPal, or Wise. Pay scales by certification level and task complexity. Training time is compensated. We handle the tax paperwork (W-9 / 1099 in the US, equivalents internationally).
Eight non-cash levers: (1) a public verifiable L1 - L5 credential that travels with you (LinkedIn badge, profile page, registry listing), (2) $200/month frontier-model API credits for L2+, (3) micro-certs in prompt engineering, red-teaming, rubric design, and AILuminate safety, (4) co-authorship on published rubrics for L4+, (5) tokenized equity in EvalQA for the first 1,000 L3+ raters, (6) direct fast-track to AI Eval Engineer roles at customer companies for L4+, (7) sponsored EvalCon attendance for L3+, (8) a $500/year hardware/compute stipend for top-decile performers. Cash is the floor, not the ceiling.
Zero minimum. Work when you want, where you want. Most active raters do 5 - 20 hours per week. There are no shifts and no quotas - your feed of contracts is yours to pick from.
You can retake it after seven days. Most failed attempts come from rushing - slow down, read the anchors, and you'll usually pass on the second try. We score blind, so a failed first attempt doesn't affect the retake.
L2 in weeks, L3 in months for most raters. L4 and L5 require sustained high κ plus invitation. The fastest documented climb is L1→L5 in 11 months. The ladder is meritocratic - your evals do the talking.
Yes - once you're L1+, you'll see a referral link in your dashboard. We pay $50 when a referral passes L1, $200 when they reach L3.
Thirty minutes from now you'll either be a certified L1 rater - or you'll be back here wondering "what if".
Create your profile