🛡️ Now accepting raters worldwide

Become the expert
AI learns from.

Join the Eval Army - a domain-trained, certified workforce that evaluates AI for the world's top labs and startups. Weekly cash. Frontier-model access. A public credential you can take anywhere. Equity for the first cohort.

Take the L1 exam See open contracts
Weekly cash + frontier-model credits Public L1 - L5 credential Founding-cohort equity Work from anywhere
L1 - L5
portable credential
$200/mo
frontier-model credits
1,000
founding-cohort equity slots
8
specialty tracks
Live contracts · refreshed hourly

Real work, real pay rates.

Open contracts right now from our customer roster. Pay range shown is per hour. Apply with one click once you're past L1.

Foundation model red-teaming
$45 - $70/hr
42 hired this weekApply →
Coding agent transcript review
$55 - $90/hr
28 hired this weekApply →
Medical RAG faithfulness audit
$80 - $120/hr
11 hired this weekApply →
Legal copilot citation accuracy
$95 - $130/hr
9 hired this weekApply →
Multilingual safety eval (FR/DE/HI)
$30 - $55/hr
67 hired this weekApply →
Robotics task-success review
$50 - $85/hr
14 hired this weekApply →
Customer-support copilot QA
$25 - $40/hr
119 hired this weekApply →
Image generation aesthetic rating
$20 - $35/hr
87 hired this weekApply →
What you do

Three kinds of work. Pick yours.

Every contract maps to one of these. Your specialty tags decide which contracts show up in your feed.

A

Score AI output

Read what the model produced, click the right score on a behaviorally-anchored 1 - 5 scale, write a one-line rationale. Ten seconds to three minutes per item depending on stakes.

B

Confirm AI judges

Hybrid mode: an LLM judge pre-fills the eval, you confirm or override. Faster than scoring from scratch - and where most of the L2+ volume lives.

C

Write rubrics & gold

For L4 and L5. Define the anchors, write gold-standard reference items, adjudicate disputes between other raters. The work that shapes the field.

Why join

Built for the way you actually work.

$

Weekly Friday payouts

Pay scales with your level and task complexity. Training time is compensated. Direct deposit, PayPal, or Wise.

Zero minimum hours

Work when you want, where you want. Most active raters do 5 - 20 hours per week. No quota. No shifts.

📈

Portable credential

L1 - L5 certification is public, portable, and verifiable. Listed on your profile, citable on your résumé.

🌍

Anywhere on earth

50+ countries. Async-first. Tax-aware payouts. We handle the W-9 / 1099 / international equivalents.

🧠

Learn while you earn

Free access to frontier model APIs through the workbench. Your eval skill compounds into prompt-engineering, red-teaming, and rubric-design fluency.

⚖️

Fair disputes

Disagreements are escalated to L5 adjudicators. Transparent reasons. No silent ghosting, no opaque QA strikes.

Beyond the paycheck

Cash is the floor.
Compound value is the ceiling.

The Friday payout is real. So is the credential that goes on your résumé, the frontier-model access you'd otherwise pay for, and the equity in the field you're literally helping define.

Your EvalQA credential, on chain.

Every level you earn is publicly verifiable at eval.qa/r/your-handle. LinkedIn badge. Résumé bullet. Citable on grant applications. Portable to any future employer in the AI ecosystem.

  • Public profile page with κ history, specialties, contracts shipped
  • LinkedIn-importable badge + verifiable shareable URL
  • Optional on-chain attestation (EAS schema) for L3+ - anyone can verify, you can revoke
  • Listed in the EvalQA registry - businesses search and hire from it directly
Credential 🎖️

L1 - L5 EvalQA cert

Portable, verifiable, public. Listed in the EvalQA registry. LinkedIn badge included.

Model access 🧠

Frontier API credits

$200/mo workbench credits to Claude Opus, GPT-5, Gemini 3 for L2+. Pays for itself.

Worth $2,400/yr
Specialties 📜

Micro-certifications

Prompt engineering · Red-teaming · Rubric design · AILuminate safety. Earned from the work, not a test.

Authorship ✍️

Co-author published rubrics

L4+ raters who contribute to a public rubric are credited by name. Build a citable research footprint.

Early equity 🪙

Founding rater stake

First 1,000 L3+ raters receive a tokenized stake in the EvalQA upside. SAFE-equivalent, vested over 24 months.

Limited cohort
Career 🚀

Direct placement

L4+ raters get fast-tracked introductions to AI Eval Engineer roles at our customer companies. Six placements in 2026 so far.

Community 🎟️

EvalCon access

Annual gathering of the Eval Army. Sponsored attendance for L3+. Regional meetups in 12 cities.

Gear & compute

Hardware allowance

Top-decile raters get a yearly $500 gear/compute stipend. Branded swag drops quarterly for everyone.

The ladder

L1 to L5. Climb in months, not years.

Every eval you submit is calibrated against gold items. Sustain high agreement and you're invited to the next level.

L1
Trainee
Pass the calibration exam at κ ≥ 0.6
L2
Associate
200+ evals, κ ≥ 0.7 sustained
L4
Senior
Adjudicate disputes; mentor L1/L2
L5
Adjudicator
Set rubrics; gate certifications
Voices

Raters from 50+ countries already in.

"I'm a paralegal by day. Two evenings a week on EvalQA pays my rent. The work is genuinely interesting - I never feel like I'm wasting my brain."

MR
Maya R.L3 · Legal copilot specialty · Manila

"Onboarding was 27 minutes. I was on a paying contract by the next morning. After three months I'm L3 and earning more than my old contract QA job."

JD
Jakob D.L3 · Foundation model specialty · Berlin

"What I love: when I disagree with an LLM judge, my override actually changes the training signal. I can see my impact in the dashboard."

AR
Aditi R.L4 · Multilingual safety · Bengaluru
How to apply

Four steps. Thirty minutes.

Start now. We respond to passing exams within 48 hours.

Create your profile

Name, email, specialty tags. No résumé required.

~2 min

Take the L1 exam

20 gold-standard items across your specialties. Pass at κ ≥ 0.6 against our reference raters.

~25 min

Get matched

Contracts in your specialties appear in your feed within 48 hours. Apply with one click.

~48 hrs

Get paid Friday

Submit evals, earn, climb the ladder. Every Friday by direct deposit, PayPal, or Wise.

weekly
Create your profile
FAQ

Honest answers.

Do I need AI experience to join?

No. Most raters come from professional backgrounds - writing, code, medicine, law, language, design - and learn the eval workflow during onboarding. The L1 exam takes about 30 minutes.

How do I get paid?

Every Friday by direct deposit, PayPal, or Wise. Pay scales by certification level and task complexity. Training time is compensated. We handle the tax paperwork (W-9 / 1099 in the US, equivalents internationally).

What do I get besides cash?

Eight non-cash levers: (1) a public verifiable L1 - L5 credential that travels with you (LinkedIn badge, profile page, registry listing), (2) $200/month frontier-model API credits for L2+, (3) micro-certs in prompt engineering, red-teaming, rubric design, and AILuminate safety, (4) co-authorship on published rubrics for L4+, (5) tokenized equity in EvalQA for the first 1,000 L3+ raters, (6) direct fast-track to AI Eval Engineer roles at customer companies for L4+, (7) sponsored EvalCon attendance for L3+, (8) a $500/year hardware/compute stipend for top-decile performers. Cash is the floor, not the ceiling.

How many hours do I have to work?

Zero minimum. Work when you want, where you want. Most active raters do 5 - 20 hours per week. There are no shifts and no quotas - your feed of contracts is yours to pick from.

What if I fail the L1 exam?

You can retake it after seven days. Most failed attempts come from rushing - slow down, read the anchors, and you'll usually pass on the second try. We score blind, so a failed first attempt doesn't affect the retake.

How fast can I get from L1 to L5?

L2 in weeks, L3 in months for most raters. L4 and L5 require sustained high κ plus invitation. The fastest documented climb is L1→L5 in 11 months. The ladder is meritocratic - your evals do the talking.

Can I refer friends?

Yes - once you're L1+, you'll see a referral link in your dashboard. We pay $50 when a referral passes L1, $200 when they reach L3.

Stop scrolling. Start earning.

Thirty minutes from now you'll either be a certified L1 rater - or you'll be back here wondering "what if".

Create your profile