Do I need AI experience to join the Eval Army?

No. Most raters come from professional backgrounds (writing, code, medicine, law, language) and learn the eval workflow during onboarding. The L1 certification exam takes about 30 minutes and shows us how you align with our gold-standard raters.

Every Friday by direct deposit, PayPal, or Wise. Pay scales by certification level (L1 lowest, L5 highest) and task complexity. Training time is also compensated.

What in-kind benefits do raters get beyond cash?

Eight non-cash benefits: (1) a publicly verifiable L1 - L5 EvalQA credential portable to any future employer, (2) $200/month frontier-model API credits to Claude, GPT, and Gemini for L2+, (3) micro-certifications in prompt engineering, red-teaming, rubric design, and AILuminate safety earned from the work, (4) co-authorship credit on published rubrics for L4+ raters, (5) tokenized founding-cohort equity for the first 1,000 L3+ raters, (6) direct placement into AI Eval Engineer roles at customer companies for L4+, (7) sponsored attendance at the annual EvalCon for L3+, (8) a $500/year hardware and compute stipend for top performers.

How long does onboarding take?

Under 30 minutes - profile creation plus the L1 calibration exam. Pass at κ ≥ 0.6 against gold raters and you're matched to contracts within 48 hours.

What if my certification fails?

You can retake the L1 exam after 7 days. Most failed attempts are because a rater rushed through the rubric - slow down, read the anchors, and you'll usually pass on the second try.

Can I climb from L1 to L5?

Yes. Every eval you submit is scored against gold items. Sustain high agreement and you're invited to take the L2 exam, then L3, etc. L5 adjudicators set the rubrics and resolve disputes.

Join the Eval Army - Get certified, evaluate AI, earn weekly

Q: How many hours do I have to work?

Zero minimum. Work when you want, where you want, as much or as little as fits your life. Most active raters do 5 - 20 hours per week.

Live contracts · refreshed hourly

Real work, real pay rates.

Open contracts right now from our customer roster. Pay range shown is per hour. Apply with one click once you're past L1.

Foundation model red-teaming

$45 - $70/hr

42 hired this weekApply →

Coding agent transcript review

$55 - $90/hr

28 hired this weekApply →

Medical RAG faithfulness audit

$80 - $120/hr

11 hired this weekApply →

Legal copilot citation accuracy

$95 - $130/hr

9 hired this weekApply →

Multilingual safety eval (FR/DE/HI)

$30 - $55/hr

67 hired this weekApply →

Robotics task-success review

$50 - $85/hr

14 hired this weekApply →

Customer-support copilot QA

$25 - $40/hr

119 hired this weekApply →

Image generation aesthetic rating

$20 - $35/hr

87 hired this weekApply →

What you do

Three kinds of work. Pick yours.

Every contract maps to one of these. Your specialty tags decide which contracts show up in your feed.

A

Score AI output

Read what the model produced, click the right score on a behaviorally-anchored 1 - 5 scale, write a one-line rationale. Ten seconds to three minutes per item depending on stakes.

B

Confirm AI judges

Hybrid mode: an LLM judge pre-fills the eval, you confirm or override. Faster than scoring from scratch - and where most of the L2+ volume lives.

C

Write rubrics & gold

For L4 and L5. Define the anchors, write gold-standard reference items, adjudicate disputes between other raters. The work that shapes the field.

Why join

Built for the way you actually work.

$

Weekly Friday payouts

Pay scales with your level and task complexity. Training time is compensated. Direct deposit, PayPal, or Wise.

⏱

Zero minimum hours

Work when you want, where you want. Most active raters do 5 - 20 hours per week. No quota. No shifts.

📈

Portable credential

L1 - L5 certification is public, portable, and verifiable. Listed on your profile, citable on your résumé.

🌍

Anywhere on earth

50+ countries. Async-first. Tax-aware payouts. We handle the W-9 / 1099 / international equivalents.

🧠

Learn while you earn

Free access to frontier model APIs through the workbench. Your eval skill compounds into prompt-engineering, red-teaming, and rubric-design fluency.

⚖️

Fair disputes

Disagreements are escalated to L5 adjudicators. Transparent reasons. No silent ghosting, no opaque QA strikes.

Beyond the paycheck

Cash is the floor.
Compound value is the ceiling.

The Friday payout is real. So is the credential that goes on your résumé, the frontier-model access you'd otherwise pay for, and the equity in the field you're literally helping define.

Your EvalQA credential, on chain.

Every level you earn is publicly verifiable at eval.qa/r/your-handle. LinkedIn badge. Résumé bullet. Citable on grant applications. Portable to any future employer in the AI ecosystem.

Public profile page with κ history, specialties, contracts shipped
LinkedIn-importable badge + verifiable shareable URL
Optional on-chain attestation (EAS schema) for L3+ - anyone can verify, you can revoke
Listed in the EvalQA registry - businesses search and hire from it directly

EQ

L4

Senior

Jane Doe

Foundation · RAG · SaaS · since Mar 2026

κ 0.84 · 1,842 evals eval.qa/r/jane-doe

Credential 🎖️

L1 - L5 EvalQA cert

Portable, verifiable, public. Listed in the EvalQA registry. LinkedIn badge included.

Model access 🧠

Frontier API credits

$200/mo workbench credits to Claude Opus, GPT-5, Gemini 3 for L2+. Pays for itself.

Worth $2,400/yr

Specialties 📜

Micro-certifications

Prompt engineering · Red-teaming · Rubric design · AILuminate safety. Earned from the work, not a test.

Authorship ✍️

Co-author published rubrics

L4+ raters who contribute to a public rubric are credited by name. Build a citable research footprint.

Early equity 🪙

Founding rater stake

First 1,000 L3+ raters receive a tokenized stake in the EvalQA upside. SAFE-equivalent, vested over 24 months.

Limited cohort

Career 🚀

Direct placement

L4+ raters get fast-tracked introductions to AI Eval Engineer roles at our customer companies. Six placements in 2026 so far.

Community 🎟️

EvalCon access

Annual gathering of the Eval Army. Sponsored attendance for L3+. Regional meetups in 12 cities.

Gear & compute ⚡

Hardware allowance

Top-decile raters get a yearly $500 gear/compute stipend. Branded swag drops quarterly for everyone.

The ladder

L1 to L5. Climb in months, not years.

Every eval you submit is calibrated against gold items. Sustain high agreement and you're invited to the next level.

L1

Trainee

Pass the calibration exam at κ ≥ 0.6

L2

Associate

200+ evals, κ ≥ 0.7 sustained

L3

Certified

Most contracts gate here. Pay ↑.

L4

Senior

Adjudicate disputes; mentor L1/L2

L5

Adjudicator

Set rubrics; gate certifications

Voices

Raters from 50+ countries already in.

"I'm a paralegal by day. Two evenings a week on EvalQA pays my rent. The work is genuinely interesting - I never feel like I'm wasting my brain."

MR

Maya R.L3 · Legal copilot specialty · Manila

"Onboarding was 27 minutes. I was on a paying contract by the next morning. After three months I'm L3 and earning more than my old contract QA job."

JD

Jakob D.L3 · Foundation model specialty · Berlin

"What I love: when I disagree with an LLM judge, my override actually changes the training signal. I can see my impact in the dashboard."

AR

Aditi R.L4 · Multilingual safety · Bengaluru

How to apply

Four steps. Thirty minutes.

Start now. We respond to passing exams within 48 hours.

Create your profile

Name, email, specialty tags. No résumé required.

~2 min

Take the L1 exam

20 gold-standard items across your specialties. Pass at κ ≥ 0.6 against our reference raters.

~25 min

Get matched

Contracts in your specialties appear in your feed within 48 hours. Apply with one click.

~48 hrs

Get paid Friday

Submit evals, earn, climb the ladder. Every Friday by direct deposit, PayPal, or Wise.

weekly

Create your profile

FAQ

Honest answers.

Do I need AI experience to join?

No. Most raters come from professional backgrounds - writing, code, medicine, law, language, design - and learn the eval workflow during onboarding. The L1 exam takes about 30 minutes.

How do I get paid?

Every Friday by direct deposit, PayPal, or Wise. Pay scales by certification level and task complexity. Training time is compensated. We handle the tax paperwork (W-9 / 1099 in the US, equivalents internationally).

What do I get besides cash?

Eight non-cash levers: (1) a public verifiable L1 - L5 credential that travels with you (LinkedIn badge, profile page, registry listing), (2) $200/month frontier-model API credits for L2+, (3) micro-certs in prompt engineering, red-teaming, rubric design, and AILuminate safety, (4) co-authorship on published rubrics for L4+, (5) tokenized equity in EvalQA for the first 1,000 L3+ raters, (6) direct fast-track to AI Eval Engineer roles at customer companies for L4+, (7) sponsored EvalCon attendance for L3+, (8) a $500/year hardware/compute stipend for top-decile performers. Cash is the floor, not the ceiling.

How many hours do I have to work?

Zero minimum. Work when you want, where you want. Most active raters do 5 - 20 hours per week. There are no shifts and no quotas - your feed of contracts is yours to pick from.

What if I fail the L1 exam?

You can retake it after seven days. Most failed attempts come from rushing - slow down, read the anchors, and you'll usually pass on the second try. We score blind, so a failed first attempt doesn't affect the retake.

How fast can I get from L1 to L5?

L2 in weeks, L3 in months for most raters. L4 and L5 require sustained high κ plus invitation. The fastest documented climb is L1→L5 in 11 months. The ladder is meritocratic - your evals do the talking.

Can I refer friends?

Yes - once you're L1+, you'll see a referral link in your dashboard. We pay $50 when a referral passes L1, $200 when they reach L3.

Become the expert
AI learns from.

Real work, real pay rates.

Three kinds of work. Pick yours.

Score AI output

Confirm AI judges

Write rubrics & gold

Built for the way you actually work.

Weekly Friday payouts

Zero minimum hours

Portable credential

Anywhere on earth

Learn while you earn

Fair disputes

Cash is the floor.
Compound value is the ceiling.

Your EvalQA credential, on chain.

L1 - L5 EvalQA cert

Frontier API credits

Micro-certifications

Co-author published rubrics

Founding rater stake

Direct placement

EvalCon access

Hardware allowance

L1 to L5. Climb in months, not years.

Raters from 50+ countries already in.

Four steps. Thirty minutes.

Create your profile

Take the L1 exam

Get matched

Get paid Friday

Honest answers.

Stop scrolling. Start earning.

Become the expertAI learns from.

Real work, real pay rates.

Three kinds of work. Pick yours.

Score AI output

Confirm AI judges

Write rubrics & gold

Built for the way you actually work.

Weekly Friday payouts

Zero minimum hours

Portable credential

Anywhere on earth

Learn while you earn

Fair disputes

Cash is the floor.Compound value is the ceiling.

Your EvalQA credential, on chain.

L1 - L5 EvalQA cert

Frontier API credits

Micro-certifications

Co-author published rubrics

Founding rater stake

Direct placement

EvalCon access

Hardware allowance

L1 to L5. Climb in months, not years.

Raters from 50+ countries already in.

Four steps. Thirty minutes.

Create your profile

Take the L1 exam

Get matched

Get paid Friday

Honest answers.

Stop scrolling. Start earning.

Become the expert
AI learns from.

Cash is the floor.
Compound value is the ceiling.