When to Hire Your First Eval Person is a critical component of evaluation design. This section explores the key principles, common pitfalls, and best practices for implementing when to hire your first eval person.
Core Principles
The foundation of when to hire your first eval person rests on several core principles that have been validated across organizations. First, clarity of purpose ensures that every evaluation decision serves a strategic goal. Second, consistency in methodology enables meaningful comparisons over time. Third, transparency in processes builds stakeholder trust.
When implementing when to hire your first eval person, organizations often discover that investing time upfront in design saves months later. A poorly designed eval creates confusion, consumes resources without producing actionable insights, and erodes stakeholder confidence.
Practical Implementation
Begin with a clear definition of success. What will this evaluation accomplish? Who will use the results? What decisions will be informed by the findings?
Next, establish baselines and standards. What constitutes good, acceptable, and poor performance? How will you measure progress? These benchmarks should be documented and communicated to all stakeholders.
Implementation requires careful planning. Timeline: How long will the evaluation take? Resources: Who will conduct it? Budget: What will it cost? Success metrics: How will you know you succeeded?
Common Challenges and Solutions
Organizations implementing when to hire your first eval person frequently encounter predictable challenges. Stakeholder disagreement about standards is common; resolve this through calibration sessions where stakeholders align on what "good" looks like. Resource constraints often emerge; address this by prioritizing the most critical evaluations. Quality drift occurs in long-running studies; combat this with regular re-calibration and consistency checks.
Advanced Techniques
Once you've mastered when to hire your first eval person basics, several advanced techniques improve results. Bayesian approaches incorporate prior knowledge and uncertainty. Multi-dimensional analysis breaks down complex judgments into component parts. Continuous evaluation adapts to changing conditions rather than using fixed criteria.
Integration with Organizational Workflow
when to hire your first eval person should integrate seamlessly with existing processes. Build eval into the product development cycle. Make results easily accessible to decision-makers. Create feedback loops where eval findings drive product improvements. Document lessons learned for future evals.
Scaling When to Hire Your First Eval Person
As organizations mature in evaluation, they scale from initial manual implementations to systematic, efficient processes. This scaling involves: (1) Building reusable infrastructure, (2) creating templates and playbooks, (3) training teams on best practices, (4) establishing standards that persist across projects.
The First Eval Hire
The First Eval Hire is a critical component of evaluation design. This section explores the key principles, common pitfalls, and best practices for implementing the first eval hire.
Core Principles
The foundation of the first eval hire rests on several core principles that have been validated across organizations. First, clarity of purpose ensures that every evaluation decision serves a strategic goal. Second, consistency in methodology enables meaningful comparisons over time. Third, transparency in processes builds stakeholder trust.
When implementing the first eval hire, organizations often discover that investing time upfront in design saves months later. A poorly designed eval creates confusion, consumes resources without producing actionable insights, and erodes stakeholder confidence.
Practical Implementation
Begin with a clear definition of success. What will this evaluation accomplish? Who will use the results? What decisions will be informed by the findings?
Next, establish baselines and standards. What constitutes good, acceptable, and poor performance? How will you measure progress? These benchmarks should be documented and communicated to all stakeholders.
Implementation requires careful planning. Timeline: How long will the evaluation take? Resources: Who will conduct it? Budget: What will it cost? Success metrics: How will you know you succeeded?
Common Challenges and Solutions
Organizations implementing the first eval hire frequently encounter predictable challenges. Stakeholder disagreement about standards is common; resolve this through calibration sessions where stakeholders align on what "good" looks like. Resource constraints often emerge; address this by prioritizing the most critical evaluations. Quality drift occurs in long-running studies; combat this with regular re-calibration and consistency checks.
Advanced Techniques
Once you've mastered the first eval hire basics, several advanced techniques improve results. Bayesian approaches incorporate prior knowledge and uncertainty. Multi-dimensional analysis breaks down complex judgments into component parts. Continuous evaluation adapts to changing conditions rather than using fixed criteria.
Integration with Organizational Workflow
the first eval hire should integrate seamlessly with existing processes. Build eval into the product development cycle. Make results easily accessible to decision-makers. Create feedback loops where eval findings drive product improvements. Document lessons learned for future evals.
Scaling The First Eval Hire
As organizations mature in evaluation, they scale from initial manual implementations to systematic, efficient processes. This scaling involves: (1) Building reusable infrastructure, (2) creating templates and playbooks, (3) training teams on best practices, (4) establishing standards that persist across projects.
Team Structure Options
Team Structure Options is a critical component of evaluation design. This section explores the key principles, common pitfalls, and best practices for implementing team structure options.
Core Principles
The foundation of team structure options rests on several core principles that have been validated across organizations. First, clarity of purpose ensures that every evaluation decision serves a strategic goal. Second, consistency in methodology enables meaningful comparisons over time. Third, transparency in processes builds stakeholder trust.
When implementing team structure options, organizations often discover that investing time upfront in design saves months later. A poorly designed eval creates confusion, consumes resources without producing actionable insights, and erodes stakeholder confidence.
Practical Implementation
Begin with a clear definition of success. What will this evaluation accomplish? Who will use the results? What decisions will be informed by the findings?
Next, establish baselines and standards. What constitutes good, acceptable, and poor performance? How will you measure progress? These benchmarks should be documented and communicated to all stakeholders.
Implementation requires careful planning. Timeline: How long will the evaluation take? Resources: Who will conduct it? Budget: What will it cost? Success metrics: How will you know you succeeded?
Common Challenges and Solutions
Organizations implementing team structure options frequently encounter predictable challenges. Stakeholder disagreement about standards is common; resolve this through calibration sessions where stakeholders align on what "good" looks like. Resource constraints often emerge; address this by prioritizing the most critical evaluations. Quality drift occurs in long-running studies; combat this with regular re-calibration and consistency checks.
Advanced Techniques
Once you've mastered team structure options basics, several advanced techniques improve results. Bayesian approaches incorporate prior knowledge and uncertainty. Multi-dimensional analysis breaks down complex judgments into component parts. Continuous evaluation adapts to changing conditions rather than using fixed criteria.
Integration with Organizational Workflow
team structure options should integrate seamlessly with existing processes. Build eval into the product development cycle. Make results easily accessible to decision-makers. Create feedback loops where eval findings drive product improvements. Document lessons learned for future evals.
Scaling Team Structure Options
As organizations mature in evaluation, they scale from initial manual implementations to systematic, efficient processes. This scaling involves: (1) Building reusable infrastructure, (2) creating templates and playbooks, (3) training teams on best practices, (4) establishing standards that persist across projects.
Roles on the Eval Team
Roles on the Eval Team is a critical component of evaluation design. This section explores the key principles, common pitfalls, and best practices for implementing roles on the eval team.
Core Principles
The foundation of roles on the eval team rests on several core principles that have been validated across organizations. First, clarity of purpose ensures that every evaluation decision serves a strategic goal. Second, consistency in methodology enables meaningful comparisons over time. Third, transparency in processes builds stakeholder trust.
When implementing roles on the eval team, organizations often discover that investing time upfront in design saves months later. A poorly designed eval creates confusion, consumes resources without producing actionable insights, and erodes stakeholder confidence.
Practical Implementation
Begin with a clear definition of success. What will this evaluation accomplish? Who will use the results? What decisions will be informed by the findings?
Next, establish baselines and standards. What constitutes good, acceptable, and poor performance? How will you measure progress? These benchmarks should be documented and communicated to all stakeholders.
Implementation requires careful planning. Timeline: How long will the evaluation take? Resources: Who will conduct it? Budget: What will it cost? Success metrics: How will you know you succeeded?
Common Challenges and Solutions
Organizations implementing roles on the eval team frequently encounter predictable challenges. Stakeholder disagreement about standards is common; resolve this through calibration sessions where stakeholders align on what "good" looks like. Resource constraints often emerge; address this by prioritizing the most critical evaluations. Quality drift occurs in long-running studies; combat this with regular re-calibration and consistency checks.
Advanced Techniques
Once you've mastered roles on the eval team basics, several advanced techniques improve results. Bayesian approaches incorporate prior knowledge and uncertainty. Multi-dimensional analysis breaks down complex judgments into component parts. Continuous evaluation adapts to changing conditions rather than using fixed criteria.
Integration with Organizational Workflow
roles on the eval team should integrate seamlessly with existing processes. Build eval into the product development cycle. Make results easily accessible to decision-makers. Create feedback loops where eval findings drive product improvements. Document lessons learned for future evals.
Scaling Roles on the Eval Team
As organizations mature in evaluation, they scale from initial manual implementations to systematic, efficient processes. This scaling involves: (1) Building reusable infrastructure, (2) creating templates and playbooks, (3) training teams on best practices, (4) establishing standards that persist across projects.
Recruiting Eval Talent
Recruiting Eval Talent is a critical component of evaluation design. This section explores the key principles, common pitfalls, and best practices for implementing recruiting eval talent.
Core Principles
The foundation of recruiting eval talent rests on several core principles that have been validated across organizations. First, clarity of purpose ensures that every evaluation decision serves a strategic goal. Second, consistency in methodology enables meaningful comparisons over time. Third, transparency in processes builds stakeholder trust.
When implementing recruiting eval talent, organizations often discover that investing time upfront in design saves months later. A poorly designed eval creates confusion, consumes resources without producing actionable insights, and erodes stakeholder confidence.
Practical Implementation
Begin with a clear definition of success. What will this evaluation accomplish? Who will use the results? What decisions will be informed by the findings?
Next, establish baselines and standards. What constitutes good, acceptable, and poor performance? How will you measure progress? These benchmarks should be documented and communicated to all stakeholders.
Implementation requires careful planning. Timeline: How long will the evaluation take? Resources: Who will conduct it? Budget: What will it cost? Success metrics: How will you know you succeeded?
Common Challenges and Solutions
Organizations implementing recruiting eval talent frequently encounter predictable challenges. Stakeholder disagreement about standards is common; resolve this through calibration sessions where stakeholders align on what "good" looks like. Resource constraints often emerge; address this by prioritizing the most critical evaluations. Quality drift occurs in long-running studies; combat this with regular re-calibration and consistency checks.
Advanced Techniques
Once you've mastered recruiting eval talent basics, several advanced techniques improve results. Bayesian approaches incorporate prior knowledge and uncertainty. Multi-dimensional analysis breaks down complex judgments into component parts. Continuous evaluation adapts to changing conditions rather than using fixed criteria.
Integration with Organizational Workflow
recruiting eval talent should integrate seamlessly with existing processes. Build eval into the product development cycle. Make results easily accessible to decision-makers. Create feedback loops where eval findings drive product improvements. Document lessons learned for future evals.
Scaling Recruiting Eval Talent
As organizations mature in evaluation, they scale from initial manual implementations to systematic, efficient processes. This scaling involves: (1) Building reusable infrastructure, (2) creating templates and playbooks, (3) training teams on best practices, (4) establishing standards that persist across projects.
Onboarding Eval Team Members
Onboarding Eval Team Members is a critical component of evaluation design. This section explores the key principles, common pitfalls, and best practices for implementing onboarding eval team members.
Core Principles
The foundation of onboarding eval team members rests on several core principles that have been validated across organizations. First, clarity of purpose ensures that every evaluation decision serves a strategic goal. Second, consistency in methodology enables meaningful comparisons over time. Third, transparency in processes builds stakeholder trust.
When implementing onboarding eval team members, organizations often discover that investing time upfront in design saves months later. A poorly designed eval creates confusion, consumes resources without producing actionable insights, and erodes stakeholder confidence.
Practical Implementation
Begin with a clear definition of success. What will this evaluation accomplish? Who will use the results? What decisions will be informed by the findings?
Next, establish baselines and standards. What constitutes good, acceptable, and poor performance? How will you measure progress? These benchmarks should be documented and communicated to all stakeholders.
Implementation requires careful planning. Timeline: How long will the evaluation take? Resources: Who will conduct it? Budget: What will it cost? Success metrics: How will you know you succeeded?
Common Challenges and Solutions
Organizations implementing onboarding eval team members frequently encounter predictable challenges. Stakeholder disagreement about standards is common; resolve this through calibration sessions where stakeholders align on what "good" looks like. Resource constraints often emerge; address this by prioritizing the most critical evaluations. Quality drift occurs in long-running studies; combat this with regular re-calibration and consistency checks.
Advanced Techniques
Once you've mastered onboarding eval team members basics, several advanced techniques improve results. Bayesian approaches incorporate prior knowledge and uncertainty. Multi-dimensional analysis breaks down complex judgments into component parts. Continuous evaluation adapts to changing conditions rather than using fixed criteria.
Integration with Organizational Workflow
onboarding eval team members should integrate seamlessly with existing processes. Build eval into the product development cycle. Make results easily accessible to decision-makers. Create feedback loops where eval findings drive product improvements. Document lessons learned for future evals.
Scaling Onboarding Eval Team Members
As organizations mature in evaluation, they scale from initial manual implementations to systematic, efficient processes. This scaling involves: (1) Building reusable infrastructure, (2) creating templates and playbooks, (3) training teams on best practices, (4) establishing standards that persist across projects.
Team Culture for Eval
Team Culture for Eval is a critical component of evaluation design. This section explores the key principles, common pitfalls, and best practices for implementing team culture for eval.
Core Principles
The foundation of team culture for eval rests on several core principles that have been validated across organizations. First, clarity of purpose ensures that every evaluation decision serves a strategic goal. Second, consistency in methodology enables meaningful comparisons over time. Third, transparency in processes builds stakeholder trust.
When implementing team culture for eval, organizations often discover that investing time upfront in design saves months later. A poorly designed eval creates confusion, consumes resources without producing actionable insights, and erodes stakeholder confidence.
Practical Implementation
Begin with a clear definition of success. What will this evaluation accomplish? Who will use the results? What decisions will be informed by the findings?
Next, establish baselines and standards. What constitutes good, acceptable, and poor performance? How will you measure progress? These benchmarks should be documented and communicated to all stakeholders.
Implementation requires careful planning. Timeline: How long will the evaluation take? Resources: Who will conduct it? Budget: What will it cost? Success metrics: How will you know you succeeded?
Common Challenges and Solutions
Organizations implementing team culture for eval frequently encounter predictable challenges. Stakeholder disagreement about standards is common; resolve this through calibration sessions where stakeholders align on what "good" looks like. Resource constraints often emerge; address this by prioritizing the most critical evaluations. Quality drift occurs in long-running studies; combat this with regular re-calibration and consistency checks.
Advanced Techniques
Once you've mastered team culture for eval basics, several advanced techniques improve results. Bayesian approaches incorporate prior knowledge and uncertainty. Multi-dimensional analysis breaks down complex judgments into component parts. Continuous evaluation adapts to changing conditions rather than using fixed criteria.
Integration with Organizational Workflow
team culture for eval should integrate seamlessly with existing processes. Build eval into the product development cycle. Make results easily accessible to decision-makers. Create feedback loops where eval findings drive product improvements. Document lessons learned for future evals.
Scaling Team Culture for Eval
As organizations mature in evaluation, they scale from initial manual implementations to systematic, efficient processes. This scaling involves: (1) Building reusable infrastructure, (2) creating templates and playbooks, (3) training teams on best practices, (4) establishing standards that persist across projects.
Scaling from 1 to 50
Scaling from 1 to 50 is a critical component of evaluation design. This section explores the key principles, common pitfalls, and best practices for implementing scaling from 1 to 50.
Core Principles
The foundation of scaling from 1 to 50 rests on several core principles that have been validated across organizations. First, clarity of purpose ensures that every evaluation decision serves a strategic goal. Second, consistency in methodology enables meaningful comparisons over time. Third, transparency in processes builds stakeholder trust.
When implementing scaling from 1 to 50, organizations often discover that investing time upfront in design saves months later. A poorly designed eval creates confusion, consumes resources without producing actionable insights, and erodes stakeholder confidence.
Practical Implementation
Begin with a clear definition of success. What will this evaluation accomplish? Who will use the results? What decisions will be informed by the findings?
Next, establish baselines and standards. What constitutes good, acceptable, and poor performance? How will you measure progress? These benchmarks should be documented and communicated to all stakeholders.
Implementation requires careful planning. Timeline: How long will the evaluation take? Resources: Who will conduct it? Budget: What will it cost? Success metrics: How will you know you succeeded?
Common Challenges and Solutions
Organizations implementing scaling from 1 to 50 frequently encounter predictable challenges. Stakeholder disagreement about standards is common; resolve this through calibration sessions where stakeholders align on what "good" looks like. Resource constraints often emerge; address this by prioritizing the most critical evaluations. Quality drift occurs in long-running studies; combat this with regular re-calibration and consistency checks.
Advanced Techniques
Once you've mastered scaling from 1 to 50 basics, several advanced techniques improve results. Bayesian approaches incorporate prior knowledge and uncertainty. Multi-dimensional analysis breaks down complex judgments into component parts. Continuous evaluation adapts to changing conditions rather than using fixed criteria.
Integration with Organizational Workflow
scaling from 1 to 50 should integrate seamlessly with existing processes. Build eval into the product development cycle. Make results easily accessible to decision-makers. Create feedback loops where eval findings drive product improvements. Document lessons learned for future evals.
Scaling Scaling from 1 to 50
As organizations mature in evaluation, they scale from initial manual implementations to systematic, efficient processes. This scaling involves: (1) Building reusable infrastructure, (2) creating templates and playbooks, (3) training teams on best practices, (4) establishing standards that persist across projects.
When NOT to Hire (And What to Do Instead)
Not every organization should hire eval specialists. Small teams: your ML engineer should do eval. Medium teams (10-50 engineers): hire 1-2 eval specialists to build infrastructure + guide engineers. Large teams (100+): hire a full eval team. Before hiring, ask: What problem are we solving? Not hiring often, the problem is infrastructure/tooling, not headcount. Sometimes you need a tool more than a person. Invest in good eval tooling and education before hiring specialists.
Interview Questions for Eval Hires
When hiring eval engineers, ask: "Tell me about a time you found a bug in a metric. How did you debug it?" (Tests their debugging and statistical thinking.) "How would you evaluate a system where ground truth is ambiguous?" (Tests their thinking about hard problems.) "You discover that a model has 20% lower accuracy on one subgroup. What do you do?" (Tests their responsibility to fairness and stakeholder communication.) "Describe an eval you're proud of. What made it good?" (Tests their values and vision for evaluation work.) Avoid: asking only technical questions, or only system design questions. Eval engineers need technical skills, but also communication, integrity, and judgment.
Onboarding and Ramp-Up Timeline
Month 1: Learn the systems. Understand the product, the infrastructure, the current evals. Read code and docs. Week 2-3: Run an existing eval end-to-end. Understand the flow. Month 2: Design a new eval. Work with experienced person to design, run, and present results. Month 3: Lead an eval independently. Still reviewed, but you're the driver. Month 4-6: Contribute to infrastructure improvements. Month 6+: Full productivity.
Career Development and Promotion Paths
Eval engineers typically progress: IC1 (entry) → IC2 (mid-level) → IC3 (senior) → IC4 (staff). Each level should have clear expectations and growth opportunities. IC1 focuses on executing evals well. IC2 focuses on designing evals and infrastructure improvements. IC3 focuses on strategy and mentorship. IC4 focuses on organizational impact. If someone hits IC3 and doesn't want to go to IC4 (which involves more politics and less technical depth), they should stay at IC3 and become world-class there. Not everyone should be a manager; not everyone should go to C-level. Great organizations have strong non-manager career paths.
Building Team Culture Around Quality
The best eval teams have a culture where: (1) Bad news is welcome (finding problems is good, not shameful), (2) rigor is valued (taking time to do it right matters), (3) collaboration is normal (team members help each other), (4) experimentation is encouraged (trying new eval approaches is OK, failure is learning), (5) perspective-taking is practiced (you care about how other teams use your evals and try to make it easy for them).
Key Takeaways
Clarity is essential: Each section of this topic requires clear thinking and communication.
Start with foundations: Master basics before advancing to complex implementations.
Iterate and improve: Evaluation is not a one-time activity; continuously refine your approach.
Involve stakeholders: Different perspectives improve evaluation quality and adoption.
Document everything: Clear documentation enables scaling and institutional knowledge transfer.
Measure impact: Track whether evaluations drive the decisions and improvements you expect.
Build Better Evaluations
Mastering evaluation methodology takes practice. Start with fundamentals, scale incrementally, and continuously learn from results.
Eval engineers typically earn: IC2 = $160-200K + equity, IC3 = $200-280K + equity, IC4 = $280-380K + equity (at companies >$1B valuation). Eval is in high demand relative to supply, so compensation competitive with ML engineering. Pro tip: eval engineering backgrounds might have lower salaries at previous employers (if they were in academia or small companies). When hiring, look at market rate for the current role, not previous salary (which can perpetuate underpayment). Consider: Is your eval team underpaid compared to ML teams? If yes, fix it—eval talent will leave for better pay.
Building a World-Class Eval Function
As your eval team matures, push for: (1) Eval published papers/blog posts (establish reputation), (2) open source tools (give back to community, hire through GitHub), (3) industry conference talks (speak at NeurIPS, FAccT, ACL), (4) partnerships with universities (research collaborations), (5) vendor relationships (shape tool development). These activities build your organization's brand in eval and make it easier to hire and retain strong talent.
Eval Team Scaling Patterns
How teams scale: Size 1 (solo practitioner): You do all evals, build infrastructure, set standards, evangelism. Very hands-on. Size 3-5: You start specializing. One person focuses on infrastructure, one on running evals, one on research. Size 10-15: You have managers (eval eng manager, research manager). Specialization increases. You run evals across 3-5 product areas. Size 25+: You have multiple teams, possibly spread across locations. Now you're managing managers. You focus less on technical details, more on strategy and organization. Size 50+: You're a VP or director-level leader managing a substantial organization. Very little hands-on eval work. Lots of strategy, hiring, budget, politics.
Technical vs. Management Career Paths
Not everyone should become a manager. The best technical leaders often specialize: they become the world expert in LLM eval, or fairness testing, or annotation methodology. They influence without managing. Compensation can be equivalent to managers. The career path matters: IC3 → IC4 (distinguished engineer) vs. IC3 → Manager → Director. Both are valuable. A world-class IC4 can drive more impact than an average Director. Organizations that value and pay both paths appropriately attract the best people.
Successful Eval Team Patterns
Common patterns of successful eval teams: (1) Central CoE (center of excellence) model: One team serves the whole company. Works for: organizations where eval is mature and standardized. Fails for: organizations with very diverse eval needs. (2) Embedded model: Eval engineers embedded in product teams. Works for: fast-moving organizations where eval needs are heterogeneous. Fails for: standardization and knowledge sharing. (3) Hub-and-spoke: Central team + embedded specialists. Hybrid approach. Requires more coordination but captures benefits of both. (4) Federated: Multiple independent teams with loose coordination. Works for: large decentralized organizations (e.g., multiple divisions). Requires strong standards to stay aligned.
Common Hiring Mistakes for Eval Teams
Mistake 1: Hiring ML engineers and expecting them to do eval. ML engineers are trained to optimize metrics, not critically examine them. Eval engineers need different mindset. Mistake 2: Hiring only quants. Eval also requires domain expertise, communication, judgment. Mistake 3: Hiring only specialists. You need some people who know eval broadly, not just specialists in one domain. Mistake 4: Not checking cultural fit. Eval work requires integrity (willing to report bad news), rigor (willing to be thorough), and collaboration. Hire for these traits. Mistake 5: Underpaying eval engineers. Eval is in-demand; if you underpay, good people leave. Make sure comp is competitive with general ML.
Retention and Development of Eval Talent
Eval engineering is still relatively new. Talent is hard to find. Once hired, invest in retention: (1) Clear career path (IC1 → IC5 progression with defined expectations). (2) Meaningful work (evals that matter, findings that drive decisions). (3) Learning opportunities (conferences, training, time for research). (4) Competitive comp (regularly benchmark and adjust). (5) Psychological safety (bad news is welcome, intellectual honesty is valued). (6) Community (connection with other eval engineers inside and outside company). Organizations that do this well keep their eval talent for years. Organizations that don't see high turnover.
Growing Your Eval Function
From 1 to 3 to 10: Scaling Your Eval Team
Size 1: You're generalist. You do all evals, build infrastructure, set strategy. Bottleneck everything. Size 3: You need specialist. One handles eval research (designing new metrics). One handles infrastructure (databases, platforms). You focus on strategy and key evals. Efficiency increases. Size 5: Add eval program manager (coordinates evals across teams). Add QA specialist (quality oversight). Processes become more formal. Size 10: Add a manager. You can't manage 10 people directly. Manager handles IC engineers. Grow from there. Key principle: structure should match work. Don't hire people you don't have work for.
Compensation Philosophy for Your Team
How much should eval engineers make? (1) Market rate. Look at ML engineer salaries in your market. Eval engineers should be similar. (2) Adjust for supply. Eval talent is scarcer. Might pay 10-20% premium. (3) Adjust for role. Senior roles pay more. IC4 pays more than IC2. (4) Equity. Startups should offer generous equity (0.05-0.1%+). Public companies might offer less equity, more base salary. (5) Bonus. Performance-based bonus for hitting team goals. (6) Benefits. Healthcare, 401k, PTO, professional development. (7) Non-monetary. Flexible hours, remote work, learning opportunities. Whole package matters.
Building Diverse Eval Teams
Eval teams benefit from diversity. Different backgrounds bring different perspectives. A team of people who all worked at the same company will have blind spots. A team with people from academia, industry, different companies, different countries, different disciplines will be more thoughtful. How to build diversity: (1) Deliberate recruiting. Don't just hire your network. (2) Interviews designed to surface different perspectives. (3) Create psychological safety so people feel comfortable disagreeing. (4) Value different backgrounds. "What did you do at your previous company?" Learn from it. (5) Rotate projects so people learn from each other. Diverse teams make better eval decisions and catch more problems.