Table of Contents

AI is Backed by Humans

TL;DR: Despite all the talk about "artificial" intelligence, the biggest names in AI are spending billions of dollars on human labor. 

From the $500M+ that Mercor is making connecting PhDs to AI labs, to Scale AI's $2B+ in revenue from human data workers, to Surge AI crossing $1.4B with just 121 employees managing human annotators—the AI revolution is actually powered by an invisible army of human experts doing the grunt work that makes these "intelligent" systems possible.

About 80% of the explosive revenue growth we’ve seen between the above companies is coming from STAFFING revenue. The AI industry has a dirty little secret, and it's hiding in plain sight. 

While Silicon Valley VCs throw around terms like "artificial general intelligence" and "autonomous systems," the reality is far more human than anyone wants to admit. Behind every breakthrough language model, every impressive AI assistant, and every mind-blowing demonstration lies an army of human workers—annotating data, providing feedback, and essentially teaching machines how to think.

The numbers tell a story that the AI hype machine doesn't want you to hear: the companies making the most money in AI aren't the ones building the flashy chatbots—they're the ones managing the human workforce that makes those chatbots possible.

The Human Infrastructure Companies Quietly Building AI Empires

Mercor: From College Recruiting to AI Gold Rush

Mercor's story reads like a Silicon Valley fever dream. Founded by three 21-year-old college dropouts in 2023, the company started as a recruiting platform for college students. Fast forward to September 2025, and Mercor is reportedly approaching a $450 million annual run rate with investors eyeing a $10+ billion valuation.

What changed? AI labs discovered that Mercor was sitting on exactly what they desperately needed: access to thousands of domain experts with advanced degrees.

The Numbers Behind Mercor's Explosion:

Here's what Mercor actually does for AI companies: they connect graduate-level experts—physics PhDs, biology researchers, legal experts, medical doctors—with AI companies that need specialized knowledge to train their models. When OpenAI needs someone who understands quantum mechanics to help improve GPT's physics reasoning, or when Anthropic needs constitutional law experts to help Claude understand legal nuances, they turn to Mercor.

Mercor's business model is brutally simple: charge a 30% fee on every expert they place. With AI companies paying premium rates for specialized talent (often $50-200+ per hour), Mercor's take per placement is substantial. The company has been profitable since early 2025, generating $1M+ in profit just in February alone.

The kicker? CEO Brendan Foody recently posted that their ARR is actually higher than $450 million—suggesting they're on track to hit the $500 million milestone faster than almost any enterprise software company in history.

Scale AI: The $29 Billion Human-Powered Machine

Scale AI tells perhaps the most revealing story about AI's human dependency. Founded in 2016, Scale positioned itself as the infrastructure layer for AI training data—but what they really built was the world's most sophisticated human workforce management system.

Scale AI's Staggering Numbers:

Here's what's wild about Scale: Meta just paid $14.3 billion for a 49% stake in what is essentially a human resources company. Think about that for a moment. The company that owns Facebook, Instagram, and WhatsApp—with all their technical expertise—decided they needed to pay nearly $15 billion to access Scale's network of human data workers.

What Scale's Army Actually Does:

  • Data Labeling: Humans look at millions of images, videos, and text samples to teach AI what's what
  • RLHF (Reinforcement Learning from Human Feedback): Humans rate AI responses to teach models what "good" looks like
  • Specialized Tasks: Doctors review medical AI outputs, lawyers check legal reasoning, scientists verify technical content

Scale operates massive facilities in Southeast Asia and Africa through their Remotasks subsidiary, employing tens of thousands of workers who spend their days training tomorrow's AI systems. They've built the McDonald's of AI training—standardized, scalable human intelligence that AI companies can't replicate internally.

The Meta acquisition reveals the secret: Scale's real value isn't their technology—it's their ability to coordinate hundreds of thousands of humans to improve AI systems at massive scale.

Surge AI: The Stealth $25 Billion Giant

While everyone was watching OpenAI and Anthropic, Surge AI quietly built the most profitable human intelligence operation in AI history. Founded in 2020 by former Google and Meta engineer Edwin Chen, Surge took a different approach: bootstrap profitability from day one.

Surge AI's Incredible Economics:

This might be the most impressive business in all of AI. Surge generates over $11 million in revenue per employee—a number that makes even the most successful SaaS companies look inefficient. How? They've perfected the art of human intelligence arbitrage.

Surge's Secret Sauce:

  • Premium Positioning: They charge 8-9 figure contracts to top AI labs
  • Quality Focus: Smaller, higher-skilled workforce compared to Scale
  • Bootstrap Approach: No outside funding means no pressure to burn cash on growth
  • Direct Relationships: Major contracts with Google, OpenAI, Anthropic

Unlike Scale's volume approach, Surge focuses on premium, specialized data work that requires deep expertise. When AI labs need the absolute highest quality human feedback—the kind that can make or break a model's performance—they pay Surge's premium prices.

Chen's anti-VC approach has created something rare: a massively profitable company with complete control over its destiny. While other AI companies burn billions chasing growth, Surge prints money by connecting highly skilled humans with AI companies willing to pay top dollar for quality.

Handshake: The Unexpected AI Workforce Play

Handshake's transformation story might be the most surprising of all. Started in 2014 as a career network for college students, Handshake spent a decade building what they didn't realize was the perfect infrastructure for the AI boom.

Handshake's AI Pivot Numbers:

What makes Handshake unique: they already had the trust and relationships with universities and students that other companies would spend years building. When AI labs started desperately seeking PhD-level experts for training data, Handshake realized they were sitting on a goldmine.

Handshake AI's business model is straightforward: connect their verified network of graduate students and recent PhD recipients with AI companies that need domain expertise. Physics students help improve AI reasoning about quantum mechanics. Biology PhDs help models understand complex molecular interactions. Legal scholars help AI understand constitutional law.

The beauty of Handshake's position is trust and verification. While anyone can claim to be an expert online, Handshake's university partnerships mean they can verify credentials and academic standing. AI companies pay premium rates for this level of verification.

The Seven AI Giants and Where Their Money Goes

To understand why companies like Mercor, Scale, Surge, and Handshake are growing so fast, you need to look at where the big AI companies get their money—and how much they're willing to spend on human intelligence.

The Big Seven AI Players:

1. OpenAI

2. Anthropic

3. Google DeepMind

  • Part of: Alphabet ($2+ trillion market cap)
  • Revenue: Estimated $2-3 billion (based on traffic/usage comparisons)
  • Parent Funding: Alphabet spends $30+ billion annually on R&D
  • Human Data Spend: Estimated $400M+ annually (including internal projects)

4. Meta AI

  • Part of: Meta ($1+ trillion market cap)
  • Investment: $14.3 billion stake in Scale AI alone
  • Total AI Spend: $20+ billion annually
  • Human Data Spend: $1+ billion annually (including Scale investment)

5. xAI (Elon Musk)

6. Microsoft (through OpenAI partnership)

  • Market Cap: $3.8+ trillion
  • OpenAI Investment: $13+ billion
  • AI Revenue: Azure AI generates $13+ billion annualized
  • Human Data Spend: Flows through OpenAI partnership

7. Amazon (Bedrock + Anthropic)

  • Market Cap: $1.8+ trillion
  • Anthropic Investment: $8 billion
  • AI Revenue: AWS AI services, estimated $5+ billion
  • Human Data Spend: Flows through Anthropic and internal teams

Total Market Math: These seven companies have a combined market cap/valuation of over $7 trillion and are collectively spending an estimated $3+ billion annually on human data work. That's enough to support the massive growth we're seeing in companies like Mercor, Scale, Surge, and Handshake.

What Humans Actually Do for AI: The RLHF Revolution

The secret to understanding AI's human dependency lies in a technical concept that sounds boring but is absolutely critical: Reinforcement Learning from Human Feedback (RLHF).

Why RLHF Matters

Here's the dirty secret about large language models: they don't actually understand anything. They're essentially extremely sophisticated autocomplete systems that predict what word should come next based on patterns they've seen in training data.

The problem? Raw prediction doesn't create useful AI assistants. A model trained only on internet text might complete "How do I cook chicken?" with accurate information—or with a conspiracy theory, a joke, or instructions for something dangerous. RLHF is how AI companies teach models to be helpful, harmless, and honest.

The RLHF Process:

  1. Generate Responses: AI model produces multiple responses to the same prompt
  2. Human Evaluation: Human experts rank these responses from best to worst
  3. Reward Model Training: AI learns to predict which responses humans prefer
  4. Model Fine-tuning: Original model is updated to generate more "human-preferred" responses

This process is labor-intensive and requires skilled human judgment. You can't just hire anyone—you need people who understand the domain, can spot subtle errors, and can make consistent quality judgments.

The Scale of Human Feedback

The numbers around RLHF are staggering:

Training GPT-4 Level Models Requires:

  • 10-100 million human preference comparisons
  • Thousands of hours of expert evaluation time
  • Multiple rounds of feedback as models improve
  • Ongoing evaluation as models are deployed and updated

Types of Human Experts Needed:

  • Safety Evaluators: Can the model be tricked into harmful outputs?
  • Domain Experts: Does the model understand physics, law, medicine correctly?
  • Writing Quality Experts: Does the model write clearly and engagingly?
  • Cultural Experts: Does the model understand cultural nuances and avoid bias?
  • Technical Experts: Can the model code, reason mathematically, solve complex problems?

This is why companies like Mercor (PhD experts), Scale (massive workforce), Surge (premium specialists), and Handshake (verified academics) are growing so fast—they've built the infrastructure to deliver human expertise at the scale AI companies need.

RLHF Is Just the Beginning

Here's what most people don't realize: RLHF isn't a one-time process. As AI models get more sophisticated, they need more sophisticated human feedback. Consider what's coming:

Next-Generation Feedback Needs:

  • Multi-modal RLHF: Teaching AI to understand images, video, audio, and text together
  • Long-term Reasoning: Teaching AI to plan and reason over longer time horizons
  • Tool Use: Teaching AI when and how to use external tools and APIs
  • Safety Alignment: Teaching AI to refuse dangerous requests across increasingly subtle scenarios

Each of these advances requires new types of human expertise and even more human feedback. The companies that can deliver this feedback will only become more valuable.

AI Evaluations: The $10 Billion Testing Industry

Beyond training AI models, there's another massive human-powered industry growing: AI evaluation and testing.

Why AI Evaluation Matters

Every AI company needs to answer the same questions:

  • Is our model better than the competition?
  • What can our model do that others can't?
  • Where does our model fail or produce dangerous outputs?
  • How do we prove to customers and regulators that our model is safe?

The answer requires human evaluation at massive scale.

The Evaluation Landscape

Current AI Benchmarks and What They Test:

  • MMLU (General Knowledge): 16,000 multiple choice questions across academic domains
  • HumanEval (Coding): Programming problems that require working code solutions
  • MATH (Mathematical Reasoning): High school and college level math problems
  • HellaSwag (Common Sense): Predicting what happens next in everyday scenarios
  • GPQA (Expert-Level Science): PhD-level questions in physics, chemistry, biology
  • SWE-bench (Software Engineering): Real GitHub issues that need to be resolved

The Human Element: Every one of these benchmarks required hundreds or thousands of hours of human expert time to create, validate, and score. And they need to be constantly updated as AI models improve.

The Evaluation Arms Race

As AI models get better, evaluation becomes more challenging and expensive:

Evolution of AI Benchmarks:

  • 2020: Simple multiple choice questions
  • 2022: Complex reasoning problems
  • 2024: Expert-level domain knowledge
  • 2025: Multi-step problem solving, tool use, safety evaluation
  • Future: Real-world task completion, long-term planning

Cost Escalation: Evaluating a single AI model on comprehensive benchmarks now costs $1,000-10,000 per model. With dozens of major models and constant updates, the evaluation market is easily worth hundreds of millions annually and growing.

Key Players in AI Evaluation:

  • Epoch AI: Building mathematical reasoning benchmarks that cost thousands per model to run
  • Apollo Research: Specializes in AI safety evaluation
  • METR: Focuses on autonomous AI capability evaluation
  • Academic Institutions: Stanford HAI, MIT, etc. creating new benchmarks

The 10-Year Evaluation Outlook

Why evaluations will keep growing:

  1. Regulatory Requirements: Governments are starting to require AI safety testing
  2. Model Complexity: More capable models need more sophisticated tests
  3. Risk Assessment: As AI becomes more powerful, safety evaluation becomes critical
  4. Competitive Intelligence: Companies need to benchmark against competitors
  5. Customer Assurance: Enterprise customers demand proof that AI systems work correctly

Conservative estimates suggest the AI evaluation market will reach $10+ billion annually by 2030, with the majority of that spending going to human experts who design, run, and interpret these evaluations.

The Uncomfortable Truth: Humans Aren't Going Anywhere

The AI industry's dirty secret isn't just that humans are powering current AI—it's that humans will likely be essential to AI development for the next decade or more.

Why Human Feedback Scales with AI Capability

As AI systems become more capable, they actually require more sophisticated human feedback, not less:

The Scaling Challenge:

  • More Modalities: Video, audio, robotics require new types of human evaluation
  • Longer Horizons: AI agents that plan over days/weeks need human feedback on long-term goals
  • Higher Stakes: More capable AI requires more careful safety evaluation
  • New Domains: AI expanding into specialized fields needs domain experts
  • Cultural Adaptation: Global deployment requires feedback from diverse human populations

Each advance multiplies the need for human expertise.

The Economics Are Locked In

The companies we've examined aren't temporary solutions—they're building sustainable economic moats:

Mercor's Moat: Exclusive relationships with 1,500+ universities and 18M+ students. New competitors would need years to build similar trust and scale.

Scale's Moat: 300,000+ trained workers, operational infrastructure across multiple countries, and enterprise relationships with every major AI company.

Surge's Moat: Premium positioning with top AI labs and a proven ability to deliver quality at massive scale with minimal overhead.

Handshake's Moat: University partnerships and verified credential systems that competitors can't easily replicate.

The Investment Reality

The numbers don't lie. AI companies are doubling down on human infrastructure:

These aren't temporary investments—they're strategic bets that human intelligence will remain essential to AI development.

What This Means for the Future

The AI Industry's Real Structure

Strip away the hype, and the AI industry looks like this:

Layer 1: Foundation Models (OpenAI, Anthropic, Google)

  • Burn billions developing base AI technology
  • Entirely dependent on human feedback for practical usefulness

Layer 2: Human Intelligence Platforms (Mercor, Scale, Surge, Handshake)

  • Actually profitable businesses with sustainable unit economics
  • Control the critical resource (human expertise) that Layer 1 needs

Layer 3: AI Applications (Everything else)

  • Build on top of Layer 1 models
  • Success depends on Layer 1 quality, which depends on Layer 2

The money flows up: Application companies pay foundation model companies, who pay human intelligence platforms. The most profitable layer isn't the one with the most hype.

Jobs and Economic Impact

For Workers: The AI revolution isn't destroying knowledge work—it's creating massive demand for human expertise. PhD graduates, domain experts, and skilled evaluators are in higher demand than ever.

For Companies: Success in AI increasingly depends on access to human intelligence at scale. Companies that can coordinate human expertise will have sustainable advantages over those that can't.

For Investors: The "picks and shovels" play in AI isn't semiconductors or cloud computing—it's human intelligence platforms.

The 10-Year Outlook

Three scenarios for human involvement in AI:

Scenario 1: Continued Growth (Most Likely)

Scenario 2: Gradual Automation (Possible)

  • AI systems slowly learn to provide their own feedback
  • Human involvement shifts from data work to oversight and verification
  • Human intelligence platforms evolve into AI-human hybrid systems

Scenario 3: AI Self-Sufficiency (Unlikely in 10 years)

  • AI systems become fully self-improving without human feedback
  • Human intelligence platforms pivot to other markets or become obsolete

Most experts believe Scenario 1 is most likely because the complexity of human values and the pace of AI advancement suggest that sophisticated human feedback will remain essential for much longer than most people realize.

Conclusion: The Real AI Revolution

The real AI revolution isn't happening in the sleek labs of OpenAI or the data centers of Google. It's happening in the distributed network of human experts who are teaching machines how to think.

Behind every impressive AI demo, every breakthrough capability, and every billion-dollar valuation lies an invisible army of humans:

  • PhD students improving GPT's understanding of quantum physics
  • Legal experts teaching Claude about constitutional law
  • Medical professionals helping AI understand diagnostic criteria
  • Writers and editors showing AI what good communication looks like
  • Safety researchers testing whether AI can be manipulated into harmful outputs

The companies that have figured out how to coordinate this human intelligence at scale—Mercor, Scale, Surge, Handshake—aren't just service providers. They're building the nervous system of the AI economy.

The dirty little secret is out: AI isn't replacing humans—it's creating unprecedented demand for human expertise. The companies that embrace this reality, rather than fighting it, will be the ones that capture the real value in the AI revolution.

The future of AI isn't artificial intelligence replacing human intelligence. It's human intelligence and artificial intelligence working together at previously unimaginable scale. The companies that master this combination won't just participate in the AI revolution—they'll control it.

And that might be the most human outcome of all.