Long Thoughts

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
October 7, 2025

How We Built ReelCV: A Behind-the-Scenes Look

Read more

Hiring hasn’t changed much in decades. Recruiters wade through endless resumes, candidates struggle to showcase who they truly are, and companies rely on paid demos and clunky trial processes. We wanted to flip the script: create a platform that surfaces exclusive, vetted talent, highlights personality and skills, and streamlines the recruiting process. That vision became ReelCV.

Here’s how we built it, month by month, behind the scenes.

March – First Steps Toward ReelCV

March marked the true beginning of building ReelCV. What started as a rough vision began to take on structure, functionality, and form. The team focused on the foundational elements: profile scoring, recruiter workflow improvements, and the first working version of the platform.

Profile Scoring Discussions

Early in the month, conversations circled around how recruiters actually think about candidates. Chris and Ryan explored the idea of profile scoring — a way to quantify both “red flags” and positive signals in a candidate’s career journey.

  • Red flags: unexplained employment gaps, frequent job-hopping, short stints in multiple roles.
  • Positive signals: long-term employment, steady progression, MSP-relevant experience.

The vision was clear: instead of recruiters manually scanning for these patterns, ReelCV would apply weighted scoring automatically, surfacing candidates most likely to succeed while flagging potential concerns. These discussions planted the seed for one of ReelCV’s most important differentiators: a data-driven way to evaluate people, not just paper resumes.

UX & Admin Improvements

Ryan had begun improving the user and admin experience inside the Hirexe platform. The focus was on precision over volume: rather than overwhelming recruiters with dozens of irrelevant candidates, the system would present a handful of highly relevant profiles.

The work included:

  • Flexible search filters combining multiple criteria.
  • Mapping recruiter workflows through recordings to understand real usage patterns.
  • Admin-side UX updates designed to save time without breaking existing recruiter habits.
  • Early profile designs aimed at being both functional for recruiters and engaging for candidates.

This was the first step toward shaping ReelCV as a practical tool recruiters could adopt quickly, rather than a flashy product that disrupted workflows.

Layout Changes & Recommendations

Chris provided critical feedback on design and layout. His recommendations emphasized clarity, structured experiences, and small usability enhancements like CV view tracking. These insights were vital in keeping the product grounded in what recruiters actually needed, not just what looked good on paper.

Version 1 Goes Live 🎉

By March 15, ReelCV hit its first major milestone: V1 was officially ready. This was more than a prototype, it was the foundation on which every future iteration would be built. Recruiters could log in, create candidate profiles, and begin to see how ReelCV might reshape the hiring process.

Bottleneck: Video Uploads

A week later, however, a problem surfaced. Despite video being central to the ReelCV vision, candidates weren’t uploading their videos. Technical friction, strict validation, and unclear guidance caused drop-offs. Without videos, the “Careel” (career reel) concept couldn’t deliver on its promise of dynamic storytelling.

Chris flagged this as a bottleneck, and on March 20, Ryan implemented a solution: video upload became a required step in profile creation.

This change forced adoption while ensuring every profile included the key differentiator that set ReelCV apart.

Skills and Future Improvements

Toward the end of March, discussions turned to how candidate skills should be captured and displayed. The team began planning for V2 improvements, including more intelligent skill extraction, tagging, and presentation. These early conversations foreshadowed the refinements that would come in April and May.

Takeaway

March was all about building the bedrock of ReelCV. From scoring models to recruiter workflows, from layout tweaks to the first working version, the month ended with a product that was both usable and promising. Challenges like video uploads revealed the gaps, but they also pushed the team toward solutions that strengthened the platform.

ReelCV had officially moved from idea to implementation.

April – Humanizing the CV

If March was about laying the foundation, April was about giving ReelCV its personality. The focus shifted from raw functionality to human-centered design, finding ways to showcase candidates as more than static resumes and making the recruiter experience feel smooth, modern, and intuitive.

Early Homepage & Signup Flow

At the start of April, the team reviewed the original homepage layout. It felt functional, but more like a traditional database than a modern hiring platform. Chris and Ryan began redesigning the flow, emphasizing search as the entry point. Instead of navigating menus, recruiters would land directly on a search-driven homepage with prompts to start finding candidates right away.

Ryan demonstrated updates that made the experience more engaging:

  • Highlights during uploads kept users occupied while waiting, reducing friction.
  • A revamped signup flow added steps for profile images, socials, CV uploads, and video capture, ensuring richer candidate data from the start.
  • Company dashboards were redesigned around search functionality, making it faster for recruiters to move from AI Candidates Search → shortlist → interview.

The shift was subtle but significant: ReelCV started to feel less like a form-filling system and more like a dynamic tool that guided recruiters and candidates forward.

Personality Capture

By April 8, the conversation turned to one of the most defining elements of ReelCV: capturing personality.

Chris, Ryan, and the team debated how to move beyond credentials and showcase people as individuals. The result was a new profile structure that highlighted hobbies, passions, and personality traits alongside professional skills. Design elements included:

  • Hobbies, About Me and a My Future Section were implemented.
  • Structured sections like “Who I Am” to surface values and motivations.
  • The potential to search by personality traits in the future, aligning culture as well as skills.

This was a turning point: ReelCV wasn’t just solving efficiency problems for recruiters, it was creating a more authentic, human-centered hiring experience.

Career Profile Evolution

Mid-April saw refinements to the Career Profile, which became the centerpiece of the candidate experience.

Key updates included:

  • Skills-first highlighting at the top, weighted against job matches.
  • A streamlined layout with video prominently displayed and sections simplified into Background, Technical, and Achievements.
  • A single left-hand timeline for career progression, replacing fragmented sections.

The team also debated the value of experimental fields like “About Me” and “My Future”, weighing authenticity against clutter. These discussions reflected ReelCV’s constant balancing act: depth vs. clarity, personality vs. professionalism.

Visual & Design Enhancements

As April progressed, design refinements gave the product a modern polish. Ryan rolled out updates that introduced:

  • A black-and-white interface with orange accents reserved for critical highlights like “Top Match.”
  • Progress bars, icons, and hover interactions for cleaner navigation.
  • A consistent design language that bridged the candidate side and company side, with subtle adjustments for each audience.

Even small choices, like limiting orange usage to preserve impact showed how carefully the team was shaping ReelCV’s identity.

Takeaway

April was the month ReelCV began to feel alive. The platform evolved from a functional V1 into something with warmth, personality, and a clearer voice. Profiles were no longer just digital CVs; they became windows into a candidate’s story. Recruiters weren’t just searching; they were experiencing a more intuitive, guided workflow.

With these changes, ReelCV moved closer to its core promise: making hiring both efficient and human.

May – From Vision to Reality

By May, ReelCV began crossing the line from concept to working product. The month was defined by breakthroughs: long-discussed features finally became operational, and for the first time, the team could experience the platform end-to-end as a user would.

A More Dynamic First Impression

On the homepage, Ryan introduced a responsive globe animation, adding a sense of scale and motion. It wasn’t just a design flourish, it visually reinforced Hirexe’s mission of connecting South African professionals with global opportunities.

Building a Complete Profile

Work also continued on the profile creation process. Early iterations of the “Create a Profile” page were simplified and refined, ensuring that candidates were guided step by step without overwhelm.

One of the major wins was solving the Candidate Title challenge. Jurgen and Ryan collaborated on an API fix that allowed accurate job titles to pull through consistently, a small but crucial detail for candidate credibility and recruiter trust.

Another milestone was enabling a functional CV upload with live data. For the first time, the system could parse, process, and store actual candidate CVs, a leap forward from placeholders and test data.

The Video Breakthrough

Perhaps the biggest technical achievement came mid-month: video upload, capture, conversion, transcription, and AI analysis all became fully operational.

This was a defining feature for ReelCV. Video made it possible to showcase not just qualifications, but presence, communication style, and personality. AI-driven transcription and analysis turned these videos into structured, searchable data. The team celebrated this as a big win.

Smarter Profiles & Admin Controls

With the basics in place, attention turned to enhancing depth and usability:

  • Ryan unveiled a profile rating system with skills assessment and experience editing, allowing recruiters to quickly benchmark candidates. The internal reaction captured the excitement , it was a true “wow moment.”
  • On the backend, the admin dashboard advanced with tools for monitoring, editing, and managing profiles at scale. This ensured recruiters and Hirexe team would have the control needed to maintain high standards.

V2 Walkthrough

By the end of May, Ryan presented a walk-through of V2. The platform had moved far beyond its early wireframes: there was now a coherent candidate journey, from signup to video, from CV upload to recruiter-facing search and rating.

Internal Testing & Feedback

With so many core features working, May also marked the start of internal sign-up testing. Members of the Hirexe team created their own profiles, uploading CVs, recording videos, and stepping into the shoes of future users. Feedback was shared openly in Slack, often captured in quick videos or notes. These test runs revealed pain points, validated design choices, and built confidence that ReelCV was ready for wider trials.

Takeaway

May was the month ReelCV truly came to life. What began as scattered features in development matured into a functioning platform, tested internally and celebrated for its breakthrough on video. For the first time, the team could see and not just imagine the product they had been building toward for months.

June – Testing the Edges

By June, ReelCV was no longer just an internal build, it was being tested in real-world conditions. This meant progress, but also the inevitable roadblocks that appear when theory meets practice.

Roadblock: Video Uploads

On June 12, a critical issue surfaced: the system was not accepting videos reliably. This was a major concern, since video was at the heart of ReelCV’s value proposition. The team quickly identified the problem and began working through fixes, but it was a reminder that even celebrated features from May needed ongoing resilience under load.

Applicant Feedback

June also marked the first wave of external applicant reactions. A handful of candidates tried the new signup process, and their feedback — screenshots, notes, and direct comments — was captured and discussed by Ryan and Nakita in Slack. While not all reactions were glowing, the input was invaluable. It validated what was working while highlighting areas for improvement.

The Search Feature

On June 20, Ryan demoed an early version of the search functionality. Recruiters could now filter and find candidates more dynamically, surfacing profiles not just by job title but by a mix of skills, experience, and traits. It was an exciting glimpse of how ReelCV would empower MSPs to move beyond keyword-matching toward holistic talent discovery.

Another Roadblock: Google Errors

The month closed with a new technical hurdle. On June 30, the team hit a Google Console error that disrupted parts of the platform. While frustrating, the issue became another example of ReelCV’s resilience-in-progress, each roadblock was tackled quickly, with fixes rolled into subsequent updates.

Takeaway

June was the month of stress-testing. The product was in users’ hands, the team was watching real reactions, and the system’s weak points were being exposed. The setbacks, especially around video uploads and Google integration, were balanced by major wins: authentic feedback, working search, and proof that ReelCV could handle the messy reality of live use.

July – Smoothing the Edges

July was all about stability, refinement, and incremental wins. The product was now being used more consistently, but real-world usage continued to expose friction points and the team responded with targeted improvements.

Ongoing Roadblocks

The month kicked off with persistent issues in Google Console, limiting production capacity. The team was actively troubleshooting quotas and production limits, ensuring that new signups and CV uploads could proceed without interruption.

Product Reviews & Positive Feedback

Despite technical hurdles and multiple product review sessions throughout July on the 15th, 22nd, and 28th it showed encouraging results. Feedback highlighted improvements in usability and overall experience, reinforcing the work the team had done to stabilize ReelCV.

Production Deployments

A key focus for July was deploying solutions to recurring issues from prior months:

  • ReelCV API Retry Logic: Introduced exponential-backoff retries (up to three attempts) for profile-rating calls. Any errors now trigger a Slack notification, allowing the team to proactively monitor failures.
  • Account Deletion: Users gained the ability to delete their profiles from the dashboard, complete with confirmation popups, admin API integration, and automatic Firebase sign-out.
  • Video Upload / Re-capture: The platform allowed users to upload a new video or re-capture an existing one during setup. Backup transcripts were automatically generated to prevent data loss due to AI or network issues.
  • Signup Path Adjustments: Production updates included minor tweaks to ensure the sign-up process was smoother and less prone to errors.

Takeaway

July was a month of resilience and refinement. Roadblocks like Google Console limitations persisted, but proactive monitoring, enhanced retry logic, and thoughtful feature updates ensured users could continue building their ReelCVs without major disruption. Positive feedback during product reviews validated these improvements, highlighting that ReelCV was becoming a stable and dependable tool for both candidates and MSPs.

August 2025 – ReelCV Launch and Completion

Overview

August marked the official launch of ReelCV, completing the platform’s development and making it fully live. The team focused on finalizing workflows, enhancing the user experience, and ensuring all key features were functional for both candidates and companies.

Product Enhancements & Deployment

Significant improvements were implemented across onboarding, user registration, company creation, and search workflows. The onboarding modal and related dialogs were refined for a smoother user experience, with consistent styling and dark/light mode harmonization. User registration and company migration processes were strengthened with Firebase Auth integration, company data migration, and enforced corporate email and profile requirements. The FindUsers and Candidate Search features were upgraded with a new two-panel layout, server-side filtering, sortable tables, and editable recruiter ratings. Deliverable sharing, data migration, and Admin tools received security updates and functionality improvements, while ReelCV was fully integrated with the SQL backend. Minor bug fixes and styling tweaks were applied across the platform.

Testing & Feedback

The team conducted internal testing and monitored user sign-ups, capturing feedback through Slack. Errors and edge cases were reviewed and resolved quickly, ensuring the platform was stable and ready for public use. Positive product reviews confirmed the platform’s usability and overall experience improvements.

Takeway

By the end of August, ReelCV was fully operational, providing a seamless experience for candidates and companies. While the Admin side is still being fine-tuned, all core features are functional and live, marking a major milestone for the platform.

October 3, 2025

AI is Backed by Humans

Read more

TL;DR: Despite all the talk about "artificial" intelligence, the biggest names in AI are spending billions of dollars on human labor. 

From the $500M+ that Mercor is making connecting PhDs to AI labs, to Scale AI's $2B+ in revenue from human data workers, to Surge AI crossing $1.4B with just 121 employees managing human annotators—the AI revolution is actually powered by an invisible army of human experts doing the grunt work that makes these "intelligent" systems possible.

About 80% of the explosive revenue growth we’ve seen between the above companies is coming from STAFFING revenue. The AI industry has a dirty little secret, and it's hiding in plain sight. 

While Silicon Valley VCs throw around terms like "artificial general intelligence" and "autonomous systems," the reality is far more human than anyone wants to admit. Behind every breakthrough language model, every impressive AI assistant, and every mind-blowing demonstration lies an army of human workers—annotating data, providing feedback, and essentially teaching machines how to think.

The numbers tell a story that the AI hype machine doesn't want you to hear: the companies making the most money in AI aren't the ones building the flashy chatbots—they're the ones managing the human workforce that makes those chatbots possible.

The Human Infrastructure Companies Quietly Building AI Empires

Mercor: From College Recruiting to AI Gold Rush

Mercor's story reads like a Silicon Valley fever dream. Founded by three 21-year-old college dropouts in 2023, the company started as a recruiting platform for college students. Fast forward to September 2025, and Mercor is reportedly approaching a $450 million annual run rate with investors eyeing a $10+ billion valuation.

What changed? AI labs discovered that Mercor was sitting on exactly what they desperately needed: access to thousands of domain experts with advanced degrees.

The Numbers Behind Mercor's Explosion:

Here's what Mercor actually does for AI companies: they connect graduate-level experts—physics PhDs, biology researchers, legal experts, medical doctors—with AI companies that need specialized knowledge to train their models. When OpenAI needs someone who understands quantum mechanics to help improve GPT's physics reasoning, or when Anthropic needs constitutional law experts to help Claude understand legal nuances, they turn to Mercor.

Mercor's business model is brutally simple: charge a 30% fee on every expert they place. With AI companies paying premium rates for specialized talent (often $50-200+ per hour), Mercor's take per placement is substantial. The company has been profitable since early 2025, generating $1M+ in profit just in February alone.

The kicker? CEO Brendan Foody recently posted that their ARR is actually higher than $450 million—suggesting they're on track to hit the $500 million milestone faster than almost any enterprise software company in history.

Scale AI: The $29 Billion Human-Powered Machine

Scale AI tells perhaps the most revealing story about AI's human dependency. Founded in 2016, Scale positioned itself as the infrastructure layer for AI training data—but what they really built was the world's most sophisticated human workforce management system.

Scale AI's Staggering Numbers:

Here's what's wild about Scale: Meta just paid $14.3 billion for a 49% stake in what is essentially a human resources company. Think about that for a moment. The company that owns Facebook, Instagram, and WhatsApp—with all their technical expertise—decided they needed to pay nearly $15 billion to access Scale's network of human data workers.

What Scale's Army Actually Does:

  • Data Labeling: Humans look at millions of images, videos, and text samples to teach AI what's what
  • RLHF (Reinforcement Learning from Human Feedback): Humans rate AI responses to teach models what "good" looks like
  • Specialized Tasks: Doctors review medical AI outputs, lawyers check legal reasoning, scientists verify technical content

Scale operates massive facilities in Southeast Asia and Africa through their Remotasks subsidiary, employing tens of thousands of workers who spend their days training tomorrow's AI systems. They've built the McDonald's of AI training—standardized, scalable human intelligence that AI companies can't replicate internally.

The Meta acquisition reveals the secret: Scale's real value isn't their technology—it's their ability to coordinate hundreds of thousands of humans to improve AI systems at massive scale.

Surge AI: The Stealth $25 Billion Giant

While everyone was watching OpenAI and Anthropic, Surge AI quietly built the most profitable human intelligence operation in AI history. Founded in 2020 by former Google and Meta engineer Edwin Chen, Surge took a different approach: bootstrap profitability from day one.

Surge AI's Incredible Economics:

This might be the most impressive business in all of AI. Surge generates over $11 million in revenue per employee—a number that makes even the most successful SaaS companies look inefficient. How? They've perfected the art of human intelligence arbitrage.

Surge's Secret Sauce:

  • Premium Positioning: They charge 8-9 figure contracts to top AI labs
  • Quality Focus: Smaller, higher-skilled workforce compared to Scale
  • Bootstrap Approach: No outside funding means no pressure to burn cash on growth
  • Direct Relationships: Major contracts with Google, OpenAI, Anthropic

Unlike Scale's volume approach, Surge focuses on premium, specialized data work that requires deep expertise. When AI labs need the absolute highest quality human feedback—the kind that can make or break a model's performance—they pay Surge's premium prices.

Chen's anti-VC approach has created something rare: a massively profitable company with complete control over its destiny. While other AI companies burn billions chasing growth, Surge prints money by connecting highly skilled humans with AI companies willing to pay top dollar for quality.

Handshake: The Unexpected AI Workforce Play

Handshake's transformation story might be the most surprising of all. Started in 2014 as a career network for college students, Handshake spent a decade building what they didn't realize was the perfect infrastructure for the AI boom.

Handshake's AI Pivot Numbers:

What makes Handshake unique: they already had the trust and relationships with universities and students that other companies would spend years building. When AI labs started desperately seeking PhD-level experts for training data, Handshake realized they were sitting on a goldmine.

Handshake AI's business model is straightforward: connect their verified network of graduate students and recent PhD recipients with AI companies that need domain expertise. Physics students help improve AI reasoning about quantum mechanics. Biology PhDs help models understand complex molecular interactions. Legal scholars help AI understand constitutional law.

The beauty of Handshake's position is trust and verification. While anyone can claim to be an expert online, Handshake's university partnerships mean they can verify credentials and academic standing. AI companies pay premium rates for this level of verification.

The Seven AI Giants and Where Their Money Goes

To understand why companies like Mercor, Scale, Surge, and Handshake are growing so fast, you need to look at where the big AI companies get their money—and how much they're willing to spend on human intelligence.

The Big Seven AI Players:

1. OpenAI

2. Anthropic

3. Google DeepMind

  • Part of: Alphabet ($2+ trillion market cap)
  • Revenue: Estimated $2-3 billion (based on traffic/usage comparisons)
  • Parent Funding: Alphabet spends $30+ billion annually on R&D
  • Human Data Spend: Estimated $400M+ annually (including internal projects)

4. Meta AI

  • Part of: Meta ($1+ trillion market cap)
  • Investment: $14.3 billion stake in Scale AI alone
  • Total AI Spend: $20+ billion annually
  • Human Data Spend: $1+ billion annually (including Scale investment)

5. xAI (Elon Musk)

6. Microsoft (through OpenAI partnership)

  • Market Cap: $3.8+ trillion
  • OpenAI Investment: $13+ billion
  • AI Revenue: Azure AI generates $13+ billion annualized
  • Human Data Spend: Flows through OpenAI partnership

7. Amazon (Bedrock + Anthropic)

  • Market Cap: $1.8+ trillion
  • Anthropic Investment: $8 billion
  • AI Revenue: AWS AI services, estimated $5+ billion
  • Human Data Spend: Flows through Anthropic and internal teams

Total Market Math: These seven companies have a combined market cap/valuation of over $7 trillion and are collectively spending an estimated $3+ billion annually on human data work. That's enough to support the massive growth we're seeing in companies like Mercor, Scale, Surge, and Handshake.

What Humans Actually Do for AI: The RLHF Revolution

The secret to understanding AI's human dependency lies in a technical concept that sounds boring but is absolutely critical: Reinforcement Learning from Human Feedback (RLHF).

Why RLHF Matters

Here's the dirty secret about large language models: they don't actually understand anything. They're essentially extremely sophisticated autocomplete systems that predict what word should come next based on patterns they've seen in training data.

The problem? Raw prediction doesn't create useful AI assistants. A model trained only on internet text might complete "How do I cook chicken?" with accurate information—or with a conspiracy theory, a joke, or instructions for something dangerous. RLHF is how AI companies teach models to be helpful, harmless, and honest.

The RLHF Process:

  1. Generate Responses: AI model produces multiple responses to the same prompt
  2. Human Evaluation: Human experts rank these responses from best to worst
  3. Reward Model Training: AI learns to predict which responses humans prefer
  4. Model Fine-tuning: Original model is updated to generate more "human-preferred" responses

This process is labor-intensive and requires skilled human judgment. You can't just hire anyone—you need people who understand the domain, can spot subtle errors, and can make consistent quality judgments.

The Scale of Human Feedback

The numbers around RLHF are staggering:

Training GPT-4 Level Models Requires:

  • 10-100 million human preference comparisons
  • Thousands of hours of expert evaluation time
  • Multiple rounds of feedback as models improve
  • Ongoing evaluation as models are deployed and updated

Types of Human Experts Needed:

  • Safety Evaluators: Can the model be tricked into harmful outputs?
  • Domain Experts: Does the model understand physics, law, medicine correctly?
  • Writing Quality Experts: Does the model write clearly and engagingly?
  • Cultural Experts: Does the model understand cultural nuances and avoid bias?
  • Technical Experts: Can the model code, reason mathematically, solve complex problems?

This is why companies like Mercor (PhD experts), Scale (massive workforce), Surge (premium specialists), and Handshake (verified academics) are growing so fast—they've built the infrastructure to deliver human expertise at the scale AI companies need.

RLHF Is Just the Beginning

Here's what most people don't realize: RLHF isn't a one-time process. As AI models get more sophisticated, they need more sophisticated human feedback. Consider what's coming:

Next-Generation Feedback Needs:

  • Multi-modal RLHF: Teaching AI to understand images, video, audio, and text together
  • Long-term Reasoning: Teaching AI to plan and reason over longer time horizons
  • Tool Use: Teaching AI when and how to use external tools and APIs
  • Safety Alignment: Teaching AI to refuse dangerous requests across increasingly subtle scenarios

Each of these advances requires new types of human expertise and even more human feedback. The companies that can deliver this feedback will only become more valuable.

AI Evaluations: The $10 Billion Testing Industry

Beyond training AI models, there's another massive human-powered industry growing: AI evaluation and testing.

Why AI Evaluation Matters

Every AI company needs to answer the same questions:

  • Is our model better than the competition?
  • What can our model do that others can't?
  • Where does our model fail or produce dangerous outputs?
  • How do we prove to customers and regulators that our model is safe?

The answer requires human evaluation at massive scale.

The Evaluation Landscape

Current AI Benchmarks and What They Test:

  • MMLU (General Knowledge): 16,000 multiple choice questions across academic domains
  • HumanEval (Coding): Programming problems that require working code solutions
  • MATH (Mathematical Reasoning): High school and college level math problems
  • HellaSwag (Common Sense): Predicting what happens next in everyday scenarios
  • GPQA (Expert-Level Science): PhD-level questions in physics, chemistry, biology
  • SWE-bench (Software Engineering): Real GitHub issues that need to be resolved

The Human Element: Every one of these benchmarks required hundreds or thousands of hours of human expert time to create, validate, and score. And they need to be constantly updated as AI models improve.

The Evaluation Arms Race

As AI models get better, evaluation becomes more challenging and expensive:

Evolution of AI Benchmarks:

  • 2020: Simple multiple choice questions
  • 2022: Complex reasoning problems
  • 2024: Expert-level domain knowledge
  • 2025: Multi-step problem solving, tool use, safety evaluation
  • Future: Real-world task completion, long-term planning

Cost Escalation: Evaluating a single AI model on comprehensive benchmarks now costs $1,000-10,000 per model. With dozens of major models and constant updates, the evaluation market is easily worth hundreds of millions annually and growing.

Key Players in AI Evaluation:

  • Epoch AI: Building mathematical reasoning benchmarks that cost thousands per model to run
  • Apollo Research: Specializes in AI safety evaluation
  • METR: Focuses on autonomous AI capability evaluation
  • Academic Institutions: Stanford HAI, MIT, etc. creating new benchmarks

The 10-Year Evaluation Outlook

Why evaluations will keep growing:

  1. Regulatory Requirements: Governments are starting to require AI safety testing
  2. Model Complexity: More capable models need more sophisticated tests
  3. Risk Assessment: As AI becomes more powerful, safety evaluation becomes critical
  4. Competitive Intelligence: Companies need to benchmark against competitors
  5. Customer Assurance: Enterprise customers demand proof that AI systems work correctly

Conservative estimates suggest the AI evaluation market will reach $10+ billion annually by 2030, with the majority of that spending going to human experts who design, run, and interpret these evaluations.

The Uncomfortable Truth: Humans Aren't Going Anywhere

The AI industry's dirty secret isn't just that humans are powering current AI—it's that humans will likely be essential to AI development for the next decade or more.

Why Human Feedback Scales with AI Capability

As AI systems become more capable, they actually require more sophisticated human feedback, not less:

The Scaling Challenge:

  • More Modalities: Video, audio, robotics require new types of human evaluation
  • Longer Horizons: AI agents that plan over days/weeks need human feedback on long-term goals
  • Higher Stakes: More capable AI requires more careful safety evaluation
  • New Domains: AI expanding into specialized fields needs domain experts
  • Cultural Adaptation: Global deployment requires feedback from diverse human populations

Each advance multiplies the need for human expertise.

The Economics Are Locked In

The companies we've examined aren't temporary solutions—they're building sustainable economic moats:

Mercor's Moat: Exclusive relationships with 1,500+ universities and 18M+ students. New competitors would need years to build similar trust and scale.

Scale's Moat: 300,000+ trained workers, operational infrastructure across multiple countries, and enterprise relationships with every major AI company.

Surge's Moat: Premium positioning with top AI labs and a proven ability to deliver quality at massive scale with minimal overhead.

Handshake's Moat: University partnerships and verified credential systems that competitors can't easily replicate.

The Investment Reality

The numbers don't lie. AI companies are doubling down on human infrastructure:

These aren't temporary investments—they're strategic bets that human intelligence will remain essential to AI development.

What This Means for the Future

The AI Industry's Real Structure

Strip away the hype, and the AI industry looks like this:

Layer 1: Foundation Models (OpenAI, Anthropic, Google)

  • Burn billions developing base AI technology
  • Entirely dependent on human feedback for practical usefulness

Layer 2: Human Intelligence Platforms (Mercor, Scale, Surge, Handshake)

  • Actually profitable businesses with sustainable unit economics
  • Control the critical resource (human expertise) that Layer 1 needs

Layer 3: AI Applications (Everything else)

  • Build on top of Layer 1 models
  • Success depends on Layer 1 quality, which depends on Layer 2

The money flows up: Application companies pay foundation model companies, who pay human intelligence platforms. The most profitable layer isn't the one with the most hype.

Jobs and Economic Impact

For Workers: The AI revolution isn't destroying knowledge work—it's creating massive demand for human expertise. PhD graduates, domain experts, and skilled evaluators are in higher demand than ever.

For Companies: Success in AI increasingly depends on access to human intelligence at scale. Companies that can coordinate human expertise will have sustainable advantages over those that can't.

For Investors: The "picks and shovels" play in AI isn't semiconductors or cloud computing—it's human intelligence platforms.

The 10-Year Outlook

Three scenarios for human involvement in AI:

Scenario 1: Continued Growth (Most Likely)

Scenario 2: Gradual Automation (Possible)

  • AI systems slowly learn to provide their own feedback
  • Human involvement shifts from data work to oversight and verification
  • Human intelligence platforms evolve into AI-human hybrid systems

Scenario 3: AI Self-Sufficiency (Unlikely in 10 years)

  • AI systems become fully self-improving without human feedback
  • Human intelligence platforms pivot to other markets or become obsolete

Most experts believe Scenario 1 is most likely because the complexity of human values and the pace of AI advancement suggest that sophisticated human feedback will remain essential for much longer than most people realize.

Conclusion: The Real AI Revolution

The real AI revolution isn't happening in the sleek labs of OpenAI or the data centers of Google. It's happening in the distributed network of human experts who are teaching machines how to think.

Behind every impressive AI demo, every breakthrough capability, and every billion-dollar valuation lies an invisible army of humans:

  • PhD students improving GPT's understanding of quantum physics
  • Legal experts teaching Claude about constitutional law
  • Medical professionals helping AI understand diagnostic criteria
  • Writers and editors showing AI what good communication looks like
  • Safety researchers testing whether AI can be manipulated into harmful outputs

The companies that have figured out how to coordinate this human intelligence at scale—Mercor, Scale, Surge, Handshake—aren't just service providers. They're building the nervous system of the AI economy.

The dirty little secret is out: AI isn't replacing humans—it's creating unprecedented demand for human expertise. The companies that embrace this reality, rather than fighting it, will be the ones that capture the real value in the AI revolution.

The future of AI isn't artificial intelligence replacing human intelligence. It's human intelligence and artificial intelligence working together at previously unimaginable scale. The companies that master this combination won't just participate in the AI revolution—they'll control it.

And that might be the most human outcome of all.

January 23, 2024

3 Important Traits

Read more

Here are the key aspects to keep in mind when you are applying for an entry level sales role. This advice is for someone young in their career (a couple years out of college) and actively interviewing for an entry level sales role.

Sales roles have the same fundamentals, regardless of the industry you’re in.

Sales leadership is looking for a bunch of core principles (especially when younger in career/lacking a bunch of experience).

Here is advice and what people are looking for/what to highlight when you’re interviewing for sales positions.

3 key traits to get across in your interview:

  1. EFFORT
  2. CURIOSITY
  3. PASSION

Everything else can be taught.

For the interview itself, come prepared with the following stories.

Talk through an ‘upsell’ you have done in the past

  • figured out a need
  • figured out why the other players would also be interested in that need
  • made them think they needed it and it was their idea
  • sold it / got their buy in
  • boom that’s an upsell (you will be doing this in your role lol)

Demonstrate that you care deeply about relationships

  • story about how you bring people together
  • help people get along
  • facilitate that sort of thing (can be professional or even personal)

Demonstrate you are a details person (and care about products/services)

  • not only are you a people person
  • you truly love/understand the product/solution that you're selling
  • example of a complex solution you have put together
  • demonstrate industry knowledge / connecting dots that yield a positive outcome

Demonstrate you are a great listener

  • sales is literally:
    • figuring out what people need
    • then giving it to them
      • to figure out what people need, you have to ask them
        • THEN LISTEN TO WHAT THEY SAY
  • demonstrate you are a great listener, really understand people/the root cause of what they are asking/getting at

Lastly = Follow Up / Be Diligent

  • I've never heard 'no' in sales
  • I get told ‘not yet’ every day
  • demonstrate that you are respectful, but persistent when attempting to get things you want!
December 21, 2023

Iteration, Not Perfection

Read more

Building a business is constantly about making small changes.

You have to focus on the big stuff, and then let the little stuff fall into place

This is especially true at startups, and especially true with software.

Take Loom as an Example:

They started off as a user testing marketplace.

Initially, they were selling the feedback from experts. But theirs users didn't care about 'expert' feedback.

They cared about REAL feedback from REAL users that were using their product!

Now, Loom would not have been able to make this pivot if:

Quick pivot, and BOOM, Product Market Fit.

Naval nailed this is a tweet from a few years ago i haven't forgotten since.

The faster you get to 10,000 iterations, the faster you have an outlier product.

The faster you get to 10,000 iterations, the better.

Necessities for productive iteration:

Action Items:

  • SHIP QUICK
  • GET REAL USERS
  • BE OPEN TO FEEDBACK
  • IMPLEMENT FEEDBACK
  • CONTINUE ITERATION CYCLE

November 28, 2023

Crawl - Walk - Run - Fly

Read more

Momentum is everything - the hardest thing is to get started from nothing.

That’s crawling (this is where we are with Hirexe)

But crawling isn't the goal - you want to go fast!

Getting to the crawl level (shipping the prototype) was important for us.

Here are our primary focus areas:

1 - User experience

UX is going to be king for this product. We are not reinventing the wheel. We know what the marketing is looking for. Quality candidates (if hiring) or a great job (if looking for full-time work). There are other companies/tools out there that 'help' get you there right now; they do a bad job at it.

So, the experience you get as a hiring manager/candidate is everything.

We are aiming to make it as simple/valuable as possible.

2 - Ease of onboarding

Nothing is worse than signing up for a new product or service and spending 20+ minutes filling out a profile immediately.

So we are doing everything in our power to automate this, perform 80% of the work for you, and let you complete the 20% to fine-tune.

3 - Specificity

Our search functionality is badass right out of the gate!

We have architected and will continue to ensure searchability, allowing you to find/highlight exactly what type of roles you're looking for.

And that's where things stand currently!

Remember, if you try to run right out of the gate, you'll fall and hurt yourself, or at the very least, pull a muscle!

As we continue to walk/run/fly:

  • ai will be baked in to analyze profiles and make smart recommendations
  • transparency will be implemented for salaries, as well as who is viewing your profile/jobs, etc
  • notifications will be implemented to see when new jobs/candidates are added to the system that fit your criteria

So you start slow, and as you get more comfortable, increase speed/difficulty.

Action item:

When building software:

  • Ship prototype (we are here!)
  • Get feedback, see what people like/are most interested in
  • Implement

Have the long-term vision in mind, and ALWAYS build towards that (automatically match up the best candidates with the best roles for them)

  • But listen/get help with the fine details getting you there

Everyone wins.

November 26, 2023

How To Actually Help

Read more

'Helping' is doing the things that need to be done without being told to do so.

There are layers here:

1 - being able to figure out what actually needs to be done

2 - being able to make an impact on whatever that thing is

A simple ‘how to help scenario’ can apply to preparing the Thanksgiving meal in the kitchen at home.

  • Not helpful: asking the person who is doing all of the cooking how you can help
  • Helpful: setting the table and taking out the trash (because the table wasn’t set, trash can was overflowing)

But it can also apply to your manager. Or the CEO. Or the VC. (an ongoing joke in the startup community)

  • Asking if you can help is a vanity metric and not valuable. Asking to help makes the person feel like they are doing something of value. Sometimes it's genuine, but sometimes it's lazy.

Helping is seeing the future. And then executing. An example:

  • At Thanksgiving dinner, if you weren't cooking any of the meal, instead of asking people, how can I help, you would set the table and take out the overflowing trash.

A more complicated business example:

  • You're the leader of a sales organization. You came up short of the monthly growth target last month. You're going to have to present to the board.
  • You could look directly at your AE's and ask them why. They will explain that a handful of deals got pushed to next month, etc. Not ultra-valuable or helpful to you, the board, or anyone else involved.

A great leader would do the following:

Look at the lead flow:

  • Do we have enough leads
  • Are they quality
  • Are we routing them in the correct direction

Look at the product:

  • Is the product working
  • Why are we losing deals? do we have the right features
  • Have we shipped what we said we would on the roadmap

Outlier or standard

  • Is this a one-time thing? Or happening regularly?

Collateral

  • do we have all of the tools necessary to close deals?

Price

  • Are we priced correctly?
  • Can we be creative here?

Support

  • Is there adequate support
  • Are people getting answers on time/can they reach the support staff with a call/email
  • Is the technical team there when issues arise

And, of course, talk to your AEs (and the rest of the team) / get their input.

But often, the most important thing you can do for your team is impact all of the other inputs I highlighted before!

Going into a conversation with the above breakdown level is much more helpful.

Put yourself in a position to understand and address the root cause, then implement change!

That’s helpful.

Action item:

Instead of asking someone, 'How can I help':

Put yourself in the shoes of the person you are asking

  • think thoughtfully about what their current problems are
  • if easy, go ahead and just do stuff for them (you don't need to ask permission)
  • if more complicated
    • highlight problems / propose how you'd go about finding a solution
    • if able, implement changes
    • if unable, at the least, this will spur a much more productive dialog for future change

Happy Thanksgiving.

October 22, 2023

Experience is Everything

Read more

Experience is Everything.

This is a topic I write/talk/preach about regularly - How you make someone feel will never be automated.

The 'Zorus Experience' was a huge reason we were successful at my last company.

The Zorus Experience was our mantra on everything customer-related.

Sometimes, technology doesn't work as you want it, especially when you're a new company. Sometimes, you can’t ship updates as quickly as you'd like. Building is tough; there are always delays. That stuff was outside of our control.

What's inside our control is the experience we can give people when they interact with us personally.

And that, when done right, makes all the difference.

Our primary areas of focus were always:

  • Listen
  • Be Positive
  • Stay Upbeat
  • Be Transparent/Honest
  • Do the Right Thing for the Partner
  • Say what you'll do, then do what you've said

And it worked really well for us. People gave us many second and third chances.

More here: Why You Should Embrace Crisis.

That’s software, though! Especially early on. But that isn't the goal.

Our goal is to build an 11-star experience.

11-star experience is all-encompassing. It ranges from fantastic UI/UX, to customer service, to quality product to branding, everything.

Brian Chesky put this together, and it’s what Airbnb models themselves after:

You will win if you can curate this experience for your users.

Work in that direction.

October 21, 2023

Become a Master Persuader

Read more

Robert Cialdini’s book, INFLUENCE, is one of the most important you can read in your life. Here is a breakdown of the book's most important highlights + personal notes.

Explanation Principle

  • I have to use the copier vs
  • I have to use the copier because i am running late

Contrast Principle

  • If seeing two things, the latter will seem drastically different
  • Light object then heavy - object feels REALLY heavy
  • This is why 500 seems like nothing after you buy a 3500 suit!
  • Real estate sales guy -> show a dumpy house, then a really nice house!

Rule 1

RECIPROCATION

"We should try to repay, in kind, what another person has provided us"

The Free Sample

  • but they must be made aware that you're doing something for free

Strong Cultural pressure to reciprocate a gift, even an unwanted one, **but there is no such pressure to purchase an unwanted commercial product**

  • so, you need to have already built some rapport with your client for this to really work!

Unfair exchanges

  • woman accepts a drink = more sexually availably!?

Concessions

  • Once you say no once, its tough to say no twice
  • Boy scout selling tickets
  • then swaps to chocolate bars for a lesser price!
  • Reject the Retreat Technique
  • The gifted negotiator will ask for something way out there, then come back around
  • Makes everyone feel like a winner
  • Responsibility
  • We were able to meeting in the middle
  • Satisfaction
  • I got him to come way down

Perceptual Contract Principle

  • after being exposed to price of something larger, less expensive offer appears WAY smaller
  • buy a 2k suit, the $150 dollar tie is nothing

Rule 2

Commitment and Consistency

Human beings have a (often subconscious) nearly obsessive desire to be consistent with what we have already done.

Horse Betters at a race track

  • once bet is placed -> far more confident it will win

Beach Blanket story

  • if someone gets up and walks away, and has items stolen, no ones reacts
  • if the person asks you watch their things, then someone trys to steal them, 19/20 people chase after them!

Consistent Decision making

  • made up mind on a topic - easier to stay there

Automatic Consistency actually often times hides the subconscious from imperfect realities

  • "shields us against thought"

COMMITMENT IS THE KEY

If I can get you to make a commitment (that is, to take a stand, go on record), I will have set the stage for your automatic and ill considered consistency with that earlier commitment.

  • once the stand is taken, you're locked in 🙂

Cold Calling Technique: "How are you doing today" (pg 51)

  • people generally respond with fine, which makes it much harder to appear grouchy on the back end, when you transition to the ask!

Start Small and Build

  • Foot-In-The-Door Technique
  • In sales, lock down any sort of small transaction, then begin to scale up.
  • you've got one commitment, keep em coming

The Magic Act

  • Watch what people do
  • NOT what they say

Writing things down

  • Physical evidence that act happened
  • Can be shown to other people
  • Simply require more work than verbal ones
  • more work = more commitment
  • severity of initial = higher commitment to group
  • frats/military
  • Pro tip - WRITE GOALS DOWN

The Inner Choice

  • commitments most effective when
  • active
  • public
  • effortful
  • Last part is biggest though:
  • inner responsibility
  • Freedman Study

Approach with children:

This is the only way to get people to buy into decision making long term - they need to BELIEVE themselves

Lowball method:

  • offer great deal, get to the finish line, deal changes
  • already so committed that they take the deal anyway
  • also works with human interaction "i promise ill change"

Rule 3

SOCIAL PROOF

We determine what is correct by finding out what other people think is correct

This one is so engrained in the subconscious its silly:

and

The Social Proof Phenomenon:

The greater the number of people who find any idea correct, the more the idea will be correct!

When we are unsure of ourselves, the situation is unclear, or uncertainty reigns

  • that is when social proof is king
  • Uncertainty = the right hand man of social proof!!
  • when it doubt -> well this is what the crowd is doing so i am too
  • this leads directly in the Herd Mindset

Remove Ambiguity when dealing with groups

  • make specific assignments to specific people

Social Proof also strongest when we view others as similar to ourselves

  • this is why the relate is such a massive part of the sales process
  • you need to be just like the person on the other side of transaction!

Rule 4

LIKING

We most prefer to say yes to the requests of someone we know and like

This is why warm intros play so hugely in sales:

  • "whom suggest I call on you"
  • turning the person down is like rejecting the friend!

Halo Effects

  • physical attractiveness -> good looking = good

Similarity

  • being aware of and capitalizing here is VITAL whenever possible in sales

Compliments

  • Actor McLean Stevenson once described how his wife tricked him into marriage:
  • she said she liked me!
  • By simply telling someone you like them is effective
  • We're absolute suckers for flattery across the board

  • Even fake compliments flatter us on a subconscious level!!

Contact and Cooperation

  • We like stuff we're familiar with
  • this is why customer touches are crucial
  • increases familiarity!!

Association

  • weatherman being yelled at for bad weather!

  • luncheon technique
  • people become fond of things / items they are talking about while eating
  • why lunch meetings are so big - not by accident!
  • Sports another massive example
  • lucky fans, can only wear certain jerseys, none of this matters at all

Rule 5

AUTHORITY - Follow An Expert

Most of us put an alarming high level of faith into the 'expert' point of view

  • example = Milgrams Experiment
  • Patients were simply unable to defy the expert "lab coated boss" who told them to keep shocking the patients on the other end

We are literally trained from the second we're born that obdience to the proper authority is RIGHT and the improper authority is WRONG

  • its absolutely drilled into our subconscious so deeply
  • RELIGION is a big part of this as well
  • but also school, stories, the military, the government, everything

Connotation, not Content

  • if we're getting click, whirred by authority figures:
  • were just as vulnerable to the symbols of authority as we are to the substance!
  • AKA - DONALD TRUMP
  • Scott Adams talks about this nonstop

Titles

  • titles w/ authority status lead to height distortions!

Clothing

  • subconscious triggers of authority figures
  • "uniforms' big here - hospitals, suit and tie, police uniforms, etc

Interestingly, w/ Trump in the White House and all of the back and forth w/ COVID-19, a lot of authority rhetoric has since been brought to light!

Rule 6

SCARCITY - The Rule of the Few

"The way to love anything is to realize that it might be lost"

The scarcity principle simply states that opportunities seem more valuable to us when their availability is limited.

  • Think about it! This is most definitely true for me - tend to remember pain much more than gain

Limited Number tactic

  • product is running out - won't last long!
  • whether true or not, its going to increase the immediate value of the item in the customers eyes!

This works with any sort of item:

  • Written items appear to be more persuasive if we can only get it from one source
  • EXCLUSIVITY

Going from having an abundance to scarcity will produce the most dramatic effects from the scarcity complex. Its not even close.

  • People see things as more desirable when they have recently become less available as opposed to it being scare all the time
  • Ultra strict parents -> more rebellious children
  • Chocolate chip cookie experiment
  • less available, the more delicious

Highlights our Competitive Nature as well

  • not only do we want the same item more when scarce, we want it most when we have to compete for it 🙂


October 20, 2023

Early SaaS Investing (2023)

Read more

Stumbled on the napkin math investing breakdown for SaaS companies in 2023. (Shoutout to Luke Sophinos @ Linear)

Check it out:

This does a great job of explaining what the industry expectations from Pre-Seed through Series B.

Some additional context (from my experience personally)

In order to secure funding:

1 - Team matters

  • Having experience being a founder/knowing your space inside and out is crucial

2 - Momentum

  • More specifically: TRACTION
  • Opportunity
  • Users
  • Revenue

**For early investment purposes (Seed/Pre-Seed). After a SEED investment is locked in:

Emphasis on capital efficiency is at an all-time high. (understandably so, given macro environment)

= KEEP BURN LOW

Metrics and financial models only get you so far in early-stage investing with tech companies. You don’t have a ton of real data. Many of the projections are a bit of a shot in the dark.

However, what you can directly control is your burn rate. Your burn rate is simply the amount of $$ you spend on a monthly basis.

The lower your burn rate, the longer the runway you have. (aka - the less money you spend, the more time you have till your bank account goes to $0 :)

Only spend money on technical folks until true PMF is really achieved. This is why you stick with founder-led sales for as long as possible/you’re ready to really start to scale.

This is where Naval’s famous quote comes from:

Learn to Sell, Learn to Build.

If you can do both, you will be unstoppable.

In conclusion:

Early-stage companies need small teams with dynamic founders who can do the roles of multiple people simultaneously. This allows them to keep their expenses low while they are building the initial version of their product and getting the traction necessary to justify institutional investment.

October 12, 2023

Angel Investing (Loom Example)

Read more

Loom just sold for 975M.

They currently have:

  • 25 Million Users
  • 1.8M Companies on the platform
  • Estimated revenue of 40M or so (information not public)
    • sold for a 24x revenue multiple

They were founded 8 years ago, in 2015.

Their first round of funding was 500K at a 6.5M valuation cap and a 10% discount.

That means a 25K investment would have bought you .3846% of the company... Which technically would be worth exactly $3,750,750 today.

Additionally, if you got in early, you would have gotten first right of refusal for subsequent fundraising rounds at that above discount %. So, if you kept putting in $$, you could easily have a much higher return %. (Hint: because loom was killing it, you would have kept maxing out the money you could put in... Your risk goes down with/ every fundraising round.)

You aren't getting returns like that anywhere besides angel investments in rapidly expanding technology companies 🙂 .

Here is an example from Greg Isenberg, who actually passed on the round!

Today, he's wishing he was a part of it!

No results found.