Why HireVue Has New Challengers in 2026
HireVue, Sapia, Harver, and myInterview collectively handle 30 million plus interviews every year. They raised over $800M in venture capital and the category is still wide open because AI video interviewing looks nothing today like it did in 2020. Until 2023, the pitch was "screen candidates faster with prerecorded video." That product was brittle, candidates hated it, and bias lawsuits piled up. The new pitch is different: LLM-powered scoring grounded in job-specific competencies, real-time conversational AI interviews (not prerecorded), and automated EEOC bias audits baked into the pipeline.
The modern wave of interview platforms (Mercor, Sapia v3, Metaview, Moonhub) is rebuilding the category on foundation models plus modern speech stacks. That makes the technical moat shallower than it was and the vertical opportunity huge. A platform that nails interview ops for healthcare (clinical competencies plus EEOC) or trades (skill demonstrations plus OSHA) or tech (take-home evaluation plus live coding) can take tens of millions in revenue from the horizontal incumbents.
Before you start, you need to accept one hard truth: AI interview platforms are high-stakes regulated software. Get bias audits wrong, get your data privacy wrong, or ship a system that hallucinates candidate data, and you get sued, lose contracts, and tarnish your brand. This is not a weekend project. It is a 12 to 18 month engineering program if you want to do it right.
Architecture Overview: Recording, STT, Scoring
An AI interview platform has five distinct subsystems. Each can be built in isolation but they all talk to each other through a job and candidate graph:
- Question bank and competency model: Library of questions tagged by competency, seniority, role family. Recruiter-configurable.
- Interview recording layer: WebRTC for live and prerecorded capture, browser-side MediaRecorder fallback, chunked upload to blob storage.
- Transcription pipeline: Streaming STT during recording, async batch STT for final transcripts, speaker diarization, PII redaction.
- Scoring engine: LLM-based scoring against competency rubrics, structured JSON outputs, confidence scores, explainability traces.
- Bias audit and reporting: Adverse impact ratio calculation, demographic breakdowns, audit logs, regulator-ready reports.
You wire these through an orchestration layer (we usually default to Inngest or Trigger.dev for async workflows, or Temporal at scale). The candidate-facing UI is a single-page app. The recruiter dashboard is a second SPA. The bias audit and admin tools are a third. Do not try to squeeze all three into one codebase; separation of concerns pays back quickly.
For complementary architecture, see our AI recruiting platform guide.
Video Capture and WebRTC Setup
Video capture is deceptively hard. Candidates use old laptops, bad cameras, 4G connections, and every browser version in existence. Your recording layer has to degrade gracefully.
Default architecture: MediaRecorder API for prerecorded interviews (one-way capture, push to blob), WebRTC with an SFU (LiveKit, Daily, or 100ms) for live conversational interviews where the candidate talks to an AI interviewer. Use WebCodecs where available for higher-quality encoding at lower CPU cost. Fall back to VP8 or VP9 on older browsers.
Upload strategy: chunked upload with resumable semantics. A 30-minute interview at 720p is 100 to 300 MB. You cannot afford to drop the whole file because the candidate's wifi blipped at minute 28. Use tus.io protocol or a similar resumable upload library. Store chunks in object storage (S3, R2, GCS), run a combiner job after upload, and queue the combined file for processing.
Hardware checks: run a pre-interview hardware check flow. Mic level, camera, bandwidth test. Log the results. You will need this evidence when candidates complain their audio was bad (and they will, even when it was their wifi).
Storage and retention: video data is sensitive. Encrypt at rest (KMS), encrypt in transit (TLS 1.3), set retention policies (typically 180 to 365 days), and build a candidate deletion flow for GDPR and CCPA compliance. Use Mux or Cloudflare Stream if you want a managed video CDN to avoid rolling your own.
Speech-to-Text and Transcript Processing
You will transcribe every interview twice. Once live (for real-time scoring and cheating detection) and once async (for final transcripts, high accuracy, diarization).
Streaming STT: Deepgram Nova-3 is our default ($0.0043 per minute, 300ms latency, strong accuracy). AssemblyAI Universal-1 is a fine second choice. OpenAI Whisper large-v3 hosted on Groq is cheap but higher latency than Deepgram.
Batch STT: for post-processing, we re-run transcripts through a higher-accuracy model with forced alignment, speaker diarization, and named entity tagging. This is where you catch typos in the streaming output, align timestamps to the video, and extract phrases to score.
PII redaction is mandatory. Redact social security numbers, dates of birth, credit card numbers, driver license numbers, and addresses before scoring, before showing transcripts to recruiters, and before any analytics pipeline. Use a two-stage approach: regex for known formats, LLM scoring for ambiguous cases (birthdays spoken as "I was born in 1992"). Log every redaction event for audit.
Subtitles and captions: generate WebVTT with timestamps and word-level alignment. Candidates with hearing impairments need captions on question playback. Recruiters want clickable transcripts that jump to video moments. Your caption pipeline is ADA infrastructure, not a nice-to-have.
LLM-Powered Scoring and Evaluation Prompts
The scoring engine is the heart of your product. It is also the place where bias, hallucination, and legal liability live. Build it carefully.
Approach: take the structured transcript, the job-specific rubric, the competency definitions, and a scoring prompt. Feed them to an LLM (Claude Opus or GPT-4o) and request a structured JSON output with per-competency scores, quoted evidence, and confidence. Run the scoring twice with different seeds and compare. Flag high-variance cases for human review.
Prompt design tips: use explicit rubric anchors (what does a 5 vs a 3 vs a 1 look like for "communication clarity"?). Require the model to quote transcript evidence. Forbid free-text judgments. Use temperature 0.1 to 0.3 for scoring. Log the full prompt and response for every scoring event.
Do not use LLMs to screen demographic or protected characteristics. Do not ask the LLM to predict hireability or performance. Scope scoring to competency-aligned behaviors ("demonstrated structured problem solving when asked about a past project"). Keep the hiring decision squarely with the human recruiter. This is both ethically correct and legally safer under NYC AEDT, Illinois AIVIA, and EU AI Act rules.
Related reading: our AI resume builder guide covers complementary NLP patterns.
Bias Audits, EEOC Compliance, and Explainability
This section will save you from lawsuits. Read it twice.
EEOC and the four-fifths rule: if your AI scoring produces selection rates where a protected group is selected at less than 80% of the rate of the top-selected group, you have adverse impact. This applies to race, sex, age, and disability. Your platform must report adverse impact ratios to customers before they base hiring decisions on your tool.
NYC Automated Employment Decision Tools (AEDT) law: requires an annual independent bias audit by a third party, published publicly on the employer's website. You must provide the data and tooling to support this. Build an audit export feature from day one.
Illinois Artificial Intelligence Video Interview Act: requires notification and consent from candidates before using AI to analyze video interviews, with specific disclosure about what the AI does.
EU AI Act: classifies AI interview systems as "high-risk." You need a risk management system, data governance plan, logging, human oversight, transparency disclosures, and conformity assessment. Budget $50K to $150K just for EU AI Act compliance if you operate in Europe.
Explainability requirements: any candidate (and sometimes regulator) can ask "why was I scored a 3 on communication?" Your system must produce a human-readable explanation citing specific transcript evidence. Build this early. Adding it after launch is 5x the effort.
Proctoring and Anti-Cheating Controls
ChatGPT and jailbreaks made traditional anti-cheating obsolete overnight. Candidates Google answers, read from teleprompters, use AI assistants during interviews. Your platform needs defense in depth.
Standard controls: tab/window focus detection (flags when candidate switches windows), webcam presence checks (continuous face detection), audio anomaly detection (flag voice not matching registered candidate), randomized question order, time-limited recordings, IP and device fingerprinting.
Second-layer controls: passive detection of reading vs speaking (eye movement analysis), screen recording consent for higher-stakes roles, lip sync analysis (does the mouth movement match the audio?), multi-camera setup for executive or compliance-sensitive roles.
Aggressive controls to avoid unless legally required: face recognition tied to government ID (privacy nightmare), keystroke analysis (invasive), constant screen sharing (candidates reject). Err on the side of less invasive. Let employers stack their own controls on top through your platform configuration.
One nuance: anti-cheating cannot be used for hiring decisions. Flags go to the recruiter for review and potential re-interview invite. They should never auto-reject a candidate. That is both an EEOC land mine and a terrible candidate experience. For the UX patterns, see our video calling app guide.
ATS Integration, Launch, and Scale
No recruiter will use your platform if they have to copy candidate data in and out of Greenhouse, Lever, Ashby, or Workday. ATS integrations are table stakes.
Start with Greenhouse (best developer docs, 7,500+ customers). Then Lever, Ashby, and SmartRecruiters. Add Workday and iCIMS last because their APIs are painful and enterprise sales cycles move slow. Budget 80 to 160 engineering hours per ATS integration.
Integration patterns: inbound webhooks when a candidate enters a stage, outbound API calls to push interview results back, OAuth or API key auth, webhook signature verification, rate limit handling. Keep integrations in a dedicated service so one ATS outage does not cascade.
Launch checklist: SOC 2 Type 2, GDPR DPA template, CCPA compliance, NYC AEDT audit readiness, model cards for every scoring model, candidate data export, deletion workflows, recruiter training videos, bias audit sample reports, security questionnaire responses (SIG, CAIQ).
Pricing: horizontal platforms charge $5K to $50K per month. Vertical platforms can charge $50 to $150 per interview or per-seat tiered pricing. Enterprise deals ($100K plus) typically close in 4 to 6 months with security and procurement review.
If you are scoping a vertical AI interview platform, book a free strategy call and we will walk through competitive positioning and build sequencing for your market.
Need help building this?
Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.