Case Study

Sequoia Capital: a bias-audited, 23-signal scoring model integrated into startup sourcing

A year-long INSITE Fellowship engagement that distilled 73 candidate signals — across Crunchbase, Pitchbook, LinkedIn, and Product Hunt — into a 23-signal scoring model for early-stage founder and company identification. Integrated into Sequoia's sourcing algorithms after testing on 50,000 companies.

Read the engagement

Company Background

Early-stage investing is increasingly a signal-extraction problem. Sequoia wanted a data pipeline that surfaced promising founders and companies earlier — and a scoring model that didn't just reinforce the biases already baked into venture outcomes (Ivy League, specific tech employers, certain networks).

Link to Project

Engagement

How we approach this project

AppRocket's CEO joined as an INSITE Fellow and worked directly with Sequoia's head of data science and head of people operations. We analyzed 73 candidate signals across Crunchbase, Pitchbook, LinkedIn, and Product Hunt, manually reviewing 700–800 companies to ground-truth the model. We ran explicit bias-elimination tests — sensitivity analysis on credential and network features — and iteratively refined the signal set down to 23 signals forming a unified scoring model, then tested it across 50,000 companies.

The challenge

Sequoia engaged AppRocket's CEO as an INSITE Fellow with a specific brief: build a data pipeline that uses public signals to identify promising founders and companies earlier. The non-obvious hard part wasn't the data — it was the bias. A scoring model that rewards Ivy League credentials or employees of a specific set of tech companies just re-concentrates capital where it's always flowed. That was the failure mode we had to design against.

What we built

Signal inventory. 73 candidate signals across Crunchbase, Pitchbook, LinkedIn, and Product Hunt — founders, teams, traction, market.
Manual ground truth. 700–800 companies hand-analyzed to calibrate the model against real outcomes.
Bias auditing. Explicit sensitivity tests on credential and network features to surface and neutralize concentration effects.
Signal reduction. Iterative refinement distilled the 73 signals down to 23 that carried the predictive weight without the bias load.
Scale test. The refined model was tested across 50,000 companies via Crunchbase to validate it held up at sourcing scale.
Visualization dashboard. Actionable insights surfaced for the investment team.

The outcome

The 23-signal scoring model is integrated into Sequoia Capital's startup sourcing algorithms — proven at scale, audited for bias, and precise enough to change what comes to the top of the pipeline.

What changed for Sequoia Capital

The final 23-signal model was integrated into Sequoia Capital's startup sourcing algorithms, proven scalable across tens of thousands of companies and materially more precise in identifying promising investment targets.

01Venture Capital
02Data Science
03Scoring Model
04Bias Auditing

73
Initial signals: 23
Final model signals: 50,000
Companies tested: 700–800
Manual ground truth

Have the same problem Sequoia Capital had?

Start with the two-week audit. We'll scope it against your firm specifically.

Book the audit