We Recomputed Our IMDb Correlation Three Ways. At the Show Level, It's Negative.

Last month we ran a quick correlation of episode-level IMDb ratings against the Humor Index across three shows and reported r = -0.005 pooled. The headline at the time was "audience ratings and comedy craft are essentially unrelated." Comments asked what that meant. We dug back in across all eight scored shows now (1,105 episodes), and the three-level answer is much more interesting than the one-line headline.

Three findings, going in the same direction.

Finding 1: Top-10 episode overlap is at chance

For each show, take HI's top 10 episodes and IMDb's top 10 episodes. Count how many appear in both lists.

Show	HI top-10 ∩ IMDb top-10	HI top-20 ∩ IMDb top-20
The Office	0 / 10	4 / 20
Seinfeld	0 / 10	1 / 20
Friends	2 / 10	3 / 20
Parks and Recreation	1 / 10	5 / 20
Arrested Development	2 / 10	7 / 20
Schitt's Creek	1 / 10	2 / 20
30 Rock	0 / 10	3 / 20
Taxi	0 / 10	5 / 20
Pooled	6 / 80 (7.5%)	30 / 160 (18.8%)

A random pick of 10 episodes out of ~100 would overlap with another random 10-pick at roughly 10%. HI and IMDb agree on "the best episodes" at chance level. For Seinfeld and Taxi specifically, the agreement is so low it's actively striking — one or zero overlapping titles in their top 10.

Arrested Development has the strongest agreement (2/10, 7/20), consistent with its also being the only show with a moderate per-episode correlation. We'll come back to AD.

Finding 2: Per-show correlations are a mixed-sign cluster

Show	N	HI mean	IMDb mean	Pearson r	Spearman ρ
Arrested Development	84	81.92	8.02	+0.392	+0.379
Parks and Recreation	98	78.32	7.99	+0.201	+0.184
Taxi	114	77.52	7.49	+0.182	+0.176
The Office	186	78.20	8.08	+0.158	+0.172
30 Rock	138	84.93	7.93	+0.007	+0.022
Friends	235	73.14	8.29	-0.013	-0.059
Seinfeld	170	77.25	8.25	-0.058	-0.004
Schitt's Creek	80	77.93	7.97	-0.115	-0.077

The range is -0.115 to +0.392. Five shows are positive, three are negative. The weighted mean of within-show correlations is about +0.07. Even within a single show — where everything except the individual episode is held constant — the AI and the audience are measuring almost-uncorrelated things, and which one leads can flip sign show-to-show.

Why does AD have such a strong positive while Schitt's Creek has a negative? We're not entirely sure, but a working hypothesis: AD is unusually joke-dense and the IMDb voters self-selected for AD fans who weight craft heavily. Schitt's Creek is more emotion-driven, and IMDb voters reward late-series finales and emotional reveals that the joke-density rubric doesn't see. That hypothesis is testable; we'll come back to it in a future post.

Finding 3: At the show level, the correlation is negative

This is the strangest finding and probably the headline.

Plot each show as one point: mean HI on the x-axis, mean IMDb on the y-axis. Across the 8 shows, r = -0.287 (rho = -0.476).

The shows the AI rates funniest tend to be the shows IMDb audiences rate lowest on average. 30 Rock is the cleanest example: highest HI in the dataset (84.9), middling IMDb (7.93). Friends is the cleanest counter: lowest HI (73.1), highest mean IMDb (8.29). Seinfeld and Friends sit at the top of the IMDb axis while sitting in the middle of the HI axis; 30 Rock and Arrested Development sit at the top of the HI axis while sitting in the middle of the IMDb axis.

This is consistent with a known divide in comedy: audience love is built on relationship and emotional payoff, which is a different thing from per-joke craft. The AI is measuring per-joke craft. IMDb is measuring how-much-this-show-meant-to-me. Across our catalog, those two things are mildly anti-correlated.

The disagreement is structured

The disagreement between the two rating systems isn't random — it's interpretable. Here are the 10 episodes where HI rates much higher than IMDb (within-show z-score gap):

Show	S	E	Title	HI	IMDb
Seinfeld	6	14	The Highlights Of 100 (1)	91.1	6.9
Seinfeld	1	4	Male Unbonding	91.6	7.2
Friends	7	21	The One With The Vows	83.0	7.2
Friends	4	21	The One With The Invitation	77.9	6.9
The Office	8	21	Angry Andy	89.5	6.7
The Office	4	13	Dinner Party	98.0	7.6
Seinfeld	1	1	The Seinfeld Chronicles	86.4	7.3
30 Rock	1	1	Pilot	90.1	7.2
The Office	9	17	The Farm	91.9	7.3
Schitt's Creek	1	3	Don't Worry, It's His Sister	88.6	7.5

Pattern: clip shows, pilots, recap episodes, and episodes that fans actively disliked. The AI is reading them as joke-dense without seeing that the audience was already over them. Note "Dinner Party" appearing here even at a 7.6 IMDb — A.V. Club gave it an A, HI agrees (98.0), but its IMDb is "only" 7.6 because many viewers find it too painful to enjoy.

Now the reverse — the 10 episodes where IMDb rates much higher than HI:

Show	S	E	Title	HI	IMDb
Friends	6	25	The One With The Proposal (2)	59.0	9.2
Schitt's Creek	6	12	The Pitch	66.2	8.7
Taxi	3	20	Latka The Playboy	67.2	8.5
Seinfeld	9	8	The Betrayal	62.9	8.9
30 Rock	5	4	Live Show	77.9	8.6
Schitt's Creek	6	13	Start Spreading The News	73.2	9.2
30 Rock	1	12	Black Tie	77.0	8.5
30 Rock	6	14	Kidnapped By Danger	74.5	8.3
Seinfeld	5	21	The Hamptons	73.4	9.5
30 Rock	7	12	Hogcock!	83.1	8.9

Pattern: series finales, emotional payoff episodes, format-experiment episodes, and stunt-character showcases. "Start Spreading The News" is the Schitt's Creek finale. "The One With The Proposal" is the Chandler-Monica proposal. "The Betrayal" is Seinfeld's reverse-chronology episode. "Live Show" is 30 Rock's live-broadcast experiment. "Hogcock!" is 30 Rock's series finale.

The audience is rewarding what the AI cannot see: ending, weight, format risk, and the cumulative emotional debt of a long-running show paying off. These are events, not jokes. HI scores them as merely competent on craft because they're not joke-dense; IMDb scores them as transcendent because of what they represent.

What this means for what we publish

The site has been positioned around the claim that HI measures "comedy craft, not popularity." This analysis gives that claim empirical backbone instead of leaving it as a hedge:

At the show level, HI is anti-correlated with audience popularity. Funnier-per-craft and more-loved are not the same thing.
At the episode level within a show, the two axes are at chance to weakly positive — and the disagreement is structured: audiences reward emotional climax; HI rewards joke density.
Our earlier r = -0.005 pooled result understated this. It averaged opposing show-level and episode-level effects to a single deceptively quiet number.

The right way to read the Humor Index is now this: it measures what a writers' room would mean by "this episode is well-crafted comedy." It does not measure what an audience means by "this is my favorite episode." Those two things diverge in interpretable ways, and we can show you exactly where.

Methodology caveats

IMDb ratings drift over time. This snapshot is whatever was in our episode JSONs at scoring time; ratings on long-running shows like Friends and Seinfeld have probably shifted slightly upward since their early days.
IMDb's per-episode rating mixes signal sources. Pilots and finales get massively more votes than mid-season filler. That sample-size imbalance is real but doesn't change the direction of the findings.
The pre-1985 era is represented by only one show (Taxi). The show-level negative correlation could partly reflect era effects we haven't measured yet. As we score Mary Tyler Moore, All in the Family, MAS*H, and Barney Miller, this comparison will get more robust.
This is one external source. Audience ratings on IMDb are a signal of episode quality, not the truth about it. A complementary critic-side validation (A.V. Club letter grades) is coming next. Both together is the right calibration.

Data

If you want to pull the underlying joined dataset — every episode with its HI, IMDb rating, and within-show z-scores — it's available on request. Email hello@thehumorindex.com and we'll send it. The full per-show breakdown is also visible on each show's page on the site.

---

This post extends the earlier [IMDb vs Humor Index](/blog/imdb-vs-humor-index) analysis from April with the full eight-show dataset. The methodology is documented at [our methodology page](/methodology). Questions or pushback: hello@thehumorindex.com.

We Recomputed Our IMDb Correlation Three Ways. At the Show Level, It's Negative.

Finding 1: Top-10 episode overlap is at chance

Finding 2: Per-show correlations are a mixed-sign cluster

Finding 3: At the show level, the correlation is negative

The disagreement is structured

What this means for what we publish

Methodology caveats

Data

We publish one deep dive every week.

Modern Sitcoms Are More Character-Driven Than the Classics

We Rescored 30 Episodes Twice. Our Single-Run Humor Index Has an ICC of 0.28.

We Recomputed Our IMDb Correlation Three Ways. At the Show Level, It's Negative.

Finding 1: Top-10 episode overlap is at chance

Finding 2: Per-show correlations are a mixed-sign cluster

Finding 3: At the show level, the correlation is *negative*

The disagreement is structured

What this means for what we publish

Methodology caveats

Data

We publish one deep dive every week.

Modern Sitcoms Are More Character-Driven Than the Classics

We Rescored 30 Episodes Twice. Our Single-Run Humor Index Has an ICC of 0.28.

Finding 3: At the show level, the correlation is negative