Intermarkets' Privacy Policy
Support


Donate to Ace of Spades HQ!



Recent Entries
Absent Friends
Bandersnatch 2024
GnuBreed 2024
Captain Hate 2023
moon_over_vermont 2023
westminsterdogshow 2023
Ann Wilson(Empire1) 2022
Dave In Texas 2022
Jesse in D.C. 2022
OregonMuse 2022
redc1c4 2021
Tami 2021
Chavez the Hugo 2020
Ibguy 2020
Rickl 2019
Joffen 2014
AoSHQ Writers Group
A site for members of the Horde to post their stories seeking beta readers, editing help, brainstorming, and story ideas. Also to share links to potential publishing outlets, writing help sites, and videos posting tips to get published. Contact OrangeEnt for info:
maildrop62 at proton dot me
Cutting The Cord And Email Security
Moron Meet-Ups





















« THE MORNING RANT: After Moving All-Star Game from Charlotte in Solidarity with LGBTQ, NBA Opens 2024 Season in Abu Dhabi, where Homosexuality is a Crime | Main | CBS "News" Continues Hiding the Real Video of Kamala Harris' Word-Salad Nonsense Answers, and Also Refuses to Make a Transcript of Her Real Words Available, Too »
October 11, 2024

Polls Are For Strippers: An Explanation

One of the most famous statisticians of the 20th Century, George Box, had a saying that I've known since high school (my dad is a world famous statistician himself who has won the Shewhart Medal, though his area is industrial engineering and experimental design). That saying is, "All models are wrong. Some models are useful." I say this: polling, as it currently exists in the American political space, is not useful.

I've been extremely dismissive of polling in general for a while, and there are lingering questions from commenters I interact with about where the source of that dismissiveness comes from. Well, here is my perspective.

Polling has several innate problems that exacerbate when funded by people with agendas and who don't have the amount of money to do it right. Polling is expensive. Response rates on phone calls for polling are minute. Something like 1 in 10 calls (according to a research paper, it's actually just under that at 9%) gets a response that a firm can actually use. So, to get a 1,000 sample, you have to call about 10,000 numbers. That's a lot of time if you're using real people to make the calls (reportedly the more accurate way to do it). It requires a lot of resources, and that gets expensive. Your fly-by-night polling firms, are they investing in their efforts at that level? The news organizations that pay for polling, the same ones seeing contractions at even the highest "talent" levels, are they investing at that level? Most likely not.

And then you get to the question of incentives: What incentive is there for polling firms to "try and get it right?"

"Well," you say, "that's obvious. You're dumb TJM for not figuring this out. If they're not accurate, then no one will believe them in the future."

That's a great idea, but it doesn't actually hold up with reality. Let's take Quinnipac as a case study. In 2016, their final poll of the presidential race was Clinton 50, Trump 44. The race ended 48 to 46. In 2020, their final poll was Biden 50, Trump 39. The final result was 51 to 47. Their pattern over the last two cycles is to be within a reasonable (we'll get to that) distance of the Democrat number and to regularly (sometimes vastly) underestimating Trump's support. They're not very accurate. They're not very good. What is the generalized opinion of the firm? Well, 538 calls them the 17th most accurate pollster and rates them 2.8/3 stars. How? Quinnipac is terrible.

I'm not interested in seeing what esoteric reasons 538 rates Quinnipac so highly, but it's obvious that them regularly getting things wrong isn't actually a concern for its reputation. Getting it wrong doesn't hurt them. So, what incentive is there in being right?


Another aspect that needs to be talked about is the difference between every poll a firm does other than the final poll, and the final poll. Which poll is the firm rated on if there's even a consideration for accuracy? It's the final poll. What do we mostly end up talking about over the course of the year of election season? Not the final poll. If the final poll is the only one that counts (again, assuming it actually counts), then is there even an incentive to have previous polls be accurate? Who can tell if they are or not?

You know we've been fed the line for literally decades that the polls "tighten up" after Labor Day, that the American people don't start paying attention to politics until the final month or two of an election, which is why the polls suddenly and regularly change. Is that even true? How would we know if it's true or not? We can point to election results and say, "this pollster was close and that one was not," for their final poll, but what about the polls in June? Or July? "Well, the electorate completely changed in September," is honestly not a good excuse, and accepting it is silly. The polling industry has gotten the American political commentariat to parrot its excuses for not being close to accurate for decades now, and it'd be nice if it would stop.

So, all that being said, let's assume, just for argument's sake, that polling is at least trying to be generally reflective of the facts on the ground, that the firms are trying to be accurate no matter what their limitations. Should we still accept them?

Nope because once you get into the nitty gritty of their methods, you realize how much is smoke and mirrors.

What is the one thing you need to do statistical analysis? It's not special software. It's not even that much of an education in statistics. It's a good sample. What's a good sample? Size is important, but also randomness. The sample of data needs to be large enough and truly random to apply statistics to with any kind of reliability. How to actually get a random sample has been a problem that the polling industry has actually been very publicly dealing with for a few years now. Long gone are the days when you could rely on the phone book to provide you with some kind of random population to pull a random sample from. What happens when area codes don't matter, certain segments of the population are FAR less likely to actually answer the phone? What about using online tools? Does that exclude large segments of the population as well? Without that randomness, that polling firms openly admit to struggling with, you have to rely on modelling of the data before you even begin the statistical analysis.

What do I mean by modeling? I mean that you have to take your raw data set, realize that it probably doesn't reflect reality because it's not actually random, the breakouts of the data don't reflect the population in any form, and something has to be done. So, you say, "Well, we got 50% Democrats, 35% Republicans, and 15% Independents, but we know that's not right. So, we have to reweight this to 35% Democrat, 35% Republican, and 30% Independent." Except...what mix do you actually choose? I chose that mix at random because it "seemed" reasonable. Is it accurate? Where do we go for accurate information about what something like partisan (could be racial, could be income, could be education) makeup of the electorate will be? There are sources you can pull like registration numbers or even a larger survey like Gallup's, but then you also have to assume what percentage of these groups will vote and how much that'll be of the electorate.

We haven't even gotten to the statistics, and we're already compiling assumptions on top of each other. What's the point of data gathering at this point if you're just going to change everything around?

This brings me to the margin for error. I seriously, seriously doubt that the posted margins for error are close to accurate. 3.5% on data that you've had to massage to get to a place where you can even begin to analyze it? Please. But, let's give in and say that everything is fine with the process and the margin of error is accurate. Do we know what that means?

The margin for error is usually interpreted (at least through implication) that it's the overall margin for error of everything. It's not. It's the margin for error of every data point individually. So, the poll showing 48-48 with a MoE of 3.5%? The actual range of possible results given the data is from 44.5-51.5 to 51.5-44.5 within one standard deviation. The idea is that the most likely scenario is somewhere between those results...somewhere. That's extremely far from precise. It also expands the idea of what's "outside the margin of error". When you see a poll saying 45-49 with a 3.5% margin of error, the natural implication in our innumerate society is to say that it's outside the margin of error. It's not. It's comfortably inside it. And some of these polls have margins for error admitted to be at least 5%. Think of that, it's at least somewhat likely that there's a 10% swing in the placement of the two main data points that the firm is ADMITTING to.

"Well, they all agree around the same point, and they're kind of right some of the times."

When you don't know what you're doing, when you don't know the actual result, but you have an inclination that it's generally a 50-50 race, how hard is it to massage the data during modeling to get to results like 48-49 consistently? Is there an incentive to keep to that kind of result if everyone else is doing it? If everyone is saying 48-48 but then a firm comes out with 45-55, are they considered probably right or wrong? Usually wrong because, well, no one else sees that. This is called herding, and looking over the history of polling, it seems obvious that it's an extremely common practice, telling me that, really, almost no one doing public polling is independent or even doing it that well.

"But the aggregates!"

Ask a statistician what happens to the margin of error when you average two averages. It gets bigger, not smaller. If aggregates are right, it's because of herding, not because averaging statistics is a good idea. And, again, assuming a major party's candidate gets at least 45% of the vote is not a hard assumption. Assuming that they'll get no more than 55% is the same thing. That gives you a 10 point window to say that both candidates are in there somewhere, leaning more towards one or the other. And Quinnipac failed even that in 2020.

So, polling is imprecise, done badly, and full of assumptions while firms have their own incentives for doing whatever they do (trying to be accurate or selling a narrative), and even accuracy isn't even rewarded by major players in the game. It's also not that hard to get within a couple of points because it's not like we live in a world where a political candidate at the national level is going to get 25% of the vote. Is polling to be dismissed entirely?

I think so. I'm a pretty firm First Amendment guy, but if there came a law to outlaw all polling, I'd be hard pressed to come up with an argument not relying on free speech against it.

Is there something else we can do?

Well, ultimately, even reactions to polling are about gut feelings. "This poll looks right," and "This poll looks wrong." We're automatically dismissive of things that don't mesh with our preconceived notions and accepting of things that do. That creates a set of assumptions about results right there. But what other actual data is out there?

Well, I started all of this because I found the below thread:


It's a thread from a guy looking at Pennsylvania early voting data, making a series of assumptions and extrapolating from that. I think the assumptions are reasonable considering past behavior, but they're still just assumptions. They could be wrong, but it feels right to me (my own assumption). It also feels more grounded in the reality of how things are actually going in PA than pollsters who make assumptions about turnout based on...whatever they want.

Is Trump going to win? It depends on a whole lot more than voting. Are the polls going to be wrong again? Hell, the most accurate individual pollsters of 2020 were wrong by a few points. Pretty much every pollster is "wrong" every single election. It's not a question of who's the most right but of who's the least wrong. Are there signs out there other than polling that we could use as a potential sign of how things will go? For sure.

It's just that polling generally sucks, and I hate it.

Polls are for strippers.

digg this
posted by TheJamesMadison at 12:07 PM

| Access Comments




Recent Comments
whig: "236 One point I disagree about. There was never su ..."

The Central Scrutinizer: "Citizen Kane is about how lying to the American pe ..."

The Dark Lord: "yeah poll 1,00 people in California and 10 people ..."

Oldcat: "And you're also wrong about usefulness of polls. Y ..."

rhomboid : "As noted many times before, US elections (dependin ..."

Sponge - F*ck Joe Biden: "[i] I just wonder how many Philly peeps remember ..."

Jukin the Deplorable a Clear and Present Danger: "2020 150+ million votes and only 135 million regis ..."

Ciampino - Si Ispettore, la droga: "65 https://tinyurl.com/32ypdn3h ..."

Smell the Glove: "Didn't ABC do the bit where they fraudulently had ..."

Oldcat: "This brings me to the margin for error. I seriousl ..."

Dr. Bone: ">>>This BS news from Gen Milley is exactly why Tru ..."

Comrade Flounder, Disinformation Demon: "dri's piano sidebar is good stuff Posted by: Pi ..."

Recent Entries
Search


Polls! Polls! Polls!
Frequently Asked Questions
The (Almost) Complete Paul Anka Integrity Kick
Top Top Tens
Greatest Hitjobs

The Ace of Spades HQ Sex-for-Money Skankathon
A D&D Guide to the Democratic Candidates
Margaret Cho: Just Not Funny
More Margaret Cho Abuse
Margaret Cho: Still Not Funny
Iraqi Prisoner Claims He Was Raped... By Woman
Wonkette Announces "Morning Zoo" Format
John Kerry's "Plan" Causes Surrender of Moqtada al-Sadr's Militia
World Muslim Leaders Apologize for Nick Berg's Beheading
Michael Moore Goes on Lunchtime Manhattan Death-Spree
Milestone: Oliver Willis Posts 400th "Fake News Article" Referencing Britney Spears
Liberal Economists Rue a "New Decade of Greed"
Artificial Insouciance: Maureen Dowd's Word Processor Revolts Against Her Numbing Imbecility
Intelligence Officials Eye Blogs for Tips
They Done Found Us Out, Cletus: Intrepid Internet Detective Figures Out Our Master Plan
Shock: Josh Marshall Almost Mentions Sarin Discovery in Iraq
Leather-Clad Biker Freaks Terrorize Australian Town
When Clinton Was President, Torture Was Cool
What Wonkette Means When She Explains What Tina Brown Means
Wonkette's Stand-Up Act
Wankette HQ Gay-Rumors Du Jour
Here's What's Bugging Me: Goose and Slider
My Own Micah Wright Style Confession of Dishonesty
Outraged "Conservatives" React to the FMA
An On-Line Impression of Dennis Miller Having Sex with a Kodiak Bear
The Story the Rightwing Media Refuses to Report!
Our Lunch with David "Glengarry Glen Ross" Mamet
The House of Love: Paul Krugman
A Michael Moore Mystery (TM)
The Dowd-O-Matic!
Liberal Consistency and Other Myths
Kepler's Laws of Liberal Media Bias
John Kerry-- The Splunge! Candidate
"Divisive" Politics & "Attacks on Patriotism" (very long)
The Donkey ("The Raven" parody)
Powered by
Movable Type 2.64