POLLS - Why are they wrong?

The Rocketry Forum

Help Support The Rocketry Forum:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.

Why are polls wrong?

  • Too small a sample

    Votes: 16 45.7%
  • Biased poll

    Votes: 23 65.7%
  • Vague Questions

    Votes: 14 40.0%
  • Lack of participation

    Votes: 15 42.9%
  • False answers

    Votes: 11 31.4%
  • Results are too close

    Votes: 5 14.3%
  • Other

    Votes: 8 22.9%

  • Total voters
    35

les

Well-Known Member
TRF Supporter
Joined
Jan 19, 2009
Messages
3,441
Reaction score
1,319
Polls. Even after the main election, I continue to see articles about different polls.
Yet the polls often appear to give the wrong answers. WHY??

I did decide to be ironic and set up a poll to help decide why polls are wrong......

I do have some thoughts which I will share in additional posts......
 
First and foremost, I believe the sample size for most polls are too small, typically in the 500 to 1000 people range. Just what are the demographics that may influence a response? Male/Female? Race? Age? Single/Married? Children? Education level? Religion? Devoutness to the religion? Liberal/Conservative/Middle? Employed/Not/Retired? Poor/Middle/Rich, Other factors? Now let’s assign some numbers based on the number of possibilities (such as there are 2 possibilities to be either a Male or a Female, and I am purposely going for low numbers.

Male/Female = 2, Race = 5, Age = 3 (young/middle/old), Single/Married = 2, Children (Yes/No = 2), Education = 3, Religion = 5, Devoutness = 3, Liberal/Conservative = 3, Employment = 3, Wealth = 3. Then multiply

2 x 5 x 3 x 2 x 2 x 3 x 5 x 2 x 3 x 3 x 3 = 97,200 permutations. And if you add other factors, maybe city/rural, or increase the options (like increase race or religion to 6, 7, or 8) the number gets even higher.

But the polls typically are less than 1000 participants. That in itself only covers 1% of the possible permutations!

And what about distribution of people? If a community is 25% polka-dot people and 75% striped (trying not to pick a specific “real” group), shouldn’t the polls also ensure to get the right mix. If the pollster has 90% of the respondents are polka-dot, then may not the results be skewed since there is insufficient representation of the larger, striped group? Hence for whichever factor provided the 25/75 split, shouldn’t the polls interview 4 people where 1 is polka-dot and the other three are striped? And then have to also get the right mix for all the other factors?
 
Next is the type of questions, or how the questions are asked, which can bias the results.
Some questions are too vague. Like, do you think “so and so” is doing a good job? Well, maybe I don’t like the outcome, but the person/group is doing a great job trying. So how do you answer such a broad question?
As an example, the company I work for has always done a good job, with on time deliveries, in the top 10% and rated as a Gold Supplier. Then we ran into a problem. Deliveries were going to be significantly late, impacting the customer’s ability to supply their customers. What did our customer do? Awarded us as Supplier of the Year! Wait, why? Well, the “problem” was 40” of water from a once in 500-year flooding that filled our plant. We went into panic mode pulling manufacturing and test equipment from the building after the water receded. Tearing everything apart, washing out the mud, drying the assemblies, replacing parts as needed, and getting over 75% of the equipment operational. We got a sister site to start taking over some of the manufacturing while we found a new facility locally to restart in. So, while the customer wasn’t happy that our deliveries would be impacted, they were very impressed with our efforts to recovery efforts.

Another example of vagueness, I once was asked a question “Would you be willing to spend more for more eco-friendly packaging?” I asked back, “How much more?” The pollster stated it didn’t say. Well, if an item will go from $9.95 to $9.99, sure I’ll pay. If it is going to go from $9.95 to $99.95, HECK NO! But how can someone actually answer that question without knowing how much more??? It reminds me of an issue our DMV got into. The state was going to issue new plates for the vehicles. They did a poll to see how many people would like to get the same plate number that they presently have. The results showed overwhelmingly that people wanted to keep their same numbers. Great! Until they actually came out with the plate. Oh, you want the same number? That’s considered a vanity plate and will cost you over $100 extra. Wait, what? Never mind – just give me a new plate. Unfortunately, the DMV ran into a budget problem that year as they took the poll result and set their budget assuming they were going to rake in all this extra money from the vanity plates….
 
Then there is bias in the questions. Compare the following 3 wordings for a question:
1) Should our politicians get a raise this year?
2) Should our hard-working politicians, who have sacrificed their annual raises for 7 years, finally get a raise?
3) Should our multi-millionaire politicians get a raise while there are still children starving in the streets?

Think the answers may be different based on which version of question was asked? By the way, I was actually asked the third one about the starving children. I told them that the poll was obviously biased and ended the call.

Last, how many people avoid answering polls? Or even go so far as to purposely lie on the poll? I know I finally got frustrated enough years ago, that except for some TRF polls, I will not participate in polls anymore. I got too frustrated with the types of questions, plus not believing the results. But how does that impact my first point on the demographics?
 
And looking at some of the results in the mid-term elections, many results were within the "stated" polling errors. Basically, the polls say they are accurate within 3%, and the election results were within 1%. In this case, the poll error is greater than the actual result which can lead to errors.
 
I think you answered most of your own questions. My best guess is that small samples due to low participation is a big problem. Most people I know don't answer phone calls from unknown numbers and among the people who do, most would not talk to solicitors or pollsters. For other polls, selection bias among participants selecting themselves to participate likely gets views from the fringes but probably doesn't reflect the opinions of the average person.

You do raise a good point with vague questions. It's possible to get the outcome you desire by the way the researcher phrases the question. The problem exists at the hospital I work at too. Tying medical reimbursement to patient satisfaction surveys has been a terrible trend. Unfortunately, many things to make someone better are quite unpleasant and result in poor hotel hospital scores and lower reimbursement rates.
 
The political environment today in the U.S. is pretty intense. From what I recall from the 2020 and 2022 election, the polls weren't that far off and the final results often ended up within the margin of error.

I mean, if Politican A beats Politican B by less than 1%, is a poll wrong when the real-world results are within its margin of error?
 
The sample size was more that enough for accuracy. The error was in the turnout models that the pollsters used to forecast the final result. The turnout model does not come from polling data (directly). Also I think there is more distrust of strangers asking people of their opinion.
 
I think you answered most of your own questions.
True, in fact the poll selections were based on my thoughts.
I do think the main issue is the size of the polls (vs the varied demographics). There are OVER 330,000,000 people in the US. A poll of 1000 people is only 0.000303% of the population. I already discussed how such a small group doesn't even cover the permutations of potential demographics,yet all these polls claim only an error typically in the 3% range. The numbers do not make sense to me....
 
The numbers do not make sense to me....
If you take a stats/probabilities course, you'll learn that there are quantitative techniques to create fairly accurate numerical predictions with deceptively small sample sizes. I can't recall the details, but something about confidence intervals and standard deviation comes to mind.

For example, a 3% margin of error shouldn't be taken literally. Rather, it's most likely something like there's a 95% (2 standard deviation) or 99.7% (3 standard deviation) that the actual results are within 3 percentage points.
 
If you take a stats/probabilities course, you'll learn that there are quantitative techniques to create fairly accurate numerical predictions with deceptively small sample sizes. I can't recall the details, but something about confidence intervals and standard deviation comes to mind.

For example, a 3% margin of error shouldn't be taken literally. Rather, it's most likely something like there's a 95% (2 standard deviation) or 99.7% (3 standard deviation) that the actual results are within 3 percentage points.
I did take a statistics course, but too many decades ago. But if those "quantitative techniques" were able to "create fairly accurate predictions with deceptively small sample sizes", then why in 2020 and 2022 elections were so many polls wrong? Either those techniques are flawed, or the sample size is to small. Or the other factors I discussed have a significant impact. And are those techniques applicable to people vs say a manufacturing assembly process? There will always be a tolerance level during manufacturing that the techniques can compensate, but where/how people feel about a topic is more "touchy-feely" and may require a larger sample. Trying to avoid politics, but I know many conservatives absolutely hated Clinton for 2016, yet many people were concerned about giving Trump a second term for 2020. In both cases, many people crossed lines and may have voted against Clinton/Trump for President, but still backed candidates from the other party for Senate/Congress.
Again, going back to my points on permutations - trying to randomly select possibilities, how a young, poor, conservative, single, Jewish, unemployed, male may consider a topic can be totally different than a middle age, middle income, liberal, married with kids, agnostic, employed, female.
Per my discussion, I came up with nearly 1000,000 permutations of voters. But again, most poll sizes are typically only 1000 people, which is still only 1% of the possible permutations.
 
If you take a stats/probabilities course, you'll learn that there are quantitative techniques to create fairly accurate numerical predictions with deceptively small sample sizes. I can't recall the details, but something about confidence intervals and standard deviation comes to mind.

For example, a 3% margin of error shouldn't be taken literally. Rather, it's most likely something like there's a 95% (2 standard deviation) or 99.7% (3 standard deviation) that the actual results are within 3 percentage points.
The MOE does not account for the turnout model, its only the standard error of the mean on the sample based on the Poisson or Binomial distribution. If the sample were 30% R, 50%D, 20%I (for example) the pollsters transform that result into an assumed electorate distribution. The MOE does not include that assumption error.
 
I completed a Graduate Diploma in Evaluation at the University of Melbourne. Never got to work in that field, but I've been able to apply it to everything I've done since.

Sample sizes are super important, as is form design, data collection and analysis.

A lot of 'issues' in research are, IMHO, related straight back to methodology and attempting to use quantitative methods when a combined qualitative/quantitative methodology is more appropriate. However qualitative analysis is more time consuming (read expensive) and requires a lot of interpretation resulting in a larger research team and increased budgetary needs.

I do however love talking to people responsible for educational programs and watching their eyes glaze over when I bring up reliability coefficients, internal consistency and interrater reliability in their assessment methods.

I don't get invited to many parties...
 
Many good, wise, perceptive comments here. Thanks, folks!

Why do we not have exit polls any more? They used to be a statistically valid double check on the total counts. But they were made illegal around the time the various hackable electronic voting machines came in (2004+).

These days it might be difficult to make any valid post voting survey, because of vote by mail and other security improvements.

(Edit: for the record I am not a denier, don't get the wrong idea)
 
Why do we not have exit polls any more? They used to be a statistically valid double check on the total counts. But they were made illegal around the time the various hackable electronic voting machines came in (2004+).

We have lots of exit polls in Australia; they can give a good early indication of seat swings. There's an election in Victoria today and even though I now live in NSW I'll still watch the election coverage and vote tally tonight.

Like I said, I don't get invited to many parties. 🤣
 
The polls in 2020 and 2022 were not “wrong”. The results were well within the margins of error. Polling in 2018 was very accurate. And even though a lot of people were surprised by the results in 2016, even that election was not outside the margin of error.

Often it’s not the polling that is wrong, it’s the narrative that comes out of interpreting the polling that’s off, and when the results don’t match the narrative, people wrongly blame the polls. Definitely, the 2022 election did not match the prevailing narratives leading up to the election — same for 2016. But in both cases, the final results were pretty close to what high-quality polling showed.

Sometimes what throws the narrative off is low-quality polling done on behalf of campaigns or other biased parties who have an interest in shaping the narrative to their benefit. That’s the goal. A lot of that type of polling is released in the final weeks of a campaign to suggest a change in “momentum” and to drive a narrative of inevitability. Good analysis will attempt to filter for that. But that kind of low-quality polling still often succeeds in shaping false narratives, at least among audiences who are inclined to believe a narrative that matches what they hope will happen. So if you buy into that narrative or look to that kind of polling to confirm your biases, and you end up disappointed, you might conclude all polling is bad when it’s actually just some polling.

Another issue that makes people think polls are inaccurate is the fact that a lot of people don’t really understand that something that is predicted to have a 60% or 80% likelihood is not 100% guaranteed. Analysts like Nate Silver and many others use polling to run Monte Carlo simulations of elections. They take the polling data and its margins of error, and they run hundreds or thousands of simulations randomly tweaking each one within the margin of error and also randomly tweaking the turnout models and other variables, and then they count up the results and they use those results to predict the percentage of each outcome. If they do 1,000 simulations, and 700 end up with the polka dot candidate winning and 300 the striped candidate wins, they say polka dot has a 70% chance of winning. A lot of people take that as almost guaranteed, forgetting that stripes won in 300 of the simulations. That definitely happened in 2016.

And then sometimes voters don’t behave like they usually do, and that’s hard to predict. In the 2022 midterms, the big surprise was the amount of ticket splitting by Republicans. Ticket splitting is not very common anymore. But in 2020 there was significant ticket splitting, with Republicans voting for their Republican Reps, Senators, and down ballot Republicans, but not voting for Trump. And in 2022, Republican ticket splitting was even more pronounced, with Republican voters voting for moderate Republicans but not voting for more extreme MAGA candidates. That’s a kind of voter behavior that might not be predicted by most polling and analysis, although I did see some ticket splitting predictions before the election.
 
Another thing often missed by polling is how much certain issues drive behaviors other than voting that can affect the outcome of elections. For example, polling might be able to determine how many voters are inclined to vote for a candidate, but might not be able to predict how many voters might donate to a campaign or volunteer for a campaign. That’s an important factor. A voter is one vote, but a volunteer might turn out several other voters. One motivated person might provide a LOT more than their own one vote to their candidate.

I think that was a factor in this last election. During the summer, it was pretty clear that reproductive rights (the legal right to choose an abortion and use birth control) was a very motivating issue to certain people. But as the election got closer, polls that asked voters what their most important issue was showed that more than half of voters felt like inflation was their most important issue, and abortion was a top issue for only something like 10 or 15% of voters.

So then a false narrative started to form that the issue of inflation would determine the outcome of the election and the issue of abortion was fading. I never believed that, because the number of voters who rank an issue as their top issue says nothing about how intensely they feel about it or what they are willing to do about it. People upset about their bodily autonomy being taken away feel a lot stronger about it and are willing to do a lot more about it than people upset about inflation. No one decides to donate hundreds of dollars or knock on hundreds of doors because the price of eggs went up 8%. They will do that over reproductive freedom.

And it’s pretty clear from the results, even though more people said inflation was their top issue, abortion made the difference in a lot of these races.
 
I did take a statistics course, but too many decades ago. But if those "quantitative techniques" were able to "create fairly accurate predictions with deceptively small sample sizes", then why in 2020 and 2022 elections were so many polls wrong? Either those techniques are flawed, or the sample size is to small. Or the other factors I discussed have a significant impact. And are those techniques applicable to people vs say a manufacturing assembly process? There will always be a tolerance level during manufacturing that the techniques can compensate, but where/how people feel about a topic is more "touchy-feely" and may require a larger sample. Trying to avoid politics, but I know many conservatives absolutely hated Clinton for 2016, yet many people were concerned about giving Trump a second term for 2020. In both cases, many people crossed lines and may have voted against Clinton/Trump for President, but still backed candidates from the other party for Senate/Congress.
Again, going back to my points on permutations - trying to randomly select possibilities, how a young, poor, conservative, single, Jewish, unemployed, male may consider a topic can be totally different than a middle age, middle income, liberal, married with kids, agnostic, employed, female.
Per my discussion, I came up with nearly 1000,000 permutations of voters. But again, most poll sizes are typically only 1000 people, which is still only 1% of the possible permutations.
I think @ThirstyBarbarian provided a pretty good explanation/summary to your questions/concerns.

You also wrote: "...most poll sizes are typically only 1000 people, which is still only 1% of the possible permutations."

No argument from me there as to your overall point. But it's amazing how a poll with such a small sample size can usually predict to within a few percentage points want the overall group will do. I think you're ignoring how well most polls are doing given the little information they have. Yes, of course the sample sizes should be bigger and further analysis should be done to adjust for biases or other variables. But human behavior and world events as a whole are unpredictable and resources are limited.
 
Just for the record the latest Marist and NYTimes polls released the weekend before the election were pretty accurate. Also the consensus generic congressional ballot polls ranged from +2 to +4 GOP and the actual result was 3.1%. Not bad.
 
A complicating factor is that f people don't trust polls they may not answer the polling questions honestly.

In addition, creating a good list of polling questions can be rather difficult. You need to ask enough questions to get good demographic information (age, gender, political affiliation, etc.), but not so many questions that people get annoyed and hang up on the pollster. You also need intelligent questions that don't annoy the responder.

And sample size is probably not well understood by non-statisticians. A poll of 1,000 people doesn't mean that the pollsters questioned 1,000 people. Your poll needs a representative sample of the population that you are looking at. And the demographics need to be applicable to the polling questions.

For example, let's say you want to know if people in the US love or hate the Dallas Cowboys. What factors do you need to look at? Is gender an important factor? Maybe. Is religion a factor? I seriously doubt that. Is geographic location a factor? Absolutely. So, one set of demographic questions does not fit for every poll. You generally need some insight into both the expected results and your population to properly design a poll and its questions.
 
Your poll needs a representative sample of the population that you are looking at. And the demographics need to be applicable to the polling questions.
To build off of what you've said, pollsters need to be careful because in attempt to get a "representative" sample size, you can also inject your biases and assumptions that can skew results.

Sometimes, creating a sample in a purely random way may be the best option.
 
Marist generic ballot poll projection result.
1669477594343.png
Actual result (as of today) R + 3.1%
 
Have you seen 2000 mules?


Actually, I did watch it (well, most of it). I felt there were various issues in their methodology and obviously had an agenda.
In terms of split tickets, I consider myself "middle of the road". I will vote for a moderate politician (of either party) over an extremist from the other party (again, of either party).
But I agree, my goal was to understand why in 2020 and 2022 there were so many surprises and I do not want to go down the political path related to 2000 mules or other voter fraud discussions.....

I think Thirsty's comments about many low quality polls used to create a narrative for the various talking heads is germane.

Related to that, if there is a close race, and polls show the spotted candidate is slightly ahead, does that cause the striped base to work harder to catch up? Or if the polls are strong for the spotted candidate, might more of those supporters actually skip voting because of a "sure thing" (think Dewey vs Truman)?

Do poll results (or at least the narratives) create a feedback loop that then influences voters and impacts the poll results?
 
Have you seen 2000 mules?



One clear result of the 2022 midterms was that candidates who promoted 2020 election denial, conspiracy theories, support for January 6th insurrectionists, or the idea that if they won they would somehow overturn or “decertify” the 2020 results or refuse to certify future results — those candidates did not poll well leading up to the 2022 midterms, and they lost.

I don’t think a single candidate promoting these ideas while running for statewide offices like Governor, Lieutenant Governor, Secretary of State, Attorney General, or Senator won. They all lost, and many of them lost very winnable seats that their party would probably have won with less wacko candidates. Polling showed that voters are sick of the BS and lies, and the election results bear it out. That’s another issue where you can see ticket splitting in the 2022 election where an election denier lost while another less cuckoo candidate from the same party won in the same state. Deniers underperformed non-deniers. It’s clearly a losing issue.
 
Polling is hard, it's easy to suck at it OR give the results the customer wants. Several know how to do it right.
 
Back
Top