Exit polls: It’s not about right or wrong

There is quite a bit of discussion on whether the exit polls are right or wrong (see here, here, here and here). What I try to do here is to show that it is not about right or wrong. Rather it is a matter of the magnitude of the what is called right or wrong which I refer to as (a margin of) error. Once we begin to think of the numbers produced by exit polls as estimates (and not the Truth), a different set of questions arise. Questions that are fundamental to understanding numbers presented by the exit polls mainly the process through which numbers are generated? More specifically, Who is sampled? When? Where? How? and by Whom?

I use the data published by The Print on May 19[1]. Figures 1 and 2 shows both the predicted and actual number of seats for the NDA and UPA respectively. There are two notable observations.

The first, unsurprisingly, is that each poll across the years is off the actual seats won by either of the two parties. However what varies is how much or the magnitude of how much each of the polls is off the actual mark.

And the second, is that in any given year while the margin of error varies between the different sources, the direction of error i.e., whether they overestimate or underestimate is identical. For example in 1998, Outlook/AC Nielsen estimate was 14 seats less than the actual number of seats won by the BJP. Along the same lines, DRS, Frontline/CMS and India Today/CSDS predicted 3, 17, 38 less. Similarly, in 2014, all sources overestimate the number of seats Congress would win. The estimates 39 (ABP News/ Nielsen), 90 (Times Now/ORG India), 44 (CNN-IBN CSDS Lokniti), 21 (News 24/Today’s Chanakya), 43 (India TV/C Voter), 62 (India Today/Cicero) and 45 (NDTV/Hansa Research). When one (over/under) estimates, everyone else seems to follow suit! Perhaps this is suggestive of comparable biases across each of the sources which then get reflected in how surveys are executed. F

Figure 1: Predicted and actual seat count for the BJP by different polls across 5 Lok Sabhas
Figure 2: Predicted and actual seat count for the Congress by different polls across 5 Lok Sabhas

Now that we have established that we are looking at variation in errors, in the next set of figures, I attempt to shed light on whether these errors are getting larger or smaller across time. In Figure 3, the average difference across sources in a given year has been plotted. Figure 4, shows the absolute average difference from Figure 3. There is one key observation which I will discuss.

There is no discernable pattern in the errors across time. We cannot say that the errors in predicting seats won by BJP or Congress over time are increasing or decreasing i.e., there is no linear effect and we cannot claim that polls are becoming more or less accurate. But based on the data, what we can say is that in some years, the average error is smaller while in others its larger. For instance, the average difference across the sources was minimum for Congress in 1998 (underestimated by about 10 seats) and maximum in 2009 (underestimated by 62 seats). Similarly, the average difference across the sources was minimum for BJP in 1998 (underestimated by 18) and maximum in 2004 (overestimated by 76).

Given the figures, it becomes important to inquire into the source/s contributing to the error. We need to ask questions such as whether the method through which data is collected i.e., survey, has changed across years and if yes how does that play a role? Specifically, how are samples drawn? Or are those who are surveyed responding differently across the years and why is that the case? For instance, does a large error mean that voters are reluctant to reveal their true preference because of fear as suggested here?

Figure 3: Average difference across sources between predicted and actual seats for BJP and Congress
Figure 4: Absolute difference across sources between predicted and actual seats for BJP and Congress

In conclusion, we need to be wary of our own biases which in this case is to refrain from seeing patterns where they do not exist. And the second, we need to treat numbers produced by polls as estimates and not truth. Statements such as polls are wrong or right are neither helpful nor useful. All they do contribute to polarized opinions. What is helpful and I think useful too is thinking through how numbers are generated and what they could mean given the context.

[1] The data for this comes from here: https://theprint.in/opinion/4-health-warnings-you-need-to-know-before-watching-exit-poll-results-2019/237238/