The polling industry called the EU referendum wrong and was out in the last UK election. Eyes are on the upcoming US election to see if anyone will call it right.
Many, political experts included, are quick to tell us that this election is different, and much harder to call. This is true, but part of the reason it is different is because the world is different than it was, even four years ago. People gather, think, communicate, consume information and are influenced very differently. Online communities are very different to the captive TV audiences of the past; ideas and political statements are no longer one way; broadcaster to voter. Opinions and debates are taking place online and opinions are challenged, altered and swayed in almost real-time.
Poll based forecasting was developed in a world where individuals, groups and demographics were more simply defined. Certain age groups, geographies, and education levels, voted consistently. Polling, voting data and commentary from previous elections provided the context and data to build a forecast. Phone a thousand people, apply some defined weighting, and out pops your result prediction.
Traditional polling is not that easy any more. Vast swathes of voters aren’t contactable by phone or interested in taking online surveys. People change their minds more readily. Long-standing party loyalty is waning, as are communities that existed around political ideologies. People read multiple news sources and are as likely to be influenced by a Tweet from the other side of the world as by their parents, peers or political commentary. So, do traditional polls offer less value to forecasters and if so what can they do?
The information explosion: The source of, and solution to, the polling problem
If you were to design a perfect method of predicting how people would vote, you would want to ask the view of every voter individually. That would of course be more difficult than running an election campaign itself. But vast numbers of people now make their views and opinions public, and almost everyone makes up their mind based on publically available information – so we have a lot of data to work with.
Collating all this information and reaching any kind of meaningful conclusion would have been impossible all but a few years ago. Modern analytics now allows us to blend lots of data from lots of different environments and use machine learning and AI techniques to convert this vast amount of messy data into insights we can understand and make informed decisions upon.
Social media is an obvious source of such data. Looking at Twitter, Facebook, Instagram and Reddit, even Tinder, can give us a great deal of information on opinion and how people will most likely vote. Newspapers, blogs and media comment sections are also a lucrative source of information and unlike most internet forums can reflect the views of an older generation. However, the challenge of any prediction from data is to make sure that your data is representative, this may not be possible using public data streams.
Many analyses of this information sources look for key words, which only tells us how many people are Tweeting about Trump or Hillary, not whether they will actually vote for them. However, with correct training, AI can be used to spot positive and negative sentiment, whether a comment was an outlier, and even sarcasm and satire. In this way AI builds a picture of voting intention of large communities within social media users. This isn’t something which will happen far away in the future, there are already companies out there who are applying these techniques.
Social media obviously cannot provide a complete picture. Polls still have a value as they ask a direct question about intention. Older demographics of voters, who use social media less, are still reasonably willing to answer the phone, so for this group phone surveys are particularly useful.
None of these data sources alone has all the answers, but together they build up a much more accurate picture. Where AI really offers something different from previous approaches is the relative ease with which it can bring lots of different messy data sources together and compare them. With the right training, AI can break down data into different demographics, and apply sentiment analysis to gain insight into how different groups will vote.
Historical data is also important. Polling, voting data and commentary from previous elections provides the context and training data used to train machine learning models and AI to recognise human behaviour patterns, indicating how people express opinions and vote.
There are warnings and caveats however. Firstly, machine learning models can only work within the context they are provided with. They will not be able to foretell a major political upset which could suddenly change a lot of minds – a candidate scandal for example.
Secondly, getting it right is complicated. It requires data scientists who can develop the sophisticated models and train them to a point where they can start digesting publically available information on their own and produce meaningful predictions.
But – and this would not be true as little as four years ago – it is possible. Data analytics, machine learning and AI, can now be employed to blend lots of different data sources and develop more accurate predictions than traditional polls alone. Similar approaches to understanding human sentiment across large groups, based on public data, are already used in the finance, fraud prevention, retail, and investment industries, so we are not talking science fiction. WikiLeaks also recently revealed that the Hillary Clinton campaign was at least investigating the use of analytics as part of its campaign.
Humans are complex and any computer model will contain uncertainties. But a well-trained AI model may now be able to comfortably out-predict traditional experts. It will very likely play a part in political forecasting in future; how much so, depends on the appetite of forecasters. AI is being used to combine diverse data sources and derive valuable insight in all sorts of industries. There’s no reason polling shouldn’t be one of them.
Tessella, Altran’s World Class Center for Analytics, is part of the Altran Group, a global leader in Engineering and R&D Services. Tessella uses data science to accelerate evidence-based decision making, allowing businesses to improve profitability, reduce costs, streamline operations, avoid errors and out-innovate the competition.
Made up of innovative problem solvers and passionate about science and technology, Tessella is committed to excellence and to its clients’ success. Tessella works at delivering innovative and pragmatic answers to some of the most important and ambitious challenges of the world’s most forward-thinking organizations.