The American election and Donald Trump’s antics in particular has allowed for a lot of slack-jawed entertainment, and in big data you’d hope to find the raw facts. Taking the hype out and paying attention only to the facts should get to some sort of objective view of where the votes are likely to be going to place their support on 8 November, no?
Sadly the answer appears to be “no”. According to a study from the Project for Computational Propaganda (here’s a link to a Prezi on the study from Politicalbots), both sides have automated “bots” posting to social media on their behalf, so any instant objectivity is going to be skewed.
On the bright side, it looks as though the majority of Tweets are coming out in favour of Trump and this doesn’t matter because even without the bots the result would be the same. It does, however, illustrate one of the issues with big data; the data itself can be manufactured artificially.
Big data building in bias
It will be under a month before we know whether the apparent bias in favour of Trump is going to translate into real-world votes. The fact that there are more Tweets in his favour than in Clinton’s, however, is open to a number of interpretations. First, it’s possible he’s just more popular. Second, it’s possible that his voters are more emotional or at least more likely to Tweet. It certainly appears to be the case, according to the data released so far, that his supporters (automated or otherwise) have a better idea of how to handle a hashtag and get retweeted.
The first of these possibilities would translate into votes in the real world. The second and third might influence votes in the real world but it’s not certain. As a means of forecasting, it’s flawed at best.
Lessons for outsiders
Inevitably this has to lead to some sort of questioning of the value of big data in the first place. This isn’t to say it has no value but that the caveats and “normalisation” of any data need to be sophisticated if the data is to be any use.
First, the presence of the automated bots in the presidential race indicates that it’s highly possible to fabricate data in the first place. Depending on the industry in which a marketing professional operates, anything from a slew of fake TripAdvisor reviews to a range of automated bots claiming to be supporters of a politician, football team or anything else, can distort the input data.
Interpretation is also a point at which flaws can crop up. Eliminating likely bots from the study (and it’s not possible to identify positively which support is coming from bots and which is not), it’s likely that Trump is ahead on social media. However, as already stated, this may not turn into actual votes. In the same way, an awful lot of people will hit “like” on a page for different reasons. They may “like” a Bentley without any realistic prospect of being able to afford one; sticking with the car example, masses of people may “like” a Ford – and then go and buy one second-hand, leaving the manufacturer’s coffers largely empty. And of course there are the people who like anything if there’s a prize draw for an iPad at the end of it.
None of this invalidates big data. It does suggest that a better name might be “big and potentially misleading data”.
Meanwhile the only data that’s going to matter in the presidential election will be available in around three weeks – the vote is on 8 November.