Tuesday, 7 May 2013

How do we make math out of Web 3.0 numbers?

I am deeply suspect of all of formulas being used on the web.  I suspect that the formulas for page placement in Facebook or Google or spam detection are more self fulfilling than real.  But I also have seen enough evidence to show me that social media activity is real, that is reflects in many cases real world activity, and since much of real world activity can be counted and social media can be counted there should be some basis for mathematics relating the two.

Elections Case Study

Using the amazing Trendsmap tool I have made a serious of maps of which political parties people in the UK are tweeting about on May 2 2013, at noon, during a bi-election in mostly rural areas.

From these three maps above we might anticipate that UK and Labour would fight to win, and the Tory party would be far behind.  But that is not really what happened.

Liberal Democrat00352-124
United Kingdom Independence Party00147139
Above we see the councils and seats won.  The Conservative party, almost always called Tory, actually won twice as many as Labour and Labour won about three.  So what gives.

Lets assumed that we have a count of tweets for a party called T and a count of votes called V.

Clearly it is not the case that:

V = bT

Where b is some constant.  Tory party and Liberal Democrats both got many more votes than UKIP but UKIP got the tweets.  But it is not really fair to say that twitter did not predict anything.

Rather let us refine our mathematics.  Votes are a function of V' + dV that is the votes of the last election plus a delta of new of lost votes.  Now if we define 

dV = bT

That is change in votes is a function of tweets we are getting a better picture of results.  UKIP and Labour dominated twitter and though they both lost to the Tories they gained the most new votes.  

But even this number is not entirely correct.  Labour pick up many more votes than UKIP and yet the buzz on twitter was heavily for UKIP.  Also the Tories lost more votes than Liberal Democrats and yet the buzz around LibDems was really pathetic.

Obviously buzz is a function of current vote performance to previous votes

V = V' + dV
dV = b(T/V')

Therefore I come to my first guess of the formula to predict election outcomes:

V = V' + b(T/V')

That is the votes in a current election are the function of the votes in the last election, times function of the tweets for a party divided by there last turnout.

Of course this is clearly not correct either, but getting closer.  The formula can't be linear.

But before I spend too much time on this I have to point out that even if I make this work for the recent UK bi-election it would not work for the 2012 US election, where Obama had a almost 2 to 1 margin over Romney on Twitter and yet actually won by a smaller margin than he had in 2010.  My formula would have predicted Romney have a margin.

So we are in real infancy here on election prediction, and predictive betting markets remain the best tool on the web to call an election.

No comments:

Post a Comment