Tag Archives: Data mining

2016 USA Presidential Election Forecast

By definition, the forecast of the election results is something extremely difficult. A reliable forecast does not simply consider opinion polls but it should be able to also consider the impact of historical, social and economical variables combined with various factors such as the “possible behavior” or “psychological reactions” of voters.

There is always the real risk of not considering or underestimate some essential variables that will affect the decisions of voters just on the election day. 

The graphs below are based on data from sites commonly considered as reliable and trustworthy but, in no case these charts can be regarded as scientific or reliable and are merely the result of a data processing described in a post published yesterday via Medium.

As empirically described by the Technical University of Munich through the paper “The mere number of tweets reflects voter preferences and comes close to traditional election polls”, the below analysis assumes the existence of a direct relationship between the number of tweets generated during an electoral contest by a candidate and the final election results.

To mitigate the supposed direct relationship between the number of tweets and the final electoral results I considered other data variables as described in the post published yesterday via Medium.

 united-states-map-3

 

totali

 

totali2

 

How to dig more information on Wikipedia using Google

WikipediaWikipedia contains many information about million of topics but each single thematic page doesn’t contain all the possible info or links about that specific subject. In fact some other details about a topic, or related to it, can be in other parts of Wikipedia not linked to the main topic.

In my experience, sometimes you can find some really interesting details about a topic if you simply use a Google query as:

site:en.wikipedia.org “Chet Baker”

Where the topic is “Chet Baker” and it is searched by Google into the entire Wikipedia.org website. If you deeply peep the results you can now find some information not contained in the “Chet Baker” page in Wikipedia.

These tip is really simple but I think it could be useful for journalists, data miners or for all that people who are not satisfied by a simple Wikipedia search.  AddThis

Strategic market analysis through Twitter – Know your competitors’ strategy and fight them!

Your competitors are on Twitter and they are very active. Good to know but is their strategy really effective? Difficult to know because normally you have not enough time to constantly monitor more than a couple of them. A good market analysis strategy is to read their posts in a certain period of time but Twitter doesn’t keep this easy because you have to spend hours to expand their tweets before have a good view of what information your competitors wrote. Moreover, Twitter allows you to view only the last 3,200 tweets. For this reason I normally use Twitter XL. This online free service allows you to catch and save the last 3,200 tweets just inserting the Twitter name and click on Get Tweets. After a bunch of seconds (or, more often, some minutes) you will have a complete list including the last 3,200 tweets  (retweets included) and you can export them in a CSV format. This file can be imported in a spreadsheet (e.g. OpenOffice) and clearly visualized. If you really know what you are looking for, you have a terrific picture about timing, source, contents and trends of the competitor you are monitoring. Not only you can discover when and about what your competitor is more active but you will also know if he/she is using some particular online service to tweet. This data mining tool is recommended for business or just for fun if you are a meddler. AddThis mp3 link