Mapping the world with Tweets

A new paper on the peer-reviewed online journal First Monday summarizes the results of a project to use geographic data gathered from Tweets to create a picture of the world according to Twitter.

The researches, led by GDELT co-creator Kalev Leetaru, used the Twitter decahose, a massive feed of 10 percent of all tweets, access to which is normally sold at high price to marketers. The project covers the period of the Oct. 23, 2102, to November 30, 2012. During this time, 1,535,929,521 tweets were streamed from 71,273,997 unique users -- about 2.8 terabytes worth of data. But only about 3.04 percent of those contained geolocation data -- either exact coordinates from mobile phones or user-selected locations. All the same, that's an awful lot of geographical information, and allowed the authors to create this map of a month in the life of Twitter (Bigger, high-resolution version here):

Of course, the most conspicuous black spot is the world's most populous country, China, where Twitter is banned. Twitter users in China use advanced VPNs and often change their identified location, which would throw off the image. 

If you zoom in on a region with a high rate of Twitter use, you can get a fairly accurate picture of population density, similar to what you would see in a satellite photo showing electric lights at night. Here's the United States:

This map (full size), shows the tweets color-coded by the language they're written in. Note the colorful dots showing non-English tweets in the United States:

This one (full size) just maps Tweets in English to give a picture of the Anglophone Twitterverse. Note the heavy concentration of English-language tweeting in Japan:

They also looked at how representative Twitter is compared to the mainstream media. In the following map, (full size) locations mentioned in the Google news RSS stream over the period studied are noted in red, while Tweets are blue. The authors write that "Mainstream media appears to have significantly less coverage of Latin America and vastly better greater of Africa. It also covers China and Iran much more strongly, given their bans on Twitter, as well as having enhanced coverage of India and the Western half of the United States.":

Finally, they examined the geography of retweeting. It turns out people are surprisingly unencumbered by physical distance when it comes to their retweets. The study finds that "average distance between all 32.5 million retweet pairings in which both users have known Exact Location positions is 749 statute miles." For reference, that's roughly the distance from New York to Atlanta as the crow flies. Here's a map of those retweets (full size):

The plenty more in the paper itself, including list of the world's most retweeted cities. Not surprisingly, New York City is number one, but I was surprised to see that my current hometown, Washington D.C., didn't even crack the top 20, which includes some seemingly unlikely places as Riyadh, Porto Allegre, and San Antonio. Guess we're not the center of the world after all.