I've written a few posts now on GDELT -- Global Data on Events, Location, and Tone -- the recently-created auotmatically updating database of more than 200 million political events culled from news accounts.
Over at the Monkey Cage, computer scientist David Masad has a guest post testing out GDELT observations on violent incidents in Syria against volunteered reports of fatalities. He found that "the two trends initially move together: an increase in violent events accompanies an increase in reported deaths, and vice versa," but "the correlation between the two data sources seems to weaken in 2012."
This would seem to support Jay Ulfelder's argument that GDELT is vulnerable to the problem of "media fatigue": " press coverage of a sustained and intense conflicts is often high when hostilities first break out but then declines steadily thereafter." As the level of coverage drops, there are fewer reported events for GDELT to record, even though real-world events are still happening.
I spoke with Kalev Leetaru, one of GDELT's developers, about this issue back in April. "As quality journalism is under attack from all sectors, whether that's government stepping up efforts to squelch it or the collapsing economics of it, we're starting to look at all the citizen journalism that's out there," he told me. On the other hand, social media and participant-reported news creates more even quality-control problems.
GDELT's reliance on the media doing it's job well is clearly going to continue to be a problem going forward, particularly with regions that don't get as much attention. Even with falling coverage, GDELT does a remarkably good job monitoring events in a well-covered conflict like Syria. I'm a little more curious to see how it does on, say, Nagorno-Karabakh.