It’s no exaggeration to say that fake news is the scourge of our time. But figuring out which tales are true and which are out-and-out dishonest is no small feat. Of course, you can always turn to fact checkers like Snopes or Politifact, but they work on a story-by-story basis. Now, thanks to a new method from MIT, you can cut the problem off at the source.
Digital Due Diligence
Online media companies — especially social media companies like Facebook and Twitter — seem to be taking the threat of fake news seriously, although the ways that they have attempted to address it have often been ineffective or have even exacerbated the problem. Facebook, specifically, has attempted both human-moderated news filters and AI-based ones, with controversial success in both cases. It’s one thing to identify a particular fake story or even a particular source that tends to publish many fake stories, but with new sources sprouting like mushrooms, even cutting off a particular source can only do so much.
Enter MIT. In order to find some rhyme or reason in the flurry of fake news online, researchers from MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) and the Qatar Computing Research Institute (QCRI) employed machine learning systems, which can make surprisingly accurate predictions based on surprisingly little data. In simplest terms, a machine learning system works by attempting to make a prediction, then adjusting its next prediction based on the accuracy of the previous one. The result is that after having run through this process several thousand (or million) times, the system develops a very accurate — although often opaque — model for getting the right answer.
To ensure their system’s accuracy, the researchers had to rely on some sort of reliable metric for calling out fake news. For that, they turned to Media Bias/Fact Check (MBFC), a resource that uses human fact checkers to track the biases of more than 2,500 media websites, ranging from major players like Fox and MSNBC to low-traffic content farms (you know, the kind with URLs like http://www.patriot-news.eagle). They’d feed a selection of the websites’ articles into their system, then check off if they made the same call as MBFC. Every wrong answer got the system closer to accuracy, and every right answer showed the system was on its way.
Widening the Net
In their paper, which isn’t yet published in a peer-reviewed journal, postdoc Ramy Baly and the rest of the team describe how their system currently works. Says Baly, “If a website has published fake news before, there’s a good chance they’ll do it again. By automatically scraping data about these sites, the hope is that our system can help figure out which ones are likely to do it in the first place.”
At this point, it only takes about 150 articles from a particular source to make a judgment call. That’s enough for the system to assign a high, low, or medium degree of trustworthiness with about 65 percent accuracy, and also to detect if it’s left-leaning, right-leaning, or moderate with 70 percent accuracy.
As it turns out, the factors that revealed the most about a source’s accuracy were its linguistic features, such as sentiment, complexity, and structure. Fake news sites relied on hyperbolic, subjective, and emotional language, but the words weren’t the only features that set them apart. The system also found fake news sites tended to have shorter Wikipedia pages, with more references to phrases like “conspiracy theory” or “extreme.” Even the URLs could be a giveaway: special characters and complicated subdirectories were more commonly found with unreliable sources. No more .eagle sites.
While the public can’t take advantage of the system at the moment, the researchers hope to release an app that helps people leave their political bubbles by giving them articles across the political spectrum for any given news story.