Researchers have recently discovered that anyone can trick hate speech detectors with simple changes to their language—and typos are just one way that neo-Nazis are foiling the algorithms.
By Morgan Meaker – Erin Schrode didn’t know much about the extreme right before she ran for Congress. “I’m not going to tell you I thought anti-Semitism was dead, but I had never personally been the subject of it,” she says.
That changed when The Daily Stormer, a prominent neo-Nazi website, posted an article about her 2016 campaign.
For years, social media companies have struggled to contain the sort of hate speech Schrode describes. When Facebook founder Mark Zuckerberg spoke before the Senate in April of 2018, he acknowledged that human moderators were not enough to remove toxic content from Facebook; in addition, he said, they needed help from technology.
“Over time, we’re going to shift increasingly to a method where more of this content is flagged up front by [artificial intelligence] tools that we develop,” Zuckerberg said.
Zuckerberg estimated that A.I. could master the nuances of hate speech in five to 10 years. “But today, we’re just not there,” he told senators.
He’s right: Researchers have recently discovered anyone can trick hate speech detectors with simple changes to their language—removing spaces in sentences, changing “S” to “$,” or changing vowels to numbers. more>