Why AI is struggling to detect hate speech

hate speech angry emoji
Image credit: Depositphotos

This article is part of Demystifying AI, a series of posts that (try to) disambiguate the jargon and myths surrounding AI.

As online trolling and hate speech are becoming more problematic, companies like Facebook and Twitter are under increasing pressure to identify and block hateful speech on their networks. And like many other problems that involve the massive amounts of online content, these companies have turned to artificial intelligence for solutions.

All major social media networks use AI algorithms to moderate online content. But while AI shows promise in detecting some type of content, it is hard pressed when it comes to spotting hate speech.

A recent study by scientists at University of Washington, Carnegie Mellon University, and Allen Institute for Artificial Intelligence, has found that the leading AI systems for detecting hate speech are deeply biased against African Americans. This includes Google Perspective, an AI tool for moderating online conversations.

The study and the unending struggles of tech companies to automate hate speech detection highlight the limits of current AI technologies in understanding the context of human language.

Understanding language context is hard

Robot sitting on a bunch of books. Contains clipping path

Advances in deep learning have helped automate complicated tasks such as image classification and object detection. Artificial neural networks, the key innovation behind deep learning algorithms, learn to perform tasks by reviewing examples. The general belief is that the more quality data you provide a neural network, the better it performs. This is true, to some extent.

At their core, neural networks are statistical machines, albeit very complicated ones. This might not pose a problem for image classification, which is largely dependent on the visual features of objects. For instance, a neural network that is trained on millions of labeled images creates a mathematical representation of the common pixel patterns between different objects and can detect them with remarkable accuracy.

But when it comes to natural language processing and generation (NLP/NLG), machine learning might not be enough. There are still plenty of things statistical representations can do. There are several cases of AI models translating text with impressive precision or generating coherent text. But while those feats are remarkable, they barely scratch the surface of the human language. These AI models perform their tasks by calculating the possibility that words appear in a certain sequence based on the examples they’ve viewed during training.

Hate-speech detector AI model draws their training from data sets that only include sample sentences and their corresponding toxicity score. In their studies, the authors used publicly available AI models that have been trained on millions of annotated tweets and other social media posts.

But statistics do not represent context. When our brain wants to interpret a sentence, we don’t only consider the sequence of words and how they compare to other sentences we’ve heard before. We also take into account other factors, such as the different characteristics of the person who is speaking. Hearing a sentence from one person might sound offending, while another person saying the same thing might be totally fine.

In their study, the researchers from Carnegie Mellon, AI2 and U of Washington show examples of sentences that would sound hateful and racist if said by a white person but acceptable if a black person said it.

AI hate speech
Depending on who is saying a sentence, it may sound toxic or not (source: University of Washington)

The authors suggest that the people who annotate the data should know about the demographics and characteristics of their authors. This will help them improve the quality of the data sets and train AI models that are much more accurate.

It’s hard to agree on what is hate speech

Annotating the data set with relevant meta-data sounds like a good idea, and the results of the experiments show that it reduces bias in the hate-speech-detection AI algorithms. But there are two problems that would make this solution incomplete.

First, annotating training data with relevant information is an enormous task. In many cases, that information is not available. For instance, tweets don’t contain information about the race, nationality and religion of the author, unless the user explicitly state that information in their bio. Some of that information can be inferred by looking at the timeline of the user and other content they have posted online. But finding and annotating that kind of information is much more difficult than labeling cats and dogs in photos.

But even adding author information would not be enough to automate hate speech. Hate speech is deeply tied to culture, and culture varies across different regions. What is considered hateful or acceptable can vary not only across countries, but also across different cities in the same country. And the relevance of things such as race, gender and religion can also vary when you go do to different geographical areas. And culture is something that changes over time. What is considered the norm today might be considered offending tomorrow.

Hate speech is also very subjective. Humans of similar backgrounds, races and religions often argue on whether something is hateful or not.

It’s hard to see how you could develop an AI training data set that could take into account all those factors and make sense of all these different complicated dialects we’ve developed over thousands of years.

When it comes to vision, hearing and physical reflexes, our brain and nervous system are much more inferior to those of wild animals. But language is the most complicated function of our brains.

All animals have some sort of way to communicate together. Some of the more advanced species even have rudimentary words to represent basic things such as food and danger. But our ability to think in complicated ways and communicate knowledge, opinions and feelings gives us the edge over all other living beings. Neuroscientists still haven’t been able to find out the exact mechanisms of formation and interpretation of language in the human brain.

Many companies think they can outsource their NLP tasks to outside contractors, hoping that human labor will train their AI and eventually create a fully-automated system.

But it’s difficult to imagine anything short of a large-scale human brain being able to make sense of all the different nuances of the diverse languages of the people who inhabit this planet. For the moment, our AI algorithms will be able to find common patterns and help filter down the huge amounts of content we create, but we can’t remove humans from the loop when it comes to detecting hate speech.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.