What is explainable artificial intelligence?

Explainable AI - examining the data behind automated decision
Source: Depositphotos

This article is part of Demystifying AI, a series of posts that (try to) disambiguate the jargon and myths surrounding AI.

If your bank declines your loan application, the person who made the decision must be able to provide an explanation of the process and information that led to the rejection. But we are increasingly entrusting our decisions to artificial intelligence algorithms, and in many cases, we don’t know how those algorithms function.

The difficulty in interpreting AI-based decisions has become an aggravating concern, especially as AI finds a prominent role in critical domains such as medicine, law enforcement, banking and transportation. The lack of visibility into how AI algorithms work raise challenges, especially in fields where wrong decisions can cause irreparable harm or where regulations legally allow individuals to challenge automated decisions and require decision-makers to provide explanations for their actions.

Known as the “AI black box,” the problem has spurred several scientists, startups and companies to find ways to create explainable artificial intelligence, AI that explains its decisions or is open to investigation.

Why is it hard to interpret AI decisions?

Source: Flickr (Penn State)

Firstly, artificial intelligence is a loaded term and encompasses a lot of different technologies, and not all of its subsets suffer from the black box problem. What we’re specifically concerned with is deep learning and neural networks, a branch of AI that has become very popular in the past few years.

Neural networks are software constructs named after the human brain functionality—though we know that they are fundamentally different from the way the human mind works. Unlike traditional software, where human programmers meticulously write all the rules of an application, neural networks and deep learning algorithms are given relevant sets of data and left to create their own rules. The most basic way to train a neural network is to feed it with a set of labeled examples, a process called “supervised learning.” It then finds common patterns and correlations between those examples and uses that information to categorize unlabeled examples.

For instance, when you give a neural network a million labeled pictures of different cat species, it runs those images through layers upon layers of artificial neurons and “learns” what are the common traits of cat pictures. It will then be able to spot cats in new pictures it has never seen before. This is an approach that is extremely effective in tasks that are hard to define with classic rules, such as image classification and voice recognition.

However, since neural networks effectively create their own rules, we can’t know for sure the details of their inner workings. For instance, both a human and a neural network might say the animals in the following picture are cats, but not for the same reasons.


Humans will tell you that cats have pointy ears, triangular noses, whiskers, furry tails… A neural network has no understanding of the different body parts of a cat. It can only tell with a high level of confidence that these pictures contain cats because the structure and color of their pixels resembles the millions of cat pictures it has previously seen.

This is the basic reason why neural networks are opaque. They are black boxes that receive inputs and produce outputs without giving explanations. This is all fine while everything works perfectly. But things can turn problematic when mistakes happen—and mistakes do happen.

Why is explainable AI important?

The human mind is not perfect, a fact that everyone agrees on. We make mistakes all the time, but we can explain the logic behind those mistakes. For instance, a human might mistake a person from someone else and say that from that specific angle, the facial features along with the physical stature made them make the erroneous distinction.

Like humans, neural networks make mistakes. But because of their underlying differences, their mistakes are dramatically different from those of humans, and we can’t know how they fail. For instance, they might mistake a turtle for a rifle, a cat for guacamole or a panda for a gibbon. What makes this significant is that no human would make the same mistake, and that is because as previously mentioned, our visual system is fundamentally different. This means that, without being able to interpret the way AI algorithms work, we have no way to anticipate how the AI black box will fail.

turtle rifle
A neural network thinks this is a picture of a rifle. The human vision system would never make this mistake (source: LabSix)

Unexpected mistakes can range from annoying (bad results in image search) to embarrassing (labeling dark-skinned people as gorillas) to harmful (an autonomous vehicle crashing into a tractor-trailer because of AI failure).

Moreover, our AI algorithms might develop hidden biases and manifest sexist or racist behavior and cause emotional, financial or legal damage to their users. On a more critical level, hackers can use the blind spots of neural networks to stage “adversarial attacks,” or force them to work in malicious ways, such as tricking an autonomous vehicle into bypassing a stop sign or a facial lock to mistake some random person for the real owner of a device and give them access to its data.

All of these accidental and intentional mistakes will remain under a shroud of mystery so long as we don’t have ways to explain and investigate decisions made by artificial intelligence.

The legal requirements of explainable AI


Aside from the mishaps, the use of non-explainable AI presents challenges in domains that explicitly require all decisions to be retraceable. An example is law, where the defendant should be able to challenge the decision made by the court. If judges use automated software based on neural networks to pass judgement—even if they use software as a guide for suggestions—they should be able to explain the motives and facts behind their decision.

Other areas include medicine, where doctors should be able to investigate and cross-examine any therapy and treatment suggestion that a deep learning algorithm might suggest. In loan application and banking (mentioned at the beginning of the article), an applicant whose request has been rejected is entitled to seeing the details of the process that led to their application being turned down.

There are also an increasing number of regulations that are restricting how companies and organizations make use of automated decisions, especially as data collection and mining practices by tech companies have been at the center at a number of widespread scandals.

One very important example is the European Union’s General Data Protection Rules (GDPR). The GDPR states: “The data subject shall have the right not to be subject to a decision based solely on automated processing.” The key here is the word “solely,” which means that if an organization wants to use automated decision-making (say for processing loan applications or employee salary adjustments), a human must examine and confirm the conclusions that an AI algorithm reaches before implementing them. But that person can only vet the automated decision if they can retrace the steps that led to it.

Another example is the New York City Council’s algorithmic accountability bill, which will require the validity and fairness of automated decisions to be verifiable. We are likely to see more of such regulations as deep learning and other AI techniques find their way into different aspects of our daily lives.

The different approaches to creating explainable AI

deep neural networks

Explainable AI is still an evolving field and scientists are trying to find different ways to make neural networks interpretable without compromising their performance. We have yet to see standards emerge, there are however several interesting initiatives that are aimed at creating explainable AI and keeping track of where the defining attributes for an automated decision come from.

Prominent among them is the Defense Advanced Research Projects Agency’s Explainable Artificial Intelligence project (XAI). DARPA, which is the research arm of the U.S. Defense Department, is exploring interpretable AI as part of its drive to increase the use of automated tools in various areas of the military.

There are also a slate of AI startups and large tech companies that are working on the subject. As we have discussed in these pages, opening the lid of the black box will be a key component of enabling AI to take the next big leap.

While the specific approaches to explainable AI are a bit technically involved to discuss them here, from a high-level perspective, there are basically two ways to interpret AI decisions.

The first, which is the simpler and more achievable of the two, has to do with investigating AI decisions after they happen. For instance, this means that if a neural network wrongly classifies the image of desert dunes as nude pictures, the operators of the application should be able to investigate the error. There are already methods that allow this kind of post-decision investigation. Staying on the image classification example, when you give an input to a neural network, it usually returns a list of classes, each with a confidence score. Scientists can investigate how different parts of the image have influenced the decision of the neural network by making small changes to the input and observing how those changes affect the weights of the output classes.

One of the methods that can help automate this process is generative adversarial networks (GAN), which pits two neural networks against each other to maximize one kind of result. For instance, a GAN can start fiddling with an image at very high speed and generate a map of which areas of the image are affecting which output class.

Though effective, the first approach still doesn’t look inside the AI black box and is still tackling the issue from the outside. Also, there are instances where we want to know how an AI works before it makes a decision, not after.

The second approach to explainable AI, which is much more challenging, is creating mechanisms that can shed light on the inner workings of neural networks. This is crucial in areas where you want to investigate the potential vulnerabilities of a neural network (say the computer vision technology used in self-driving cars) before deploying it in the real world.

The problem is that neural networks are best optimized when they are left to create their own complex connections. As soon as you try to break down the complexity to help a human analyst to explore its mechanisms, you risk damaging its speed and efficiency.

This is an interesting area of research, with many efforts going into AI algorithms being able to self-explain their own decision-making processes. We have yet to see some notable results in this domain and we might still be a few years (or decades) away from AI algorithms that can explain their own inner-mechanisms.

Until then, we’ll have to figure out how to create techniques that can bring more transparency, interpretability and accountability to an industry that is becoming more relevant in every aspect of our lives.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.