Explainable AI: Interpreting the neuron soup of deep learning

Kate Saenko
Kate Saenko, Associate Professor at the Department of Computer Science at Boston University, and the director of the Computer Vision and Learning Group

This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

Kate Saenko had a problem. Her AI algorithms tended to identify scientists as men and kitchen workers as women, and she didn’t know why.

An associate professor at Boston University’s Department of Computer Science, Kate had been using deep learning to automate the captioning of images and videos. And to be true, the results were spectacular. Neural networks, the software structure that underlies deep learning, proved to be very good at generating human-like descriptions of digital imagery. However they weren’t always correct. They would occasionally fail, perhaps mistaking a man in the image for a woman.

“We had this dataset of short YouTube videos with their associated captions, which we used to train our neural networks,” says Kate, who is also the director of the Computer Vision and Learning Group at Boston University. After examining the videos, the neural networks were able to generate descriptions for new videos that weren’t captioned. “But what we noticed is that it had become biased toward certain context in the video. For example if it saw a video with a kitchen scene, it would often start the sentence as ‘A woman is…’”

In another example, when Kate’s team had trained a neural network on science videos, the model would start describing new videos with “A man is talking…”

But the problem was, she had no way of finding out the reasons the neural networks were making mistakes. For instance, she couldn’t tell whether the AI model was misrecognizing the gender of the person or if it was just ignoring the person and basing its decisions on other objects in the scene such as the pots and pans in a kitchen. “When they were wrong, we didn’t know why, and because we didn’t know exactly why they were wrong, it was difficult to fix them or redesign them in a way that would fix the problem,” she says.

Errors and failures are a common thing in computer software. Developers have plenty of tools to retrace the actions and decisions of computer programs to find and fix errors. Unfortunately, none of those tools and techniques apply to deep learning, a branch of artificial intelligence that has become increasingly popular in recent years.

Known as the “AI black box” problem, the challenges of interpreting decisions made by neural networks and deep learning algorithms has become a topic of hot debate as deep learning has found its way into an increasing number of critical domains such as healthcare, criminal justice, education and recruitment.

Kate has been involved in one of the several efforts to create tools and methods to investigate and explain the logic behind decisions made by neural networks, collectively known as “explainable artificial intelligence.” In an interview with TechTalks, Kate discussed the challenges of explaining AI models as well as her own efforts in the field, which she recently published in a paper titled “Randomized Input Sampling for Explanation of Black-box Models (RISE)

Why is it difficult to understand deep learning?

Source: Flickr (Penn State)

When traditional programmers want to develop an application, we have plenty of tools to simplify the problem we want to solve. We use modules, procedures, class hierarchies, data structures, design patterns and much more to divide and conquer the complexity of the problem. We break it down to small logical components that each solve a small part of the problem and together succeed in achieving the various goals of our application.

Those components can be independently tested and reused across applications. The logical structure also makes it easy to represent its functionality in diagrams and graphs and to find bugs and problems by retracing the steps that the application takes to accomplish a task.

We also name our variables and objects to convey the tasks they accomplish. We use comments to annotate our code and help other developers make sense and pick up where we left off.

In her early works in computer vision, Saenko would manually design software components that would look for specific features in images, such as wheels in a car, or ears in a cat. And then they would do machine learning on top of that. But deep learning changed all of that.

“The reason deep learning is so successful is because there’s very little design that goes into neural networks,” says Saenko. “We just let the machine discover the most useful pattern from raw data. We’re not going to tell it what to look for. We’re not going to tell it any high level features. We let it search through all of its training data and find those patterns that lead to the highest accuracy in solving the problem.”

What you gain is accuracy. Deep learning is especially good at performing tasks such as classifying raw, unstructured data like photos, videos and audio files. It has also made great strides in processing and generating written text, performing machine translation, and learning to play games at a professional level. Anyone who has had experience programming knows how hard it is to perform any of those tasks with classical, rule-based approaches.

However, the accuracy and performance of deep learning comes at a cost. “What we gained from that is accuracy, but what we lost is interpretability,” says Saenko. “It’s a tradeoff between how much human-imposed top-down design you put into something to make it more interpretable versus how much performance you lose as a result of that.”

So, in non-AI programming terms, a simplification of deep learning applications would be to pack an entire program—a very huge one with millions of lines of codes—inside a single module, a single procedure, without any comments, and with millions of variables that have generic names such as “v1, v2, v3…” Try making sense of that!

Different approaches to creating interpretable AI

Explainable AI - examining the data behind automated decision
Source: Depositphotos

There are generally two different methods of interpreting deep learning algorithms. The first is to develop neural networks that are inherently explainable.

“A neural network that is inherently interpretable has a particular structure such that you can actually read out intermediate activations of that network, and when you show them to a human user, they would be meaningful,” Saenko says.

In a hypothetical example, if you have an interpretable neural network that detects cats, it should be able to show you different stages of its work, such as how it detects and classifies the eyes, how it detects the tail, the body the fur, and how it evaluates each of these components to decide whether a certain image contains a cat or not. “At some points of that network, you should have intermediate results that the neural network can show you that you could interpret as being something that explains the process. It’s not just a bunch of numbers and neuron activations,” Saenko says.

The problem with this model is that neural networks are optimized for machines, not humans. The stages that a neural network goes through to classify data is fundamentally different the process that the human mind undergoes. If you try to change their structure to make them understandable to a human observer, it will come with a penalty to speed and accuracy.

The second approach is to create tools that can investigate and find explanations for the output of a neural network. This method works for any AI technique where we don’t have a guaranteed interpretability.

“We just have a bunch of neurons, or a neuron soup, and we want to understand at some intuitive or high level, why it is making these decisions,” Saenko says. “It’s about justifying or rationalizing the decisions of an arbitrary neural network.”

This is the approach behind RISE, the model that Saenko and her colleagues at Boston University have developed.

What is RISE?

“We were interested in understanding the workings of neural networks we used for image captioning. Those have additional complexities because they also have a language component. So the architecture of the neural networks we were dealing with were more complicated and we had to come up with a method for explaining their decisions that weren’t available before,” Saenko says.

In a nutshell, the RISE method produces a heat map or a saliency map that highlights which parts of an input contribute to the output weights of a neural network. For instance, when you feed a photo to an image classifier network, it returns a set of classes and corresponding weights or levels of certainty (e.g. bird=0.85, dog=0.1, airplane=0.3…). RISE will try to tell you how much each of the pixels of that image contributed to each of those classes.

RISE explainable AI example saliency map
Examples of saliency maps produced by RISE

“There are different methods for explaining neural networks, but what we wanted to do was have a method that could be applied to a black box model,” Saenko says. A black box approach means that you don’t have access to the code or parameters or any of the other information that can hint at the inner workings of the neural network, and you still want to explain how it works. “We just want some simple and quick way to explain their decision.”

To obtain the heat map of a given input image, RISE generates random masks and overlays them over the image. It then feeds those masked versions of the image into the neural network and observes the changes that each make to the outputs. By repeating the process multiple times, it can measure which parts of the image have the most influence on the output classes.

RISE explainable AI example mask
RISE applies random masks to inputs and measures the value of each pixel

RISE is not the only explainable AI model that uses the black box approach. Local Interpretable Model-agnostic Explanations (LIME) is another approach that needs no insights into a neural network to interpret its decisions. However, LIME relies on finding and evaluating “super pixels,” or sub-regions in an image that have similar colorings. After defining the super-pixels of an image, LIME masks each of them and reinserts them into the network to observe how it changes the output.

The problem with LIME is that in some cases where say a person’s body or an object has several very distinct colors, LIME wouldn’t be able to capture the entire contour of that person or object. According to Saenko’s paper and the experiments that she and her team have conducted, the results that RISE returns are more accurate than LIME.

Who will be able to use interpretable AI models such as RISE?

There are basically two types of persons who will benefit from explainable AI models. The first are the developers and engineers such as Kate and her team, who will be able to investigate why their AI models are making mistakes or whether the decisions they’re making are based on real, logical elements.

Second are the end users, the people who aren’t designing the AI models but are using them in their applications. This audience is equally important and is growing every day as deep learning and neural networks find their way into more and more applications.

“One way to use explainable AI is to build more trust,” Saenko example. “For example, if you have a self-driving car and it tells you it wants to turn right up here, it could show a saliency map that highlights the signs and other objects that are driving that decision.”

Another very critical use for explainable AI is in domains where deep learning is used to augment the abilities of human experts. An example is health care, which is one of the areas where there’s a lot of interest in using deep learning, and insights into the decisions of AI models can make a big difference. For instance, neural networks are being used in areas such as radiology, where they are analyzing x-ray and MRI images and describing symptoms. However, doctors need to know why a neural network is making a certain diagnosis, especially when it’s different from their own assessment.

“When you give a doctor and AI image model that can look at a medical image or an MRI and detect cancer with very high accuracy, they often still don’t trust it because they don’t know why it’s making that decision,” Saenko says.

Using RISE, doctors would be able to see which parts of an MRI or x-ray are contributing to a neural network’s output and decide for themselves whether they would want to follow it or rely on their own judgement.

Going beyond explaining image classifier networks

Abstract Background Science Technology

While Saenko’s work is focused on image classification algorithms, she explains that the RISE model can be applied to other neural networks where the inputs are not pixels and image data.

“It’s certainly applicable to other kinds of data, like text data or audio data or network traffic data,” Saenko says, although they haven’t experimented with other data yet.

For instance, there’s increasing desire in human resources departments to use AI, especially in companies and industries where recruiters are overwhelmed with a huge amount of job applications, and automation can help filter out non-relevant resumes and single out potential candidates for the staff to review.

But there’s also fear that using automation will lead to algorithmic bias, where deep learning algorithms might silently discriminate against candidates of a particular gender or race. Recently, Amazon had to shut down its AI recruiting tool because it was biased against women. There have been other cases where the algorithms behind automated job ads have shown gender bias.

RISE can help find out the hidden biases that an automated hiring tool would contain. In this case, instead of weighing pixels, RISE will highlight the words and sentences that might contribute to the hiring or elimination of a certain candidate. Auditors will then be able to decide whether the neural network is making a fair assessment or if it’s discriminating against the applicant.

“Basically, as long as the explanation that you want can be provided in the form of a heat map or importance values assigned to each part of the input, you can use the RISE model,” Saenko says. “So for instance, instead of a visual heat map, you’ll be assigning a heat map over the words in a sentence or even the characters.”

Cooperation with DARPA

Saenko’s work is funded by the Defense Advanced Research Projects Agency (DARPA) as part of its Explainable AI (XAI) initiative. The military is increasingly interested in the use of AI in different fields, from logistics to surveillance and maybe autonomous weapons. There’s a lot of debate and controversy over the ethics of using AI in military spaces, and over the past months, many big tech companies such as Google, Amazon and Microsoft have faced internal pressure from engineers and employees who have urged their employers not to engage in AI projects that would end up hurting innocents.

DARPA’s job is to make sure that if the military does end up employing AI in critical fields, its users have the right tools to audit and vet the decisions made by deep learning algorithms. For Saenko and other academicians doing research in the domain of explainable AI, DARPA’s XAI initiative provides the perfect environment to help serve the general interests of the industry.

Saenko and her team have made their work and source code public for everyone to use in their domains of choice, whether it’s healthcare or network security or recruiting. As for Saenko, she can now see for herself why her captioning tool was mislabeling the people in the kitchen videos.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.