Blog

Explainable AI: Viewing the world through the eyes of neural networks

February 4, 2019

This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

One of the most intriguing artificial intelligence techniques was conceived when a few computer scientists where discussing deep learning and photorealistic images at a Montreal pub in 2014. Called generative adversarial networks (GAN), the concept has enabled the AI industry to take huge leaps toward creativity, generating images and sounds that are very close to their natural counterparts.

However, like other AI techniques that use deep learning and neural networks, GANs are opaque, which means there’s very little visibility or control on how they work. As a result, engineers find it hard to troubleshoot them, and users find it hard to trust them.

To overcome these limitations, researchers at IBM and MIT have developed a technique called “GAN Dissection” that helps explore the inner workings of GANs and better understand the reasoning that results in their output. The work of IBM and MIT is one of several efforts collectively explainable AI, projects that aim to create tools that can interpret AI decisions or create artificial intelligence models that are more transparent and open to investigation.

GAN Dissection and its associated visual tool, GANpaint, give insights into how generative adversarial networks view the world. And interestingly, in some areas, their “understanding” of the world is often not very different from ours.

The wonders and complexities of GANs

“The reason we were studying GANs is because they’re this fascinating, mysterious but oddly successful system that have been on fire in the last couple of years,” says David Bau, scientist at MIT Computer Science and Artificial Intelligence Laboratory (MIT CSAIL). “The core thing about GANs is that they learn to do a very difficult task, which is to create photorealistic images of the world. But they do it with almost no supervision at all.”

The idea behind generative adversarial networks is to employ two neural networks to create new data. One network, the discriminator, has been trained with real examples to detect the authenticity of data. The second network, the generator, creates new data and runs it through the classifier again and again, tweaking the input in between each run, until the data becomes validated as acceptable content.

For instance, a GAN that generates human faces trains the classifier on thousands of human faces until it can classify any image as “human face” or “not human face.” And then it pits the generator against the classifier and sets it to generate and tweak images until the classifier labels them as human faces.

But what eventually happens under the hood as the two networks go through millions of iterations remains a mystery.

“The puzzle for us is, what have these systems learned, in learning to create the world? It seems like it’s a hard problem. It seems like, to solve this problem, the neural networks must have solved a bunch of other hard problems. But nobody really knows, because they’re completely opaque systems,” Bau says.

So the researchers at MIT and IBM decided to crack open GANs and see if they can make sense of how they work.

What do you see when you look inside GANs?

“There are an astronomical number of ways that things can be encoded in neural networks, and that’s one of the things that intimidated people before,” Bau says.

Deep learning algorithms and neural networks learn to do things very efficiently, but the way they do it is not necessarily interpretable to humans. Neural networks are composed of thousands and millions of variables, also called “neurons” or “units,” which combine to solve problems. The general assumption is that if you look inside neural networks, you won’t find much that makes sense.

But instead of taking a complicated approach to decipher the inner-workings of GANs, the researchers at MIT and IBM did something that many would consider nonsensical. They directly looked for concepts that would make sense to humans.

“It would be the equivalent of opening up the back of a computer and asking if there’s one wire that corresponds to a person’s nose or hair?” Bau says, adding that most people would rightly consider this a ridiculous question.

To probe the GANs, the researchers used a process called “network dissection,” which is to test the neurons in the GAN against a database of images and concepts.

“We probe every single neuron in the entire network, and we have a large dictionary of several hundred or about a thousand concepts. We test it on hundreds and thousands of images and we correlate thousands of concepts on thousands of images and tens of thousands of variables inside the neural network,” Bau says.

The result is a matrix that maps neurons to the input concepts. To verify the validity of the results, the researchers manually test the output mappings to make sure the established correlations are not problematic and illogical.

To their surprise, the researchers found that for a lot of concepts, the networks were using simple encodings and logic that very much resembles human reasoning. For instance, they discovered there were variables that directly corresponded to “trees” or “doors” or other known objects. So whenever the GAN would draw a tree or door, those specific neurons would become activated.

“That was the big surprise of our research. We looked at individual neurons and found that they corresponded to human-interpretable objects. We found a neuron that correlates with trees—with what humans call trees. It’s extremely naïve to look for it, but sometimes you can unlock new research by asking really stupid questions,” Bau says.

Do GANs have commonsense understanding of the world?

Something you hear a lot among the data science and AI community is that “correlation is not causation.” Basically this means that if a neural network provides a correct output, it doesn’t necessarily mean that it is acting upon logical parameters.

For instance, a neural network might learn to classify trees not by their shapes and colors but by other elements that usually surround them, such as large expanses of grass. In one case, researchers found that a neural network was classifying the gender of people shown in videos based on objects that were in the background and not based on the physical traits of the person.

To untangle the difference between correlation and causation, the researchers at MIT and IBM created GANpaint, a tool that helps test interventions in neural networks. The idea behind GANpaint is to manually turn neurons on and off and observe the changes in the behavior of the AI model. The process would help better understand how the neural network processes and reasons with its information.

“To our delight and surprise, we found that if we took a collection of tree neurons and turn them off, the trees would disappear from the image, but other things like the buildings would stay. We found similar effects for whole bunch of objects,” Bau says.

The technique also worked the other way around. So if they turned on a particular neuron associated with a type of object, it would result in that type of object being added to the scene. “We could force the neural network to think about trees, when it wasn’t previously,” Bau says.

GANpaint screenshot 1 — GANpaint is a visual tool that lets you manipulate the decisions made by AI models

Using GANpaint is very easy. You choose one of several provided images. You can then choose the type of object you want to manipulate (trees, grass, doors, sky…) and whether you want to add or remove that kind of object. And then you use your mouse pointer like a brush to apply changes to the scene. For instance, by choosing remove trees in the picture below, I was able to erase the large tree from the scene.

The GAN automatically replaced it with what it thinks the background should look like. Granted, the tree took up a large portion of the scene, so it’s natural to see the replacement look a bit unnatural. The neural network performs much better on smaller trees. Also, what’s interesting is that while I didn’t perfectly choose all of the tree and left out some of its edges, the AI was smart enough to fill those gaps for me.

GANpaint is still a very crude and limited interface. There’s very little you can do beyond fiddling with the examples and object types the researchers have provide. But this is just a testbed for the GAN dissection technique. In the future, the researchers might develop it into a general purpose tool for manipulating neural networks in different AI models.

What the general structure of GANs tell us about the AI industry

The joint research of MIT and IBM suggests that, to some degree, GANs are organizing knowledge and information in ways that are logical to humans, and they’re doing it with little or no supervision. An important conclusion from this discovery is that in the future, we might be able to create AI models that are both very efficient and interpretable.

“A lot of people believe there’s a tradeoff, that if you make AI models smarter, inherently you’re going to make them harder to understand. And what our research is suggesting is that it’s possible that it might not always be the case. It might be that as we make neural networks smarter, that in some ways they might become easier to understand,” Bau says, while also acknowledging that what they’ve achieved so far doesn’t prove anything yet because it’s just a single observation.

“We are finding evidence that maybe progress is possible,” Bau clarifies.

Hendrik Strobelt, research scientist at the MIT-IBM Watson AI lab, stresses that their findings still do not imply that they’ve been able to fully control the structure and behavior of neural networks, something that Bau describes as the “Holy Grail” of contemporary AI.

“We’re still more observing than acting,” Strobelt says. “AI is often like a biological experiment. You put some ingredients in and all of a sudden you get a result. We went one step further and said at least we know roughly what are the molecules at work here to produce in this case a tree. But it’s not easily that we could say let’s synthesize a new thing by ourselves.”

Building trust in AI systems

Deep learning algorithms perform spectacularly at very difficult tasks, sometimes giving the impression that they’re perfect AI models. “When you just take a look at an image that comes out of a GAN, then you might imagine that it’s this amazingly skilled, talented photorealistic painter that must know all sorts of things,” Bau says.

But deep learning also has distinct limits that can cause it to fail in unexpected ways. Sometimes, those failures can become vulnerabilities that can cause harmful behavior or turned into attacks.

The result is uncertainty and lack of confidence over the level of trust you can put into artificial intelligence systems.

With techniques such as GAN Dissect and tools such as GANpaint, you’ll be able to evaluate the strengths and weaknesses of AI models. For instance, by tweaking GANpaint, the researchers were able to discover that to some extent, GANs were able to recognize the relations that different objects had with each other.

“If you had an eye on commonsense questions, it’ll restrict some of the things you can do. It won’t let you put a door in a place that doesn’t make sense. You can get the feel that it kind of knows what the door is, where they should go,” Bau says.

For instance, in some cases, if you tried to use GANpaint to draw a tree in the middle of the sky, the GAN would know that it would also have to draw a tree trunk and connect it to the ground. Likewise, it wouldn’t allow you to draw a door where a building didn’t exist.

But GANpaint also helped you discover cases where the AI would manifest irrational behavior, implying that there might be a disconnect between its understanding and the realities of the world.

“What we did with GANpaint is that by letting a person get more involved in what the neural network is doing and having more continuous interaction with it, it gives people a much better feel for what the strengths and weaknesses of the AI are,” Bau says.

Better cooperation between AI and humans

As the capabilities and limits of AI have become clearer in the past few years, many scientists and experts have started using the term “augmented intelligence“. Augmented intelligence, which shares the same acronym as artificial intelligence, implies that AI is about empowering human intelligence, not replacing it.

“One of the big questions when it comes to AI is what is the role of humans. I think this is a nice example where AI and human intelligence work together to produce something that is creative,” says Strobelt.

AI tools are now making it easier for everyone to express themselves artistically. There are plenty of AI tools that enable users to apply the styles of famous painters to their drawings or to compose songs of various genres.

GANpaint can evolve to become one of those tools, Strobelt explains. For instance, you can train a GAN to learn level maps for computer games and it can generate new random level maps. With a tool like GANpaint you’ll be able to make small changes and improve the output of the AI where it makes mistakes.

“It’s very nice that the human can stop to correct the little things, the mistakes the GAN might output,” Strobelt says.

But while those tools are performing amazing feats, AI is not a replacement for human creativity, Strobelt stresses. “In many respects, humans are still superior in creativity. I still have a strong belief that this will last for a long time,” he says.

How the AI arms race moved from smart models to full-stack…

Why LLMs should stop thinking out loud (and what comes after…

Beyond vibe coding: How Codev 3.0 engineers the AI-powered dev team

How Cursor’s Composer 2.5 uses self-distillation to beat the frontier LLMs…

Vertical integration as AI infrastructure: What 21D’s full arch implant system…

Applied ML: When ‘perfect’ becomes the enemy of ‘good’

AI can’t replace software engineers yet, but here is how to…

How to turbocharge your product and market research with DeepSearch

How looking differently at data can save your machine learning project

Building a solid data foundation for generative AI applications

Demystifying loop engineering: Get more from AI agents, avoid loopmaxxing

Why the future of agentic AI is all about the harness

The evolution of LLM tool-use from API calls to agentic applications

What makes DeepSeek-V3.2 so efficient?

What to know about Claude Opus 4.5

AI is writing your code, but who’s reviewing it?

Machine learning in space: Building intelligent systems for the harshest environments

Decoding the brain, inspiring AI: How Rahul Biswas is bridging neuroscience…

The cash flow conundrum: How technology is reshaping small business finance

What to know about the security of open-source machine learning models

Explainable AI: Viewing the world through the eyes of neural networks

The wonders and complexities of GANs

What do you see when you look inside GANs?

Do GANs have commonsense understanding of the world?

What the general structure of GANs tell us about the AI industry

Building trust in AI systems

Better cooperation between AI and humans

Like this:

Leave a ReplyCancel reply

The wonders and complexities of GANs

What do you see when you look inside GANs?

Do GANs have commonsense understanding of the world?

What the general structure of GANs tell us about the AI industry

Building trust in AI systems

Better cooperation between AI and humans

Like this:

Leave a ReplyCancel reply

Discover more from TechTalks