Inside Kaggle’s most important AI competition

4 min read
Image credit: Depositphotos

Since its founding, Kaggle has held many interesting data science and machine learning competitions. But its most interesting project might have just begun. It’s called the “Abstraction and Reasoning Challenge” and it addresses one of the most fundamental problems of current artificial intelligence techniques.

Current AI systems have proven to be very efficient at solving specific problems, but poor at general problem-solving. This is why measuring the real “intelligence” of AI systems is very tricky. Current evaluation methods mostly consist of testing the accuracy of an AI model against datasets composed of neatly picked data and very narrow benchmarks.

To address the problem of evaluating AI systems, François Chollet, AI researcher and the creator of the Keras deep learning library, published a very important paper in November titled “On the Measure of Intelligence.” Chollet’s work discusses the problems with current approaches to testing natural and artificial intelligence systems.

Chollet also presented the Abstract Reasoning Corpus (ARC), a set of problems that can test the general-problem solving capabilities of humans and AI systems.

Last week, Chollet recently launched the ARC challenge on Kaggle with a three-month timeline and $20,000 prize. Although it’s very unlikely that anyone will be able to solve the challenge in time, it will be an interesting test of how far we’ve come toward solving artificial general intelligence.

Measuring intelligence

stones on a Go board

In the early years of artificial intelligence research, scientists thought that creating thinking machines was just around the corner. But as the challenges of replicating the simplest functions of the human mind became obvious, the field became divided into narrow AI, algorithms that can tackle very specific problems, and artificial general intelligence, the original vision of creating general-problem solving AI.

Narrow AI is well defined and there are plenty of ways to measure it. For instance, ImageNet is a good benchmark for testing computer vision systems that can classify images into a predefined set of things. Machine learning engineers try to optimize their algorithms (and sometimes they cheat) to score higher on ImageNet and other similar datasets.

But as Chollet argues in his paper, such benchmarks don’t test the intelligence of the AI algorithm. Instead, they are a measure of the intelligence of the AI’s developers in developing a system that can solve that specific problem. Case in point: A deep learning algorithm that scores better than the smartest human on ImageNet or defeats the StarCraft 2 world champion can’t perform most tasks that the least intelligent human can perform without a second thought.

Current deep learning systems also suffer from distinct problems, including reliance on a lot of data and brittleness. Many of these AI systems break when they face the messiness and uncertainty of the real world. There are efforts to develop new datasets that better represent the messiness of the real world.

While these new efforts can help to create more robust algorithms, they will not help push develop AI systems that are truly intelligent.

The ultimate solution would be to develop machines that, instead of solving a preprogrammed problem, explore a problem and create a program that can solve it. For instance, an AI that can play Quake 3 should develop the high-level skills that will allow it to start playing any other first-person shooter decently instead of starting from scratch. This is the kind of generalization power that has eluded AI scientists for decades.

Levels of intelligence
François Chollet breaks down intelligence into a hierarchy of three layers. Current AI systems are still struggling at the bottom rung of this ladder. (Source:

But such general intelligence will also need its own specialized benchmark and test dataset, which brings us to ARC.

The ARC challenge

The Abstract Reasoning Corpus is a set of problem-solving tasks that require general problem-solving skills. There are a few things about ARC that make it especially interesting. First, there aren’t a ton of training examples. The system that wants to solve a set of problems has to learn the rules from a few examples, like the following.

ARC test example
A training example in the ARC dataset: The red object must be made adjacent to the blue box.
ARC text example
ARC problem: the test taker must denoise the image, removing the blue dots (or any other color they might have) and keep the main objects intact.

Any human looking at the previous examples will know that the first problem-set involves cohesion and the second is a denoising task. But current AI techniques can’t perform such reasoning and abstraction with so few examples. ARC is filled with such examples and is designed in a way that prevents developers to optimize for evaluation sets.

Another important aspect of ARC is that it levels the ground between AI and humans. The current artificial intelligence landscape is composed of many challenging fields such as computer vision and natural language processing.

Comparing AI performance to humans in those fields is very difficult because we humans have a lot of prior knowledge about the world and can easily take on new challenges. There still isn’t an AI system that incorporates that kind of knowledge. Therefore, any challenge that surrounding image classification and natural language would put AI algorithms at a disadvantage.

But ARC is based on simple visual elements that are easy to parse and require no prior. They strip the advantage that humans have and make it fairer for AI systems to compete. Humans can easily solve most of the problems proposed in ARC not because of their vast knowledge of the world, but thanks to their abstraction and reasoning capabilities. (But even the brightest people admit that some ARC problems are cognitively challenging.)

As for current AI systems, they perform very poorly on ARC problems. “To the best of our knowledge, ARC does not appear to be approachable by any existing machine learning technique (including Deep Learning), due to its focus on broad generalization and few-shot learning,” Chollet notes in his paper.

Solving ARC will require “program synthesis,” the subfield of AI that involves generating programs that satisfy high-level specifications.

How long before it’s solved?

kaggle abstraction and reasoning challenge
Kaggle’s Abstraction and Reasoning Challenge

Kaggle ARC challenge has set May 27 as the final submission deadline for the ARC challenge. The timing somehow reminds me of the “2-month, 10-man study” that was supposed to solve the AI problem in 1955. Given the limits of today’s AI technology, I’d doubt that anyone will be able to solve the challenge by the end of May.

Neither does Chollet have any illusions about ARC-ready AI being right around the corner.

Ben Hammer, Kaggle’s CTO, called the ARC challenge “the toughest Kaggle competition in a long time.”

Chollet provides some high-level guidance on where to start, including finding general ways to solve the problems presented in the ARC dataset. But he also notes that there is currently no AI textbook or tutorial that will guide you on creating the general intelligence required to solve the ARC challenge.

Despite the enormity (and impossibility) of the task, at the time of this writing, more than 190 teams have applied for the ARC challenge and will be testing their skills. It will be interesting how the competition develops. And perhaps more importantly, it will be exciting to see what new discoveries we make in the interim.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.