This article is part of our series that explores the business of artificial intelligence
OpenAI will make Codex, its AI programmer technology, available through an application programming interface, the company announced on its blog on Tuesday. In tandem with the announcement, OpenAI CTO Greg Brockman, Chief Scientist Ilya Sutskever, and co-founder Wojciech Zaremba gave an online presentation of the capabilities of the deep learning model.
The Codex demo puts the advantages of large language models to full display, showing an impressive capacity to resolve references and write code for a variety of APIs and micro-tasks that can be frustratingly time-consuming.
OpenAI is still testing the waters with Codex. How far you can push it in programming tasks and how it will affect the software job market remain open questions. But this unexpected turn to OpenAI’s exploration of large language models seems to be the first promising application of neural networks that were meant for conversations with humans.
Language models for coding
Codex is a descendent of GPT-3, a very large language model OpenAI released in 2020 and made available through a commercial private beta API. OpenAI’s researchers wanted to see how developers would use GPT-3 for natural language processing applications.
But the outcome surprised them. “The thing that was funny for us was to see that the applications that most captured people’s imaginations, the ones that most inspired people, were the programming applications,” Brockman said in the video demo of Codex. “Because we didn’t make the model to be good at coding at all. And we knew that if we put in some effort, we could make something happen.”
Codex is a version of GPT-3 that has been finetuned for programming tasks. The machine learning model is already used in Copilot, another beta-test code generation product hosted by GitHub. According to OpenAI, the current version of Codex has a 37-percent accuracy on coding tasks as opposed to GPT-3’s zero percent.
Codex takes a natural language prompt as input (e.g., “Say hello world”) and generates code for the task it is given. It is supposed to make it much easier for programmers to take care of the mundane parts of writing software.
“You just ask the computer to do something, and it just does it,” Brockman said.
The demo had some impressive highlights, even if it seemed to be rehearsed. For example, Codex seems to be pretty good at coreference resolution. It also links nouns in the prompt to their proper variables and functions in the code (though in the demo, it seemed that Brockman also knew how to phrase his commands to avoid confusing the deep learning model).
These are not complicated tasks, but they’re tedious and error-prone processes, and they usually require looking up reference manuals, browsing programming forums, and poring over code samples. So, having an AI assistant writing this kind of code for you can save some valuable time.
“This kind of stuff is not the fun part of programming,” Brockman said.
Per OpenAI’s blog: “Once a programmer knows what to build, the act of writing code can be thought of as (1) breaking a problem down into simpler problems, and (2) mapping those simple problems to existing code (libraries, APIs, or functions) that already exist. The latter activity is probably the least fun part of programming (and the highest barrier to entry), and it’s where OpenAI Codex excels most.”
The limits of Codex
While the Codex demos are impressive, they do not present a full picture of the deep learning system’s capabilities and limits.
Codex is currently available through a closed beta program, which I don’t have access to yet (hopefully that will change). OpenAI also ran a Codex coding challenge on Thursday, which was available to everyone. Unfortunately, their servers were overloaded when I tuned in, so I wasn’t able to play around with it.
But the demo video shows some of the flaws to look out for when using Codex. For example, if you tell human programmers to print “Hello world” five times, they will usually use a loop and print each message on a single line. But when Brockman told the deep learning model to do the same thing, it used an unusual method that pasted all the messages next to each other. As a result, Brockman was forced to reword his instruction more specifically.
Codex’s output is not necessarily the optimal way to solve problems. For example, to enlarge an image on the webpage, the model used an awkward CSS instruction instead of just using larger numbers for width and height.
The video demo also didn’t show any of the limits detailed in full in the Codex paper, including the model’s limits in dealing with multi-step tasks. This omission raised some concern in the AI community.
But despite the limits, Codex can be very useful. Already, those lucky few who have been given access to the API have used it to automate some of the tedious and boring parts of their jobs. And many others who have been working with GitHub’s Copilot have also expressed satisfaction with the productivity benefits of AI-powered code generation.
Who should use Codex?
In an interview with The Verge, Zaremba compared programming with Codex to the transition from punch cards to programming languages. At the time, the advent of programming languages such as C and Fortran reduced the barrier of entry to software development and made the market accessible to a much larger audience. The same thing happened as higher-level languages appeared and took care of the complex technical challenges of writing code. Today, many programmers write code without worrying about allocating and freeing memory chunks, managing threads, or releasing system resources and handles.
But I don’t think Codex is a transition from learning programming languages to giving computers conversational instructions and letting them write the code for themselves. Codex can be a very useful tool for experienced programmers who want an AI assistant to churn out code that they can review. But in the hands of a novice programmer, Codex can be a dangerous tool with unpredictable results.
I’m especially concerned about the potential security flaws that such statistical models can have. Since the model creates its output based on the statistical regularities of its training corpus, it can be vulnerable to data poisoning attacks. For example, if an adversary uploads malicious code in GitHub in enough abundance and targeted for a specific type of prompt, Codex might pick up those patterns during training and then output them in response to user instructions. In fact, the page for GitHub Copilot, which uses the same technology, warns that the code generation model might suggest “old or deprecated uses of libraries and languages.”
This means that blindly accepting Codex’s output can be a recipe for disaster, even if it works fine. You should only use it to generate code that you fully understand.
The business model of Codex
I believe the Codex API will find plenty of internal uses for software companies. According to the details in the Codex paper, it is much more resource-efficient than GPT-3, and therefore, it should be more affordable. If software development companies manage to adapt the tool to their internal processes (as with the Blender example above) and save a few hours’ time for their developers every month, it will be worth the price.
But the real developments around Codex will come from Microsoft, the unofficial owner of OpenAI and the exclusive license-holder of its technology.
After OpenAI commercialized GPT-3, I argued that creating a product and business models on the language model would be very difficult if not impossible. Whatever you do with the language model, Microsoft will be able to do it better, faster, and at a lower cost. And with the huge userbase of Office, Teams, and other productivity tools, Microsoft is in a suitable position to dominate most markets for GPT-3-powered products.
Microsoft also has a dominating position with Codex, especially since it owns GitHub and Azure, two powerhouses for software development, DevOps, and application hosting. So if you’re planning to create a commercial product with the Codex API, you’ll probably lose the competition to Microsoft unless you’re targeting a very narrow market that the software giant will not be interested in. As with GPT-3, OpenAI and Microsoft released the Codex API to explore new product development opportunities as developers experiment with it, and they will use the feedback to roll out profitable products.
“[We] know we’ve only scratched the surface of what can be done,” the OpenAI blog reads.