Blog

What to expect from OpenAI’s Codex API

August 13, 2021

This article is part of our series that explores the business of artificial intelligence

OpenAI will make Codex, its AI programmer technology, available through an application programming interface, the company announced on its blog on Tuesday. In tandem with the announcement, OpenAI CTO Greg Brockman, Chief Scientist Ilya Sutskever, and co-founder Wojciech Zaremba gave an online presentation of the capabilities of the deep learning model.

The Codex demo puts the advantages of large language models to full display, showing an impressive capacity to resolve references and write code for a variety of APIs and micro-tasks that can be frustratingly time-consuming.

OpenAI is still testing the waters with Codex. How far you can push it in programming tasks and how it will affect the software job market remain open questions. But this unexpected turn to OpenAI’s exploration of large language models seems to be the first promising application of neural networks that were meant for conversations with humans.

Language models for coding

Codex is a descendent of GPT-3, a very large language model OpenAI released in 2020 and made available through a commercial private beta API. OpenAI’s researchers wanted to see how developers would use GPT-3 for natural language processing applications.

But the outcome surprised them. “The thing that was funny for us was to see that the applications that most captured people’s imaginations, the ones that most inspired people, were the programming applications,” Brockman said in the video demo of Codex. “Because we didn’t make the model to be good at coding at all. And we knew that if we put in some effort, we could make something happen.”

Codex is a version of GPT-3 that has been finetuned for programming tasks. The machine learning model is already used in Copilot, another beta-test code generation product hosted by GitHub. According to OpenAI, the current version of Codex has a 37-percent accuracy on coding tasks as opposed to GPT-3’s zero percent.

Codex takes a natural language prompt as input (e.g., “Say hello world”) and generates code for the task it is given. It is supposed to make it much easier for programmers to take care of the mundane parts of writing software.

“You just ask the computer to do something, and it just does it,” Brockman said.

In the demo, Brockman and Sutskever take Codex through a series of tasks that range from displaying a simple “Hello World” message in Python to gradually writing a web game in JavaScript.

The demo had some impressive highlights, even if it seemed to be rehearsed. For example, Codex seems to be pretty good at coreference resolution. It also links nouns in the prompt to their proper variables and functions in the code (though in the demo, it seemed that Brockman also knew how to phrase his commands to avoid confusing the deep learning model).

Codex can perform some tedious tasks, such as rendering web pages, launching web servers, and sending emails. The model also shows some of the zero-shot learning capabilities of GPT-3. For instance, in the demo, Brockman showed how you can add Mailchimp interfacing capabilities to Codex with three lines of instructions. Further down the video, the presenters use Codex to create a user interface in JavaScript, place objects on the screen, and make the objects controllable with the keyboard arrow keys. Another video shows OpenAI generating data science code and generating charts in Python’s matplotlib library.

These are not complicated tasks, but they’re tedious and error-prone processes, and they usually require looking up reference manuals, browsing programming forums, and poring over code samples. So, having an AI assistant writing this kind of code for you can save some valuable time.

“This kind of stuff is not the fun part of programming,” Brockman said.

Maybe I can finally use matplotlib now without spending half a day googling the exact syntax and options! https://t.co/Vak1nzu0Jk
— Soumith Chintala (@soumithchintala) August 11, 2021

Per OpenAI’s blog: “Once a programmer knows what to build, the act of writing code can be thought of as (1) breaking a problem down into simpler problems, and (2) mapping those simple problems to existing code (libraries, APIs, or functions) that already exist. The latter activity is probably the least fun part of programming (and the highest barrier to entry), and it’s where OpenAI Codex excels most.”

The limits of Codex

While the Codex demos are impressive, they do not present a full picture of the deep learning system’s capabilities and limits.

Codex is currently available through a closed beta program, which I don’t have access to yet (hopefully that will change). OpenAI also ran a Codex coding challenge on Thursday, which was available to everyone. Unfortunately, their servers were overloaded when I tuned in, so I wasn’t able to play around with it.

The Codex Challenge servers are currently overloaded due to demand (Codex itself is fine though!). Team is fixing… please stand by.
— OpenAI (@OpenAI) August 12, 2021

But the demo video shows some of the flaws to look out for when using Codex. For example, if you tell human programmers to print “Hello world” five times, they will usually use a loop and print each message on a single line. But when Brockman told the deep learning model to do the same thing, it used an unusual method that pasted all the messages next to each other. As a result, Brockman was forced to reword his instruction more specifically.

Codex’s output is not necessarily the optimal way to solve problems. For example, to enlarge an image on the webpage, the model used an awkward CSS instruction instead of just using larger numbers for width and height.

And sometimes, the model generates code that is very far off from what the developer intends. In the final ten minutes of the demo, Brockman and Sutskever used Codex to create a JavaScript game. When they instructed Codex to define a condition for game loss, the deep learning model generated an event listener for the spacebar keypress. Brockman fixed it by explicitly telling Codex to write a function for game loss.

The video demo also didn’t show any of the limits detailed in full in the Codex paper, including the model’s limits in dealing with multi-step tasks. This omission raised some concern in the AI community.

. @OpenAI's #Codex is to programming as Tesla's FSD 2021 is to driving.

Read the paper (esp Appendix B) carefully and you will realize there is a gap between the slick videos & reality: it is often correct on simple tasks, but frequently lost on more complex challenges.

1/3 pic.twitter.com/9VNRIj1wYw
— Gary Marcus (@GaryMarcus) August 11, 2021

But despite the limits, Codex can be very useful. Already, those lucky few who have been given access to the API have used it to automate some of the tedious and boring parts of their jobs. And many others who have been working with GitHub’s Copilot have also expressed satisfaction with the productivity benefits of AI-powered code generation.

The new @OpenAI Codex model is a pretty exciting piece of technology.

Here I made a @Blender add-on and taught it how to use the built in Python API.

Taking creative coding to the next level!! pic.twitter.com/0UksTsq1Ep
— Andrew Carr (@andrew_n_carr) August 11, 2021

Who should use Codex?

In an interview with The Verge, Zaremba compared programming with Codex to the transition from punch cards to programming languages. At the time, the advent of programming languages such as C and Fortran reduced the barrier of entry to software development and made the market accessible to a much larger audience. The same thing happened as higher-level languages appeared and took care of the complex technical challenges of writing code. Today, many programmers write code without worrying about allocating and freeing memory chunks, managing threads, or releasing system resources and handles.

But I don’t think Codex is a transition from learning programming languages to giving computers conversational instructions and letting them write the code for themselves. Codex can be a very useful tool for experienced programmers who want an AI assistant to churn out code that they can review. But in the hands of a novice programmer, Codex can be a dangerous tool with unpredictable results.

I’m especially concerned about the potential security flaws that such statistical models can have. Since the model creates its output based on the statistical regularities of its training corpus, it can be vulnerable to data poisoning attacks. For example, if an adversary uploads malicious code in GitHub in enough abundance and targeted for a specific type of prompt, Codex might pick up those patterns during training and then output them in response to user instructions. In fact, the page for GitHub Copilot, which uses the same technology, warns that the code generation model might suggest “old or deprecated uses of libraries and languages.”

This means that blindly accepting Codex’s output can be a recipe for disaster, even if it works fine. You should only use it to generate code that you fully understand.

The business model of Codex

I believe the Codex API will find plenty of internal uses for software companies. According to the details in the Codex paper, it is much more resource-efficient than GPT-3, and therefore, it should be more affordable. If software development companies manage to adapt the tool to their internal processes (as with the Blender example above) and save a few hours’ time for their developers every month, it will be worth the price.

But the real developments around Codex will come from Microsoft, the unofficial owner of OpenAI and the exclusive license-holder of its technology.

After OpenAI commercialized GPT-3, I argued that creating a product and business models on the language model would be very difficult if not impossible. Whatever you do with the language model, Microsoft will be able to do it better, faster, and at a lower cost. And with the huge userbase of Office, Teams, and other productivity tools, Microsoft is in a suitable position to dominate most markets for GPT-3-powered products.

Microsoft also has a dominating position with Codex, especially since it owns GitHub and Azure, two powerhouses for software development, DevOps, and application hosting. So if you’re planning to create a commercial product with the Codex API, you’ll probably lose the competition to Microsoft unless you’re targeting a very narrow market that the software giant will not be interested in. As with GPT-3, OpenAI and Microsoft released the Codex API to explore new product development opportunities as developers experiment with it, and they will use the feedback to roll out profitable products.

“[We] know we’ve only scratched the surface of what can be done,” the OpenAI blog reads.

Moving beyond passive RAG: How to implement active memory reconstruction for…

How self-improving harnesses are rewriting the agent engineering playbook

How Nvidia’s ASPIRE framework accelerates robot programming with self-improving AI

How the AI arms race moved from smart models to full-stack…

Why LLMs should stop thinking out loud (and what comes after…

Applied ML: When ‘perfect’ becomes the enemy of ‘good’

AI can’t replace software engineers yet, but here is how to…

How to turbocharge your product and market research with DeepSearch

How looking differently at data can save your machine learning project

Building a solid data foundation for generative AI applications

Demystifying loop engineering: Get more from AI agents, avoid loopmaxxing

Why the future of agentic AI is all about the harness

The evolution of LLM tool-use from API calls to agentic applications

What makes DeepSeek-V3.2 so efficient?

What to know about Claude Opus 4.5

AI is writing your code, but who’s reviewing it?

Machine learning in space: Building intelligent systems for the harshest environments

Decoding the brain, inspiring AI: How Rahul Biswas is bridging neuroscience…

The cash flow conundrum: How technology is reshaping small business finance

What to know about the security of open-source machine learning models

What to expect from OpenAI’s Codex API

Language models for coding

The limits of Codex

Who should use Codex?

The business model of Codex

Like this:

Leave a ReplyCancel reply

Language models for coding

The limits of Codex

Who should use Codex?

The business model of Codex

Like this:

Leave a ReplyCancel reply

Discover more from TechTalks