This article is part of our series that explores the business of artificial intelligence
Since GPT-2, there has been much excitement around the applications of large language models. And in the past few years, we’ve seen LLMs used for many exciting tasks, such as writing articles, designing websites, creating images, and even writing code.
But as I have argued before, there’s a wide gap between showing a new technology do something cool and using the same technology to create a successful product with a workable business model.
Microsoft, I think, just launched the first real LLM product with the public release of GitHub Copilot last week. This is an application that has a strong product/market fit, has immense added value, is hard to beat, is cost-efficient, has very strong distribution channels, and can become a source of great profit.
GitHub Copilot’s release is a reminder of two things: First, LLMs are fascinating, but they are useful when applied to specific tasks as opposed to artificial general intelligence. And second, the nature of LLMs put large tech companies like Microsoft and Google at an unfair advantage to commercialize them—LLMs are not democratic.
Copilot is an AI programming tool that is installed as an extension on popular IDEs like Visual Studio and VS Code. It provides suggestions as you write code, something like autocomplete but for programming. Its capabilities range from completing a line of code to creating entire blocks of code such as functions and classes.
Copilot is powered by Codex, a version of OpenAI’s famous GPT-3 model, a large language model that made the headlines for its capacity to perform a wide range of tasks. However, contrary to GPT-3, Codex has been finetuned just for programming tasks. And it produces impressive results.
The success of GitHub Copilot and Codex underline one important fact. When it comes to putting LLMs to real use, specialization beats generalization. When Copilot was first introduced in 2021, CNBC reported: “…back when OpenAI was first training [GPT-3], the start-up had no intention of teaching it how to help code, [OpenAI CTO Greg] Brockman said. It was meant more as a general purpose language model [emphasis mine] that could, for instance, generate articles, fix incorrect grammar and translate from one language into another.”
But while GPT-3 has found mild success in various applications, Copilot and Codex have proven to be great hits in one specific area. Codex can’t write poetry or articles like GPT-3, but it has proven to be very useful for developers of different levels of expertise. Codex is also much smaller than GPT-3, which means it is more memory and compute efficient. And given that it has been trained for a specific task as opposed to the open-ended and ambiguous world of human language, it is less prone to the pitfalls that models like GPT-3 often fall into.
It is worth noting, however, that just as GPT-3 knows nothing about human language, Copilot knows nothing about computer code. It is a transformer model that has been trained on millions of code repositories. Given a prompt (e.g., a piece of code or a textual description), it will try to predict the next sequence of instructions that make the most sense.
With its huge training corpus and massive neural network, Copilot mostly makes good predictions. But sometimes, it might make dumb mistakes that the most novice programmer would avoid. It doesn’t think about programs in the way a programmer does. It can’t design software or think in steps and think about user requirements and experience and all the other things that go into building successful apps. It’s not a replacement for human programmers.
Copilot’s product/market fit
One of the milestones for any product is achieving product/market fit, or proving that it can solve some problem better than alternative solutions in the market. In this regard, Copilot has been a stunning success.
GitHub released Copilot as a preview product last June and has since been used by more than one million developers.
According to GitHub, in files where Copilot is activated, it accounts for around an impressive 40 percent of the written code. Developers and engineers I spoke to last week say that while there are limits to Copilot’s capabilities, there’s no denying that it improves their productivity significantly.
For some use cases, Copilot is competing with StackOverflow and other code forums, where users must search for the solution to a specific problem they face. In this case, the added value of Copilot is very evident and palpable: less frustration and distraction, more focus. Instead of leaving their IDE and searching for a solution on the web, developers just type the description or docstring of the functionality they want, and Copilot does most of the work for them.
In other cases, Copilot is competing against manually writing frustrating code, such as configuring matplotlib charts in Python (a super frustrating task). While Copilot’s might output require some tweaking, it relieves most of the burden on developers.
In many other use cases, Copilot has been able to cement itself as a superior solution to problems that many developers face every day. Developers told me about things such as running test cases, setting up web servers, documenting code, and many other tasks that previously required manual effort and were arduous. Copilot has helped them save a ton of time in their day-to-day work.
Distribution and cost-efficiency
Product/market fit is just one of the several components of creating a successful product. If you have a good product but can’t find the right distribution channels to deliver its value in a way that is cost-efficient and profitable, then you’re doomed. At the same time, you’ll need a plan to maintain your edge over competitors, prevent other companies from replicating your success, and make sure that you can continue to deliver value down the stretch.
To turn Copilot into a successful product, Microsoft needed to bring together several very important pieces, including technology, infrastructure, and market.
First, it needed the right technology, which it acquired thanks to its exclusive license to OpenAI’s technology. Since 2019, OpenAI has stopped open-sourcing its technology and is instead licensing it to its financial backers, chief among them Microsoft. Codex and Copilot were created off GPT-3 with the help of OpenAI’s scientists.
Other large tech companies have been able to create large language models that are comparable to GPT-3. But there’s no denying that LLMs are very costly to train and run.
“For a model that is 10 times smaller than Codex—the model behind Copilot (which has 12B parameters on the paper)—it takes hundreds of dollars to do the evaluation on this benchmark which they used in their paper,” Loubna Ben Allal, machine learning engineer at Hugging Face, told TechTalks. Ben Allal referred to another benchmark used for Codex evaluation, which cost thousands of dollars for her own smaller model.
“There are also security issues because you have to execute untrusted programs to evaluate the model which might be malicious, sandboxes are typically used for security,” Ben Allal said.
Leandro von Werra, another ML engineer at Hugging Face, estimated training costs to be between tens to hundreds of thousands of dollars depending on the size and number of necessary experiments to get it right.
“Inference is one of the biggest challenges,” von Werra added in comments to TechTalks. “While almost anybody with resources can train a 10B model these days, getting the inference latency low enough to feel responsive to the user is an engineering challenge.”
This is where Microsoft’s second advantage kicks in. The company has been able to create a large cloud infrastructure that is specialized for machine learning models such as Codex. It runs inference and provides suggestions in milliseconds. And more importantly, Microsoft is able to run and provide Copilot at a very affordable price. Currently, Copilot is offered at $10/month or $100/year, and it will be provided for free to students and maintainers of popular open-source repositories.
Most developers I spoke to were very satisfied with the pricing model because it made them much more than its price in time saved.
Abhishek Thakur, another ML engineer at Hugging Face I spoke to earlier this week, said, “As a machine learning engineer, I know that a lot goes into building products like these, especially Copilot, which provides suggestions with sub-milliseconds latency. To build an infrastructure that serves these kinds of models for free is not feasible in the real world for a longer period of time.”
However, running code generator LLMs at affordable rates is not impossible.
“In terms of the compute to build these models and necessary data: that‘s quite feasible and there have been a few replications of Codex such as Incoder from Meta and CodeGen (now accessible for free on the Hugging Face Hub) from Salesforce matching Codex‘s performance,” von Werra said. “There is definitely some engineering involved in building the models into a fast and nice product, but it seems many companies could do this if they want to.”
However, this is where the third piece of the puzzle kicks in. Microsoft’s acquisition of GitHub gave it access to the biggest developer market, making it easy for the company to put Copilot into the hands of millions of users. Microsoft also owns Visual Studio and VS Code, two of the most popular IDEs with hundreds of millions of users. This reduces the friction for developers to adopt Copilot as opposed to another similar product.
With its pricing, efficiency, and market reach, Microsoft seems to have solidified its position as the leader in the emerging market for AI-assisted software development. The market can take other turns. What’s for sure (and as I have pointed out before) is that large language models will open up plenty of opportunities to create new applications and markets. But they won’t change the fundamentals of sound product management.