
OpenAI is scrambling to recover from Google’s huge AI comeback after the latter released Gemini 3.0 Pro and Nano Banana Pro. OpenAI Sam Altman has declared “Code Red,” according to the Information, and has warned: “We are at a critical time for ChatGPT.” The company is reportedly cancelling plans for ads and other products and is focusing on releasing its next model that will outperform Gemini 3.
OpenAI is not a profitable company (even with around $20 billion in annual recurring revenue). It needs to raise capital from investors to fund its next generation of models and products. It has managed to raise tens of billions of dollars on the premise and promise that it is and will remain the undisputed leader in AI. With the sentiment being that it is no longer in the lead, it is less likely to get the next funding round, unless it comes up with a convincing plan to take back the lead.
But this raises the question, how do you measure the lead in AI? Right now, everything is about benchmarks, the set of tasks that models are tested on to measure how good they perform at certain tasks. Gemini 3, which was released in November, topped the benchmark leaderboards.
A week after Gemini 3, Anthropic released Claude Opus 4.5, which also showed bleeding-edge results on key benchmarks. (At the time of this writing, Gemini 3 Pro still has the overall lead on the prestigious Artificial Analysis leaderboard.)
But in reality, it is becoming harder and harder to compare frontier models. Sure, if you scroll through X, you’ll find plenty of examples of the latest and greatest models performing tasks that were impossible with previous generations. But for most tasks, you can get pretty good results from most models. (In fact, I am still using Gemini 2.5 Pro for many of my tasks, even though Google has made Gemini 3 available for free through AI Studio. It gets the work done faster and I don’t see a noticeable difference in the output. And I find Grok 4 Fast to be very good at tasks that require gathering information from the web and X.)
Unfortunately for OpenAI, investors are currently mostly looking at benchmarks to determine whether to invest in the next round or not. So it will have to scramble to release the next model (as it did with GPT-5), which risks being premature and underwhelming (as happened with GPT-5 when it was first launched). Staying in the lead has come at the cost of staying pedal to the metal, taking shortcuts (such as benchmaxxing, or training models on benchmarks) and cutting corners on important tasks (such as figuring out how you are going to turn a profit on this thing).
Google, on the other hand, has not been in the pressure-cooker position of OpenAI since the botched release of Bard. It has been discounted as a second- or third-place AI company for more than a year (which is a long time in AI years). It has taken time with releasing models, making sure they are polished, integrated across its entire ecosystem, and do not fail when users rush to use them. At the same time, it is using its vast compute and financial resources to subsidize access to its models. And it is a profitable company, so it does not rely on investor money to run its AI operations. In fact, after the release of Gemini 3, Google’s stock jumped and its market cap increased by more than the entire amount of funding OpenAI has raised throughout its life.
OpenAI’s problem is not that it doesn’t have the best model anymore but that the general feeling is that it has fallen behind. Being at the forefront of AI is both a blessing and a curse: You get a lot of attention (and funding) but you also have to win every day. When you’re second or third, you just have to win once. Then it’s your turn to maintain the lead. But if you take the lead around the finish line (or at least your rival’s finish line), then you don’t need much runway.



















