This article is part of our series that explores the business of artificial intelligence
2022 has been a huge year for generative artificial intelligence. Large language models have continued to make progress in generating text and software code. At the same time, we’ve seen tremendous advances in text-to-image generators with the introduction of models such as DALL-E 2, Imagen, and Stable Diffusion.
The year was also marked by an acceleration in productizing generative AI models. The science and technology of generative models are maturing to the point of being able to solve real problems. Now, companies like Microsoft and Google are looking for ways to take the lead in a newly shaping market that might change the future of creativity.
In this week’s AI@ ’22 conference, Google showed a glimpse of its roadmap for leveraging generative models in its products. Its strategy can be a prelude to where the field is headed and where the competition might shift in the future.
Google’s generative models
At AI@ ’22, Douglas Eck, Principal Scientist at Google Research, listed Google’s current research on generative models in four areas: text, source code, audio, image, and video.
Google is currently running test projects in all these fields with an eye on creating products in the future. The Wordcraft Writers Workshop is an effort to help writers get help from large language models in their writing. Google created Wordcraft, a tool that uses the language model LaMDA to generate writing based on prompts provided by the user. The tool is designed to use the model in an iterative process in which the human writer and the LLM interact to create the story together.
“Using LaMDA to write full stories is a dead end. It’s a much more effective tool when it’s used to add spice, to a particular character or to enhance an aspect of the story,” Eck said. “The user interface also has to be right. The Wordcraft tool was designed from the ground up to enable writers to interact with generative models.”
Learning for Code is a project that uses LLMs to generate code suggestions for developers. Google is currently testing the tool internally and it includes single- and multiple-line code completion suggestions.
AudioLM uses language models to generate audio. The model takes an audio sample as input and continues it. It can be used to generate music and speech.
Perhaps the most advanced models that Eck presented at AI@ ’22 were the text-to-image models Imagen and Parti. Imagen works like OpenAI’s DALL-E 2 and uses diffusion models to turn language embeddings into images. Parti uses the transformer architecture to generate images from text tokens. DreamBooth is a model that can tune a text-to-image generator like Imagen to display a subject in different contexts. And DreamFusion combines the power of diffusion models with neural radiance fields (NeRF), a deep learning architecture that can create 3D models from 2D images.
Eck also showed a preview of Google’s research in video generation with Imagen Video and Phenaki. Imagen Video uses diffusion models to create a series of high-resolution images that can be stitched together to create videos. Phenaki is based on the transformer architecture and turns a sequence of text prompts into a sequence of images. Eck also showed how the combination of Imagen Video and Phenaki could be used to create high-resolution videos from prompt sequences.
Google’s policy for generative models
One thing that Eck made clear throughout the presentation is that generative models are not meant to automate or replace human creativity.
“It’s no longer about a generative model that creates a realistic picture. It’s about making something that you created yourself,” Eck said. “Technology should serve our need to have agency and creative control over what we do.”
He further emphasized the point when he discussed Google’s “responsible AI” strategy and finished his presentation by saying, “Creativity is an important part of what makes us all human. I think it’s important to keep this in mind when we build these AI systems.”
Aside from the PR aspect of this rhetoric, which is meant to assuage fears of generative AI models replacing human creativity (which is largely exaggerated), the emphasis on control has the positive impact of steering the field toward human-centered AI. AI systems should be designed in a way that provides transparency and control to augment humans. Without human control and oversight, AI systems like generative models will underperform because they don’t have the same grasp of fundamental concepts as we humans do.
Can Google compete in the generative AI space?
The gap between AI research and productization can be very difficult to bridge. Surely, the quality of Google’s LLMs and text-to-image models do not fall short of OpenAI’s GPT-3 and DALL-E 2. But the question is, can Google deliver a successful product based on these models?
There are several aspects to consider when thinking about productizing a technology. Will the technology become the basis for a new product? If not, will it be integrated into existing products? What problem is it solving and what are the alternative solutions that currently exist? Does the product provide enough added value to convince users to switch? Can it help solidify the company’s position in an existing market?
Naturally, companies will try to reach the low-hanging fruit, which is bringing the technology to the markets where they already excel. In the writing space, Microsoft has the lead over Google. Office 365 has a larger market share than G Suite and Microsoft already has a head start in integrating LLMs into its products.
Microsoft also has the lead in coding with GitHub Copilot and Codex, which have already entered production mode as opposed to Google’s internal code generation tool. Google’s most popular development tools are Colab and Android Studio, which will provide it with a venue to test and roll out its code AI when it is ready. But the market share of these IDEs is not comparable to Microsoft’s Visual Studio Code and GitHub Codespaces (also owned by Microsoft).
In the image, video, and audio domains, I think that Adobe will be the winner of generative AI. Adobe already has the largest market share and well-established tools that are being updated with AI features regularly. And Adobe is already experimenting with generative AI tools in its suite of tools.
However, this doesn’t mean that the incumbents are bound to dominate the field of generative AI. For the moment, we’re looking at generative models from the perspective of the tools we’re using today, such as word processors, IDEs, and image editing applications. Basically, we’re looking at how generative models can automate or improve the tasks we’re already doing (completing our sentences, writing chunks of code, editing or generating photos, etc.). The real potential of AI will be fulfilled when we create new systems of tools and workflows that can take full advantage of the growing power of generative models and other advances in AI to do things in totally different ways (I have a few ideas here that I will elaborate on in the future).
Just as Google reinvented information discovery and Amazon reshaped shopping with the popularization of the web, companies that discover and embrace the new opportunities of AI are bound to revamp existing markets or create new ones.