Blog

Figure 02: Observations, questions, and speculations

August 7, 2024

Yesterday, Figure revealed the second generation of its humanoid robot, and the demonstration is stunning. In two, years, the startup has made impressive progress and is catching up with some of the most prestigious robotics companies.

Figure 02 has a better robotics body, improved hardware, and many new features. However, Figure will have to do much more than demos to prove its new robot is promising. And the video also leaves a lot of room for speculation. Here is what we know and can guess about the new robot.

Robot upgrade

The most impressive part of the new announcement is the new robot, with a sleeker design, better battery life, and hands with more degrees of freedom. The robot will reportedly work for up to 20 hours with a single charge. The packaging has also been upgraded to hide the wiring and reduce safety risks.

Figure 02 has six cameras placed in the head, front torso, and back torso, which gives it a 360-degree view of its surroundings. The robot’s gait is still a bit clunky in comparison to Tesla’s Optimus and Boston Dynamics’ new Atlas. But it is still remarkably good.

With advances in vision-language models (VLMs) and vision-language-action models (VLAs), a lot of the discussion surrounding robotics has shifted to the machine learning layer. But pure robotics engineering and control still remains one of the hardest challenges of the field. And in this regard, Figure has done a great job.

Where is the model?

According to the demonstration video, Figure 02 is equipped with three times more compute power than its predecessor. It also runs its own onboard VLM and speech recognition models. However, the details are a bit confusing.

The original Figure robot reportedly used GPT-4 on the OpenAI cloud. In an X thread, Figure founder Brett Adcock posted a diagram of how the new robot works, which includes “Onboard mics + speakers connected to custom AI models trained in partnership with OpenAI.” Another post in the thread states that the onboard VLM “enables semantic grounding and fast common-sense visual reasoning from robot cameras.”

Speech-to-speech reasoning

Figure 02 is capable of speech-to-speech conversations w/ humans

Onboard mics + speakers connected to custom AI models trained in partnership with OpenAI

The default UI to our robot will be speech pic.twitter.com/hhxMdEsuqQ
— Brett Adcock (@adcock_brett) August 6, 2024

However, OpenAI is not in the business of deploying on-device models, and the diagram makes it clear that the reasoning and planning are done by OpenAI models before sending commands to the robot’s control system.

It will be interesting to see if there is a division of labor between cloud-based and on-device models. The arrangement of the input sensors will also be interesting to see. There is an open discussion on where to fuse different modalities. Does figure use early fusion, where vision and language are processed separately and blended, or will it do late fusion, where all modalities are tokenized and embedded together?

Learning on the job

How does Figure 02 learn to perform new tasks? The video shows a video of the robot working in a BMW factory with the label “100% autonomous neural network learned placement” and “self-correcting learned behavior.”

Another diagram in the X thread suggests that the model has a DataOps and MLOps pipeline that continuously gather new data, train new models, and deploy them on the fleet of robots.

Onboard CPU & GPU

Figure 02 has 3x the computation & AI inference available on-board compared to Figure 01

This enables real-world AI tasks to be performed fully autonomously

Our AI data engine will provide latest AI models that will run on-robot GPU pic.twitter.com/yWXqdKm1eK
— Brett Adcock (@adcock_brett) August 6, 2024

However, this raises more questions about the partnership between OpenAI and Figure and what kind of model training support they are providing them with. More details on this front could hint at OpenAI’s future plans for robotics.

I’m also interested to know if the robot uses any kind of in-context learning to dynamically adjust its behavior. One of the important features of LLMs and VLMs is to use examples and observations in their prompts to correct their responses. The models could use automated prompting techniques to analyze the outcome of their actions and adjust the robotic commands.

Exciting work, lots of questions. I’m excited to see how it pans out.

How Cursor’s Composer 2.5 uses self-distillation to beat the frontier LLMs…

Vertical integration as AI infrastructure: What 21D’s full arch implant system…

Why sandboxing OpenClaw doesn’t stop data exfiltration

Google brings multi-token prediction Gemma 4 LLMs

How Memory Sparse Attention scales LLM memory to 100 million tokens

Applied ML: When ‘perfect’ becomes the enemy of ‘good’

AI can’t replace software engineers yet, but here is how to…

How to turbocharge your product and market research with DeepSearch

How looking differently at data can save your machine learning project

Building a solid data foundation for generative AI applications

Why the future of agentic AI is all about the harness

The evolution of LLM tool-use from API calls to agentic applications

What makes DeepSeek-V3.2 so efficient?

What to know about Claude Opus 4.5

OpenAI’s GPT-5: A reality check for the AI hype train

AI is writing your code, but who’s reviewing it?

Machine learning in space: Building intelligent systems for the harshest environments

Decoding the brain, inspiring AI: How Rahul Biswas is bridging neuroscience…

The cash flow conundrum: How technology is reshaping small business finance

What to know about the security of open-source machine learning models

Figure 02: Observations, questions, and speculations

Robot upgrade

Where is the model?

Learning on the job

Like this:

Leave a ReplyCancel reply

Robot upgrade

Where is the model?

Learning on the job

Like this:

Leave a ReplyCancel reply

Discover more from TechTalks