The problem of Data on Physical AI
On the post about physical AI we talked about the idea of a revolution that will be the next big thing. Here I want to talk about the most important challenge humanity will face in the next decade.
The physical internet
As we have seen recently the arising of super intelligent AI, with the mass adoption of LLMs such as chatGPT. We could see the power of intelligent systems and large collections of data can do. The 2 main things that made this revolution possible are:
- The insane amount of data in form of text and images the internet holds.
- The invention of the transformer, the model that power all chatbots right now.
While the first comes from a system that has been already built and used for 30 years. The other one came as the trigger that started this revolution for bringing a vessel to store that data, while generalizing it good enough in order to be useful.
For physical there’s another story. There’s no internet of robot data. LLMs train on billions of image-text pairs, while today’s robotics foundation models train on tens of thousands of examples. There is no preexisting repository of physical interactions to reference. Unlike text or images, robotic manipulation data can’t be scraped from the web. It must be collected, one interaction at a time, in the real world.
Today’s relevant open-source datasets, including DROID and Open X-Embodiment, offer only about 5,000 hours of interaction data combined, far too little for Physical AI to handle real-world complexity. Most of these datasets are collected by research institutions using platforms such as ROS or Franka World for collecting and processing multimodal data.
Virtual Worlds
The solution that Nvidia brings is Simulation. By creating a world representation, we could put virutal robots on environments and run millions of experiments. By this we collect sythetic data that has a similarity with the real world. This allows to have control on the experiments and iterate much faster to obtain the data necessary for a model to learn. For example, we put a robot in a 3D model of a kitchen, we simulate a camera and joint angle simulation, and we can teach it to open a fridge.
Even though this seems like a good approach, real world data is necessary for brigning an accurate representation of the physics involved. For example, as a first step, robotic companies such as Sanctuary AI started with teleoperation (having an operator control one or more robots) while collecting data. The goal is to use this data to train the robots to operate autonomously at a later point.
Developers can use technologies provided by Nvidia to produce numerous plausible futures or variations based on real or synthetic data, serving as an automatic data multiplier grounded in real physics.
Day 1
This is Day 1 for physical AI, there is no clear path to follow and how it will develop in the future. It’s clear that the challenges people had building the internet back in the nineties will repeat on this new paradigm. However, this brings a new difficulty on the inter connectivity of computers with the real world, bringing problems to the table such as: the need for new hardware that could be built on industrial environments, the amount of multi modal data that needs to be stored and processed, the increase of accessibility and easiness of use of industrial software for everyone.
I believe in the next 2-3 years, physical AI will be in the mouth of everyone such us how agents are now. This will open doors for new investment on companies such as Autentio, with the power of bringing disruptive technology to factories and make it available to everyone. With this in mind, we need to prepare for what comes next.