The “GPT moment” for AI robots is near

Peter Chen He is the CEO and co-founder of a variable, the world’s leading robotics company in the field of artificial intelligence. Before founding Covariant, Peter was a research scientist at OpenAI and a researcher at Berkeley Artificial Intelligence Research Lab (BAIR), where he focused on reinforcement learning, meta-learning, and unsupervised learning.

Contents

What enables GPT to succeed?Base model approach Training on a large, proprietary, high-quality dataset The role of reinforcement learning (RL)The next frontier for foundational models is in the field of robotics Base model approach Training on a large, proprietary, high-quality dataset The role of reinforcement learning Tremendous and challenging growth is coming The “GPT moment” for AI robots is near

It’s no secret that core models have transformed AI in the digital world. Large Language Models (LLMs) such as ChatGPT, LLaMA, and Bard have revolutionized language AI. Although OpenAI’s GPT models are not the only large language model available, they have achieved the most popular recognition by taking text and image inputs and providing human-like responses – even with some tasks that require complex problem solving and advanced thinking.

The viral and widespread adoption of ChatGPT has greatly shaped how society understands this new AI moment.

The next advance that will define artificial intelligence for many generations is robotics. Building AI-powered robots that can learn how to interact with the physical world will enhance all forms of repetitive work in sectors ranging from logistics, transportation, and manufacturing to retail, agriculture, and even healthcare. It will also unlock many efficiencies in the physical world as we have seen in the digital world over the past few decades.

While there is a unique set of problems to solve within robotics compared to language, there are similarities across the underlying core concepts. Some of the brightest minds in AI have made significant progress in building GPT Robotics.

What enables GPT to succeed?

To understand how the “GPT for Robotics” was built, first look at the key pillars that have enabled the success of MBAs like GPT.

Base model approach

GPT is an artificial intelligence model trained on a wide and diverse data set. Previously, engineers collected data and trained a specific AI for a specific problem. Then they will need to collect new data to solve another problem. Another problem? New data again. Now, with the basic model approach, exactly the opposite is happening.

Instead of building specialized AI systems for each use case, one can be used universally. This general model is more successful than any specialized model. AI in the basic model works best on one specific task. He can benefit from lessons learned from other tasks and generalize to new tasks better because he has learned additional skills from having to perform well across a variety of tasks.

Training on a large, proprietary, high-quality dataset

To have generalized artificial intelligence, you first need access to a huge amount of diverse data. OpenAI obtained the real-world data needed to train GPT models reasonably efficiently. GPT is trained on data collected from the entire Internet using a large and diverse dataset, including books, news articles, social media posts, code, and more.

Building AI-powered robots that can learn how to interact with the physical world will enhance all forms of repetitive work.

It’s not just the size of the data set that matters; Organizing high-quality, valuable data also plays a big role. GPT models achieve unprecedented performance because their high-quality datasets are based mostly on the tasks that users care about and the answers that are most useful.

The role of reinforcement learning (RL)

OpenAI uses Reinforcement Learning from Human Feedback (RLHF) to align the model’s response with human preferences (i.e., what is considered useful to the user). There has to be more than just supervised learning (SL) because supervised learning can only deal with a problem through a clear pattern or set of examples. LLMs require AI to achieve a goal without having a correct, unique answer. Enter RLHF.

RLHF allows the algorithm to move toward the goal through trial and error while the human acknowledges correct answers (high reward) or rejects incorrect answers (low reward). AI finds the reward function that best explains human preference and then uses RL to figure out how to get there. ChatGPT can provide responses that mirror or exceed human level capabilities by learning from human feedback.

The next frontier for foundational models is in the field of robotics

The same basic technology that allows GPTs to see, think, and even speak, also enables machines to see, think, and act. Robots supported by a basic model can understand the physical environment around them, make informed decisions, and adapt their actions to changing circumstances.

“GPT for Robotics” is being built in the same way as GPT, laying the foundation for a revolution that will redefine AI as we know it once again.

Base model approach

By taking a basic model approach, you can also create a single AI that works across multiple tasks in the physical world. A few years ago, experts advised creating specialized artificial intelligence for robots that pick and pack groceries. This is different from the model that can sort different electrical parts, which is different from the model of unloading pallets from the truck.

This paradigm shift to the more basic model enables AI to better respond to edge-case scenarios that often exist in unstructured real-world environments and that may hinder models with narrower training. Building a single, generalized AI for all these scenarios is much more successful. By training on everything you get the human-level independence that we missed from previous generations of robots.

Training on a large, proprietary, high-quality dataset

Teaching a robot to know what actions lead to success and what lead to failure is very difficult. It requires large-scale, high-quality data based on real-world physical interactions. Individual laboratory settings or video examples are not reliable or robust enough sources (e.g., YouTube videos fail to translate the details of physical interaction and academic datasets tend to be limited in scope).

Unlike AI for language or image processing, there is no pre-existing dataset that represents how robots interact with the physical world. Therefore, a large, high-quality dataset becomes a more complex challenge in robotics, and deploying a fleet of robots in production is the only way to build a diverse dataset.

The role of reinforcement learning

Similar to answering text questions with human-level abilities, automated control and manipulation requires the agent to strive to make progress toward a goal that does not have a single correct, unique answer (e.g., “What is a successful way to catch this red onion?”). Again, more than just supervised learning is required.

You need a robot that runs deep reinforcement learning (deep RL) to succeed in robotics. This autonomous, self-learning approach combines deep learning with deep neural networks to unlock higher levels of performance – the AI will automatically adapt its learning strategies and continue to fine-tune its skills as it experiences new scenarios.

Tremendous and challenging growth is coming

In the past few years, some of the world’s brightest AI and robotics experts have laid the technical and business foundation for the robotic foundation model revolution that will redefine the future of AI.

While these AI models are built similarly to GPT, achieving human-level autonomy in the physical world presents a different scientific challenge for two reasons:

Building an AI-based product that can serve a variety of real-world settings requires an impressive set of complex physical requirements. AI must adapt to different device applications, as it is doubtful that a single device will work across different industries (logistics, transportation, manufacturing, retail, agriculture, healthcare, etc.) and activities within each sector.
Warehouses and distribution centers are an ideal learning environment for AI models in the physical world. It is common to have hundreds of thousands or even millions of different stock keeping units (SKUs) flowing through any facility at any given moment – providing the large, proprietary, high-quality data set needed to train “GPT for robots.”

The “GPT moment” for AI robots is near

The growth trajectory of robotic foundation models is accelerating at a very rapid pace. Robotic applications, especially in tasks requiring precise object manipulation, are already being implemented in real-world production environments – and we will see a huge number of commercially viable robotic applications being widely deployed in 2024.

Chen has published more than 30 academic papers that have appeared in top international journals specialized in artificial intelligence and machine learning.