2025-07-30 17:02 UTCIn-site rewrite18 min readUpdated: 2026-06-27 00:25 UTC

Robotics Levels of Autonomy

Robots have powered manufacturing for decades, yet they stayed single-purpose and thrived only in perfect settings. Previous attempts at intelligent machines overpromised and underdelivered. But they were too early. Today, modern AI paradigms convert most robot roadblocks into data problems and push machines toward capabilities once thought impossible. As these models absorb real-world experience, robots will sharpen current skills, gain new ones, and deploy faster, absorbing ever-increasing shares of labor.

SourceSemiAnalysisAuthor: Reyk Knuhtsen

Mar 11, 2025

America Is Missing The New Labor Economy – Robotics Part 1

Dylan Patel, Reyk Knuhtsen, Niko Ciminelli, Jeremie Eliahou Ontiveros, Joe Ryu, Robert Ghilduta

General-purpose robots that can accurately perform any task in any domain is now an inevitability, and mass labor replacement is on the horizon. However, these robots will arrive in levels, slowly adding more capabilities until all tasks are feasible. To provide a barometer for this progress, we introduce our industry-first “Robotics Levels of Autonomy,” which classifies robotics into 5 distinct Levels.

Source: SemiAnalysis

Each Level of Autonomy is defined by the capability unlocked, and each builds sequentially on those before it to enable new applications. To ground these Levels, we provide data-driven analysis of current deployments, use cases and economics, current challenges, and active areas of progress. The Levels provide a type of task segmentation in which progress is additive — robots may target one Level of tasks and still benefit from capabilities developed in other Levels.

Our Levels of Autonomy are demarcated around commercial viability — not merely what is possible. Robot autonomy is inherently linked to applications: creating value only through actions often irrecoverable. Therefore, capabilities are derived from reliability and capability. Once reliability is proven, the robot must deliver sufficient throughput to justify its cost as well.

Thank You

We’ve talked extensively to top scientists, surveyed numerous companies, traveled to top industry conferences, and dug into research surrounding contemporary robotics to develop this taxonomy.

We deeply appreciate the invaluable contribution of our coauthors: industry practitioners Niko Ciminelli, Joe Ryu, and Robert Ghilduta. We take inspiration from coauthor Joe Ryu’s framework to flesh out this classification. This project couldn’t be done without the help of outside experts.

Source: SemiAnalysis

We welcome feedback: Please reach out to discuss anything regarding our new Levels of Autonomy classification. You can meet us in person at most of the top industry events, such as Humanoids Summit SF, CoRL, Humanoids 2025 Seoul, and more.

Describing Autonomy

The path to full autonomy begins with accurate, single-purpose systems. But general-purpose robots must start anew, learning to see, plan, interact, and achieve exceptional accuracy. Along the way, their capabilities, applications, and challenges may vary widely. Each Level can be adequately explained across the two axes below, Agency and Dexterity:

Source: SemiAnalysis

By mapping gains along Agency and Dexterity, the framework shows what has been achieved, where the field now stands, and what to anticipate in the years ahead.

Currently, general-purpose robots are already working in early production phases in Level 2, but largely remain outside of the public eye. In Level 3, general-purpose robots are in early pilot stages of automating low-skill jobs and showing themselves to work. While we are early, this evolution will accelerate faster than most realize.

Executive Summary

Level 0: Scripted Motion – Robots are pre-programmed entirely, requiring static environments and tasks to function.

Unlock: High Accuracy, High Repeatability

Capabilities: 24/7 Automation, High Throughput

Deployment and Use Cases (2025): Industry standard in automotive and electronics factories

Source: Siemens

Level 1: Intelligent Pick and Place – Robots can identify items in various positions and pick them for sorting.

Unlock: Generalizable Perception, Generalizable Grasping

Capabilities: Stationary Pick and Place

Deployment and Use Cases (2025): Adopted in parcel logistics centers for pick and place sorting, increasing penetration in additional warehousing markets as capabilities and integrations improve

Source: Covariant

Level 2: Autonomous Mobility – Robots can understand the open world, navigate, and traverse various terrains.

Unlock: High-level Planning, Spatial Reasoning, Robust Locomotion

Capabilities: Open world Navigation and Traversal

Deployment and Use Cases (2025): Early production phases for inspection and data collection roles, e.g. construction sites, oil & gas refineries, critical infrastructure, etc

Source: TechEBlog

Level 3: Low-skill Manipulation – Robots can perform basic, noncritical, low-skill tasks.

Unlock: Generalizable Manipulation

Capabilities: Advanced Pick and Place, Mobile Manipulation

Deployment and Use Cases (2025): Early pilot stages in kitchens, laundromats, manufacturing, and logistics

Source: Interesting Engineering

Level 4: Force-dependent Tasks– Robots can perform delicate tasks that require force and weight understanding, e.g. finding a phone in a pocket, driving a screw on the correct threads, etc.

Unlock: In Research

Capabilities: Delicate, Force-dependent Tasks, Fine-grain Manipulation

Deployment and Use Cases (2025): In Research

Source: Feel The Force

Level 0 – Scripted Motion

Source: SemiAnalysis

To understand the shift in robotics, we must first look from where it’s departing. When most think of robots, they picture Level 0: the automation that has dominated factories for decades, helping manufacture cars, electronics, planes, etc. The robots performing these tasks have incredible power, speed, and precision, but they operate with no intelligence, only via strict programming and perfect tasks/environments. Lacking entirely in autonomy, they are primarily monuments to industrial engineering and capital expenditure. They represent the rigid, single-purpose robotics world, and understanding their nature is paramount to seeing the monumental shift toward general-purpose robotics.

Current View

Deployments and Considerations: Locked Away

Source: SemiAnalysis

In Level 0, robots lack the ability to autonomously perceive and react to their environment, and the environment must be perfectly engineered for them. Everything is done on the robot’s terms, and everything and everyone else must comply.

This leads to the core of Level 0 deployments: the “cell.” The robot lives in a cage, fenced off for a number of reasons and with special designs:

Safety for the humans around the robot. These robots may be purpose-built for heavy lifting, making them extremely powerful. However, a lack of computer vision and autonomy means these robots will not adapt to a human in their environment, and will continue their action. Instead, the safeguards in place are typically Emergency-stop (Estop) buttons, light curtains, control barrier functions, but in a complex world, this may not be reliable enough for human safety

The cell isolates the robots to limit external interference or perturbations that can alter their environment, their positioning, or any aspect of the task at hand

Each cell is tailored to the robot and the location, making installation and programming of the task at hand simpler

Source: arm

This rigidity of Level 0 turns automation into an industrial engineering project. A new large-scale automotive assembly line can cost upwards of $10M-$60M and take years to build. An industry representative joked that these projects have “birthdays,” taking multiple years to complete. Retrofitting an existing factory is even harder, and for a unique system, the integration cost is extremely expensive.

Integration: 4x to 6x The Cost of The Robots Themselves

Because retrofit costs vary widely, let’s ground this in a concrete scenario of a medium-scale automotive facility retrofitted with a brand new, unique body-in-white assembly line–assembling the welded frame.

Source: KUKA, Body-in-white assembly line

Typically, the same system integrator and robot brand+software will have to be used to ensure no chance of breaking the factory’s flow with new systems. The total integration can cost roughly 4x-6x the robots themselves in the end. Construction and deployment of cells, configuring related systems like (PLCs, conveyor/line tracks, MES, etc), and installation + testing racks up a big pricetag. A Proof of Principle (PoP), like a physical mockup of the line, can be built first to test out the system (which most should do), but most opt out of this unless it’s highly unique, like for pharmaceuticals. We remark that for standardized automotive solutions, this may run rather ~70% of the robot CapEx.

Source: SemiAnalysis Estimates

However, this immense cost and complexity is the reason automation has been historically confined to high-volume, low-mix industries like automotive and electronics. It is a tool for the capital-rich and often boxes out most medium-size or smaller facilities from implementing any amount of automation. General-purpose robotics, by contrast, aims to remove these barriers to entry in the later Levels.

Implications: Efficiency and Dark Factories

At Level 0, robots have become widespread additions to a few industries. Automotive factories often use between 400-1000 industrial robots per factory, some even reporting use of up to 1650. In electronics manufacturing robot usage is less, around 50-200 robots in a facility. These could be AMRs performing transport, SCARAs used for statically mounting parts onto circuit boards, CNC machines milling pieces of hardware, or cobots for machine tending.

Source: SemiAnalysis

The automotive line can pay for itself in under two years and afterward, operating costs are nearly ~75% cheaper. Industry representatives have said that after the payback period, these factories are “printing money.” Some facilities can even reach up to 2,000 cars per day, and warehouse arms can often do the work of ~10 people with no fatigue. The efficiency of robots executing Level 0 tasks warranted Amazon’s hundreds of thousands of robots. For example, 50 robots might perform the large assembly and manipulation work of 200 laborers at ~73% lower costs per job.

Source: SemiAnalysis Estimates

The pinnacle of this paradigm is the “Dark Factory,” a facility run entirely by robots without the need for lights. A representative from FANUC says there’s a factory in Japan where their robots are building one robot every 80 seconds. While this is the apex of industrial automation, it is still categorized as Level 0. The robots are entirely pre-programmed, the environment/task is perfectly sterile and controlled, and bears no resemblance to the dynamic, non-engineered environments of human labor. Instead, the task and environment are perfectly crafted for these robots to perform, maintain themselves, swap their own tools, and schedule downtime ahead of time for a human to come in and repair an issue.

Subscribe to get notified of all SemiAnalysis articles

Enter the code sent to your email

Resend email

Please verify your email address to proceed.

By subscribing, you agree to the Privacy Policy and Terms and Conditions.

Current Challenges: The Issue of Rigidity

The fundamental difficulty of Level 0 is the robot’s total lack of Autonomy. The robot cannot diagnose or solve a problem on its own, and this creates a host of issues down the line:

Constant Oversight: Human technicians must always be on-site (except in dark factories). Ratios may range from 20:1 robots:humans, and down toward 12-15:1 for demanding industrial settings. Most of the time, when humans take a lunch break or swap shifts, the robots have to be stopped as well. If these robots fail, downtime can be incredibly costly, like $2M per hour in automotive, or $50M/day in semiconductor fabs.

Source: SemiAnalysis Wafer Fab Equipment Model

Capital Incineration: A small error in programming, poor integration, or failure of two systems to sync can render an entire multi-million dollar factory non-functional. The factory now turns into an industrial engineering project, and this risk is too high for smaller companies, cutting them out of the automation market

Inflexibility: Amazon, a powerhouse of industrial engineering, has to build their fulfillment centers around these non-autonomous robots. In fact, instead of making the robots more collaborative/intelligent, they found it easier to change the workers by designing a special safety vest, slowing the robots when the worker was nearby.

Looking Forward

Promising Sources of Progress

For Level 0, we see costs decreasing as a significant path forward. As real wages climb and industrial robots’ prices drop, they become even more attractive. This should continue over time. Robotics being a manufactured good means that as manufacturing processes improve, production increases, and economies of scale take over, the robots should become more cost-efficient.

Source: Ark Invest, FRED

This would lower the barriers to entry for a much wider market, enabling a broader adoption of robots performing Level 0 tasks without requiring industrial engineering prowess or massive CapEx.

Additionally, most provide an equipment monitoring system, like FANUC’s Zero Downtime solution. These enable the robots to predict their failures ahead of time, reducing the need for constant oversight and bolstering dark factory potential. While substantial, they are fairly new, and constantly improving themselves.

Finally, integration of these robots might be streamlined through a more “unified” industrial software. Instead of deploying only one brand of robot, system integrators could then plug-and-play multiple brands/setups, creating less finicky automation systems faster and cheaper.

While these would improve scripted motion systems, the challenge lies in their perfected applications. These robots only function in static, engineered worlds, but genuine labor replacement requires adaptivity and autonomy. As Level 1 will highlight, perceiving a changing task and adapting was not as simple as it sounds.

Level 1: Intelligent Pick and Place

Source: SemiAnalysis

In Level 1, robots can now see. Around 2015, we saw the first injection of intelligence into robotics, creating a new Level of Autonomy that would later attempt commercialization around 2018. In this Level we will focus on the era from 2015-2022, before foundation models arrived.

Robots first broke away from Level 0’s static tasks when they shifted into “pick and place,” picking an item from area A and placing it in area B. Pick and place lives in a non-perfect domain where objects, configurations, and lighting may all change. The robot must generalize its perception to determine the object and its pose, and tweak its grasp accordingly– a task impossible for Level 0 robots. Large-scale datasets, and smaller but vital grasping datasets, powered this attempt into Level 1 autonomy by unlocking a piece of Dexterity: Generalization, especially in perception. With enough data, the robot could recognize objects, sometimes novel, in various poses and angle its grasp for picking.

Source: Google Research

The commercialization attempts went toward warehouse and logistics “pick and place” roles, slotting robots near sorting lines to organize non-delicate items by picking the object from bin A and placing it in bin B. However, this first attempt at intelligent robots was bottlenecked by insufficient data, nascent AI models, demanding throughputs, and high costs, all leading to unproven ROIs. During the years of 2015-2022, a few companies built “arm farms,” performing months of grasps to accrue enough data for training. Pick success eventually rose to 99% percent, but the “last millimeter” to 99.99% percent was almost as difficult, and even this sometimes wasn’t enough to prove ROI. Level 1 saw a valiant first step, and consistently displayed linear improvement over time, but this ultimately highlighted how many challenges remained in robot autonomy.

Nowadays, some companies have continued to reap the benefits of linear improvement, and advancements in AI models and deployment solutions have created a new viability for robots targeting Level 1’s pick and place. These robots are currently ironing out their remaining Challenges to become more capable than the original attempts.

Source: Covariant

A Look at The Past – 2015-2022

Adapting to Novelty

While pick and place is simple for humans, the non-static nature was a massive hurdle for a robot. Items, sometimes novel, can arrive jumbled, occluded, or presented in new ways. Each of these variables, along with challenges like shadows, reflections, or transparent objects, could cause the robot’s early perception systems to falter. They might misidentify an item, misjudge its position and shape, and ultimately fail the grasp altogether. This was a level of chaos beyond Level 0. What was missing?

Problem 1: Seeing and Understanding – Before Level 1, cameras on robots were mainly to verify that an action and task had been completed. However, autonomously picking an object from a cluttered bin requires generalizable perception –perception capable of adapting across novel scenarios. In Level 1’s pick and place, this is identifying an item, discerning it from the clutter, and estimating its shape and pose. Broad visual reasoning like this could come from today’s Vision-Language-Models (VLMs), but this 2015-2022 era mainly used neural networks that needed large, annotated datasets of application-specific images that didn’t quite exist for robotics applications.

Source: Sick

Problem 2: Learning to Grasp – After identifying the item, the robot needs to grasp it without picking its neighbors too. This demands grasping that can generalize to new situations each time, but learning this requires masses of trial and error data. In 2015-2022, open-source communities and crowdsourced data were not as large as today, so data collection came from real-world, expensive robots repeatedly attempting slow grasps. Simulators, where robots can act and gather data in a virtual setting, were not sufficiently robust at the time to replace physical data. They suffered from what’s called the “sim2real” gap, in which physics, environments, and actions in simulation didn’t match reality. The sim2real gap was significantly more challenging in this era, and still isn’t solved today.

Beginning Sparks

The first signs of a solution to these two problems came from the computer vision world, enabling generalizable perception. The creation of the large-scale ImageNet dataset (2009) and the success of neural networks like AlexNet (2012) showcased the potential for computer vision. This then sparked many new projects, like YOLOv1 (2015), which allowed for real-time object and bounding box detection for locating objects, Mask R-CNN (2017) then enabled shape estimations with “masks” to segment objects from the rest, and finally PoseCNN (2018), which tied it together with 6D pose estimation of objects with just a stereo camera. With these efforts, models had early generalizable perception, capable of understanding multiple objects in multiple contexts for the first time.

Source: YOLO

While perception was finally generalizable, it was still brittle. Systems were still easily confused by novel objects, reflective or transparent objects, shadows, or too much clutter. However, in this era of 2015-2022, many saw these advances as a chance to support perception in robotics; maybe the robot could now generalize perception to identify the object and its pose for picking.

Source: PoseCNN

This breakthrough in perceptual abilities fired up researchers to attempt amassing robotic grasping datasets, where some like Pinto & Gupta (2015) showed 700 hours of robotic grasping attempts enabled their robot to reach 80% grasp accuracy.

Source: Arxiv

While this “adaptive grasping” was monumental for robotics, 80% does not meet the threshold for most commercial applications. Each failed pick typically couldn’t be resolved by the robot due to its lack of autonomy, and 40% of the time required human intervention. Since these were often unsafe industrial arms, the human had to pause the whole warehouse line, solve the issue, and resume the process, leading to an average Mean Time to Recovery of ~6 minutes.

Source: SemiAnalysis Estimates

More projects came out after scaling data showed promise in robot learning, like Levine et al. (2016) who released approximately 3000 hours of grasping data and achieved 94.6% grasp prediction accuracy after fine-tuning. However, the large-scale datasets were mainly coming from the computer vision side, and robotics was left with much less grasping data to work with.

Even with the modern booms in robot data, the field is still tiny, and data was substantially less during this era.

Source: Colossus, One of the largest current robot action datasets vs LLM common dataset

In the end, some companies decided the best approach to learning grasping was generating massive data themselves via arm farms. They did gather huge datasets over several months, but often 99% success rates weren’t enough. Worse, the jump to 99.99% is an 81x improvement, larger than the initial 1%-80%. Some were able to reach even this, but it became a Sisyphean task as each novel item and botched grasp set the percentage back. However, their challenging integrations and low autonomy ultimately bottlenecked many companies the most, and still pose an issue today.

Source: Google Research

Subscribe to get notified of all SemiAnalysis articles

Enter the code sent to your email

Resend email

Please verify your email address to proceed.

By subscribing, you agree to the Privacy Policy and Terms and Conditions.

Deployments and Considerations: The Wild West

Source: SemiAnalysis

During the 2015-2022 era, integrating these AI robots into the optimized, unforgiving warehouse environment — where 98%-99% of the deliveries are on time — became a “Wild West” of hazy estimates and improvised solutions. Unlike a Level 0 project, the challenges for these pick and place robots are not only physical but also informational.

Integration of these arms and cells into a warehouse line might cost $90K-$180K. But the robot also had a new custom API that had to complete a “handshake” with the facility’s Warehouse Management System (WMS) which coordinates all inventory and logistics. Oftentimes, the robot’s API was not built with a WMS in mind. As a result, the WMS had to update to accommodate this handshake gap, and a failed WMS update can cost tens of millions of dollars. As a workaround solution, a third party integrator might be used and charge up to hundreds of thousands for deployment. Most of the time stop-gap fixes were used to sync the systems instead, like GUI automation agents, a program that merely emulates a human clicking the right buttons.

Source: WAP

Because the robots needed a full cell installation, integrators chose ideal, cheaper spots, such as a pick and place station between two horizontal conveyors. Difficult locations, like the vertical putwalls, were skipped, because most robots learned in horizontal applications, and reconfiguring the line to horizontal was costly. By screening proper locations, installations might only take up to 4 weeks.

Source: YouTube

Even then, the warehouse’s decision cycles to implement these robots might take months, and some clients would end up choosing to install the robots only in their own isolated area, away from the opportunity of breaking the warehouse’s process flow.

Nonetheless, the autonomy of these robots was still too low. Employees often weren’t “replaced,” but reorganized around the robots. They typically performed the pre/post-processing of the line of the robot, or became robot technicians.

Implications: A Narrow Market of Profitability

The promise of Level 1 was enormous: the automation of low-skill, high-turnover pick-and-place jobs– a new market for the robots. The task’s basic nature made it seem like a good fit for a robot that could do just that: Pick and place. Businesses had large incentives for automation, as wages were packed with “loaded costs.” For example, we’ve heard Amazon sees a turnover rate of 2%-4% per week. This means that for every 100 workers on the floor, by year’s end 104 workers may have quit. Thus having to constantly hire, onboard, train, and ramp up productivity renders the wage 56% higher than with no attrition in the workforce. In fact, Amazon currently has a crisis of having already cycled through every low-wage worker in some regions.

Source: SemiAnalysis Estimates

Not only is the cost enormous, but the logistics of constantly hiring new employees are burdensome, and many hiring waves may result in most quitting within the first week. These challenges and costs made the role ripe for an AI-driven robot as a viable, consistent labor replacement. However, many found that the business case was highly dependent on the specifics of the task.

Consider a high-mix, low-throughput task like e-commerce fulfillment, where a robot must pick a wide variety of items at a modest pace.

Source: SemiAnalysis Estimates

Dividing cumulative picks by cumulative cost, we show below how cost per pick evolves over time. In the e-commerce case, the cost per pick of the robot doesn’t drop below a human’s for 3.5 years, and their effective pick-rate remains below a human’s, with 11 robots doing the work of 9 humans.

Source: SemiAnalysis Estimates

E-commerce-like warehouse lines posed an interesting challenge for our intelligent pick and place robots: matching a human accurately picking multiple items. While a human may pick 5 items at once, these robots would likely pick one at a time, falling short of human throughput. Then, if the robot could pick fast enough, the surrounding conveyor and pre/post-processing systems would be locked at certain speeds, or clogged by the humans on either end, limiting throughput again. The warehouse could potentially make up for this by installing more robots, but the cost grows prohibitively. Worse, the “high-mix” of items is likely to bring too novel of an object/scene to grasp successfully, so many had to reach 99%+ to mitigate the six minute downtimes. All in all, this configuration of specifics made it difficult to justify the cost of intelligent pick and place robots.

Source: University of Bonn, an example of a high-mix bin in a lab

However, Level 1 introduced a new upgrade: retries. If the task fails, the robot can detect the error and retry (a few times), whereas errors in Level 0 would freeze the process flow immediately. Let’s take for example “parcel” pick and place, where items arrive in parcels – uniform boxes and packages with labels.

Source: DVZ

Parcel pick and place benefits intelligent pick and place robots two-fold: the packages can be heavy, fatiguing humans for lower throughput benchmarks, and they’re fairly uniform, so failed picks can be retried and resolved easier since it’s likely not a generalization error as it was in ecommerce.

The robot can target 550 picks/hour, but even 95% accuracy in this domain delivers an effective pick-rate of 520.

Source: SemiAnalysis Estimates

In this case, we see 10 robots doing the work of 23 human workers. In these conditions, the robot cost per pick drops below human rates just after one year.

Source: SemiAnalysis Estimates

The robots targeting Level 1’s pick and place found a niche, but only in very specific domains of pick and place. While it’s easy to look back and understand what worked and what didn’t, this was new at the time. We’ve asked some companies why they didn’t target parcel domains to begin with, one paraphrased answer exemplifies the era: we didn’t know, we realized too late. This was the first foray into intelligent robotics, and while parcel is a smaller market than ecommerce, autonomy was simply too early.

Subscribe to get notified of all SemiAnalysis articles

Enter the code sent to your email

Resend email

Please verify your email address to proceed.

By subscribing, you agree to the Privacy Policy and Terms and Conditions.

Challenges: The Limits of Brittle Intelligence

For this pre-2023 time period, perception struggled. Some tried to skirt the fragile perception by replicating their deployment sites in their lab’s arm farms, following the same catalog of items the robot would pick, and even deploying with specific, static lighting fixtures. Most of these guardrails only partially resolved the shortcomings. Instead, 25% of Amazon’s item catalog is on the “exclusion list,” a list of objects not to be picked by the robot for risk of failure.

The Dexterity in this era was still nascent. If the company introduced a new item, they would test-run it with the robot, and if the pick fails 5-10 times, it’s placed on the same exclusion list instead. Additionally, without robust generalization the robot could pick multiple items at once, messing up the warehouse flow again. The generalization available at the time simply was not robust enough.

While other tasks seem relatively simple and a mere variation of pick and place, they can introduce different challenges. Take for example folding a shirt, which might seem like an easy task for a robot. In reality, objects like shirts are deformable and “high-dimensional.” The robot’s neural network would begin by cataloguing every wrinkle, crease, fold, etc on the shirt to understand it. This is called the “state explosion” problem, and it’s specifically difficult for Reinforcement learning in which the model tries to verify steps in the process as good or bad decisions, as now there can be an incredible amount of combinations. In Level 1, folding clothes became the “holy grail” for the era.

Source: Foldimate, instead, single-purpose machines were built for folding clothes

The Present Moment and Looking Forward

Promising Sources of Progress

The pre-2023 era was plagued with shortcomings, leaving many industry professionals battle-scarred. However, the companies of today targeting Level 1’s pick and place have created viable solutions, patching many challenges and refining their systems. Some are implementing end-to-end solutions to resolve unpickable items, lowering failure rates, recovery times, and mitigating the exclusion list downside. Modular systems can now skirt physical integrations and offer much higher throughputs, albeit at a higher cost. In-house operating systems now allow for streamlined, cheaper WMS integrations. Even simulators have stepped up enough to bootstrap basic parcel pick and place data.

But importantly, many robots are utilizing Foundation Models for deep, generalizable perception and spatial reasoning. We explore what foundation models provide for robots in Level 2.

Level 2 – Autonomous Mobility

Source: SemiAnalysis

In Level 2, robots gain general-purpose autonomy. They are now capable of planning their own tasks and traversing the open world autonomously. This capability was not feasible before, older models would be left confused due to the open world’s ever-changing scenes, terrains, and objects; rigid movement approaches would fall short in this chaos.

Instead, for Level 2, robots get Agency, gaining higher order planning and spatial reasoning from recent advancements in foundation models and Vision-Language Models (VLMs). Additionally, robots now have the Dexterity to traverse difficult terrains thanks to large scale reinforcement learning in simulation. This Dexterity in locomotion enables the robot to exhibit agility in its movements. Both approaches leverage massive digital datasets for learning, rather than collecting data on each scenario, mitigating the data scarcity challenges. In Level 2, robots autonomously perceive and understand their surroundings, plan a path, and use their robust locomotion to maneuver around the open-world on long time horizons.

The general-purpose robots of Level 2 are currently being deployed in early production phases for data collection and inspection roles in massive domains like construction sites, oil & gas refineries, and infrastructure sites. These sites are often too large to be effectively covered by humans, too large to be sensorized cheaply, too dangerous for humans, or too remote for cheap inspections by humans. Instead, these robots equipped with additional sensors can use their autonomy to plan and execute these roles.

These autonomous robots are the first proof of the general-purpose revolution. This leap into Agency reverberates throughout subsequent Levels, serving as the genesis for general-purpose robotics.

Current View

Entering the Open World and Agency

The central challenge of autonomous mobility is the open world, an environment with no rigid structure or predictability. Unlike the engineered environments for Levels 0-1, the open world is a chaotic collection of ever-changing scenes, obstacles, terrains, and weather. To operate here, a robot must surpass simple perception and classical, rigid planning toward scene understanding and higher order planning. However, early algorithms were not sufficiently robust to rise to the task.

Where am I? – Positioning Within An Environment

The open world does not always provide a static path, and the robot must determine where its own position is in relation to the environment, otherwise it may get lost. This requires constant map updates, and small position errors in the updates compound over time, turning inches of error into feet and leaving the robot confused. This error could be the difference between a fully charged robot and a dead machine on the floor. Advanced players might solve this without extra measures, but most might still use AprilTags, QR code like stickers for robotic reorientation and calibration, placed at the charging station. These can set fixed, pre-programmed paths or behaviors for which to guide the robot.

Source: New Atlas

First Solution: SLAM – The main engineering solution to this is Simultaneous Localization and Mapping (SLAM). Using sensors and data, like LiDARs, velocity, time, etc, the SLAM algorithm allows the robot to build a “map” of its surroundings while simultaneously keeping track of its own location within the map. However, SLAM is still limited to geometric representations. Open-world environments are constantly shifting and demand a more “cognitive” understanding to lessen this drift or error potential. SLAM may not be substantial on its own, but rather a complement.

Source: Geo Week News

Planning, Reasoning, and Scene Understanding

A robot may know its positioning, but it may still not know what to do or what’s around it unless explicitly programmed. To navigate a chaotic environment, the robot would need a more foundational understanding of its world. For example, a robot might need to both distinguish a black puddle from asphalt, and plan its next moves. A failure in perception might lead to mistaking the puddle as not a hazard. Or, a failure in planning might lead to the robot dodging the puddle at the wrong time.

The Breakthrough: Foundation Models

Recent foundation models give robots the missing pieces for reasoning and long-horizon planning. By training on an internet-scale text dataset, a robot no longer needs every situation laid out in code or explicitly learned via expensive, scarce real-world data; it can instead generalize a massive knowledge base to new contexts. These models can translate situations into step-by-step, natural language descriptions it can reason through, unlocking far broader capabilities.

Vision-Language Models (VLMs), a type of foundation model, can bridge the language and visual modalities, enabling visual reasoning and problem solving. These foundation models are trained on massive, internet-scale datasets of images, captions, and descriptions, and fine-tuned on robot-specific data to allow for better spatial reasoning. Now, robots can broadly generalize perception, mitigating the lack of robot perception data from before.

All of this constitutes the robot’s newfound Agency: generalizable planning, reasoning, and perception. The robot can now obey and follow instructions in many novel environments. For example, in the command “go to the stairs past the ladder,” the VLM would identify objects and their relationships, then translate the scene to the foundation model for the plan “move left of the ladder, then right toward the stairs.” This loop of “thinking” grants the robot the autonomy to perceive and navigate the open world on long time horizons

Source: Giphy

Agile Movement – Dexterity

This Agency is complemented by a gain in locomotive Dexterity. Instead of collecting data on many possible configurations via real life deployments, or hard-coding heaps of control, simulators step in hugely to tackle the locomotion Dexterity issue. Simulation environments now provide robust, extensive training platforms to rapidly iterate locomotion control policies on heaps of environmental configurations, often far more difficult than the deployment environment.

These simulators improved enough that much of these learned locomotion skills transfer to real world deployment with enough fine-tuning, significantly mitigating the “sim2real” gap. Now, the robot can use their new locomotion Dexterity to robustly, and agilely, traverse uneven ground, inclines, unstable ground (rocks, sands, construction pallets), and even locomote with a broken motor. With these unlocks, we see quadruped locomotion hitting its improvement inflection point in Level 2.

Source: YouTube

Hardware Boosts

Lastly, developments in hardware have enabled these robots’ autonomy by equipping them with adequate means. Onboard compute advancements, like the Nvidia Jetson, has enabled robots to ingest and process significantly more data. Multiple sensors, cameras, and LiDARs can now all be used to generate high-quality, real-time perception data, enabling rapid adaptation to the inherent randomness of the environment. Lastly, high efficiency actuators and enhanced batteries enable these robots to operate in the open world for long-horizon tasks.

Subscribe to get notified of all SemiAnalysis articles

Enter the code sent to your email

Resend email

Please verify your email address to proceed.

By subscribing, you agree to the Privacy Policy and Terms and Conditions.

Deployment and Considerations: Agents In The Open World

Source: SemiAnalysis

The robots of Level 2 may come in the following form factors:

Source: SemiAnalysis

But notably, the quadruped is unlocked. Advancements in large-scale simulation platforms enable robust control of their four legs to traverse with Dexterity, and their Agency can determine the scene and plan, both challenging prior to Level 2’s unlocks.

Source: Anybotics

Importantly, these robots in Level 2 no longer require millions of dollars in facility engineering. Their general-purpose autonomy means they can be deployed in a new environment in as little as 1-3 weeks to learn their domain and perform tasks reliably. However, battery duration determines how many robots or chargers will be needed for a site. Since quadrupeds might average 90 minutes of battery life, one might buy more quadrupeds or charging stations, ramping costs.

This freedom introduces a new question in the workplace: safety. In the case of the robots with autonomous mobility, there is no way to unplug them if they fall over or catch on fire. Their challenges shift more toward ensuring no property damage or human harm is done. For example, the open-world terrains may pose a danger to those around the robot, like a s