arXiv Robotics AI News Source

Public articles 318Collected articles 350Trust 75Refresh 360 min

Health HealthySource type ResearchFull-text rights Full text allowedLast ingested 2026-06-26ID arxiv-cs-roStatus Enabled

Use abstract and metadata; check individual paper license before full text.

Latest public articles

Racing a Wheeled Quadruped: Active Load Transfer Mitigation via Model Predictive Control

2026-06-26 04:00 UTC

This paper presents a hierarchical control framework using model predictive control (MPC) and reinforcement learning (RL) for active roll control to manage lateral load transfer during autonomous racing of a wheeled quadruped. The framework integrates offline time-optimal raceline generation, an online MPC planner that actively minimizes the lateral Load Transfer Ratio (LTR), and a low-level, whole-body RL policy deployed directly onto the robot's 16 actuators. Physical experiments show that active roll control reduces mean LTR by up to 44%, improves fastest lap time by 8.7%, and boosts peak lateral acceleration by 21.3% to 1.98 m/s², maintaining robust high-speed stability.

Hierarchical control framework combining MPC and RL to actively manage lateral load transfer
Robot leg actuators act as active suspension, with knee joints generating anti-roll torque

NavIsaacLab: Generating Realistic Crowd via Parallel Robot Learning for Benchmarking Human-aware Navigation

2026-06-26 04:00 UTC

NavIsaacLab, a framework built on Isaac Lab, enables physics-based and photo-realistic simulations of pedestrians and scenes for benchmarking human-aware robot navigation. It leverages GPU parallel simulation and data-driven pedestrian models (trajectory diffusion + adversarial motion learning) to overcome the scarcity of diverse, high-quality scenario data, providing a robust benchmark for navigation algorithms.

NavIsaacLab uses photo-realistic rendering and GPU parallel simulation to provide real-time 3D visual feedback.
It employs a trajectory diffusion model and adversarial motion learning controller for realistic, controllable pedestrian motion.

TaskNPoint: How to Teach Your Humanoid to Hit a Backhand in Minutes

2026-06-26 04:00 UTC

This paper introduces TaskNPoint, a training protocol that enables humanoid robots to learn dynamic skills from a single human demonstration and under an hour of GPU training. By focusing on a critical interaction window, the protocol successfully taught a Unitree G1 humanoid to perform tennis strokes, soccer kicks, and pick-and-place tasks without per-task reward tuning.

TaskNPoint leverages a coach-learner division of labor with minimal human input.
Dynamic skill learning reduces to mastering a short crucial trajectory segment.

RoboTales: ROBOTic Anthropomorphic LEarning Systems

2026-06-26 04:00 UTC

RoboTales is a low-cost robotic storytelling system that animates narratives using expressive sock puppetry. Implemented autonomously on a Baxter robot as a test case, RoboTales synchronizes narration, gestures, and mouth movements to perform character-driven stories. In a pilot study, puppet-based storytelling outperformed a gesture-only mode, producing higher HRIES ratings and improved story recall, suggesting that embodied puppetry enhances engagement and narrative comprehension. Designed to be modular and platform-agnostic, RoboTales can be adapted to other manipulators and offers a screen-free alternative to passive media, supporting future deployment in child-centered learning environments.

RoboTales is a low-cost robotic storytelling system using expressive sock puppetry.
Autonomous implementation on Baxter robot synchronizes narration, gestures, and mouth movements.

OmniContact: Chaining Meta-Skills via Contact Flow for Generalizable Humanoid Loco-Manipulation

2026-06-26 04:00 UTC

OmniContact is a hierarchical framework using contact flow (CF) representation to chain meta-skills for long-horizon humanoid loco-manipulation. Low-level CF-Track learns a unified skill library, while high-level CF-Gen synthesizes future contact flow sequences. Experiments achieve 98.7% on Carry Box and 76.5% on Push-Stack Boxes, outperforming baselines by 40.9% (meta-skill) and 66.5% (chaining). The framework integrates with VLMs for semantic task decomposition.

Proposes OmniContact with contact flow as a compact shared interface between planning and execution
CF-Track learns reusable skills; CF-Gen generates future contact sequences for chaining and recovery

Morphology-Specific Closed-Loop Control of Logarithmic-Spiral Continuum Arms via Online Jacobian Error Compensation

2026-06-26 04:00 UTC

This paper presents the first morphology-specific closed-loop task-space control framework for logarithmic-spiral continuum arms. Using a segmented tendon-driven model and online Jacobian error compensation (Broyden update and Kalman filter), it achieves accurate robust control, outperforming piecewise-constant-curvature methods in simulations, and enables manipulations like grasping and obstacle-assisted motions.

First closed-loop control framework tailored to logarithmic-spiral morphology
Combines analytical Jacobian with online error compensation

LiMoDE: Rethinking Lifelong Robot Manipulation from a Mixture-of-Dynamic-Experts Perspective

2026-06-26 04:00 UTC

This paper introduces LiMoDE, a two-stage learning scheme using Mixture of Dynamic Experts for lifelong robot manipulation. It first learns prior knowledge via multi-task pre-training with dynamic MoE, then adapts to new tasks with a lifelong MoE mechanism. Experiments show superior performance on simulation and real-world tasks.

LiMoDE uses two stages: multi-task pre-training (dynamic MoE) and task adaptation (lifelong MoE).
Dynamic MoE activates heterogeneous experts based on motion information for short-term manipulations.

RMTL: Reinforced Micro-task Learning for Long-Horizon Manipulation with VLM Rewards

2026-06-26 04:00 UTC

This paper proposes RMTL (Reinforced Micro-task Learning), which decomposes long-horizon manipulation tasks into language-described micro-tasks and trains an agent to switch between them. Using multi-view VLM rewards, reverse curriculum, and a hierarchical policy, RMTL provides more informative reward signals than single-prompt VLM rewards, enabling faster learning. Experiments on the Fetch manipulation environment validate its effectiveness.

Single-prompt VLM rewards are flat for much of the trajectory, hindering early progress detection in long-horizon tasks.
RMTL decomposes tasks into micro-tasks, each with its own language prompt, and trains the agent to switch between them.

Reinforcement Learning Enables Autonomous Microrobot Navigation and Intervention in Simulated Blood Capillaries

2026-06-26 04:00 UTC

Researchers developed a physically grounded simulation of a blood capillary network, training deep RL agents to navigate via chemotaxis. They systematically mapped the physical limits of navigation, discovered a forbidden regime, and observed agents independently discovering multiple universal strategies. Without retraining, agents perform targeted blocking and unblocking of capillary flow, restoring throughput to healthy baseline levels.

Developed a physically grounded simulation of blood capillary network with realistic hydrodynamics and RBC dynamics
Deep RL agents successfully navigate via chemotaxis

Unsupervised Memory-Enhanced Video Transformers: Obstacle Detection for Autonomous Agricultural Rover

2026-06-26 04:00 UTC

A new fully unsupervised method, VMTAD, uses transformer architecture and a memory module to detect obstacles in dynamic agricultural scenes in real-time. It achieves state-of-the-art performance on a rapeseed dataset with 0.973 detection and 0.997 segmentation AUC, and a lightweight variant runs in 14 ms.

VMTAD is a fully unsupervised, real-time obstacle detection method for dynamic agricultural scenes.
It uses a memory module to leverage temporal context from video frames, handling motion-induced dynamics.

Toward Low-Latency Vision-Language Models with Doubly-Correct Predictions in Egocentric Visual Understanding

2026-06-25 04:00 UTC

This paper investigates weight pruning for Vision-Language Models (VLMs) in egocentric visual understanding to achieve low-latency inference while preserving doubly-correct predictions—both accurate and evidence-grounded. Existing pruning methods often maintain evidence localization but degrade accuracy. The authors propose a rationale-informed pruning strategy that aligns evidence with decisions, achieving state-of-the-art accuracy and doubly-correct predictions on egocentric video benchmarks.

Weight pruning reduces VLM latency for on-board processing in human-robot collaboration
Existing pruning preserves evidence localization but harms prediction accuracy

SwarmFly: A simulation platform for UAV swarm experiment design and validation

2026-06-25 04:00 UTC

SwarmFly is an open-source MATLAB-based simulation platform for UAV swarms, addressing issues of poor maintenance, steep learning curves, and single-scenario designs. It supports four coordination modes, a plugin architecture, and real-time maps, validated through eight experiments measuring accuracy, wind tolerance, fault recovery, endurance, and airspace compliance.

SwarmFly is a MATLAB platform supporting four swarm coordination modes (leader-follower, decentralized, heterogeneous relay, heterogeneous speed)
Plugin architecture allows researchers to add features without modifying core code

Memory Retrieval in Visuomotor Policies for Long-Horizon Robot Control

2026-06-25 04:00 UTC

This paper introduces HALO, a visuomotor policy with attention-based memory retrieval for long-horizon robot control, addressing spurious correlations and error accumulation in imitation learning.

HALO distills vision-language model priors to suppress spurious correlations.
HALO uses sparse attention to reduce memory error accumulation in closed-loop control.

Causality-Based Parametric Control Barrier Function for Safe Multi-Vehicle Interaction

2026-06-25 04:00 UTC

The paper extends Parametric Control Barrier Function (Parametric-CBF) by embedding causality inference to explicitly reason over inter-vehicle influence, enabling an adaptive safety-critical controller that avoids overly conservative behavior and improves task efficiency in multi-vehicle interactions.

Embedds causality inference into Parametric-CBF to handle inter-vehicle influence
Avoids conservative worst-case analysis, improving task efficiency

RGB: RL Guided Whole-Body MPPI for Humanoid Control

2026-06-25 04:00 UTC

This paper proposes RGB, an RL-guided whole-body MPPI framework that uses a pretrained RL policy as a sampling prior and MPPI for online correction, achieving robust and precise humanoid control without retraining. Simulations on a Unitree G1 humanoid demonstrate stable 280Hz control and improved precision over pure RL.

RGB uses a pretrained RL policy as a sampling prior for MPPI, enabling new objectives without retraining.
MPPI corrects the RL prior online to reduce drift and track whole-body reference signals.

AeroCast: Probabilistic 3D Trajectory Prediction for Non-Cooperative Aerial Obstacles via Transformer-MDN Architecture

2026-06-25 04:00 UTC

AeroCast is a probabilistic trajectory prediction framework combining Transformer encoder with Mixture Density Network to predict Gaussian mixture distributions over future 3D displacements. It reduces error by 50% on a quadrotor corpus and runs at 0.1ms per sample.

Combines Transformer encoder with Mixture Density Network for probabilistic 3D trajectory prediction.
Achieves 50% reduction in Average and Final Displacement Error over baselines.

SurveilNav: Collaborative Object Goal Navigation with Robot and Surveillance System

2026-06-25 04:00 UTC

A novel dataset and framework, SurveilNav, enables robots to collaborate with multi-view surveillance systems for object goal navigation. By integrating active camera scheduling, joint 2D/3D mapping, VLM-based value estimation, and collaborative target verification, it overcomes the limitations of single-robot perception and fixed-camera blind spots. Experiments on HM3D demonstrate state-of-the-art performance in exploration efficiency and navigation success rate.

New dataset with 206 cameras across 74 floors for systematic evaluation of multi-view collaboration
SurveilNav framework integrates active camera scheduling, 2D/3D mapping, VLM-based value estimation, and collaborative verification

ADM-Fusion: Adaptive Deep Multi-Sensor Fusion for Robust Ego-Motion Estimation in Diverse Conditions

2026-06-25 04:00 UTC

Proposes ADM-Fusion, an end-to-end deep learning multi-sensor fusion method using an adaptive sensor mixture-of-experts framework with content-aware routing to dynamically weigh sensor inputs. It features separate translation and rotation branches coupled via cross-task attention. Trained on CARLA-LOC simulated dataset and fine-tuned on KITTI real-world data, it demonstrates robust performance under sensor degradation while matching state-of-the-art methods.

Adaptive sensor mixture-of-experts with content-aware routing for real-time dynamic weighting.
Separate translation and rotation branches with cross-task attention for task-specific and shared information.

Invariant Kalman filtering for extended pose estimation in multi-IMU articulated rigid-body systems

2026-06-25 04:00 UTC

This paper introduces a novel invariant Kalman filtering approach for extended pose estimation in multi-IMU articulated rigid-body systems. By proposing a relative L-extended pose Lie group representation and incorporating joint kinematic constraints as noise-free pseudo-measurements within an iterated IEKF, the method achieves faster convergence and over 50% reduction in RMSE compared to existing filters on both a UR5e robot and a human leg.

Proposes relative L-extended pose Lie group representation for kinematic-tree systems with one IMU per body
Incorporates joint constraints as noise-free pseudo-measurements in an iterated IEKF, preserving convergence and consistency guarantees

BFMTrack: Latent Sequence Optimization for Physics-Based Motion Tracking with Behavioral Foundation Models

2026-06-25 04:00 UTC

A new method called Latent Sequence Optimization (LSO) enables precise physics-based motion tracking by optimizing over sequences of latents in Behavioral Foundation Models, validated on a real humanoid robot.

Behavioral Foundation Models (BFMs) organize physically plausible behaviors into a latent space but lack time-varying objective support.
BFMTrack introduces Latent Sequence Optimization (LSO) combining simulation rollouts with policy gradient updates.

NavWM: A Unified Navigation World Model for Foresight-Driven Planning

2026-06-24 04:00 UTC

NavWM is a unified navigation world model that integrates latent world reasoning, multimodal action prediction, and controllable visual generation. By introducing an anchor-based multimodal trajectory forecasting framework, it generates a diverse action space and uses visual foresight for robust closed-loop planning. Experiments show significant improvements in high-fidelity future state generation and zero-shot navigation success.

NavWM unifies perception, generation, and control within a shared spatiotemporal framework.
Latent world tokens distill geometric and semantic priors for robust structural understanding.

DynaWM: Dynamics-Aware Distillation with World Model and Momentum Targets for Smooth Locomotion over Continuous Stairs

2026-06-24 04:00 UTC

DynaWM improves bipedal-wheeled robot locomotion on continuous stairs by using a world model regularizer for terrain encoding and a momentum target encoder for stable distillation, enabling smoother and more adaptable movement in simulation and real hardware.

DynaWM introduces a world model regularizer to enforce forward-dynamics awareness and preserve terrain geometry.
A momentum target encoder provides consistent distillation targets, preventing dimensional collapse.

MinInter: Minimizing Trajectory Interpolation During Data Augmentation for Imitation Learning

2026-06-24 04:00 UTC

MinInter selects source demonstrations requiring the least interpolation to generate higher-quality synthetic data for imitation learning. Experiments on 12 manipulation tasks show consistent improvements in data generation and policy success rates, with largest gains on contact-rich, long-horizon tasks.

MinInter minimizes interpolation by selecting the best source demonstration for each initial configuration.
Consistently improves success rates on 12 manipulation tasks from the MimicGen benchmark.

SPACE: Enabling Learning from Cross-Robot Data Toward Generalist Policies

2026-06-24 04:00 UTC

SPACE framework uses Cartesian state delta as a universal action representation, with State Prediction and Adaptive Command Execution to address issues in behavior cloning across robots with different dynamics. Experiments show it outperforms direct command prediction and remains robust under dynamics shifts.

Proposes Cartesian state delta as universal action representation
SPACE handles variation across embodiments, hardware units, and within a robot

TurboMPC: Fast, Scalable, and Differentiable Model Predictive Control on the GPU

2026-06-24 04:00 UTC

TurboMPC is a differentiable MPC solver that runs entirely on the GPU, supporting state and control inequality constraints, implicit integrators, cross-time-coupled costs, and slack variables. It achieves up to 15× and 58× speedups over state-of-the-art CPU and GPU differentiable solvers, respectively, and scales to planning horizons over 8000 knot points. Deployed on a full-scale car for minimum-time racing, GPU-accelerated Bayesian optimization tuning yields significantly faster driving.

Runs entirely on GPU, combining SQP, ADMM, implicit differentiation, and JAX-CUDA
Up to 15× speedup over CPU solvers and 58× over GPU solvers

Sim-to-Real Betting on the E-Process: Bringing "simulators" to anytime-valid confidence sequences

2026-06-24 04:00 UTC

This note describes an integration of the sim-to-real performance estimate with betting (from Chen et al.) and the safe anytime-valid inference (from Ramdas et al.), using scaled simulators to produce efficient, reliable certificates for mean estimates, especially valuable in robot performance testing.

Integrates sim-to-real performance estimate with betting and anytime-valid inference
Uses scaled simulators for efficient, reliable mean estimate certificates

Topological Online Learning for Displacement-based Formation Control

2026-06-24 04:00 UTC

This paper introduces TOLD, a real-time edge-level adaptation framework that updates interaction topology weights online to minimize formation distortion, outperforming conventional node-level robust controllers. Theoretical analysis, simulations, and hardware experiments on Crazyflie 2.0 quadrotors demonstrate significant distortion reduction (over 62% for OGF).

TOLD is the first approach to adjust interaction topology weights online for formation control, rather than modulating individual robot inputs.
Two strategies are proposed: OGF (unconstrained weights) and OExpGF (non-negative convex weights), with OExpGF guaranteeing asymptotic consensus.

Enforcing Human-like Kinematics in Dexterous Piano Playing via Adversarial Posture Regularization

2026-06-24 04:00 UTC

Reinforcement learning can train bimanual dexterous hands to play piano in physics simulation with high note accuracy, but for high-DoF hands, relying solely on task rewards or IK inversion often leads to unnatural postures and joint overextension. The proposed Adversarial Posture Regularization (APR) uses a small amount of casual human playing data to match the posture distribution of the policy with a human prior via an adversarial objective, encouraging more human-like hand shapes. The authors collect and release unstructured hand motion data using a consumer-grade Meta Quest 3 and retarget it to the Shadow Hand. APR achieves significantly better performance than prior methods on human-likeness metrics (cPSI, BSE, FAC) and visual quality.

Proposes Adversarial Posture Regularization (APR) to avoid expensive, song-aligned expert demonstrations
Collects casual human piano playing data using Meta Quest 3 and retargets to Shadow Hand

Engineering Reliable Autonomous Systems: Challenges and Solutions

2026-06-24 04:00 UTC

This workshop report captures discussions from the Lorentz Center Workshop "Engineering Reliable Autonomous Systems" (ERAS), held June 10–14, 2024. It focuses on verification and validation, real-world engineering, and safe software architectures for autonomous systems, resulting in a catalogue of challenges and a roadmap to solutions. Some challenges can be addressed by existing academic techniques not yet widely adopted in practice; others require further research.

Co-organized by FMAS and AREA communities, bringing together academia, industry, and domain experts.
Three main topics: verification/validation, engineering real-world systems, and software architectures for safety.

Verifiable Foundation Models for Robot Safety

2026-06-24 04:00 UTC

This paper introduces FEARL, a framework that decomposes robot policies into a large controller and a small safety module, enabling formal verification of safety-critical properties while preserving the expressive power of foundation models. Experiments in simulation and on a physical robot demonstrate its effectiveness.

FEARL splits robot policy into a large perception/reasoning controller and a verifiable safety module.
The safety module uses low-dimensional sensor data, making formal verification tractable.

arXiv Robotics

Latest public articles

Racing a Wheeled Quadruped: Active Load Transfer Mitigation via Model Predictive Control

NavIsaacLab: Generating Realistic Crowd via Parallel Robot Learning for Benchmarking Human-aware Navigation

TaskNPoint: How to Teach Your Humanoid to Hit a Backhand in Minutes

RoboTales: ROBOTic Anthropomorphic LEarning Systems

OmniContact: Chaining Meta-Skills via Contact Flow for Generalizable Humanoid Loco-Manipulation

Morphology-Specific Closed-Loop Control of Logarithmic-Spiral Continuum Arms via Online Jacobian Error Compensation

LiMoDE: Rethinking Lifelong Robot Manipulation from a Mixture-of-Dynamic-Experts Perspective

RMTL: Reinforced Micro-task Learning for Long-Horizon Manipulation with VLM Rewards

Reinforcement Learning Enables Autonomous Microrobot Navigation and Intervention in Simulated Blood Capillaries

Unsupervised Memory-Enhanced Video Transformers: Obstacle Detection for Autonomous Agricultural Rover

Toward Low-Latency Vision-Language Models with Doubly-Correct Predictions in Egocentric Visual Understanding

SwarmFly: A simulation platform for UAV swarm experiment design and validation

Memory Retrieval in Visuomotor Policies for Long-Horizon Robot Control

Causality-Based Parametric Control Barrier Function for Safe Multi-Vehicle Interaction

RGB: RL Guided Whole-Body MPPI for Humanoid Control

AeroCast: Probabilistic 3D Trajectory Prediction for Non-Cooperative Aerial Obstacles via Transformer-MDN Architecture

SurveilNav: Collaborative Object Goal Navigation with Robot and Surveillance System

ADM-Fusion: Adaptive Deep Multi-Sensor Fusion for Robust Ego-Motion Estimation in Diverse Conditions

Invariant Kalman filtering for extended pose estimation in multi-IMU articulated rigid-body systems

BFMTrack: Latent Sequence Optimization for Physics-Based Motion Tracking with Behavioral Foundation Models

NavWM: A Unified Navigation World Model for Foresight-Driven Planning

DynaWM: Dynamics-Aware Distillation with World Model and Momentum Targets for Smooth Locomotion over Continuous Stairs

MinInter: Minimizing Trajectory Interpolation During Data Augmentation for Imitation Learning

SPACE: Enabling Learning from Cross-Robot Data Toward Generalist Policies

TurboMPC: Fast, Scalable, and Differentiable Model Predictive Control on the GPU

Sim-to-Real Betting on the E-Process: Bringing "simulators" to anytime-valid confidence sequences

Topological Online Learning for Displacement-based Formation Control

Enforcing Human-like Kinematics in Dexterous Piano Playing via Adversarial Posture Regularization

Engineering Reliable Autonomous Systems: Challenges and Solutions

Verifiable Foundation Models for Robot Safety

All sources