Simulately

2025-07-03

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

Authors: Sixiang Chen, Jiaming Liu, Siyuan Qian, Han Jiang, Lily Li, Renrui Zhang, Zhuoyang Liu, Chenyang Gu, Chengkai Hou, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang

Abstract

Recently, mobile manipulation has attracted increasing attention for enabling language-conditioned robotic control in household tasks. However, existing methods still face challenges in coordinating mobile base and manipulator, primarily due to two limitations. On the one hand, they fail to explicitly model the influence of the mobile base on manipulator control, which easily leads to error accumulation under high degrees of freedom. On the other hand, they treat the entire mobile manipulation process with the same visual observation modality (e.g., either all 2D or all 3D), overlooking the distinct multimodal perception requirements at different stages during mobile manipulation. To address this, we propose the Adaptive Coordination Diffusion Transformer (AC-DiT), which enhances mobile base and manipulator coordination for end-to-end mobile manipulation. First, since the motion of the mobile base directly influences the manipulator's actions, we introduce a mobility-to-body conditioning mechanism that guides the model to first extract base motion representations, which are then used as context prior for predicting whole-body actions. This enables whole-body control that accounts for the potential impact of the mobile base's motion. Second, to meet the perception requirements at different stages of mobile manipulation, we design a perception-aware multimodal conditioning strategy that dynamically adjusts the fusion weights between various 2D visual images and 3D point clouds, yielding visual features tailored to the current perceptual needs. This allows the model to, for example, adaptively rely more on 2D inputs when semantic information is crucial for action prediction, while placing greater emphasis on 3D geometric information when precise spatial understanding is required. We validate AC-DiT through extensive experiments on both simulated and real-world mobile manipulation tasks.

2025-07-03​

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation​

Abstract​

RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation​

Abstract​

Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations​

Abstract​

DexH2R: A Benchmark for Dynamic Dexterous Grasping in Human-to-Robot Handover​

Abstract​

SAM4D: Segment Anything in Camera and LiDAR Streams​

Abstract​

2025-06-26​

DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy​

Abstract​

FORTE: Tactile Force and Slip Sensing on Compliant Fingers for Delicate Manipulation​

Abstract​

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation​

Abstract​

RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies​

Abstract​

Learning Accurate Whole-body Throwing with High-frequency Residual Policy and Pullback Tube Acceleration​

Abstract​

Dex1B: Learning with 1B Demonstrations for Dexterous Manipulation​

Abstract​

Vision in Action: Learning Active Perception from Human Demonstrations​

Abstract​

2025-06-18​

ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes​

Abstract​

Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation​

Abstract​

GMT: General Motion Tracking for Humanoid Whole-Body Control​

Abstract​

RL from Physical Feedback: Aligning Large Motion Models with Humanoid Control​

Abstract​

From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots​

Abstract​

KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills​

Abstract​

LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction​

Abstract​

Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins​

Abstract​

Touch begins where vision ends: Generalizable policies for contact-rich manipulation​

Abstract​

Construction of a Multiple-DOF Under-actuated Gripper with Force-Sensing via Deep Learning​

Abstract​

2025-06-13​

EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence​

Abstract​

GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation​

Abstract​

Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop​

Abstract​

UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation​

Abstract​

SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending​

Abstract​

CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks​

Abstract​

Real-Time Execution of Action Chunking Flow Policies​

Abstract​

Versatile Loco-Manipulation through Flexible Interlimb Coordination​

Abstract​

Improving Long-Range Navigation with Spatially-Enhanced Recurrent Memory via End-to-End Reinforcement Learning​

Abstract​

3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model​

Abstract​

BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly​

Abstract​

DemoSpeedup: Accelerating Visuomotor Policies via Entropy-Guided Demonstration Acceleration​

Abstract​

Fabrica: Dual-Arm Assembly of General Multi-Part Objects via Integrated Planning and Learning​

Abstract​

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis​

Abstract​

Object-centric 3D Motion Field for Robot Learning from Human Videos​

Abstract​

Rodrigues Network for Learning Robot Actions​

Abstract​

2025-07-03

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

Abstract

RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation

Abstract

Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations

Abstract

DexH2R: A Benchmark for Dynamic Dexterous Grasping in Human-to-Robot Handover

Abstract

SAM4D: Segment Anything in Camera and LiDAR Streams

Abstract

2025-06-26

DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy

Abstract

FORTE: Tactile Force and Slip Sensing on Compliant Fingers for Delicate Manipulation

Abstract

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

Abstract

RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies

Abstract

Learning Accurate Whole-body Throwing with High-frequency Residual Policy and Pullback Tube Acceleration

Abstract

Dex1B: Learning with 1B Demonstrations for Dexterous Manipulation

Abstract

Vision in Action: Learning Active Perception from Human Demonstrations

Abstract

2025-06-18

ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes

Abstract

Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation

Abstract

GMT: General Motion Tracking for Humanoid Whole-Body Control

Abstract

RL from Physical Feedback: Aligning Large Motion Models with Humanoid Control

Abstract

From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots

Abstract

KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills

Abstract

LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction

Abstract

Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins

Abstract

Touch begins where vision ends: Generalizable policies for contact-rich manipulation

Abstract

Construction of a Multiple-DOF Under-actuated Gripper with Force-Sensing via Deep Learning

Abstract

2025-06-13

EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence

Abstract

GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation

Abstract

Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop

Abstract

UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation

Abstract

SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending

Abstract

CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks

Abstract

Real-Time Execution of Action Chunking Flow Policies

Abstract

Versatile Loco-Manipulation through Flexible Interlimb Coordination

Abstract

Improving Long-Range Navigation with Spatially-Enhanced Recurrent Memory via End-to-End Reinforcement Learning

Abstract

3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model

Abstract

BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly

Abstract

DemoSpeedup: Accelerating Visuomotor Policies via Entropy-Guided Demonstration Acceleration

Abstract

Fabrica: Dual-Arm Assembly of General Multi-Part Objects via Integrated Planning and Learning

Abstract

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

Abstract

Object-centric 3D Motion Field for Robot Learning from Human Videos

Abstract

Rodrigues Network for Learning Robot Actions

Abstract