ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning

🎉 Accepted by ICLR 2026

ArtVIP Team
(hover to display full author list)
Zhao Jin ^1,∗ , Zhengping Che ^1,∗, , Tao Li ^1, , Zhen Zhao ¹ , Kun Wu ¹ , Yuheng Zhang ¹ , Yinuo Zhao ¹ , Zehui Liu ¹ , Zhang Qiang ¹ , Xiaozhu Ju ¹ ,
Jing Tian ² , Yousong Xue ² , Jian Tang ¹

^* Co-first Authors, Corresponding Authors

¹ Beijing Innovation Center of Humanoid Robotics,

² Beijing Institute of Architectural Design

Experiments

Physical Fidelity and Interaction Evaluation

We employ an optical tracking system to record motion trajectories of joints on real-world objects. These recordings are compared with the joint motions of their corresponding digital-twin articulated objects in simulation to evaluate the discrepancy between simulated and real-world joint behavior. We break the sim-to-real barrier and present a straightforward comparison in the following videos. More analysis is provided in the research paper.

Motion Driven by External Force

Motion Triggered by Latch Release

Motion Triggered by Joint Position Threshold

Imitation Learning in Real World

We design four challenging articulated-object manipulation tasks: (1) PullDrawer, (2) OpenCabinet, (3) SlideShelf, and (4) CloseOven. These tasks demand precise and flexible motions, including rotation, angled pushing, and horizontal translation. Data was collected via teleoperation in both real and simulated environments. For each experiment, we trained ACT and DP for 50k gradient descent iterations with three different random seeds, and evaluated the final checkpoint from each run with 60 rollouts to compute per-task success rates. We prove that Real-Sim-Mixed data can significantly improve the success rates.

PullDrawer

OpenCabinet

SlideShelf

CloseOven

Reinforcement Learning in Real World

We design a CloseTrashcan task with a Franka arm and train a two-stage agent in Isaac Sim. Then we deploy the same policy in the real world.

CloseTrashcan

BibTeX

@inproceedings{jin2026artvip, 
   title={ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning}, 
   author={Zhao Jin and Zhengping Che and Tao Li and Zhen Zhao and Kun Wu and Yuheng Zhang and others}, 
   booktitle={International Conference on Learning Representations (ICLR)}, 
   year={2026} 
 }