ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning

ArtVIP Team

(hover to display full author list)

Zhao Jin 1,∗ , Zhengping Che 1,∗, , Tao Li 1, , Zhen Zhao 1 , Kun Wu 1 , Yuheng Zhang 1 , Yinuo Zhao 1 , Zehui Liu 1 , Zhang Qiang 1 , Xiaozhu Ju 1 ,
Jing Tian 2 , Yousong Xue 2 , Jian Tang 1
* Co-first Authors, Corresponding Authors
1 Beijing Innovation Center of Humanoid Robotics,

2 Beijing Institute of Architectural Design

We introduce ArtVIP, a comprehensive open-source dataset comprising 200+ high-quality digital-twin articulated objects and 6 basic scenes. ArtVIP ensures visual realism through precise geometric meshes and high-resolution textures, and physical fidelity is achieved via fine-tuned parameters.


Interaction with ArtVIP assets

Experiments

Physical Fidelity and Interaction Evaluation

We employ an optical tracking system to record motion trajectories of joints on real-world objects. These recordings are compared with the joint motions of their corresponding digital-twin articulated objects in simulation to evaluate the discrepancy between simulated and real-world joint behavior. We break the sim-to-real barrier and present a straightforward comparison in the following videos. More analysis is provided in the research paper.

Motion Driven by External Force
Motion Triggered by Latch Release
Motion Triggered by Joint Position Threshold

Imitation Learning in Real World

We design four challenging articulated-object manipulation tasks: (1) PullDrawer, (2) OpenCabinet, (3) SlideShelf, and (4) CloseOven. These tasks demand precise and flexible motions, including rotation, angled pushing, and horizontal translation. We prove that Real-Sim-Mixed data can significantly improve the success rates.

PullDrawer
OpenCabinet
SlideShelf
CloseOven