Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation

Abstract

Robot manipulation learning from human demonstrations offers a rapid means to acquire skills but often lacks generalization across diverse scenes and object placements. This limitation hinders real-world applications, particularly in complex tasks requiring dexterous manipulation. Vision-Language-Action (VLA) paradigm leverages large-scale data to enhance generalization. However, due to data scarcity, VLA’s performance remains limited. In this work, we introduce Object-Focus Actor (OFA), a novel, data-efficient approach for generalized dexterous manipulation. OFA exploits the consistent end trajectories observed in dexterous manipulation tasks, allowing for efficient policy training. Our method employs a hierarchical pipeline: object perception and pose estimation, pre-manipulation pose arrival and OFA policy execution. This process ensures that the manipulation is focused and efficient, even in varied backgrounds and positional layout. Comprehensive real-world experiments across seven tasks demonstrate that OFA significantly outperforms baseline methods in both positional and background generalization tests. Notably, OFA achieves robust performance with only 10 demonstrations, highlighting its data efficiency.

Method

Descriptive Alt Text

Figure 1: The overall structure of the proposed OFA mainly consists of the following three modules: 1) manipulating-object perception and pose estimation, 2) pre-manipulation pose arrival, 3) object-focus policy learning.

Experiments

Task ACT OFA w/o rel-of OFA w/o rel OFA w/o of OFA (object-mask) OFA (hand-focus)
Grasp Cup 20 40 30 90 50 90
Take Mug 10 30 20 40 10 60
Hold Scanner 30 50 30 90 80 90
Catch Loopy 40 40 70 90 90 80
Pinch Toy 20 40 10 30 10 40
Grasp Sanitizer 30 70 50 80 100 100
Lift Tray 10 90 60 60 50 100

Table 1: Success rate (%) of the comparison methods using 30 human demonstrations. The results are obtained with 10 evaluations.

Grasp Cup

Task description: The robot needs to use a dexterous hand to grasp the cup on the table.

Pinch Cube

Task description: The robot needs to use a dexterous hand to take the mug on the table.

Hold Scanner

Task description: The robot needs to use a dexterous hand to hold a barcode scanner from a flat surface, preparing it for use.

Catch Loopy

Task description: The robot needs to use a dexterous hand to catch loopy from the environment.

Pinch toy

Task description: The robot needs to use a dexterous hand to pinch a small toy, maintaining a gentle but secure grip

Grasp Sanitizer

Task description: The robot needs to use a dexterous hand to grasp a sanitizer, and hold it in the air.

Lift Tray

Task description: The robot needs to use both dexterous hands to lift a tray, ensuring a stable hold while maintaining balance.

Position Generalization Experiments

Catch Loopy (Ours)

Task description: Testing OFA's positional generalization at 3 OOD positions in the Catch Loopy task.

Catch Loopy (ACT)

Task description: Testing ACT's positional generalization at 3 OOD positions in the Catch Loopy task.

Hold Scanner (Ours)

Task description: Testing OFA's positional generalization at 3 OOD positions in the Hold Scanner task.

Hold Scanner (ACT)

Task description: Testing ACT's positional generalization at 3 OOD positions in the Hold Scanner task.

Background Generalization Experiments

Catch Loopy (Ours)

Task description: Testing OFA's background generalization with 3 different levels in Catch Loopy task.

Catch Loopy (ACT)

Task description: Testing ACT's background generalization with 3 different levels in Catch Loopy task.

Hold Scanner (Ours)

Task description: Testing OFA's background generalization with 3 different levels in Hold Scanner task.

Hold Scanner (ACT)

Task description: Testing ACT's background generalization with 3 different levels in Hold Scanner task.

BibTeX

@misc{li2025objectfocus,
    title ={Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation},
    author={Yihang Li and Tianle Zhang and Xuelong Wei and Jiayi Li and Lin Zhao and Dongchi Huang and Zhirui Fang and Minhua Zheng and Wenjun Dai and Xiaodong He},
    year={2025},
    eprint={2505.15098},
    archivePrefix={arXiv},
    primaryClass={cs.RO}
  }