Figure 1: The overall structure of the proposed OFA mainly consists of the following three modules: 1) manipulating-object perception and pose estimation, 2) pre-manipulation pose arrival, 3) object-focus policy learning.
Task | ACT | OFA w/o rel-of | OFA w/o rel | OFA w/o of | OFA (object-mask) | OFA (hand-focus) |
---|---|---|---|---|---|---|
Grasp Cup | 20 | 40 | 30 | 90 | 50 | 90 |
Take Mug | 10 | 30 | 20 | 40 | 10 | 60 |
Hold Scanner | 30 | 50 | 30 | 90 | 80 | 90 |
Catch Loopy | 40 | 40 | 70 | 90 | 90 | 80 |
Pinch Toy | 20 | 40 | 10 | 30 | 10 | 40 |
Grasp Sanitizer | 30 | 70 | 50 | 80 | 100 | 100 |
Lift Tray | 10 | 90 | 60 | 60 | 50 | 100 |
Table 1: Success rate (%) of the comparison methods using 30 human demonstrations. The results are obtained with 10 evaluations.
Task description: The robot needs to use a dexterous hand to grasp the cup on the table.
Task description: The robot needs to use a dexterous hand to take the mug on the table.
Task description: The robot needs to use a dexterous hand to hold a barcode scanner from a flat surface, preparing it for use.
Task description: The robot needs to use a dexterous hand to catch loopy from the environment.
Task description: The robot needs to use a dexterous hand to pinch a small toy, maintaining a gentle but secure grip
Task description: The robot needs to use a dexterous hand to grasp a sanitizer, and hold it in the air.
Task description: The robot needs to use both dexterous hands to lift a tray, ensuring a stable hold while maintaining balance.
Task description: Testing OFA's positional generalization at 3 OOD positions in the Catch Loopy task.
Task description: Testing ACT's positional generalization at 3 OOD positions in the Catch Loopy task.
Task description: Testing OFA's positional generalization at 3 OOD positions in the Hold Scanner task.
Task description: Testing ACT's positional generalization at 3 OOD positions in the Hold Scanner task.
Task description: Testing OFA's background generalization with 3 different levels in Catch Loopy task.
Task description: Testing ACT's background generalization with 3 different levels in Catch Loopy task.
Task description: Testing OFA's background generalization with 3 different levels in Hold Scanner task.
Task description: Testing ACT's background generalization with 3 different levels in Hold Scanner task.
@misc{li2025objectfocus,
title ={Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation},
author={Yihang Li and Tianle Zhang and Xuelong Wei and Jiayi Li and Lin Zhao and Dongchi Huang and Zhirui Fang and Minhua Zheng and Wenjun Dai and Xiaodong He},
year={2025},
eprint={2505.15098},
archivePrefix={arXiv},
primaryClass={cs.RO}
}