YUBI: Yielding Universal Bidigital Interface for Bimanual Dexterous Manipulation at Scale


Takehiko Ohkawa1*   Jumpei Arima2*   Yuki Noguchi2   Masatoshi Tateno1,4   Makoto Sugiura1   Takuya Okubo1   Kengo Ikeuchi1,4   Yuma Shin1,5   Hiroki Nishizawa1,6   Naoaki Kanazawa1   Yuki Wakayama2   Daiki Fukunaga2   Koshi Makihara3   Tomohiro Motoda3   Floris Erich3   Yukiyasu Domae3   Tatsuya Matsushima1,4   Yohishiro Okumatsu2   Kei Ota1

1AI Robot Association (AIRoA)   2Toyota Motor Corporation   3National Institute of Advanced Industrial Science and Technology (AIST)
4The University of Tokyo   5Institute of Science Tokyo   6Waseda University

ICRA 2026 Workshop on Beyond Teleoperation

*Project Co-Lead




Yielding Universal Bidigital Interface (YUBI). Our lightweight, finger-aligned gripper offers intuitive control by mirroring human digital kinematics for dexterous manipulation. Leveraging high-precision VR-based tracking, YUBI facilitates the curation of a large-scale, high-quality bimanual dataset to advance robotic foundation models.

Abstract

We introduce Yielding Universal Bidigital Interface (YUBI), a finger-aligned gripper designed to enable intuitive, ergonomic, and scalable data collection for bimanual dexterous manipulation. While handheld data collection systems such as Universal Manipulation Interface (UMI) enable affordable data collection, their bulky pistol-grip designs can pose ergonomic and usability challenges for fine-grained, dexterous manipulation tasks. To address this, YUBI presents a distinct design principle: yielding, finger-driven actuation that directly maps human finger movements to gripper jaw motion. Using the YUBI devices, we set up a data collection system with integrated VR-based 6 DoF tracking of the gripper, ensuring high-fidelity trajectory data acquisition. We curate a UMI-based dataset of unprecedented scale: 8434 hours across 1.20M episodes and 119 tasks. Experiments show that YUBI offers advantages over the UMI gripper in versatility for complex bimanual tasks, dexterity, and operational efficiency. A single policy trained on the YUBI dataset transfers across multiple bimanual robots (UR, Franka, and ELEY) simply by mounting the gripper on each platform, confirming that the collected data are directly executable as policy supervision. We release the gripper hardware, data-collection software, and dataset as one integrated stack, offering the open community a reproducible path to large-scale data acquisition for advancing robotic foundation models.

YUBI Gripper Design

YUBI replaces the pistol-grip interface with a yielding, finger-aligned actuation design: the thumb drives one jaw and the index/middle fingers the other, so each jaw yields to its finger and the aperture follows the operator's natural pinch without motor resistance, which improves haptic transparency and dexterity. An integrated support grip acts as a fulcrum for loads up to 2 kg, while a miniaturized build cuts handheld mass to ~319 g (200 g gripper + 119 g VR controller, down from ~780–900 g), reducing wrist fatigue.

Overview and exploded view of the YUBI gripper. The bidigital mechanism uses internal gears to actuate the jaws, supported by an ergonomic grip and flap. A fisheye camera is attached for task observation, and the Quest controller provides high-frequency 6-DoF trajectory tracking.

Operation Setup

YUBI supports both stationary and portable data collection. The portable mode keeps the gripper fully self-contained for in-the-wild scenarios such as household tasks with whole-body motion, while the stationary tabletop setup, shown below, is our primary configuration and accounts for the majority of the collected data. Each hand holds a YUBI device with a wrist camera, Quest controller, and magnetic aperture encoder. VR-based gripper tracking uses the Quest 3S for 6 DoF poses, which are more reliable than drift-prone SLAM. The headset is rig-mounted to avoid neck fatigue. A fixed RealSense D435 stereo camera adds a top-down workspace view for extra supervision, and a foot pedal enables hands-free sub-action annotation.

System overview. Bimanual YUBI-based demonstrations are collected at 22 desks in parallel from 179 operators. The setup features a stereo top-view camera for stable workspace observation, a rig-mounted VR system for 6 DoF gripper tracking, and foot-pedal-based action segmentation.

Large-Scale Bimanual Dataset

Collected across 22 desks running 24/7 for over two months by 179 operators (125 male, 54 female), the dataset comprises 8434 hours across 1.20M episodes and 119 tasks, far larger than Fast-UMI (~60 h, 22 tasks) and the original UMI (12 h, 4 tasks). Tasks span seven domains (industrial, kitchen, toy, desk work, clothing, appliance, personal care) and six skill types (placement, insertion, sorting, assembly, deformation, tool use), and most combine several skills in practice. All streams are converted to the LeRobot format at 30 Hz, with a detector cascade filtering defective episodes.

Domain distribution (left) across seven categories and skill distribution (right) across six primary skill types of the 119 tasks.

Usability Study

We recruited a gender-balanced group of 10 operators with no prior experience using either UMI or YUBI. In a dexterity test, operators ran single-attempt pick-and-place of six hex nuts (M10–M3, largest to smallest). Both devices approach the ceiling on large nuts (≥94% at M8–M10), but diverge as the diameter shrinks: YUBI leads UMI by +20 and +10 pp at M6 and M5, and by roughly at the smallest M3 nut, indicating substantially better precision. In an operational efficiency test, operators performed five tasks under three conditions (direct hand, UMI, and YUBI) with counterbalanced ordering. YUBI is consistently faster than UMI, with per-task speed-ups from 1.37× (domino arrangement) to 4.19× (phone charging), substantially narrowing the gap to direct hand operation even for precision tasks.

Dexterity test (left): pick-and-place success rate on hex nuts M10–M3 for UMI and YUBI (error bars: 95% binomial confidence intervals, n=50). Operation efficiency (right): mean completion time on five tasks for Hand, UMI, and YUBI; YUBI is significantly faster than UMI and narrows the gap to direct hand operation.

Robot Policy Deployment

To test whether the YUBI dataset translates into real-world capability, we train a multi-task vision-language-action policy (π0.5-based) on YUBI's wrist data and deploy it across three bimanual robot platforms—UR, Franka, and Toyota's semi-humanoid ELEY—each fitted with the YUBI gripper as a common end-effector. Because the policy is trained on the gripper end-effector trajectory rather than robot-specific joint space, a single dataset transfers across kinematically distinct arms without retargeting. Reporting success over 20 rollouts per task: Bimanual UR reaches 20/20 (ball in basket), 13/20 (stack cup pyramid), and 9/20 (unfold glasses); Bimanual Franka reaches 18/20 (pick-and-place socks) and 18/20 (tape in box); and ELEY reaches 11/20 (cup placement). This confirms that end-effector-space supervision from YUBI transfers across robots and generalizes to complex bimanual dexterity tasks.

UR — Ball in basket (20/20)
UR — Stack cup pyramid (13/20)
UR — Unfold glasses (9/20)
Franka — Pick-and-place socks (18/20)
Franka — Tape in box (18/20)
ELEY — Cup placement (11/20)
Robot deployment rollouts. A single multi-task policy trained on YUBI data, deployed across three robots and six tasks with the YUBI gripper as a shared end-effector.

BibTeX

@techreport{ohkawa2026yubi,
  author      = {Takehiko Ohkawa and Jumpei Arima and Yuki Noguchi and
                 Masatoshi Tateno and Makoto Sugiura and Takuya Okubo and
                 Kengo Ikeuchi and Yuma Shin and Hiroki Nishizawa and
                 Naoaki Kanazawa and Yuki Wakayama and Daiki Fukunaga and
                 Koshi Makihara and Tomohiro Motoda and Floris Erich and
                 Yukiyasu Domae and Tatsuya Matsushima and Yohishiro Okumatsu and
                 Kei Ota},
  title       = {{YUBI}: Yielding Universal Bidigital Interface for
                 Bimanual Dexterous Manipulation at Scale},
  year        = {2026},
}