We introduce Yielding Universal Bidigital Interface (YUBI), a finger-aligned gripper designed to enable intuitive, ergonomic, and scalable data collection for bimanual dexterous manipulation. While handheld data collection systems such as Universal Manipulation Interface (UMI) enable affordable data collection, their bulky pistol-grip designs can pose ergonomic and usability challenges for fine-grained, dexterous manipulation tasks. To address this, YUBI presents a distinct design principle: yielding, finger-driven actuation that directly maps human finger movements to gripper jaw motion. Using the YUBI devices, we set up a data collection system with integrated VR-based 6 DoF tracking of the gripper, ensuring high-fidelity trajectory data acquisition. We curate a UMI-based dataset of unprecedented scale: 8434 hours across 1.20M episodes and 119 tasks. Experiments show that YUBI offers advantages over the UMI gripper in versatility for complex bimanual tasks, dexterity, and operational efficiency. A single policy trained on the YUBI dataset transfers across multiple bimanual robots (UR, Franka, and ELEY) simply by mounting the gripper on each platform, confirming that the collected data are directly executable as policy supervision. We release the gripper hardware, data-collection software, and dataset as one integrated stack, offering the open community a reproducible path to large-scale data acquisition for advancing robotic foundation models.
YUBI replaces the pistol-grip interface with a yielding, finger-aligned actuation design: the thumb drives one jaw and the index/middle fingers the other, so each jaw yields to its finger and the aperture follows the operator's natural pinch without motor resistance, which improves haptic transparency and dexterity. An integrated support grip acts as a fulcrum for loads up to 2 kg, while a miniaturized build cuts handheld mass to ~319 g (200 g gripper + 119 g VR controller, down from ~780–900 g), reducing wrist fatigue.
YUBI supports both stationary and portable data collection. The portable mode keeps the gripper fully self-contained for in-the-wild scenarios such as household tasks with whole-body motion, while the stationary tabletop setup, shown below, is our primary configuration and accounts for the majority of the collected data. Each hand holds a YUBI device with a wrist camera, Quest controller, and magnetic aperture encoder. VR-based gripper tracking uses the Quest 3S for 6 DoF poses, which are more reliable than drift-prone SLAM. The headset is rig-mounted to avoid neck fatigue. A fixed RealSense D435 stereo camera adds a top-down workspace view for extra supervision, and a foot pedal enables hands-free sub-action annotation.
Collected across 22 desks running 24/7 for over two months by 179 operators (125 male, 54 female), the dataset comprises 8434 hours across 1.20M episodes and 119 tasks, far larger than Fast-UMI (~60 h, 22 tasks) and the original UMI (12 h, 4 tasks). Tasks span seven domains (industrial, kitchen, toy, desk work, clothing, appliance, personal care) and six skill types (placement, insertion, sorting, assembly, deformation, tool use), and most combine several skills in practice. All streams are converted to the LeRobot format at 30 Hz, with a detector cascade filtering defective episodes.
We recruited a gender-balanced group of 10 operators with no prior experience using either UMI or YUBI. In a dexterity test, operators ran single-attempt pick-and-place of six hex nuts (M10–M3, largest to smallest). Both devices approach the ceiling on large nuts (≥94% at M8–M10), but diverge as the diameter shrinks: YUBI leads UMI by +20 and +10 pp at M6 and M5, and by roughly 3× at the smallest M3 nut, indicating substantially better precision. In an operational efficiency test, operators performed five tasks under three conditions (direct hand, UMI, and YUBI) with counterbalanced ordering. YUBI is consistently faster than UMI, with per-task speed-ups from 1.37× (domino arrangement) to 4.19× (phone charging), substantially narrowing the gap to direct hand operation even for precision tasks.
To test whether the YUBI dataset translates into real-world capability, we train a multi-task vision-language-action policy (π0.5-based) on YUBI's wrist data and deploy it across three bimanual robot platforms—UR, Franka, and Toyota's semi-humanoid ELEY—each fitted with the YUBI gripper as a common end-effector. Because the policy is trained on the gripper end-effector trajectory rather than robot-specific joint space, a single dataset transfers across kinematically distinct arms without retargeting. Reporting success over 20 rollouts per task: Bimanual UR reaches 20/20 (ball in basket), 13/20 (stack cup pyramid), and 9/20 (unfold glasses); Bimanual Franka reaches 18/20 (pick-and-place socks) and 18/20 (tape in box); and ELEY reaches 11/20 (cup placement). This confirms that end-effector-space supervision from YUBI transfers across robots and generalizes to complex bimanual dexterity tasks.
@techreport{ohkawa2026yubi,
author = {Takehiko Ohkawa and Jumpei Arima and Yuki Noguchi and
Masatoshi Tateno and Makoto Sugiura and Takuya Okubo and
Kengo Ikeuchi and Yuma Shin and Hiroki Nishizawa and
Naoaki Kanazawa and Yuki Wakayama and Daiki Fukunaga and
Koshi Makihara and Tomohiro Motoda and Floris Erich and
Yukiyasu Domae and Tatsuya Matsushima and Yohishiro Okumatsu and
Kei Ota},
title = {{YUBI}: Yielding Universal Bidigital Interface for
Bimanual Dexterous Manipulation at Scale},
year = {2026},
}