YUBI: Yielding Universal Bidigital Interface for
Bimanual Dexterous Manipulation at Scale


Takehiko Ohkawa1*   Jumpei Arima2*   Yuki Noguchi2   Masatoshi Tateno1,3   Makoto Sugiura1   Takuya Okubo1   Yuki Wakayama2   Naoaki Kanazawa1   Tatsuya Matsushima1,3   Yohishiro Okumatsu2†   Kei Ota1†

1AI Robot Association (AIRoA)   2Toyota Motor Corporation   3The University of Tokyo

ICRA 2026 Workshop on Beyond Teleoperation

*Equal contribution   †Equal supervision




Yielding Universal Bidigital Interface (YUBI). Our lightweight, finger-aligned gripper offers intuitive control by mirroring human digital kinematics for dexterous manipulation. Leveraging high-precision VR-based tracking, YUBI facilitates the curation of a large-scale, high-quality bimanual dataset to advance robotic foundation models.

Abstract

We introduce Yielding Universal Bidigital Interface (YUBI), a finger-aligned gripper designed to enable intuitive, ergonomic, and scalable data curation for bimanual dexterous manipulation. While handheld data collection systems such as Universal Manipulation Interface (UMI) have lowered the barrier for in-the-wild data collection, their bulky pistol-grip designs can present ergonomic and usability challenges for fine-grained, dexterous manipulation tasks. To address this limitation, YUBI presents a distinct design principle: yielding, finger-driven actuation that directly maps human finger movements to gripper jaw motion, allowing the jaws to naturally follow the operator's grip. This intuitive interface bridges the gap between human intent and robotic execution, facilitating more precise fingertip motor control. Furthermore, compared to SLAM-based tracking used in the original UMI, our system enhances the fidelity of 6 DoF gripper tracking via a rig-based operation setup integrated with VR systems. We validate the system's efficacy by providing an unprecedented UMI-based dataset, comprising 2730 hours of interaction data across 300K episodes and 40 distinct tasks. Our experiments demonstrate that YUBI offers advantages over the original UMI gripper in versatility for complex bimanual tasks, dexterity, and operational efficiency. Collectively, the YUBI framework establishes a foundation for massive, high-fidelity data acquisition toward robotic foundation models.

YUBI Gripper Design

YUBI shifts away from the conventional pistol-grip interface of prior UMI grippers toward a yielding, finger-aligned actuation design. One jaw is actuated by the thumb while the opposing jaw is driven by the coordinated motion of the index and middle fingers, so each jaw yields directly to its driving finger. The gripper aperture therefore follows the operator's natural pinch motion without motor-driven resistance, mitigating the control mismatch and poor haptic transparency of pistol-grip designs and enabling operators to directly leverage their inherent dexterity. To preserve precision while supporting loads of up to 2 kg, an integrated support grip serves as a mechanical fulcrum and the remaining fingers stabilize the grip to distribute load. A miniaturized gripper architecture with a lightweight camera module reduces the handheld mass to ~319 g (200 g gripper + 119 g controller), down from ~780–900 g in prior UMI/VR designs, alleviating wrist fatigue over long-duration sessions.

Exploded view of the YUBI gripper. The bidigital mechanism uses internal gears to actuate the jaws, supported by an ergonomic grip and flap. A fisheye camera is attached for task observation, and the Quest controller provides high-frequency 6-DoF trajectory tracking.

Operation Setup

We design a fixed desktop operation setup tailored for sustained, bimanual manipulation. Each hand grasps one YUBI device equipped with an onboard wrist camera, a Quest controller, and a magnetic encoder that measures the gripper aperture. For VR-based gripper tracking, the Quest 3S tracks the 6 DoF trajectory of the controller mounted on YUBI, yielding higher fidelity than drift-prone SLAM. Unlike head-worn VR systems, the heavy headset is mounted on the fixed rig, reducing neck fatigue while maintaining tracking coverage. A rigidly mounted RealSense D435 stereo camera provides a stable top-down view of the workspace for additional supervision signals (object tracking, fine-grained action annotation). A laptop-based task UI aggregates all sensor streams, and a foot pedal lets operators annotate task transitions and sub-action boundaries hands-free.

System overview. Bimanual YUBI-based demonstrations are collected at 20 desks in parallel from 104 operators. The setup features a stereo top-view camera for stable workspace observation, a rig-mounted VR system for 6 DoF gripper tracking, and foot-pedal-based action segmentation.

Large-Scale Bimanual Dataset

We collected YUBI-based manipulation data at scale across 20 desks, operated 24/7 over one month by 104 operators (73 male, 31 female). The resulting dataset comprises 2730 hours of interaction across 300K episodes and 40 distinct tasks — substantially larger than prior UMI-based datasets such as Fast-UMI (~60 hours, 22 tasks) and the original UMI (12 hours, 4 tasks). The tasks span seven domains (industrial, kitchen, toy, desk work, clothing, appliance, personal care) and six primary skill types (placement, assembly, insertion, deformation, sorting, writing), reflecting YUBI's target scope of precise, heavy, and everyday object handling. Most tasks combine multiple skills in practice — e.g., "writing on a whiteboard" requires pick-and-place of the marker, tactile-sensitive writing and erasing, and inserting the cap. All sensor streams are converted to the LeRobot format at 30 Hz, and a cascade of detectors filters defective episodes (too-short recordings, stuck pose/aperture signals, kinematically implausible jumps).

Domain distribution (left) across seven categories and skill distribution (right) across six primary skill types of the 40 tasks.
Sample episodes from the YUBI dataset across representative tasks. (Placeholder videos — replace with final clips.)

Deployment Results

Placeholder section. Policies trained on the YUBI dataset will be deployed on real robots, with rollout videos shown below. (Content to be added.)

Deployment rollouts. (Placeholder videos — replace with final clips.)

BibTeX

@inproceedings{ohkawa2026yubi,
  author      = {Takehiko Ohkawa and Jumpei Arima and Yuki Noguchi and
                 Masatoshi Tateno and Makoto Sugiura and Takuya Okubo and
                 Yuki Wakayama and Naoaki Kanazawa and Tatsuya Matsushima and
                 Yohishiro Okumatsu and Kei Ota},
  title       = {{YUBI}: Yielding Universal Bidigital Interface for
                 Bimanual Dexterous Manipulation at Scale},
  booktitle   = {IEEE International Conference on Robotics and Automation
                 (ICRA) Workshop on Beyond Teleoperation},
  year        = {2026},
}