Phone2Act

A Low-Cost, Hardware-Agnostic Teleoperation System for Scalable VLA Data Collection

Om Mandhane*, Bipin Yadav*, Sangeetha Prasanna Ram, Gopalakrishnan Narayanan
Vivekanand Education Society's Institute of Technology (VESIT), Mumbai
* Equal contribution
Phone2Act Teaser

Phone2Act transforms a commodity smartphone into a 6-DoF controller via ARCore. The system features built-in safety filters, an unlimited virtual workspace via a clutching mechanism, and seamless gripper control.

Overview

Collecting diverse, high-quality manipulation data for Vision-Language-Action (VLA) model training remains prohibitively expensive for many research groups, as existing teleoperation frameworks rely on specialized hardware or are tightly coupled to specific robot platforms.

We present Phone2Act, a low-cost, hardware-agnostic teleoperation framework that transforms a commodity Android smartphone into a 6-DoF robot controller via Google ARCore. Built on a modular ROS 2 architecture, it decouples control logic from hardware specifics through interchangeable bridge nodes — supporting platforms from industrial cobots to low-cost bimanual arms without any code modification.

A Universal Recorder synchronizes multi-camera RGB streams with robot state feedback and exports demonstrations natively in the LeRobot dataset format, eliminating post-processing and enabling immediate VLA fine-tuning. We validate the framework by fine-tuning GR00T-N1.5 on 130 collected episodes, achieving a 90% success rate on a real-world multi-stage pick-and-place task on a physical Dobot CR5.

Intuitive Teleoperation & Native Safety

Phone2Act enables safe and seamless spatial control by strictly bounding movements and decoupling phone repositioning from robot motion. Key features demonstrated in the video above include:

  • Floating Zero (Clutching): Pressing the Volume Down button engages the clutch, holding the robot in place while the operator freely repositions the phone. This provides an unlimited virtual workspace.
  • Zero-Jump Safety Filter: Tracking loss, accidental phone drops, or rapid motions are safely ignored. Any spatial jump exceeding a configurable threshold (e.g., 60mm) is silently dropped, protecting the hardware.
  • Seamless Gripper Actuation: The Volume Up key maps directly to a binary gripper command, allowing uninterrupted spatial control while interacting with objects.
  • Modular Core Planner: All safety parameters and positional scaling are configurable via YAML files, allowing the same teleoperation logic to safely support diverse robot kinematic chains.
Phone2Act App UI

Phone2Act Android Application UI. (Example showing connectivity and AR pose/volume topics.)

Hardware Agnostic & Bimanual Support

By routing commands through standardized ROS 2 topics, Phone2Act scales effortlessly. Scaling from an industrial Dobot CR5 to a low-cost, 3D-printed LeRobot SO-101 dual-arm setup requires zero modifications to the core source code.

LeRobot SO-101 (Bimanual)

Industrial Cobot (Dobot CR5)

Real-World Policy Deployment (GR00T-N1.5)

Phone2Act features a Universal Recorder that exports synchronized RGB frames, robot states, and actions directly into the LeRobot dataset format (Parquet + MP4) at 20Hz.

We validated the collected data by fine-tuning the GR00T-N1.5-3B model on 130 teleoperation episodes. Deployed on a physical Dobot CR5, the policy achieved a 90% success rate on a multi-stage real-world pick-and-place task, proving the system's viability for scalable VLA training.

System Architecture

Phone2Act Architecture

BibTeX

not published yet