Data Collection

Task Creation

Task configuration can be found in the ALOHA package’s aloha module under constants.py in the TASK_CONFIGS dictionary. A template task (aloha_template) is provided containing all possible fields and some placeholder values. Here, we will focus only on the task name, the dataset directory, the episode length, and the observation cameras.

Configuration Field	Description
Task Name	The task name should accurately describe the task that the ALOHA is performing.
Dataset Directory	The `dataset_dir` field sets the directory episodes will saved to.
Episode Length	The `episode_len` field sets the length of the episode in number of timesteps.
Camera Names	The `camera_names` field takes in a list of strings corresponding to camera names. These camera names will be used as observation sources during dataset collection.

Recording Episodes

To record an episode, follow the steps below:

Bringup the ALOHA control stack according to your platform
Configure the environment and run the episode recording script as follows:

$ source /opt/ros/humble/setup.bash # configure ROS system install environment
$ source ~/interbotix_ws/install/setup.bash # configure ROS workspace environment
$ cd ~/interbotix_ws/src/aloha/scripts/
$ python3 record_episodes.py \
      --task_name <task_name> \
      [--episode_idx <episode_idx>] \
      [-b, --enable_base_torque] \
      [-g, --gravity_compensation]
The task_name argument should match one of the task names in the TASK_CONFIGS, as configured in the Task Creation section.

The episode_idx argument is optional and specifies the index of the episode to record reflected in the output filename <dataset_dir>/episode_<episode_idx>.hdf5. If not provided, the script will automatically calculate the next episode index based on the number of episodes already saved in the dataset directory.

If the -b, --enable_base_torque argument is set, mobile base will be torqued on during episode recording, allowing the use of a joystick controller or some other manual method.

If the -g, --gravity_compensation argument is set, gravity compensation will be enabled for the leader robots when teleop starts.
Tip

Each platform has a “dummy” task that can be used to test basic data collection and playback. For the Stationary variant, use the aloha_stationary_dummy task. For the Mobile variant, use the aloha_mobile_dummy task.

An example for the Mobile variant would look like:
$ python3 record_episodes.py --task_name aloha_mobile_dummy --episode_idx 0

Episode Playback

To play back a previously-recorded episode, follow the steps below:

Bringup the ALOHA control stack according to your platform
Configure the environment and run the episode playback script as follows:

$ source /opt/ros/humble/setup.bash # configure ROS system install environment
$ source ~/interbotix_ws/install/setup.bash # configure ROS workspace environment
$ cd ~/interbotix_ws/src/aloha/scripts/
$ python3 replay_episodes.py --dataset_dir </path/to/dataset> --episode_idx <episode_idx>

Tip

An example for replaying the dummy Mobile episode recorded above would look like:

$ python3 replay_episodes.py --dataset_dir ~/aloha_data/aloha_mobile_dummy/ --episode_idx 0

Episode Auto-Recording

A helpful bash script, auto_record.sh, is provided to allow users to collect many episodes consecutively without having to interact with the control computer.

Configuration

This script, whose source can be found on the ALOHA GitHub repository, has a few configuration options that should be verified or set before running.

`ROS_DISTRO`

Set the codename of the ROS distribution used on the control computer. This value is used to set the path to the ROS_SETUP_PATH variable used later in the script. ROS_DISTRO defaults to humble.

ROS_DISTRO=humble

`ROS_SETUP_PATH`

Set the path to the ROS distribution’s setup script. This value is used when setting up the system-installed ROS environment. Setting the ROS_DISTRO variable from before should be sufficient to configure this variable. ROS_SETUP_PATH defaults to "/opt/ros/$ROS_DISTRO/setup.bash".

ROS_SETUP_PATH="/opt/ros/$ROS_DISTRO/setup.bash"

`WORKSPACE_SETUP_PATH`

Set the path to the Interbotix workspace’s setup script. This value is used when setting up the Interbotix workspace’s ROS environment. WORKSPACE_SETUP_PATH defaults to "$HOME/interbotix_ws/install/setup.bash".

WORKSPACE_SETUP_PATH="$HOME/interbotix_ws/install/setup.bash"

`RECORD_EPISODES`

Set the path to the record_episodes.py script. This value is used when calling the record_episodes script. RECORD_EPISODES defaults to "$HOME/interbotix_ws/src/aloha/scripts/record_episodes.py".

RECORD_EPISODES="$HOME/interbotix_ws/src/aloha/scripts/record_episodes.py"

Usage

Once configured, the auto_record script is now ready to use. To auto-record a specific amount of episodes, follow the steps below:

Bringup the ALOHA control stack according to your platform
In a new terminal, navigate to the directory storing the auto_record script and run the command below:
```
$ auto_record.sh <task_name> <num_episodes> [-b, --enable_base_torque] [-g, --gravity_compensation]
```
Tip

An example for auto-recording 50 episodes of the dummy Mobile ALOHA task would look like:
```
$ auto_record.sh aloha_mobile_dummy 50
```
The auto_record script will then call the record_episodes Python script the specified number of times.

Note

If episodes of the specified task already exist, episode indices will be automatically calculated as one greater than the number of tasks in the episode save directory.

Dataset Format

ALOHA saves its episodes in the hdf5 format with the following format:

- images
    - cam_high          (480, 640, 3) 'uint8'
    - cam_low           (480, 640, 3) 'uint8'   (on Stationary)
    - cam_left_wrist    (480, 640, 3) 'uint8'
    - cam_right_wrist   (480, 640, 3) 'uint8'
- qpos                  (14,)         'float64'
- qvel                  (14,)         'float64'

action                  (14,)         'float64'
base_action             (2,)          'float64' (on Mobile)

What’s Next?

With the data collected, we are ready to train and evaluate the machine learning models.