Training and Evaluating a Policy

Warning

We have introdced a new change in how the values are saved in the dataset. The values are now saved in the dataset as radians for all joints and no scaling is applied for the gripper. If you are using a previous version of the dataset, the values for joints 0-5 will be in degrees and a scaling of 10000 will be applied to gripper. Check Trossen v1.0 Dataset Format before using datasets from previous versions.

Training a Policy

To train a policy to control your robot, use the train.py script. A few arguments are required. Here is an example command:

python lerobot/scripts/train.py \
  --dataset.repo_id=${HF_USER}/trossen_ai_stationary_test \
  --policy.type=act \
  --output_dir=outputs/train/act_trossen_ai_stationary_test \
  --job_name=act_trossen_ai_stationary_test \
  --device=cuda \
  --wandb.enable=true

python lerobot/scripts/train.py \
  --dataset.repo_id=${HF_USER}/trossen_ai_mobile_test \
  --policy.type=act \
  --output_dir=outputs/train/act_trossen_ai_mobile_test \
  --job_name=act_trossen_ai_mobile_test \
  --device=cuda \
  --wandb.enable=true

python lerobot/scripts/train.py \
  --dataset.repo_id=${HF_USER}/trossen_ai_solo_test \
  --policy.type=act \
  --output_dir=outputs/train/act_trossen_ai_solo_test \
  --job_name=act_trossen_ai_solo_test \
  --device=cuda \
  --wandb.enable=true

Explanation of the Command

We provided the dataset using --dataset.repo_id=${HF_USER}/trossen_ai_xxxxxxx_test.
We specified the policy with --policy.type=act, which loads configurations from configuration_act.py. This policy will automatically adapt to the number of motor states, motor actions, and cameras recorded in your dataset.
We set --device=cuda to train on an NVIDIA GPU. If using Apple Silicon, you can replace it with --device=mps.
We enabled Weights & Biases logging using --wandb.enable=true for visualizing training plots. This is optional, but if used, ensure you’re logged in by running:
```
wandb login
```

Note

Training will take several hours. Checkpoints will be saved in: outputs/train/act_trossen_ai_xxxxx_test/checkpoints.

Training Pipeline Configuration

The training pipeline can be configured using the following parameters:

--dataset: Configuration for the dataset.
--env: Configuration for the environment. Can be None.
--policy: Configuration for the pre-trained policy. Can be None.
--output_dir: Directory to save all run outputs. If another training session is run with the same value, its contents will be overwritten unless resume is set to true.
--job_name: Name of the job. Can be None.
--resume: Set to true to resume a previous run. Ensure output_dir is the directory of an existing run with at least one checkpoint.
--device: Device to use for training (e.g., cuda, cpu, mps).
--use_amp: Determines whether to use Automatic Mixed Precision (AMP) for training and evaluation.
--seed: Seed for training and evaluation environments.
--steps: Number of training steps to run.
--num_workers: Number of workers for the dataloader.
--batch_size: Batch size for training.
--eval_freq: Frequency of evaluation during training.
--log_freq: Frequency of logging during training.
--save_checkpoint: Whether to save checkpoints during training.
--save_freq: Frequency of saving checkpoints.
--offline: Configuration for offline training.
--online: Configuration for online training.
--use_policy_training_preset: Whether to use policy training preset.
--optimizer: Configuration for the optimizer. Can be None.
--scheduler: Configuration for the learning rate scheduler. Can be None.
--eval: Configuration for evaluation.
--wandb: Configuration for Weights & Biases logging.

Evaluating Your Policy

You can use the record function from lerobot/scripts/control_robot.py but with a policy checkpoint as input. Run the following command to record 10 evaluation episodes:

python lerobot/scripts/control_robot.py \
  --robot.type=trossen_ai_stationary \
  --control.type=record \
  --control.fps=30 \
  --control.single_task="Recording evaluation episode using Trossen AI Stationary." \
  --control.repo_id=${HF_USER}/eval_act_trossen_ai_stationary_test \
  --control.tags='["tutorial"]' \
  --control.warmup_time_s=5 \
  --control.episode_time_s=30 \
  --control.reset_time_s=30 \
  --control.num_episodes=10 \
  --control.push_to_hub=true \
  --control.policy.path=outputs/train/act_trossen_ai_stationary_test/checkpoints/last/pretrained_model \
  --control.num_image_writer_processes=1

python lerobot/scripts/control_robot.py \
  --robot.type=trossen_ai_mobile \
  --control.type=record \
  --control.fps=30 \
  --control.single_task="Recording evaluation episode using Trossen AI Mobile." \
  --control.repo_id=${HF_USER}/eval_act_trossen_ai_mobile_test \
  --control.tags='["tutorial"]' \
  --control.warmup_time_s=5 \
  --control.episode_time_s=30 \
  --control.reset_time_s=30 \
  --control.num_episodes=10 \
  --control.push_to_hub=true \
  --control.policy.path=outputs/train/act_trossen_ai_mobile_test/checkpoints/last/pretrained_model \
  --control.num_image_writer_processes=1 \
  --robot.enable_motor_torque=true

python lerobot/scripts/control_robot.py \
  --robot.type=trossen_ai_solo \
  --control.type=record \
  --control.fps=30 \
  --control.single_task="Recording evaluation episode using Trossen AI Solo." \
  --control.repo_id=${HF_USER}/eval_act_trossen_ai_solo_test \
  --control.tags='["tutorial"]' \
  --control.warmup_time_s=5 \
  --control.episode_time_s=30 \
  --control.reset_time_s=30 \
  --control.num_episodes=10 \
  --control.push_to_hub=true \
  --control.policy.path=outputs/train/act_trossen_ai_solo_test/checkpoints/last/pretrained_model \
  --control.num_image_writer_processes=1

Note

You can change the camera interface to use for recording by adding the following command line argument: --robot.camera_interface='opencv'. This is useful if you have configured multiple camera interfaces as explained in Camera Serial Number.

Note

The leader arms will be disabled in evaluation mode.

This change was introduced in the Leader Arm Deactivation During Inference update to allow users to replay episodes or run inference without requiring the leader arms to be connected or initialized. Only the follower arms are needed for executing recorded trajectories or evaluating policies.

Note

If the arms move violently during evaluation because of the policy, you can smooth out the motion by increasing the motion duration. You can do this by setting a higher multiplier using the following command-line argument: --robot.min_time_to_move_multiplier=6.0

This value controls the time to reach each goal as min_time_to_move = multiplier / fps. Lower values result in quicker but potentially jerky movement; higher values improve smoothness but increase lag. A good starting value is 3.0.

Key Differences from Training Data Recording

Policy Checkpoint:
- The command includes --control.policy.path, which specifies the path to the trained policy checkpoint (e.g., outputs/train/act_trossen_ai_xxxxx_test/checkpoints/last/pretrained_model).
- If you uploaded the model checkpoint to Hugging Face Hub, you can also specify it as: –control.policy.path=${HF_USER}/act_trossen_ai_xxxxx_test.
Dataset Naming Convention:
- The dataset name now begins with eval_ (e.g., ${HF_USER}/eval_act_trossen_ai_xxxxx_test) to indicate that this is an evaluation dataset.
Image Writing Process:
- We set --control.num_image_writer_processes=1 instead of the default 0.
- On some systems, using a dedicated process for writing images (from multiple cameras) allows achieving a consistent 30 FPS during inference.
- You can experiment with different values of --control.num_image_writer_processes to optimize performance.