Training and Evaluating a Policy

Training a Policy

To train a policy to control your robot, use the train.py script. A few arguments are required. Here is an example command:

python lerobot/scripts/train.py \
  --dataset.repo_id=${HF_USER}/trossen_ai_bimanual_test \
  --policy.type=act \
  --output_dir=outputs/train/act_trossen_ai_bimanual_test \
  --job_name=act_trossen_ai_bimanual_test \
  --device=cuda \
  --wandb.enable=true

Explanation of the Command

  1. We provided the dataset using --dataset.repo_id=${HF_USER}/trossen_ai_bimanual_test.

  2. We specified the policy with --policy.type=act, which loads configurations from configuration_act.py. This policy will automatically adapt to the number of motor states, motor actions, and cameras recorded in your dataset.

  3. We set --device=cuda to train on an NVIDIA GPU. If using Apple Silicon, you can replace it with --device=mps.

  4. We enabled Weights & Biases logging using --wandb.enable=true for visualizing training plots. This is optional, but if used, ensure you’re logged in by running:

    wandb login
    

Note

Training will take several hours. Checkpoints will be saved in: outputs/train/act_trossen_ai_bimanual_test/checkpoints.

Training Pipeline Configuration

The training pipeline can be configured using the following parameters:

  • dataset: Configuration for the dataset.

  • env: Configuration for the environment. Can be None.

  • policy: Configuration for the pre-trained policy. Can be None.

  • output_dir: Directory to save all run outputs. If another training session is run with the same value, its contents will be overwritten unless resume is set to true.

  • job_name: Name of the job. Can be None.

  • resume: Set to true to resume a previous run. Ensure output_dir is the directory of an existing run with at least one checkpoint.

  • device: Device to use for training (e.g., cuda, cpu, mps).

  • use_amp: Determines whether to use Automatic Mixed Precision (AMP) for training and evaluation.

  • seed: Seed for training and evaluation environments.

  • num_workers: Number of workers for the dataloader.

  • batch_size: Batch size for training.

  • eval_freq: Frequency of evaluation during training.

  • log_freq: Frequency of logging during training.

  • save_checkpoint: Whether to save checkpoints during training.

  • save_freq: Frequency of saving checkpoints.

  • offline: Configuration for offline training.

  • online: Configuration for online training.

  • use_policy_training_preset: Whether to use policy training preset.

  • optimizer: Configuration for the optimizer. Can be None.

  • scheduler: Configuration for the learning rate scheduler. Can be None.

  • eval: Configuration for evaluation.

  • wandb: Configuration for Weights & Biases logging.

Evaluating Your Policy

You can use the record function from lerobot/scripts/control_robot.py but with a policy checkpoint as input. Run the following command to record 10 evaluation episodes:

python lerobot/scripts/control_robot.py \
  --robot.type=trossen_ai_bimanual \
  --control.type=record \
  --control.fps=30 \
  --control.single_task="Grasp a lego block and put it in the bin." \
  --control.repo_id=${HF_USER}/eval_act_trossen_ai_bimanual_test \
  --control.tags='["tutorial"]' \
  --control.warmup_time_s=5 \
  --control.episode_time_s=30 \
  --control.reset_time_s=30 \
  --control.num_episodes=10 \
  --control.push_to_hub=true \
  --control.policy.path=outputs/train/act_trossen_ai_bimanual_test/checkpoints/last/pretrained_model \
  --control.num_image_writer_processes=1

Key Differences from Training Data Recording

  1. Policy Checkpoint:

    • The command includes --control.policy.path, which specifies the path to the trained policy checkpoint (e.g., outputs/train/act_trossen_ai_bimanual_test/checkpoints/last/pretrained_model).

    • If you uploaded the model checkpoint to Hugging Face Hub, you can also specify it as: –control.policy.path=${HF_USER}/act_trossen_ai_bimanual_test.

  2. Dataset Naming Convention:

    • The dataset name now begins with eval_ (e.g., ${HF_USER}/eval_act_trossen_ai_bimanual_test) to indicate that this is an evaluation dataset.

  3. Image Writing Process:

    • We set --control.num_image_writer_processes=1 instead of the default 0.

    • On some systems, using a dedicated process for writing images (from multiple cameras) allows achieving a consistent 30 FPS during inference.

    • You can experiment with different values of --control.num_image_writer_processes to optimize performance.