Hugging Face Guide

Uploading and Downloading Datasets on Hugging Face

Creating an Account

If you don’t already have an account, sign up for a new account on the Hugging Face Sign Up.

Creating a New Dataset Repository

Web Interface

  1. Navigate to the Hugging Face website.
  2. Log in to your account.
  3. Click on your profile picture in the top-right corner and select “New dataset.”
  4. Follow the on-screen instructions to create a new dataset repository.

Command Line Interface (CLI)

  1. Ensure you have the huggingface_hub library installed.

  2. Use the following Python script to create a new repository:

    from huggingface_hub import HfApi
    api = HfApi()
    
    api.create_repo(repo_id="username/repository_name", repo_type="dataset")
    

For more information on creating repositories, refer to the Hugging Face Repositories.

Uploading Your Dataset

You have two primary methods to upload datasets: through the web interface or using the Python API.

Web Interface

  1. Navigate to your dataset repository on the Hugging Face website.
  2. Click on the “Files and versions” tab.
  3. Drag and drop your dataset files into the files section.
  4. Click “Commit changes” to save the files in the repository.

Python API

You can use the following Python script to upload your dataset:

from huggingface_hub import HfApi
api = HfApi()

api.upload_folder(
    folder_path="path/to/dataset",
    repo_id="username/repository_name",
    repo_type="dataset",
)

Example:

from huggingface_hub import HfApi
api = HfApi()

api.upload_folder(
    folder_path="~/aloha_data/aloha_stationary_block_pickup",
    repo_id="TrossenRoboticsCommunity/aloha_static_datasets",
    repo_type="dataset",
)

For more information on uploading datasets, refer to the Hugging Face Uploading.

Downloading Datasets

You can download datasets either by cloning the repository or using the Hugging Face CLI.

Cloning the Repository

To clone the repository, use the following command:

$ git clone https://huggingface.co/datasets/username/repository_name

Using the Hugging Face CLI

You can also use the Hugging Face CLI to download datasets with the following Python script:

from huggingface_hub import snapshot_download

# Download the dataset
snapshot_download(
    repo_id="username/repository_name",
    repo_type="dataset",
    local_dir="path/to/local/directory",
    allow_patterns="*.hdf5"
)

Note

  • The dataset episodes are stored in .hdf5 format. Therefore, ensure that you only allow these patterns during download.

For more information on downloading datasets, refer to the Hugging Face Datasets.

Additional Information

  • Repository Management: Utilize the Hugging Face Hub documentation for detailed instructions on managing repositories, handling versions, and setting permissions.
  • Dataset Formats: Hugging Face supports various dataset formats. For this guide, we specifically use the Aloha’s native .hdf5 format.
  • Community Support: If you encounter any issues, refer to the Hugging Face community forums for additional support.

By following this guide, you should be able to seamlessly upload and download datasets using the Hugging Face platform. For more detailed guides and examples, refer to the Hugging Face Documentation.