Warning
You are viewing the documentation for a version of the Interbotix ALOHA stack that is no longer supported. Please upgrade to keep up to date with the latest features. See the v2.0 documentation for more information.
Hugging Face Guide
Uploading and Downloading Datasets on Hugging Face
Creating an Account
If you don’t already have an account, sign up for a new account on the Hugging Face Sign Up.
Creating a New Dataset Repository
Web Interface (Repository)
- Navigate to the Hugging Face website.
- Log in to your account.
- Click on your profile picture in the top-right corner and select “New dataset.”
- Follow the on-screen instructions to create a new dataset repository.
Command Line Interface (CLI)
Ensure you have the huggingface_hub library installed.
Use the following Python script to create a new repository:
from huggingface_hub import HfApi api = HfApi() api.create_repo(repo_id="username/repository_name", repo_type="dataset")
For more information on creating repositories, refer to the Hugging Face Repositories.
Uploading Your Dataset
You have two primary methods to upload datasets: through the web interface or using the Python API.
Web Interface (Upload Dataset)
- Navigate to your dataset repository on the Hugging Face website.
- Click on the “Files and versions” tab.
- Drag and drop your dataset files into the files section.
- Click “Commit changes” to save the files in the repository.
Python API
You can use the following Python script to upload your dataset:
from huggingface_hub import HfApi
api = HfApi()
api.upload_folder(
folder_path="path/to/dataset",
repo_id="username/repository_name",
repo_type="dataset",
)
Example:
from huggingface_hub import HfApi
api = HfApi()
api.upload_folder(
folder_path="~/aloha_data/aloha_stationary_block_pickup",
repo_id="TrossenRoboticsCommunity/aloha_static_datasets",
repo_type="dataset",
)
For more information on uploading datasets, refer to the Hugging Face Uploading.
Downloading Datasets
You can download datasets either by cloning the repository or using the Hugging Face CLI.
Cloning the Repository
To clone the repository, use the following command:
$ git clone https://huggingface.co/datasets/username/repository_name
Using the Hugging Face CLI
You can also use the Hugging Face CLI to download datasets with the following Python script:
from huggingface_hub import snapshot_download # Download the dataset snapshot_download( repo_id="username/repository_name", repo_type="dataset", local_dir="path/to/local/directory", allow_patterns="*.hdf5" )
Note
- The dataset episodes are stored in
.hdf5
format. Therefore, ensure that you only allow these patterns during download.
For more information on downloading datasets, refer to the Hugging Face Datasets.
Additional Information
- Repository Management: Utilize the Hugging Face Hub documentation for detailed instructions on managing repositories, handling versions, and setting permissions.
- Dataset Formats: Hugging Face supports various dataset formats. For this guide, we specifically use the Aloha’s native
.hdf5
format. - Community Support: If you encounter any issues, refer to the Hugging Face community forums for additional support.
By following this guide, you should be able to seamlessly upload and download datasets using the Hugging Face platform. For more detailed guides and examples, refer to the Hugging Face Documentation.