Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.
Endigest AI Core Summary
Hugging Face introduces Storage Buckets, a mutable S3-like object storage system on the Hub designed for intermediate ML artifacts such as checkpoints, optimizer states, and processed datasets.
•Buckets are built on Xet, a chunk-based backend that deduplicates content across files, reducing bandwidth and storage costs for related ML artifacts like successive checkpoints
•Pre-warming allows users to bring hot data closer to specific cloud provider regions (AWS and GCP supported initially) for faster access during distributed training
•Accessible via hf CLI, Python (huggingface_hub v1.5.0+), JavaScript (@huggingface/hub v2.10.5+), and fsspec-compatible filesystem integration
•Libraries like pandas, Polars, and Dask can read/write Bucket data using hf:// paths with no extra setup
•
Enterprise billing is based on deduplicated storage, so shared chunks across files directly reduce the billed footprint
This summary was automatically generated by AI based on the original article and may not be fully accurate.