4.5.0
Dataset Features
-
Add lance format support by @eddyxu in https://github.com/huggingface/datasets/pull/7913
- Support for both Lance dataset (including metadata / manifests) and standalone .lance files
- e.g. with lance-format/fineweb-edu
from datasets import load_dataset ds = load_dataset("lance-format/fineweb-edu", streaming=True) for example in ds["train"]: ...
What's Changed
- Raise early for invalid
revisioninload_datasetby @Scott-Simmons in https://github.com/huggingface/datasets/pull/7929 - fix low but large example indexerror by @CloseChoice in https://github.com/huggingface/datasets/pull/7912
- Fix method to retrieve attributes from file object by @lhoestq in https://github.com/huggingface/datasets/pull/7938
- add _OverridableIOWrapper by @lhoestq in https://github.com/huggingface/datasets/pull/7942
- Add _generate_shards by @lhoestq in https://github.com/huggingface/datasets/pull/7943
New Contributors
- @eddyxu made their first contribution in https://github.com/huggingface/datasets/pull/7913
- @Scott-Simmons made their first contribution in https://github.com/huggingface/datasets/pull/7929
Full Changelog: https://github.com/huggingface/datasets/compare/4.4.2...4.5.0