February 16, 2023 — Posted by the TensorFlow Datasets team Datasets landscape has changed a lot since TensorFlow Datasets (TFDS) was introduced about 4 years ago: TFDS made sharing or re-using a dataset significantly easier, and transformed the datasets landscape by inspiring other ML tools, libraries and services. Loading a dataset went from complicated scripts to: import tensorflow_datasets as tfds ds = tfds.load…
Posted by the TensorFlow Datasets team
Datasets landscape has changed a lot since TensorFlow Datasets (TFDS) was introduced about 4 years ago: TFDS made sharing or re-using a dataset significantly easier, and transformed the datasets landscape by inspiring other ML tools, libraries and services.
Loading a dataset went from complicated scripts to:
import tensorflow_datasets as tfds
ds = tfds.load('mnist', split='train')
for example in ds: # example is `{'image': tf.Tensor, 'label': tf.Tensor}`
print(list(example.keys()))
image = example["image"]
label = example["label"]
print(image.shape, label)
|
Read the documentation for a more extensive introduction.
Over the years, TFDS has grown to become a recognized way to load datasets. To celebrate our last 4.8.2 release, we would like to take some time to reflect on the progress and improvements made over those past years and thank the community for their support.
TFDS is still a library to facilitate download, preparation and loading of datasets for ML pipelines, but it now supports hundreds of datasets and offers the following main features:
Thank you to all of our contributors and users!
TFDS is under active development to bring you the best datasets to use as input in your ML pipelines.
Notably, we work on making transformations seamless. Sometimes, a dataset is derived from another dataset by a few transformations (e.g., data augmentation or column renaming). We want those transformations to be as easy to implement as possible. This feature is already available experimentally, don’t hesitate to give feedback on GitHub!
We are also working on making the TensorFlow dependency optional. TFDS is a framework agnostic library that provides datasets and tools to support machine learning research. TFDS does not rely on any specific machine learning framework, and we are working to make the TensorFlow dependency optional.
We have other plans too, smaller ones such as the support of partitioned datasets, and longer-term ones that could durably influence the field. Follow us on GitHub to receive future updates about those upcoming developments!
February 16, 2023 — Posted by the TensorFlow Datasets team Datasets landscape has changed a lot since TensorFlow Datasets (TFDS) was introduced about 4 years ago: TFDS made sharing or re-using a dataset significantly easier, and transformed the datasets landscape by inspiring other ML tools, libraries and services. Loading a dataset went from complicated scripts to: import tensorflow_datasets as tfds ds = tfds.load…