Introducing Wake Vision: A High-Quality, Large-Scale Dataset for TinyML Computer Vision Applications
diciembre 05, 2024
Posted by Colby Banbury, Emil Njor, Andrea Mattia Garavagno, Vijay Janapa Reddi – Harvard University

TinyML is an exciting frontier in machine learning, enabling models to run on extremely low-power devices such as microcontrollers and edge devices. However, the growth of this field has been stifled by a lack of tailored large and high-quality datasets. That's where Wake Vision comes in—a new dataset designed to accelerate research and development in TinyML.

A vibrant, abstract representation of a human figure is formed by swirling lines and dots of rainbow colors. A large, bright blue eye is centrally located on the figure's torso.

Why TinyML Needs Better Data

The development of TinyML requires compact and efficient models, often only a few hundred kilobytes in size. The applications targeted by standard machine learning datasets, like ImageNet, are not well-suited for these highly constrained models.

Existing datasets for TinyML, like Visual Wake Words (VWW), have laid the groundwork for progress in the field. However, their smaller size and inherent limitations pose challenges for training production-grade models. Wake Vision builds upon this foundation by providing a large, diverse, and high-quality dataset specifically tailored for person detection—the cornerstone vision task for TinyML.

What Makes Wake Vision Different?

A table displaying the number of images used for training, validation, and testing different datasets, including Wake Vision, Visual Wake Words, CIFAR-100, and PASCAL VOC 2012. The table shows the total number of images and the number of person images in each dataset split.

Wake Vision is a new, large-scale dataset with roughly 6 million images, almost 100 times larger than VWW, the previous state-of-the-art dataset for person detection in TinyML. The dataset provides two distinct training sets:

  • Wake Vision (Large): Prioritizes dataset size.
  • Wake Vision (Quality): Prioritizes label quality.

Wake Vision's comprehensive filtering and labeling process significantly enhances the dataset's quality.

Why Data Quality Matters for TinyML Models

In traditional overparameterized models, it is widely believed that data quantity matters more than data quality, as an overparameterized model can adapt to errors in the training data. But according to the image below, TinyML tells a different story:

Five line graphs illustrate the Wake Vision Test Score with varying percentages of training data quality used, comparing models by parameter count (78K, 309K, 1.2M, 4.9M, and 11M) and  error rate (7%, 15%, and 30%).

The figure above shows that high-quality labels (less error) are more beneficial for under-parameterized models than simply having more data. Larger, error-prone datasets can still be valuable when paired with fine-grained techniques.

By providing two versions of the training set, Wake Vision enables researchers to explore the balance between dataset size and quality effectively.

Real-World Testing: Wake Vision's Fine-Grained Benchmarks

Five images are shown, each with text underneath describing the content as Perceived Older Person, Near Person, Bright Image, Perceived Female Person, and Depicted Person.

Unlike many open-source datasets, Wake Vision offers fine-grained benchmarks and detailed tests for real-world applications like those shown in the above figure. These enable the evaluation of model performance in real-world scenarios, such as:

  • Distance: How well the model detects people at various distances from the camera.
  • Lighting Conditions: Performance in well-lit vs. poorly-lit environments.
  • Depictions: Handling of varied representations of people (e.g., drawings, sculptures).
  • Perceived Gender and Age: Detecting biases across genders and age groups.

These benchmarks give researchers a nuanced understanding of model performance in specific, real-world contexts and help identify potential biases and limitations early in the design phase.

Key Performance Gains With Wake Vision

The performance gains achieved using Wake Vision are impressive:

  • Up to a 6.6% increase in accuracy over the established VWW dataset.
  • Error rate reduction from 7.8% to 2.2% with manual label validation on evaluation sets.
  • Robustness across various real-world conditions, from lighting to perceived age and gender.

Furthermore, combining the two Wake Vision training sets, using the larger set for pre-training and the quality set for fine-tuning, yields the best results, highlighting the value of both datasets when used in sophisticated training pipelines.

Wake Vision Leaderboard: Track and Submit New Top-Performing Models

The Wake Vision website features a Leaderboard, providing a dedicated platform to assess and compare the performance of models trained on the Wake Vision dataset.

The leaderboard enables a clear and detailed view of how models perform under various conditions, with performance metrics like accuracy, error rates, and robustness across diverse real-world scenarios. It’s an excellent resource for both seasoned researchers and newcomers looking to improve and validate their approaches.

Explore the leaderboard to see the current rankings, learn from high-performing models, and submit your own to contribute to advancing the state of the art in TinyML person detection.

Making Wake Vision Easy to Access

Wake Vision is available through popular dataset services such as:

With its permissive license (CC-BY 4.0), researchers and practitioners can freely use and adapt Wake Vision for their TinyML projects.

Get Started with Wake Vision Today!

The Wake Vision team has made the dataset, code, and benchmarks publicly available to accelerate TinyML research and enable the development of better, more reliable person detection models for ultra-low-power devices.

To learn more and access the dataset, visit Wake Vision’s website, where you can also check out a leaderboard of top-performing models on the Wake Vision dataset - and see if you can create better performing models!

Next post
Introducing Wake Vision: A High-Quality, Large-Scale Dataset for TinyML Computer Vision Applications

Posted by Colby Banbury, Emil Njor, Andrea Mattia Garavagno, Vijay Janapa Reddi – Harvard UniversityTinyML is an exciting frontier in machine learning, enabling models to run on extremely low-power devices such as microcontrollers and edge devices. However, the growth of this field has been stifled by a lack of tailored large and high-quality datasets. That's where Wake Vision comes in—a new …