Higher accuracy on vision models with EfficientNet-Lite
מרץ 16, 2020
Posted by Renjie Liu, Software Engineer

In May 2019, Google released a family of image classification models called EfficientNet, which achieved state-of-the-art accuracy with an order of magnitude of fewer computations and parameters. If EfficientNet can run on edge, it opens the door for novel applications on mobile and IoT where computational resources are constrained.

Today, we are excited to announce EfficientNet-Lite (GitHub, TFHub), which runs on TensorFlow Lite and designed for performance on mobile CPU, GPU, and EdgeTPU. EfficientNet-Lite brings the power of EfficientNet to edge devices and comes in five variants, allowing users to choose from the low latency/model size option (EfficientNet-Lite0) to the high accuracy option (EfficientNet-Lite4). The largest variant, integer-only quantized EfficientNet-Lite4, achieves 80.4% ImageNet top-1 accuracy, while still running in real-time (e.g. 30ms/image) on a Pixel 4 CPU. Below is how the quantized EfficientNet-Lite models perform compared to similarly quantized version of some popular image classification models.
Figures: Integer-only quantized models running on Pixel 4 CPU with 4 threads.

Challenges: Quantization and heterogeneous hardware

The unique nature of edge devices raises several challenges.

Quantization: Since many edge devices have limited floating-point support, quantization is widely used. However, it often requires a complicated quantization-aware training procedure or poor post-training quantization model accuracy.

Thankfully, within our toolkit, we leveraged the TensorFlow Lite post-training quantization workflow to quantize the model with minimal accuracy loss.

Heterogeneous hardware: It is challenging to run the same model on a wide range of accelerators, such as mobile GPU/EdgeTPU. Due to the hardware specialization, these accelerators often perform well only for a limited set of operations. We found that some of the operations in EfficientNet are not well supported by certain accelerators.

To address the heterogeneity issue, we tailored the original EfficientNets with the following simple modifications:
  • Removed squeeze-and-excitation networks since they are not well supported
  • Replaced all swish activations with RELU6, which significantly improved the quality of post-training quantization (explained later)
  • Fixed the stem and head while scaling models up in order to reduce the size and computations of scaled models

Post-training quantization with the TensorFlow Model Optimization Toolkit

Thanks to the TensorFlow Model Optimization Toolkit, we easily quantized the model without losing much accuracy via integer-only post-training quantization (for more information, see this link). This reduced the model size by 4x and improved inference speed by 2x.
Here is how the EfficientNet-Lite0 float model compared to its quantized version in terms of accuracy & latency:
* Benchmarked on Pixel 4 CPU with 4 threads
We also want to share some of our experience about post-training quantization. When we first tried post-training quantization, we found a significant accuracy drop: Top-1 accuracy dropped from 75% to 46% on the ImageNet dataset.
We found that the issue was caused by the quantized output range being too wide. Quantization was essentially doing affine transformation of the floating-point values to fit into the int8 buckets:
Illustration about quantization
In our case, the output tensor range was from -168 to 204, as in the examples below: That's a sign that we may have lost too much accuracy as it was hard to fit the wide-ranged floating tensor into int8 ranged buckets.
To address the issue, we replaced the swish activation with "restricted-ranged" activation (relu6) because relu6 restricts the output to [0, 6]. After this change, the model Top-1 accuracy on ImageNet of the quantized model climbed back up to 74.4% from a floating-point baseline of 75.1%.

Try EfficientNet-Lite today with your dataset

Let’s bring the power of EfficientNet-Lite to your data. We suggest that you use the TensorFlow Lite Model Maker, which is a tool that enables you to apply transfer learning on existing TensorFlow models with a user’s input data and export the resulting model to a TensorFlow Lite format.
TensorFlow Lite Model Maker supports multiple model architectures, including MobileNetV2 and all variants of EfficientNet-Lite. Here is an example of how you can build an EfficientNet-Lite0 image classification model with just 5 lines of code:
# Load your custom dataset
data = ImageClassifierDataLoader.from_folder(flower_path)
train_data, test_data = data.split(0.9)

# Customize the pre-trained TensorFlow model
model = image_classifier.create(train_data, model_spec=efficienetnet_lite0_spec)

# Evaluate the model
loss, accuracy = model.evaluate(test_data)

# Export as TensorFlow Lite model.
model.export('image_classifier.tflite', 'image_labels.txt')
Try out the library with the flower classification notebook. You can easily switch to different models by changing the model_spec parameter. For a small dataset like tf_flowers, you can achieve ~92% accuracy under a few minutes with 5 epochs. Accuracy can be improved if you train with more epochs, more data, or fine-tune the whole model.
Next, let’s build a mobile app with this model. You can start with our Image Classification example—a ready-to-run mobile application built with EfficientNet-Lite. The app automatically downloads the EfficientNet-Lite models pre-trained on ImageNet dataset using Gradle tasks to the assets folder. If you want to try out your customized model created with Model Maker, you can replace it in the assets folder.
As shown in the screenshot, the EfficientNet-Lite model runs inference in real-time (>= 30 fps).

Want to know more?

Build our reference apps and play around with them (instructions). Try out EfficientNet-Lite on TensorFlow Hub and customize them for your task using TensorFlow Lite Model Maker. Learn more about TensorFlow Lite at tensorflow.org/lite, try TensorFlow model optimization, and explore more TensorFlow Lite models at tfhub.dev.

Acknowledgements

Renjie Liu, Xunkai Zhang, Tian Lin, Yuqi Li, Mingxing Tan, Khanh LeViet, Chao Mei, Amy Jang, Luiz GUStavo Martins‎, Yunlu Li, Suharsh Sivakumar‎, Raziel Alvarez, Lawrence Chan, Jess Kim, Mike Liang, Shuangfeng Li, Sarah Sirajuddin

References

[1] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks: https://arxiv.org/abs/1905.11946
Next post
 Higher accuracy on vision models with EfficientNet-Lite

Posted by Renjie Liu, Software Engineer

In May 2019, Google released a family of image classification models called EfficientNet, which achieved state-of-the-art accuracy with an order of magnitude of fewer computations and parameters. If EfficientNet can run on edge, it opens the door for novel applications on mobile and IoT where computational resources are constrained.

Today, we are excited to…