https://blog.tensorflow.org/2020/03/higher-accuracy-on-vision-models-with-efficientnet-lite.html?hl=da
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjkW-QEUi2BQR3pO85B6XNSiw3DplipNxCzRUQx7RAY8jrpR6oWB22B8WUYtYZq5LR2H41PtbIjNzCK27qD3_OdrctcHSwA41YyhT7lglc69Zlx2ZKtPN7UbegeqeZvujd3iQggemU3QjE/s1600/figure1a.png
Posted by Renjie Liu, Software Engineer
In May 2019, Google released a family of image classification models called 
EfficientNet, which achieved state-of-the-art accuracy with an order of magnitude of fewer computations and parameters. If EfficientNet can run on edge, it opens the door for novel applications on mobile and IoT where computational resources are constrained.
Today, we are excited to announce 
EfficientNet-Lite (
GitHub, 
TFHub), which runs on 
TensorFlow Lite and designed for performance on mobile CPU, GPU, and EdgeTPU. EfficientNet-Lite brings the power of EfficientNet to edge devices and comes in five variants, allowing users to choose from the low latency/model size option (EfficientNet-Lite0) to the high accuracy option (EfficientNet-Lite4). The largest variant, integer-only quantized EfficientNet-Lite4, achieves 80.4% ImageNet top-1 accuracy, while still running in real-time (e.g. 30ms/image) on a Pixel 4 CPU. Below is how the quantized EfficientNet-Lite models perform compared to similarly quantized version of some popular image classification models.
|   | 
| Figures: Integer-only quantized models running on Pixel 4 CPU with 4 threads. | 
Challenges: Quantization and heterogeneous hardware
The unique nature of edge devices raises several challenges.
Quantization: Since many edge devices have limited floating-point support, quantization is widely used. However, it often requires a complicated quantization-aware training procedure or poor post-training quantization model accuracy. 
Thankfully, within our toolkit, we leveraged the 
TensorFlow Lite post-training quantization workflow to quantize the model with minimal accuracy loss.
Heterogeneous hardware: It is challenging to run the same model on a wide range of accelerators, such as mobile GPU/EdgeTPU. Due to the hardware specialization, these accelerators often perform well only for a limited set of operations. We found that some of the operations in EfficientNet are not well supported by certain accelerators.
To address the heterogeneity issue, we tailored the original EfficientNets with the following simple modifications:
- Removed squeeze-and-excitation networks since they are not well supported
 
- Replaced all swish activations with RELU6, which significantly improved the quality of post-training quantization (explained later)
 
- Fixed the stem and head while scaling models up in order to reduce the size and computations of scaled models
 
Post-training quantization with the TensorFlow Model Optimization Toolkit
Thanks to the 
TensorFlow Model Optimization Toolkit, we easily quantized the model without losing much accuracy via integer-only post-training quantization (for more information, see 
this link). This reduced the model size by 4x and improved inference speed by 2x. 
Here is how the EfficientNet-Lite0 float model compared to its quantized version in terms of accuracy & latency: 
|  | 
| * Benchmarked on Pixel 4 CPU with 4 threads | 
We also want to share some of our experience about post-training quantization. When we first tried post-training quantization, we found a significant accuracy drop: Top-1 accuracy dropped from 75% to 46% on the ImageNet dataset. 
We found that the issue was caused by the quantized output range being too wide. Quantization was essentially doing affine transformation of the floating-point values to fit into the int8 buckets:  
|  | 
| Illustration about quantization | 
In our case, the output tensor range was from -168 to 204, as in the examples below: 

 
That's a sign that we may have lost too much accuracy as it was hard to fit the wide-ranged floating tensor into int8 ranged buckets. 
To address the issue, we replaced the 
swish activation with "restricted-ranged" activation (relu6) because relu6 restricts the output to [0, 6]. After this change, the model Top-1 accuracy on ImageNet of the quantized model climbed back up to 74.4% from a floating-point baseline of 75.1%. 
Try EfficientNet-Lite today with your dataset
Let’s bring the power of EfficientNet-Lite to your data. We suggest that you use the 
TensorFlow Lite Model Maker, which is a tool that enables you to apply transfer learning on existing TensorFlow models with a user’s input data and export the resulting model to a TensorFlow Lite format.   
TensorFlow Lite Model Maker supports multiple model architectures, including MobileNetV2 and all variants of EfficientNet-Lite. Here is an example of how you can build an EfficientNet-Lite0 image classification model with just 5 lines of code: 
# Load your custom dataset
data = ImageClassifierDataLoader.from_folder(flower_path)
train_data, test_data = data.split(0.9)
# Customize the pre-trained TensorFlow model
model = image_classifier.create(train_data, model_spec=efficienetnet_lite0_spec)
# Evaluate the model
loss, accuracy = model.evaluate(test_data)
# Export as TensorFlow Lite model.
model.export('image_classifier.tflite', 'image_labels.txt')
Try out the library with the flower classification 
notebook. You can easily switch to different models by changing the 
model_spec parameter. For a small dataset like 
tf_flowers, you can achieve 
~92% accuracy under a few minutes with 5 epochs. Accuracy can be improved if you train with more epochs, more data, or fine-tune the whole model. 
Next, let’s build a mobile app with this model. You can start with our 
Image Classification example—a ready-to-run mobile application built with EfficientNet-Lite. The app automatically downloads the EfficientNet-Lite models pre-trained on ImageNet dataset using 
Gradle tasks to the 
assets folder. If you want to try out your customized model created with Model Maker, you can replace it in the 
assets folder. 
As shown in the screenshot, the EfficientNet-Lite model runs inference in real-time (>= 30 fps). 
 
Want to know more?
Build our reference apps and play around with them (
instructions). Try out 
EfficientNet-Lite on TensorFlow Hub and customize them for your task using 
TensorFlow Lite Model Maker. Learn more about TensorFlow Lite at 
tensorflow.org/lite, try
 TensorFlow model optimization, and explore more TensorFlow Lite models at 
tfhub.dev.
Acknowledgements
Renjie Liu, Xunkai Zhang, Tian Lin, Yuqi Li, Mingxing Tan, Khanh LeViet, Chao Mei, Amy Jang, Luiz GUStavo Martins, Yunlu Li, Suharsh Sivakumar, Raziel Alvarez, Lawrence Chan, Jess Kim, Mike Liang, Shuangfeng Li, Sarah Sirajuddin 
References
[1] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks: 
https://arxiv.org/abs/1905.11946