October 16, 2020 — A guest article by Ido Gus of CEVA CEVA is a leading licensor of wireless connectivity and smart sensing technologies. Our products help OEMs design power-efficient, intelligent and connected devices for a range of end markets, including mobile, consumer, automotive, robotics, industrial and IoT. In this article, we'll describe how we used TensorFlow Lite for Microcontrollers (TFLM) to deplo…
A guest article by Ido Gus of CEVA
CEVA is a leading licensor of wireless connectivity and smart sensing technologies. Our products help OEMs design power-efficient, intelligent and connected devices for a range of end markets, including mobile, consumer, automotive, robotics, industrial and IoT.
In this article, we'll describe how we used TensorFlow Lite for Microcontrollers (TFLM) to deploy a speech recognition engine and frontend, called WhisPro, on a bare-metal development board based on our CEVA-BX DSP core. WhisPro detects always-on wake words and speech commands efficiently, on-device.
Figure 1 CEVA Multi-microphone DSP Development Board
WhisPro is a speech recognition engine and frontend targeted to run on low power, resource constrained edge devices. It is designed to handle the entire data flow from processing audio samples to detection.
WhisPro supports two use cases for edge devices:
WhisPro enables voice interface on any SoC that has a CEVA BX DSP core integrated into it, lowering entry barriers to OEMs and ODM interested in joining the voice interface revolution.
Originally, WhisPro was implemented using an in-house neural network library called CEVA NN Lib. Although that implementation achieved excellent performance, the development process was quite involved. We realized that, if we ported the TFLM runtime library and optimized it for our target hardware, the entire model porting process would become transparent and more reliable (far fewer lines of code would need to be written, modified, and maintained).
The first thing we had to do is to figure out how to port TFLM to our own platform. We found that following this porting to a new platform guide to be quite useful.
Following the guide, we:
Our starting point is a Keras implementation of our model. Let’s look at the steps we took to deploy our model on our bare-metal target hardware:
Converted theTensorFlow model to TensorFlow Lite using the TF built-in converter:
$ python3 -m tensorflow_docs.tools.nbfmt [options] notebook.ipynb
```
converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)
converter.experimental_new_converter = True
tflite_model = converter.convert()
open("converted_to_tflite_model.tflite", "wb").write(tflite_model)
```
Used quantization:
$ python3 -m tensorflow_docs.tools.nbfmt [options] notebook.ipynb
```
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.representative_dataset = representative_data_gen
```
Converted the TensorFlow Lite model to TFLM using xxd:
$ python3 -m tensorflow_docs.tools.nbfmt [options] notebook.ipynb
```
$> xxd –I model.tflite > model.cc
```
Here we found that some of the model layers (for example, GRU) were not properly supported (at the time) by TFLM. It is very reasonable to assume that, as TFLM continues to mature and Google and the TFLM community invest more in it, issues like this will become rarer.
In our case, though, we opted to re-implement the GRU layers in terms of Fully Connected layers, which was surprisingly easy.
The next step was to integrate the TFLM runtime library and the converted model into our existing embedded C frontend, which handles audio preprocessing and feature extraction.
Even though our frontend was not written with TFLM in mind, it was modular enough to allow easy integration by implementation of a single simple wrapper function, as follows:
The process we described is performed by two components:
This work has proven the importance of the TFLM platform to us, and the significant value supporting TFLM can add to our customers and partners by enabling easy neural network model deployment on edge devices. We are committed to further support TFLM on the CEVA-BX DSP family by:
While the porting process had some bumps along the way, at the end it was a great success, and took about 4-5 days' worth of work. Implementing a model in C from scratch, and handcrafting model conversion scripts from Python to C, could take 2-3 weeks (and lots of debugging).
To learn more, you are welcome to watch CEVA’s virtual seminar - Wireless Audio session, covering TFLM, amongst other topics.
October 16, 2020 — A guest article by Ido Gus of CEVA CEVA is a leading licensor of wireless connectivity and smart sensing technologies. Our products help OEMs design power-efficient, intelligent and connected devices for a range of end markets, including mobile, consumer, automotive, robotics, industrial and IoT. In this article, we'll describe how we used TensorFlow Lite for Microcontrollers (TFLM) to deplo…