How CEVA uses TensorFlow Lite for Always-On Speech Recognition on the Edge

Oktober 16, 2020

A guest article by Ido Gus of CEVA

CEVA is a leading licensor of wireless connectivity and smart sensing technologies. Our products help OEMs design power-efficient, intelligent and connected devices for a range of end markets, including mobile, consumer, automotive, robotics, industrial and IoT.

In this article, we'll describe how we used TensorFlow Lite for Microcontrollers (TFLM) to deploy a speech recognition engine and frontend, called WhisPro, on a bare-metal development board based on our CEVA-BX DSP core. WhisPro detects always-on wake words and speech commands efficiently, on-device.

Figure 1 CEVA Multi-microphone DSP Development Board

About WhisPro

WhisPro is a speech recognition engine and frontend targeted to run on low power, resource constrained edge devices. It is designed to handle the entire data flow from processing audio samples to detection.

WhisPro supports two use cases for edge devices:

Always-on wake word detection engine. In this use case, WhisPro’s role is to wake a device in sleep mode when a predefined phrase is detected.
Speech commands. In this use case, WhisPro’s role is to enable a voice-based interface. Users can control the device using their voice. Typical commands can be: volume up, volume down, play, stop, etc.

WhisPro enables voice interface on any SoC that has a CEVA BX DSP core integrated into it, lowering entry barriers to OEMs and ODM interested in joining the voice interface revolution.

Our Motivation

Originally, WhisPro was implemented using an in-house neural network library called CEVA NN Lib. Although that implementation achieved excellent performance, the development process was quite involved. We realized that, if we ported the TFLM runtime library and optimized it for our target hardware, the entire model porting process would become transparent and more reliable (far fewer lines of code would need to be written, modified, and maintained).

Building TFLM for CEVA-BX DSP Family

The first thing we had to do is to figure out how to port TFLM to our own platform. We found that following this porting to a new platform guide to be quite useful.
Following the guide, we:

Verified DebugLog() implementation is supported by our platform.
Created a TFLM runtime library project in CEVA’s Eclipse-based IDE:
- Created a new CEVA-BX project in CEVA’s IDE
- Added all the required source files to the project
Built the TFLM runtime library for the CEVA-BX core.
This required the usual fiddling with compiler flags, including paths (not all required files are under the “micro” directory), linker script, and so on.

Model Porting Process

Our starting point is a Keras implementation of our model. Let’s look at the steps we took to deploy our model on our bare-metal target hardware:

Converted theTensorFlow model to TensorFlow Lite using the TF built-in converter:

$ python3 -m tensorflow_docs.tools.nbfmt [options] notebook.ipynb

```
converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)
converter.experimental_new_converter = True
tflite_model = converter.convert()
open("converted_to_tflite_model.tflite", "wb").write(tflite_model)
```

Used quantization:

$ python3 -m tensorflow_docs.tools.nbfmt [options] notebook.ipynb


```
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.representative_dataset = representative_data_gen
```

Converted the TensorFlow Lite model to TFLM using xxd:

$ python3 -m tensorflow_docs.tools.nbfmt [options] notebook.ipynb
  

```
$> xxd –I model.tflite > model.cc
```

Here we found that some of the model layers (for example, GRU) were not properly supported (at the time) by TFLM. It is very reasonable to assume that, as TFLM continues to mature and Google and the TFLM community invest more in it, issues like this will become rarer. 
In our case, though, we opted to re-implement the GRU layers in terms of Fully Connected layers, which was surprisingly easy.

Integration

The next step was to integrate the TFLM runtime library and the converted model into our existing embedded C frontend, which handles audio preprocessing and feature extraction.

Even though our frontend was not written with TFLM in mind, it was modular enough to allow easy integration by implementation of a single simple wrapper function, as follows:

Linked the TFLM runtime library into our embedded C application (WhisPro frontend)
Implemented a wrapper-over-setup function for mapping the model into a usable data structure, allocating the interpreter and tensors
Implemented a wrapper-over-execute function for mapping data passed from the WhisPro frontend into tflite tensors used by the actual execute function
Replaced the call to the original model execute function with a call to the TFLM implementation

Process Visualization

The process we described is performed by two components:

The microcontroller supplier, in this case, CEVA – is responsible for optimizing TFLM for its hardware architecture.
The microcontroller user, in this case, CEVA WhisPro developer – is responsible for deploying a neural network based model, using an optimized TFLM runtime library, on the target microcontroller.

What’s Next

This work has proven the importance of the TFLM platform to us, and the significant value supporting TFLM can add to our customers and partners by enabling easy neural network model deployment on edge devices. We are committed to further support TFLM on the CEVA-BX DSP family by:

Active contribution to the TFLM project, with the goal of improving layer coverage and overall platform maturity.
Investing in TFLM operator optimization for execution on CEVA-BX cores, aiming for full coverage.

Final Thoughts

While the porting process had some bumps along the way, at the end it was a great success, and took about 4-5 days' worth of work. Implementing a model in C from scratch, and handcrafting model conversion scripts from Python to C, could take 2-3 weeks (and lots of debugging).

CEVA Technology Virtual Seminar

To learn more, you are welcome to watch CEVA’s virtual seminar - Wireless Audio session, covering TFLM, amongst other topics.

How CEVA uses TensorFlow Lite for Always-On Speech Recognition on the Edge

Community · TensorFlow Lite ·

How CEVA uses TensorFlow Lite for Always-On Speech Recognition on the Edge

Oktober 16, 2020 — A guest article by Ido Gus of CEVA CEVA is a leading licensor of wireless connectivity and smart sensing technologies. Our products help OEMs design power-efficient, intelligent and connected devices for a range of end markets, including mobile, consumer, automotive, robotics, industrial and IoT. In this article, we'll describe how we used TensorFlow Lite for Microcontrollers (TFLM) to deplo…