https://blog.tensorflow.org/2020/07/tensorflow-operation-fusion-in-tensorflow-lite-converter.html?hl=lt
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEissIjoB9vu3WmdVTKr384peIUWMj-vsMo2i9lEuKHkjJa4LBcz9DmeGMJr1xrS9Gk7ET7o9end-68OxclfKumyUKHKSHkDI4YAHw6wiwwfmLyu3_SWkZyI8y90adYsv9Z3YQ8uL4UmYj4/s1600/op_fusion_banner.jpg
Posted by Ashwin Murthy, Software Engineer, TensorFlow team @ Google
Overview
Efficiency and performance are critical for edge deployments. TensorFlow Lite achieves this by means of fusing and optimizing a series of more granular TensorFlow operations (which themselves are composed of composite operations, like LSTM) into a single executable TensorFlow Lite unit.
Many users have asked us for more granular control of the way operations can be fused to achieve greater performance improvements. Today, we are delivering just that by providing users with the ability to specify how operations can be fused.
Furthermore, this new capability allows for seamless conversion of TensorFlow Keras LSTM operations—one of our most requested features. And to top it off, you can now plug in a user-defined RNN conversion to TensorFlow Lite!
Fused operations are more efficient
As mentioned earlier, TensorFlow operations are typically composed of a number of primitive, more granular operations, such as
tf.add. This is important in order to have a level of reusability, enabling users to create operations that are a composition of existing units. An example of a composite operation is
tf.einsum. Executing a composite operation is equivalent to executing each of its constituent operations.
However, with efficiency in mind, it is common to “fuse” the computation of a set of more granular operations into a single operation.
Another use for fused operations is providing a higher level interface to define complex transformations like quantization, which would otherwise be infeasible or very hard to do at a more granular level.
Concrete examples of fused operations in TensorFlow Lite include various RNN operations like Unidirectional and Bidirectional sequence LSTM, convolution (conv2d, bias add, relu), fully connected (matmul, bias add, relu) and more.
Fusing TensorFlow operations into TensorFlow Lite operations has historically been
challenging until now!
Out-of-the-box RNN conversion and other composite operation support
Out-of-the-box RNN conversion
We now support conversion of
Keras LSTM and
Keras Bidirectional LSTM, both of which are composite TensorFlow operations. This is the simplest way to get RNN-based models to take advantage of the efficient LSTM fused operations in TensorFlow Lite. See
this notebook for end-to-end keras LSTM to TensorFlow Lite conversion and execution via the TensorFlow Lite interpreter.
Furthermore, we enabled conversion to any other TensorFlow RNN implementation by providing a convenient interface to the conversion infrastructure. You can see a couple of examples of this capability using
lingvo’s LSTMCellSimple and
LayerNormalizedLSTMCellSimple RNN implementations.
For more information, please look at our RNN conversion
documentation.
Note: We are working on adding quantization support for TensorFlow Lite’s LSTM operations. This will be announced in the future.
Extending conversion to other composite operations
We extended the
TensorFlow Lite converter to enable conversion of other composite TensorFlow operations into existing or custom TensorFlow Lite operations.
The following steps are needed to implement a TensorFlow operation fusion to TensorFlow Lite:
- Wrap the composite operation in a tf.function. In the TensorFlow model source code, identify and abstract out the composite operation into a tf.function with the experimental_implements function annotation.
- Write conversion code. Conceptually, the conversion code replaces the composite implementation of this interface with the fused one. In the prepare-composite-functions pass, plug in your conversion code.
- Invoke the TensorFlow Lite converter. Use the TFLiteConverter.from_saved_model API to convert to TensorFlow Lite.
For the overall architecture of this infrastructure, see
here. For detailed steps with code examples, see
here. To learn how operation fusion works under the hood, see the detailed
documentation.
Feedback
Please email
tflite@tensorflow.org or create a
GitHub issue with the component label “TFLiteConverter”.
Acknowledgements
This work would not have been possible without the efforts of Renjie Liu, a key collaborator on this project since its inception. We would like to thank Raziel Alvarez for his leadership and guidance. We would like to thank Jaesung Chung, Scott Zhu, Sean Silva, Mark Sandler, Andrew Selle, Qiao Liang and River Riddle for important contributions. We would like to acknowledge Sarah Sirajuddin, Jared Duke, Lawrence Chan, Tim Davis and the TensorFlow Lite team as well as Tatiana Shpeisman, Jacques Pienaar and the Google MLIR team for their active support of this work.