Supercharging the TensorFlow.js WebAssembly backend with SIMD and multi-threading
9月 02, 2020
Posted by Ann Yuan and Marat Dukhan, Software Engineers at Google

In March we introduced a new WebAssembly (Wasm) accelerated backend for TensorFlow.js (scroll further down to learn more about Wasm and why this is important). Today we are excited to announce a major performance update: as of TensorFlow.js version 2.3.0, our Wasm backend has become up to 10X faster by leveraging SIMD (vector) instructions and multithreading via XNNPACK, a highly optimized library of neural network operators.

Benchmarks

SIMD and multithreading bring major performance improvements to our Wasm backend. Below are benchmarks in Google Chrome that demonstrate the improvements on BlazeFace - a light model with 0.1 million parameters and about 20 million multiply-add operations:

(times listed are milliseconds per inference)

Larger models, such as MobileNet V2, a medium-sized model with 3.5 million parameters and roughly 300 million multiply-add operations, attain even greater speedups:

*Note: Benchmarks for the TF.js multi-threaded Wasm backend are not available for Pixel 4 because multi-threading support in mobile browsers is still a work-in-progress. SIMD support in iOS is also still under development.

**Note: Node support for the TF.js multi-threaded Wasm backend is coming soon.

The performance gains from SIMD and multithreading are independent of each other. These benchmarks show that SIMD brings a 1.7-4.5X performance improvement to plain Wasm, and multithreading brings another 1.8-2.9X speedup on top of that.


Usage

SIMD is supported as of TensorFlow.js 2.1.0, and multithreading is supported as of TensorFlow.js 2.3.0.

At runtime we test for SIMD and multithreading support and serve the appropriate Wasm binary. Today we serve a different binary for each of the following cases:
  • Default: The runtime does not support SIMD or multithreading
  • SIMD: The runtime supports SIMD but not multithreading
  • SIMD + multithreading: The runtime supports SIMD and multithreading
Since most runtimes that support multi-threading also support SIMD, we decided to omit the multi-threading-only runtime to keep our bundle size down. This means that if your runtime supports multithreading but not SIMD, you will be served the default binary. There are two ways to use the Wasm backend:
  1. With NPM
    // Import @tensorflow/tfjs or @tensorflow/tfjs-core
    const tf = require('@tensorflow/tfjs');
    // Add the WAsm backend to the global backend registry.
    require('@tensorflow/tfjs-backend-wasm');
     
    // Set the backend to WAsm and wait for the module to be ready.
    tf.setBackend('wasm').then(() => main());
    The library expects the Wasm binaries to be located relative to the main JS file. If you’re using a bundler such as parcel or webpack, you may need to manually indicate the location of the Wasm binaries with our setWasmPaths helper:
    import {setWasmPaths} from '@tensorflow/tfjs-backend-wasm';
    setWasmPaths(yourCustomFolder);
    tf.setBackend('wasm').then(() => {...});
    See the “Using bundlers” section in our README for more information.
  2. With script tags
    <!-- Import @tensorflow/tfjs or @tensorflow/tfjs-core -->
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
     
    <!-- Adds the WAsm backend to the global backend registry -->
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-wasm/dist/tf-backend-wasm.js"></script>
     
    <script>
      tf.setBackend('wasm').then(() => main());
    </script>
    NOTE: TensorFlow.js defines a priority for each backend and will automatically choose the best supported backend for a given environment. Today, WebGL has the highest priority, followed by Wasm, then the vanilla JS backend. To always use the Wasm backend, we need to explicitly call tf.setBackend(‘wasm’).

Demo

To see the performance improvements for yourself, check out this demo of our BlazeFace model, which has been updated to use the new Wasm backend: https://tfjs-wasm-simd-demo.netlify.app/ To compare against the unoptimized binary, try this version of the demo, which manually turns off SIMD and multithreading support.

What is Wasm?

WebAssembly (Wasm) is a cross-browser binary format that brings near-native code execution speed to the web. Wasm serves as a compilation target for programs written in statically typed high level languages, such as C, C++, Go, and Rust. In TensorFlow.js, we implement our Wasm backend in C++ and compile with Emscripten. The XNNPACK library provides a heavily optimized implementation of neural network operators underneath.

Wasm has been supported by Chrome, Safari, Firefox, and Edge since 2017, and is supported by 90% of devices worldwide.

The WebAssembly specification is evolving quickly and browsers are working hard to support a growing number of experimental features. You can visit this site to see which features are supported by your runtime, including:
  1. SIMD
    SIMD stands for Single Instruction, Multiple Data, which means that SIMD instructions operate on small fixed-size vectors of elements rather than individual scalars. The Wasm SIMD proposal makes the SIMD instructions supported by modern processors usable inside Web browsers, unlocking significant performance gains.

    Wasm SIMD is a phase 3 proposal, and is available via an origin trial in Chrome 84-86. This means developers can opt in their websites to Wasm SIMD and all their visitors will enjoy its benefits without needing to explicitly enable the feature in their browser settings. Besides Google Chrome, Firefox Nightly supports Wasm SIMD by default.

  2. Multi-threading
    Nearly all modern processors have multiple cores, each of which is able to carry out instructions independently and concurrently. WebAssembly programs can spread their work across cores via the threads proposal for performance. This proposal allows multiple Wasm instances in separate web workers to share a single WebAssembly.Memory object for fast communication between workers.

    Wasm threads is a phase 2 proposal, and has been available in Chrome desktop by default since version 74. There is an ongoing cross-browser effort to enable this functionality for mobile devices as well.
To see which browsers support SIMD, threads, and other experimental features, check out the WebAssembly roadmap.

Other improvements

Since the original launch of our Wasm backend in March, we have extended operator coverage and now support over 70 operators. Many of the new operators are accelerated through the XNNPACK library, and unlock support for additional models, like the HandPose model.

Looking ahead

We expect the performance of our Wasm backend to keep improving. We’re closely following the progress of several evolving specifications in WebAssembly, including flexible vectors for wider SIMD, quasi fused multiply-add, and pseudo-minimum and maximum instructions. We’re also looking forward to ES6 module support for WebAssembly modules. As with SIMD and multithreading, we intend to take advantage of these features as they become available with no implications for TF.js user code.

More information

Acknowledgements

We would like to thank Daniel Smilkov and Nikhil Thorat for laying the groundwork of the WebAssembly backend and the integration with XNNPACK, Matsvei Zhdanovich for collecting Pixel 4 benchmark numbers, and Frank Barchard for implementing low-level Wasm SIMD optimizations in XNNPACK.
Next post
 Supercharging the TensorFlow.js WebAssembly backend with SIMD and multi-threading

Posted by Ann Yuan and Marat Dukhan, Software Engineers at Google

In March we introduced a new WebAssembly (Wasm) accelerated backend for TensorFlow.js (scroll further down to learn more about Wasm and why this is important). Today we are excited to announce a major performance update: as of TensorFlow.js version 2.3.0, our Wasm backend has become up to 10X faster by leveraging SIMD (vector) inst…