Introducing the WebAssembly backend for TensorFlow.js
三月 11, 2020
Posted by Daniel Smilkov, Nikhil Thorat, and Ann Yuan, Software Engineers at Google

We’re happy to announce that TensorFlow.js now provides a WebAssembly (WASM) backend for both the browser and for Node.js! This backend is an alternative to the WebGL backend, bringing fast CPU execution with minimal code changes. This backend helps improve performance on a broader set of devices, especially lower-end mobile devices that lack WebGL support or have a slow GPU. It uses the XNNPack library to accelerate the operations.

Installation

There are two ways to use the new WASM backend:
  1. With NPM
    // Import @tensorflow/tfjs or @tensorflow/tfjs-core
    const tf = require('@tensorflow/tfjs');
    // Add the WASM backend to the global backend registry.
    require('@tensorflow/tfjs-backend-wasm');
     
    // Set the backend to WASM and wait for the module to be ready.
    tf.setBackend('wasm').then(() => main());
    The library expects the WASM binary to be relative to the main JS file. If you’re using a bundler such as parcel or webpack, you may need to manually indicate the location of the WASM binary with our setWasmPath helper:
    import {setWasmPath} from '@tensorflow/tfjs-backend-wasm';
    setWasmPath(yourCustomPath);
    tf.setBackend('wasm').then(() => {...});
    See the “Using bundlers” section in our README for more information.
  2. With script tags
    <!-- Import @tensorflow/tfjs or @tensorflow/tfjs-core -->
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
     
    <!-- Adds the WASM backend to the global backend registry -->
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-wasm/dist/tf-backend-wasm.js"></script>
     
    <script>
      tf.setBackend('wasm').then(() => main());
    </script>
    NOTE: TensorFlow.js defines a priority for each backend and will automatically choose the best supported backend for a given environment. Today, WebGL has the highest priority, followed by WASM, then the vanilla JS backend. To always use the WASM backend, we need to explicitly call `tf.setBackend(‘wasm’)`.

Demo

Check out the face detection demo (using the MediaPipe BlazeFace model) that runs on the WASM backend. For more details about the model, see this blog post.

Why WASM?

WASM is a cross-browser, portable assembly, and binary format for the web that brings near-native code execution speed on the web. It was introduced in 2015 as a new web-based binary format, providing programs written in C, C++, or Rust, a compilation target for running on the web. WASM has been supported by Chrome, Safari, Firefox, and Edge since 2017, and is supported by 90% of devices worldwide.

Performance

Versus JavaScript: WASM is generally much faster than JavaScript for numeric workloads common in machine learning tasks. Additionally, WASM can be natively decoded up to 20x faster than JavaScript can be parsed. JavaScript is dynamically typed and garbage collected, which can cause significant non-deterministic slowdowns at runtime. Additionally, modern JavaScript libraries (such as TensorFlow.js) use compilation tools like TypeScript and ES6 transpilers that generate ES5 code (for wide browser support) that is slower to execute than vanilla ES6 JavaScript.

Versus WebGL: For most models, the WebGL backend will still outperform the WASM backend, however WASM can be faster for ultra-lite models (less than 3MB and 60M multiply-adds). In this scenario, the benefits of GPU parallelization are outweighed by the fixed overhead costs of executing WebGL shaders. Below we provide guidelines for finding this line. However, there is a WASM extension proposal to add SIMD instructions, allowing multiple floating point operations to be vectorized and executed in parallel. Preliminary tests show that enabling these extensions brings 2-3x speedup over WASM today. Keep an eye out for this to land in browsers! It will automatically be turned on for TensorFlow.js.

Portability and Stability

When it comes to machine learning, numerical precision matters. WASM natively supports floating point arithmetic, whereas the WebGL backend requires the OES_texture_float extension. Not all devices support this extension, which means a GPU-accelerated TensorFlow.js isn’t supported on some devices (e.g. older mobile devices where WASM is supported).

Moreover, GPU drivers can be hardware-specific and different devices can have precision problems. On iOS, 32 bit floats aren’t supported on the GPU so we fall back to 16 bit floats, causing precision problems. In WASM, computation will always happen in 32 bit floats and thus have precision parity across all devices.

When should I use WASM?

In general, WASM is a good choice when models are smaller, if you care about wide device support, or if your project is sensitive to numerical stability. WASM, however, doesn’t have parity with our WebGL backend. If you are using the WASM backend and need an op to be implemented, feel free to file an issue on Github. To address the needs for production use-cases, we prioritized inference over training support. For training models in the browser, we recommend using the WebGL backend.

In Node.js, the WASM backend is a great solution for devices that don’t support the TensorFlow binary or you don’t want to build it from source.

The table below shows inference times (in milliseconds) in Chrome on a 2018 MacBook Pro (Intel i7 2.2GHz, Radeon 555X) for several of our officially supported models across the WebGL, WASM, and plain JS (CPU) backends.
We observe the WASM backend to be 10-30x faster than the plain JS (CPU) backend across our models. Comparing WASM to WebGL, there are two main takeaways:
  1. WASM is on-par, or faster than WebGL for ultra-lite models like MediaPipe’s BlazeFace and FaceMesh.
  2. WASM is 2-4X slower than WebGL for medium-sized edge models like MobileNet, BodyPix and PoseNet.

Looking ahead

We believe WASM will be an increasingly preferred backend. In the last year we have seen a wave of production quality ultra-light models designed for edge devices (e.g. MediaPipe’s BlazeFace and FaceMesh), for which the WASM backend is ideally suited.

In addition, new extensions such as SIMD and threads are actively being developed which will enable further acceleration in the future.

SIMD / QFMA

There is a WASM extension proposal to add SIMD instructions. Today, Chrome has partial support for SIMD under an experimental flag, Firefox and Edge status is in development, while Safari hasn’t given any public signal. SIMD is hugely promising. Benchmarks with SIMD-WASM on popular ML models show 2-3X speedup over non-SIMD WASM.

In addition to the original SIMD proposal, the LLVM WASM backend recently got support for experimental QFMA SIMD instructions that should further improve performance of kernels. Benchmarks on popular ML models show QFMA SIMD giving an additional 26-50% speedup over regular SIMD.

The TF.js WASM backend will take advantage of SIMD through the XNNPACK library, which includes optimized micro-kernels for WASM SIMD. When SIMD lands, this will be invisible to the TensorFlow.js user.

Multithreading

The WASM spec recently got a thread and atomics proposal with the goal to speed up multi-threaded applications. The proposal is in early stage, meant to seed a future W3C Working Group. Notably, Chrome 74+ has support for WASM threads enabled by default.

When the threading proposal lands, we will be ready to take advantage of threads through the XNNPACK library with no changes to TensorFlow.js user code.

More information

  • If you are interested in learning more, you can read our WebAssembly guide.
  • Learn more about WebAssembly by checking out this collection of resources by the Mozilla Developer Network.
  • We’d appreciate your feedback and contributions via issues and PRs on GitHub!
Next post
Introducing the WebAssembly backend for TensorFlow.js

Posted by Daniel Smilkov, Nikhil Thorat, and Ann Yuan, Software Engineers at Google

We’re happy to announce that TensorFlow.js now provides a WebAssembly (WASM) backend for both the browser and for Node.js! This backend is an alternative to the WebGL backend, bringing fast CPU execution with minimal code changes. This backend helps improve performance on a broader set of devices, especially lower…