Sharing our Experience Upgrading OpenNMT to TensorFlow 2.0
November 15, 2019

A guest post by Guillaume Klein, research engineer at SYSTRAN.

OpenNMT-tf is a neural machine translation toolkit for TensorFlow released in 2017. At that time, the project used many features and capabilities offered by TensorFlow: training and evaluation with tf.estimator, variable scopes, graph collections, tf.contrib, etc. We enjoyed using these features together for more than 2 years.

We spent the last few months fully transitioning 10,000+ lines of code from TensorFlow 1.0 to TensorFlow 2.0, and its new recommended practices. We thought we’d write this article to share our experience with the upgrade, and a few reasons you may wish to consider doing so as well.

Transitioning iteratively

Upgrading any large piece of software is long and costly, especially if users are relying on it like OpenNMT-tf. In this context, we planned this transition early and shortly after the first TensorFlow 2.0 announcement, when the outline of this release started to be defined.

We chose a bottom-up approach to iteratively update all usages of the TensorFlow API. In particular, we regularly released code that was compatible with both TensorFlow 1.0 and 2.0 and set up automated tests accordingly. The compatibility modules tf.compat.v1 and tf.compat.v2 greatly helped during this iterative process.

Following TensorFlow 2.0 best practices, we decided to move from tf.estimator, even though this required a large code redesign. Fortunately, we found it relatively easy to write a custom training loop while still meeting performance requirements (with tf.function) and supporting advanced features such as multi-GPU training (with tf.distribute) and mixed precision training (with automatic mixed precision graph).

Gradient accumulation is another feature frequently used to train state-of-the-art sequence to sequence models such as the Transformer. It is now simple to implement in TensorFlow 2.0, thanks to eager execution. In our custom training loop, we split the training step into 2 functions: “forward” (accumulate the gradients) and “step” (apply the gradients).

Here’s how they look in OpenNMT-tf.

Forward:
@tf.function
def forward(source, target):
    """Forwards a training example into the model, computes the 
    loss, and accumulates the gradients.
    """
    logits = model(source, target, training=True)
    loss = model.compute_loss(logits, target)
    gradients = optimizer.get_gradients(loss, 
model.trainable_variables)
    if not accum_gradients:
        # Initialize the variables to accumulate the gradients.
        accum_gradients.extend([
            tf.Variable(tf.zeros_like(gradient), 
            trainable=False)
            for gradient in gradients])
    for accum_gradient, step_gradient in 
    zip(accum_gradients, gradients):
        accum_gradient.assign_add(step_gradient)
    return loss
Step:
@tf.function
def step():
    """Applies the accumulated gradients and advances 
the training step."""
    grads_and_vars = [
        (gradient / accum_steps, variable)
        for gradient, variable in zip(accum_gradients, 
model.trainable_variables)]
    optimizer.apply_gradients(grads_and_vars)
    for accum_gradient in accum_gradients:
        accum_gradient.assign(tf.zeros_like(accum_gradient))

for i, (source, target) in enumerate(dataset):
  forward(source, target)
  # Apply gradients every accum_steps examples.
  if (i + 1) % accum_steps == 0:
    step()

In practice, there are some additional details when running with distribution strategies but the overall logic is the same.

Managing changes

After refactoring your model in terms of Keras layers (or tf.Module) and using tf.train.Checkpoint to load and save checkpoints, it is likely that you will break compatibility with existing checkpoints. To mitigate this change in OpenNMT-tf, we silently convert old checkpoints on load with this process:
  1. Load the V1 variables with tf.train.load_checkpoint
  2. Create a new V2 model
  3. Map V1 variable names to V2 variables in the model
  4. Assign V2 variables and V2 optimizer slots with their V1 equivalent
  5. Save the V2 model with tf.train.Checkpoint
With this approach, the workflow of existing users is unchanged and they can continue their training as before.

Getting involved in the TensorFlow development

During this development, we found some bugs or incomplete features in the preview versions of TensorFlow 2.0. The TensorFlow team was always helpful and issues were generally resolved in a couple of days. Thanks to early adopters, there are now a lot of resources related to TensorFlow 2.0 which should help others upgrading.

By getting involved, we wanted to ensure that use cases related to sequence to sequence models were ready to be used efficiently in TensorFlow 2.0. This includes participating in the maintenance of the tensorflow_addons.seq2seq module that is the TensorFlow 2.0 equivalent of tf.contrib.seq2seq.

Why upgrade to TensorFlow 2.0?

Upgrading to TensorFlow 2.0 was both fun and challenging. Depending on your project, it can be a long process but the outcome is largely positive: the code is noticeably simpler while being more extensible. Developers will have a better time maintaining their project and adding new features while users will have a better understanding of the execution path thanks to eager mode and more likely to contribute back.

As library developers, with TensorFlow 2.0 we expect to iterate faster on new developments and provide a more consistent experience to our users. We are also excited to see the ecosystem grow with more reusable and composable modules developed by the community, for example in TensorFlow Addons.

You can learn more about OpenNMT-tf at https://github.com/OpenNMT/OpenNMT-tf

About OpenNMT

OpenNMT is an open source ecosystem for neural machine translation and neural sequence generation. It features 2 main implementations: OpenNMT-tf and OpenNMT-py powered by TensorFlow and PyTorch respectively. The project also includes components that cover other aspects of the NMT workflow such as the Tokenizer, a text tokenization library, and CTranslate2, a custom C++ inference engine.
Next post
Article Image Placeholder

A guest post by Guillaume Klein, research engineer at SYSTRAN.

OpenNMT-tf is a neural machine translation toolkit for TensorFlow released in 2017. At that time, the project used many features and capabilities offered by TensorFlow: training and evaluation with tf.estimator, variable scopes, graph collections, tf.contrib, etc. We enjoyed using these features together for more than 2 years.

We spent…