How Rasa Open Source Gained Layers of Flexibility with TensorFlow 2.x
December 16, 2020

A guest post by Vincent D. Warmerdam and Vladimir Vlasov, Rasa


Rasa logo with bird

At Rasa, we are building infrastructure for conversational AI, used by developers to build chat- and voice-based assistants. Rasa Open Source, our cornerstone product offering, provides a framework for NLU (Natural Language Understanding) and dialogue management. On the NLU side we offer models that handle intent classification and entity detection using models built with Tensorflow 2.x.

In this article, we would like to discuss the benefits of migrating to the latest version of TensorFlow and also give insight into how some of the Rasa internals work.

A Typical Rasa Project Setup

When you’re building a virtual assistant with Rasa Open Source, you’ll usually begin by defining stories, which represent conversations users might have with your agent. These stories will serve as training data and you can configure them as yaml files. If we pretend that we’re making an assistant that allows you to buy pizzas online then we might have stories in our configuration that look like this:

yaml
version: "2.0"

stories:

- story: happy path
  steps:
  - intent: greet
  - action: utter_greet
  - intent: mood_great
  - action: utter_happy

- story: purchase path
  steps:
  - intent: greet
  - action: utter_greet
  - intent: purchase
    entities: 
product: “pizza”
  - action: confirm_purchase
  - intent: affirm
  - action: confirm_availability

These stories consist of intents and actions. Actions can be simple text replies, or they can trigger custom Python code (that checks a database, for instance). To define training data for each intent, you supply the assistant with example user messages, which might look something like:

yaml
version: "2.0"

nlu:
- intent: greet
  examples: |
    - hey
    - hello
    - hi
    - hello there
    - good morning

- intent: purchase
  examples: |
    - i’d like to buy a [veggie pizza](product) for [tomorrow](date_ref)
    - i want to order a [pizza pepperoni](product)
    - i’d want to buy a [pizza](product) and a [cola](product)
- ...

When you train an assistant using Rasa you’ll supply configuration files like those shown above. You can be very expressive in the types of conversations your agent can handle. Intents and actions are like lego bricks and can be combined expressively to cover many conversational paths. Once these files are defined they are combined to create a training dataset that the agent will learn from.

Rasa allows users to build custom machine learning pipelines to fit their datasets. That means you can incorporate your own (pre-trained) models for natural language understanding if you’d like. But Rasa also provides models, written in TensorFlow, that are specialized for these tasks.

Specific Model Requirements

You may have noticed that our examples include not just intents but also entities. When a user is interested in making a purchase, they (usually) also say what they’re interested in buying. This information needs to be detected when the user provides it. It’d be a bad experience if we needed to supply the user with a form to retrieve this information.

intent greet vs intent purchase

If you take a step back and think about what kind of model could work well here, you’ll soon recognize that it’s not a standard task. It’s not just that we have numerous labels at each utterance; we have multiple *types* of labels too. That means that we need models that have two outputs.

Types of labels texts to tokens

Rasa Open Source offers a model that can detect both intents and entities, called DIET. It uses a transformer architecture that allows the system to learn from the interaction between intents and entities. Because it needs to handle these two tasks at once, the typical machine learning pattern won’t work:

model.fit(X, y).predict(X)

You need a different abstraction.

Abstraction

This is where TensorFlow 2.x has made an improvement to the Rasa codebase. It is now much easier to customize TensorFlow classes. In particular, we’ve made a custom abstraction on top of Keras to suit our needs. One example of this is Rasa’s own internal `RasaModel.` We’ve added the base class’s signature below. The full implementation can be found here.

class RasaModel(tf.keras.models.Model):

	def __init__(
    	self,
    	random_seed: Optional[int] = None,
    	tensorboard_log_dir: Optional[Text] = None,
    	tensorboard_log_level:Optional[Text] = "epoch",
    	**kwargs,
	) -> None:
    	...

	def fit(
    	self,
    	model_data: RasaModelData,
    	epochs: int,
    	batch_size: Union[List[int], int],
    	evaluate_on_num_examples: int,
    	evaluate_every_num_epochs: int,
    	batch_strategy: Text,
    	silent: bool = False,
    	eager: bool = False,
	) -> None:
		...

This object is customized to allow us to pass in our own `RasaModelData` object. The benefit is that we can keep all the existing features that the Keras model object offers while we can override a few specific methods to suit our needs. We can run the model with our preferred data format while maintaining manual control over “eager mode,” which helps us debug.

These Keras objects are now a central API in TensorFlow 2.x, which made it very easy for us to integrate and customize.

Training Loop

To give another impression of how the code became simpler, let’s look at the training loop inside the Rasa model.

Python Pseudo-Code for TensorFlow 1.8

We’ve got a part of the code used for our old training loop listed below (see here for the full implementation). Note that it is using `session.run` to calculate the loss as well as the accuracy.

def train_tf_dataset(
	train_init_op: "tf.Operation",
	eval_init_op: "tf.Operation",
	batch_size_in: "tf.Tensor",
	loss: "tf.Tensor",
	acc: "tf.Tensor",
	train_op: "tf.Tensor",
	session: "tf.Session",
	epochs: int,
	batch_size: Union[List[int], int],
	evaluate_on_num_examples: int,
	evaluate_every_num_epochs: int,
)
	session.run(tf.global_variables_initializer())
	pbar = tqdm(range(epochs),desc="Epochs", disable=is_logging_disabled())

for ep in pbar:
  ep_batch_size=linearly_increasing_batch_size(ep, batch_size, epochs)
   session.run(train_init_op, feed_dict={batch_size_in: ep_batch_size})

    ep_train_loss = 0
    ep_train_acc = 0
    batches_per_epoch = 0
    while True:
    	  try:
        	_, batch_train_loss, batch_train_acc = session.run(
            	[train_op, loss, acc])
        	batches_per_epoch += 1
        	ep_train_loss += batch_train_loss
        	ep_train_acc += batch_train_acc

    	  except tf.errors.OutOfRangeError:
        	break

The train_tf_dataset function requires a lot of tensors as input. In TensorFlow 1.8, you need to keep track of these tensors because they contain all the operations you intend to run. In practice, this can lead to cumbersome code because it is hard to separate concerns.

Python Pseudo-Code for TensorFlow 2.x

In TensorFlow 2, all of this has been made much easier because of the Keras abstraction. We can inherit from a Keras class that allows us to compartmentalize the code much better. Here is the `train` method from Rasa’s DIET classifier (see here for the full implementation).

def train(
    	self,
    	training_data: TrainingData,
    	config: Optional[RasaNLUModelConfig] = None,
    	**kwargs: Any,
	) -> None:
    	"""Train the embedding intent classifier on a data set."""

    	model_data = self.preprocess_train_data(training_data)

    	self.model = self.model_class()(
        	config=self.component_config,
    	)

    	self.model.fit(
        	model_data,
        	self.component_config[EPOCHS],
        	self.component_config[BATCH_SIZES],
        	self.component_config[EVAL_NUM_EXAMPLES],
        	self.component_config[EVAL_NUM_EPOCHS],
        	self.component_config[BATCH_STRATEGY],
    	)

The object-oriented style of programming from Keras allows us to customize more. We’re able to implement our own `self.model.fit` in such a way that we don’t need to worry about the `session` anymore. We don’t even need to keep track of the tensors because the Keras API abstracts everything away for you.

If you’re interested in the full code, you can find the old loop here and the new loop here.

An Extra Layer of Features

It’s not just the Keras models where we apply this abstraction; we’ve also developed some neural network layers using a similar technique.

We’ve implemented a few custom layers ourselves. For example, we’ve got a layer called `DenseWithSparseWeights.` It behaves just like a dense layer, but we drop many weights beforehand to make it more sparse. Again we only need to inherit from the right class (tf.keras.layers.Dense) to create it.

normal dense vs sparse dense model

We’ve grown so fond of customizing that we’ve even implemented a loss function as a layer. This made a lot of sense for us, considering that losses can get complex in NLP. Many NLP tasks will require you to sample such that you also have labels of negative examples during training. You may also need to mask tokens during the process. We’re also interested in recording the similarity loss as well as the label accuracy. By just making our own layer, we are building components for re-use, and it is easy to maintain as well.

custom layer

Lessons Learned

Discovering this opportunity for customization made a massive difference for Rasa. We like to design our algorithms to be flexible and applicable in many circumstances, and we were happy to learn that the underlying technology stack allowed us to do so. We do have some advice for folks who are working on their TensorFlow migration:

  1. Start by thinking about what “lego bricks” you need in your application. This mental design step will make it much easier to recognize how you can leverage existing Keras/TensorFlow objects for your use-case.
  2. It can be tempting to try to immerse yourself by going for a deep dive immediately. Instead, it may help to start from a working example and drill down from there. TensorFlow is not an average Python package, and the internals can get complex. The Python code that you interact with needs to interact with C++ to keep the tensor operations performant. Once the code works, you’re at a much better place to start tuning/optimizing all the new TensorFlow version’s performance features.
Next post
How Rasa Open Source Gained Layers of Flexibility with TensorFlow 2.x

A guest post by Vincent D. Warmerdam and Vladimir Vlasov, Rasa
At Rasa, we are building infrastructure for conversational AI, used by developers to build chat- and voice-based assistants. Rasa Open Source, our cornerstone product offering, provides a framework for NLU (Natural Language Understanding) and dialogue management. On the NLU side we offer models that handle intent classification and en…