דצמבר 16, 2020 —
                                          
A guest post by Vincent D. Warmerdam and Vladimir Vlasov, Rasa
 
At Rasa, we are building infrastructure for conversational AI, used by developers to build chat- and voice-based assistants. Rasa Open Source, our cornerstone product offering, provides a framework for NLU (Natural Language Understanding) and dialogue management. On the NLU side we offer models that handle intent classification and en…
 
A guest post by Vincent D. Warmerdam and Vladimir Vlasov, Rasa
At Rasa, we are building infrastructure for conversational AI, used by developers to build chat- and voice-based assistants. Rasa Open Source, our cornerstone product offering, provides a framework for NLU (Natural Language Understanding) and dialogue management. On the NLU side we offer models that handle intent classification and entity detection using models built with Tensorflow 2.x.
In this article, we would like to discuss the benefits of migrating to the latest version of TensorFlow and also give insight into how some of the Rasa internals work.
When you’re building a virtual assistant with Rasa Open Source, you’ll usually begin by defining stories, which represent conversations users might have with your agent. These stories will serve as training data and you can configure them as yaml files. If we pretend that we’re making an assistant that allows you to buy pizzas online then we might have stories in our configuration that look like this:
yaml
version: "2.0"
stories:
- story: happy path
  steps:
  - intent: greet
  - action: utter_greet
  - intent: mood_great
  - action: utter_happy
- story: purchase path
  steps:
  - intent: greet
  - action: utter_greet
  - intent: purchase
    entities: 
product: “pizza”
  - action: confirm_purchase
  - intent: affirm
  - action: confirm_availabilityThese stories consist of intents and actions. Actions can be simple text replies, or they can trigger custom Python code (that checks a database, for instance). To define training data for each intent, you supply the assistant with example user messages, which might look something like:
yaml
version: "2.0"
nlu:
- intent: greet
  examples: |
    - hey
    - hello
    - hi
    - hello there
    - good morning
- intent: purchase
  examples: |
    - i’d like to buy a [veggie pizza](product) for [tomorrow](date_ref)
    - i want to order a [pizza pepperoni](product)
    - i’d want to buy a [pizza](product) and a [cola](product)
- ...When you train an assistant using Rasa you’ll supply configuration files like those shown above. You can be very expressive in the types of conversations your agent can handle. Intents and actions are like lego bricks and can be combined expressively to cover many conversational paths. Once these files are defined they are combined to create a training dataset that the agent will learn from.
Rasa allows users to build custom machine learning pipelines to fit their datasets. That means you can incorporate your own (pre-trained) models for natural language understanding if you’d like. But Rasa also provides models, written in TensorFlow, that are specialized for these tasks.
You may have noticed that our examples include not just intents but also entities. When a user is interested in making a purchase, they (usually) also say what they’re interested in buying. This information needs to be detected when the user provides it. It’d be a bad experience if we needed to supply the user with a form to retrieve this information.
If you take a step back and think about what kind of model could work well here, you’ll soon recognize that it’s not a standard task. It’s not just that we have numerous labels at each utterance; we have multiple *types* of labels too. That means that we need models that have two outputs.
Rasa Open Source offers a model that can detect both intents and entities, called DIET. It uses a transformer architecture that allows the system to learn from the interaction between intents and entities. Because it needs to handle these two tasks at once, the typical machine learning pattern won’t work:
model.fit(X, y).predict(X)
You need a different abstraction.
Abstraction
This is where TensorFlow 2.x has made an improvement to the Rasa codebase. It is now much easier to customize TensorFlow classes. In particular, we’ve made a custom abstraction on top of Keras to suit our needs. One example of this is Rasa’s own internal `RasaModel.` We’ve added the base class’s signature below. The full implementation can be found here.
class RasaModel(tf.keras.models.Model):
	def __init__(
    	self,
    	random_seed: Optional[int] = None,
    	tensorboard_log_dir: Optional[Text] = None,
    	tensorboard_log_level:Optional[Text] = "epoch",
    	**kwargs,
	) -> None:
    	...
	def fit(
    	self,
    	model_data: RasaModelData,
    	epochs: int,
    	batch_size: Union[List[int], int],
    	evaluate_on_num_examples: int,
    	evaluate_every_num_epochs: int,
    	batch_strategy: Text,
    	silent: bool = False,
    	eager: bool = False,
	) -> None:
		...
This object is customized to allow us to pass in our own `RasaModelData` object. The benefit is that we can keep all the existing features that the Keras model object offers while we can override a few specific methods to suit our needs. We can run the model with our preferred data format while maintaining manual control over “eager mode,” which helps us debug.
These Keras objects are now a central API in TensorFlow 2.x, which made it very easy for us to integrate and customize.
Training Loop
To give another impression of how the code became simpler, let’s look at the training loop inside the Rasa model.
Python Pseudo-Code for TensorFlow 1.8
We’ve got a part of the code used for our old training loop listed below (see here for the full implementation). Note that it is using `session.run` to calculate the loss as well as the accuracy.
def train_tf_dataset(
	train_init_op: "tf.Operation",
	eval_init_op: "tf.Operation",
	batch_size_in: "tf.Tensor",
	loss: "tf.Tensor",
	acc: "tf.Tensor",
	train_op: "tf.Tensor",
	session: "tf.Session",
	epochs: int,
	batch_size: Union[List[int], int],
	evaluate_on_num_examples: int,
	evaluate_every_num_epochs: int,
)
	session.run(tf.global_variables_initializer())
	pbar = tqdm(range(epochs),desc="Epochs", disable=is_logging_disabled())
for ep in pbar:
  ep_batch_size=linearly_increasing_batch_size(ep, batch_size, epochs)
   session.run(train_init_op, feed_dict={batch_size_in: ep_batch_size})
    ep_train_loss = 0
    ep_train_acc = 0
    batches_per_epoch = 0
    while True:
    	  try:
        	_, batch_train_loss, batch_train_acc = session.run(
            	[train_op, loss, acc])
        	batches_per_epoch += 1
        	ep_train_loss += batch_train_loss
        	ep_train_acc += batch_train_acc
    	  except tf.errors.OutOfRangeError:
        	break
The train_tf_dataset function requires a lot of tensors as input. In TensorFlow 1.8, you need to keep track of these tensors because they contain all the operations you intend to run. In practice, this can lead to cumbersome code because it is hard to separate concerns.
Python Pseudo-Code for TensorFlow 2.x
In TensorFlow 2, all of this has been made much easier because of the Keras abstraction. We can inherit from a Keras class that allows us to compartmentalize the code much better. Here is the `train` method from Rasa’s DIET classifier (see here for the full implementation).
def train(
    	self,
    	training_data: TrainingData,
    	config: Optional[RasaNLUModelConfig] = None,
    	**kwargs: Any,
	) -> None:
    	"""Train the embedding intent classifier on a data set."""
    	model_data = self.preprocess_train_data(training_data)
    	self.model = self.model_class()(
        	config=self.component_config,
    	)
    	self.model.fit(
        	model_data,
        	self.component_config[EPOCHS],
        	self.component_config[BATCH_SIZES],
        	self.component_config[EVAL_NUM_EXAMPLES],
        	self.component_config[EVAL_NUM_EPOCHS],
        	self.component_config[BATCH_STRATEGY],
    	)
The object-oriented style of programming from Keras allows us to customize more. We’re able to implement our own `self.model.fit` in such a way that we don’t need to worry about the `session` anymore. We don’t even need to keep track of the tensors because the Keras API abstracts everything away for you. 
If you’re interested in the full code, you can find the old loop here and the new loop here.
It’s not just the Keras models where we apply this abstraction; we’ve also developed some neural network layers using a similar technique.
We’ve implemented a few custom layers ourselves. For example, we’ve got a layer called `DenseWithSparseWeights.` It behaves just like a dense layer, but we drop many weights beforehand to make it more sparse. Again we only need to inherit from the right class (tf.keras.layers.Dense) to create it.
We’ve grown so fond of customizing that we’ve even implemented a loss function as a layer. This made a lot of sense for us, considering that losses can get complex in NLP. Many NLP tasks will require you to sample such that you also have labels of negative examples during training. You may also need to mask tokens during the process. We’re also interested in recording the similarity loss as well as the label accuracy. By just making our own layer, we are building components for re-use, and it is easy to maintain as well.
Lessons Learned
Discovering this opportunity for customization made a massive difference for Rasa. We like to design our algorithms to be flexible and applicable in many circumstances, and we were happy to learn that the underlying technology stack allowed us to do so. We do have some advice for folks who are working on their TensorFlow migration:
 
דצמבר 16, 2020
 —
                                  
A guest post by Vincent D. Warmerdam and Vladimir Vlasov, Rasa
 
At Rasa, we are building infrastructure for conversational AI, used by developers to build chat- and voice-based assistants. Rasa Open Source, our cornerstone product offering, provides a framework for NLU (Natural Language Understanding) and dialogue management. On the NLU side we offer models that handle intent classification and en…