gennaio 28, 2019 —
Posted by Sara Robinson
Have you ever started building an ML model, only to realize you’re not sure which model architecture will yield the best results? Enter the TensorFlow-based AdaNet framework. With AdaNet, you can feed multiple models into AdaNet’s algorithm and it’ll find the optimal combination of all of them as part of the training process. I’ve been playing with it recently and have bee…
AutoEnsembleEstimator
. You can build any type of network with AdaNet (images, text, structured data, etc.). For this example, I’ll build a text classification model to predict the author given a few sentences of text they’ve written. In addition to AdaNet, here are the tools we’ll be using to build this model:import adanet
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
import urllib
from sklearn.preprocessing import LabelEncoder
urllib
, convert it to a Pandas DataFrame, shuffle the data, and preview it:urllib.request.urlretrieve('https://storage.googleapis.com/authors-training-data/data.csv', 'data.csv')
data = pd.read_csv('data.csv')
data = data.sample(frac=1) # Shuffles the data
data.head()
Then we’ll split it into train and test sets, using 80% of the data for training:train_size = int(len(data) * .8)
train_text = data['text'][:train_size]
train_authors = data['author'][:train_size]
test_text = data['text'][train_size:]
test_authors = data['author'][train_size:]
The labels are strings for each author, and I’ve encoded them as one-hot vectors using the Scikit Learn LabelEncoder utility. We can do this in just a few lines of code:encoder = LabelEncoder()
encoder.fit_transform(np.array(train_authors))
train_encoded = encoder.transform(train_authors)
test_encoded = encoder.transform(test_authors)
num_classes = len(encoder.classes_)
ndim_embeddings = hub.text_embedding_column(
"ndim",
module_spec="https://tfhub.dev/google/nnlm-en-dim128/1", trainable=False
)
encoder_embeddings = hub.text_embedding_column(
"encoder",
module_spec="https://tfhub.dev/google/universal-sentence-encoder/2", trainable=False)
Now we can define both Estimators that we’ll feed into our AdaNet model. Since this is a classification problem, we’ll use a DNNEstimator
for both:estimator_ndim = tf.contrib.estimator.DNNEstimator(
head=multi_class_head,
hidden_units=[64,10],
feature_columns=[ndim_embeddings]
)
estimator_encoder = tf.contrib.estimator.DNNEstimator(
head=multi_class_head,
hidden_units=[64,10],
feature_columns=[encoder_embeddings]
)
What’s happening here? hidden_units
tells TensorFlow the number of neurons our network will have in each layer. For each of these, it’ll have 64 in the first layer and 10 in the second. feature_columns
is a list of the features for our model. In this example we have only one (the sentence of the book).AutoEnsembleEstimator
which makes this pretty simple. It will take both estimators I’ve created, and incrementally create an ensemble by averaging the predictions of each model. For more customization, check out the adanet.subnetwork
Builder and Generator classes. With AutoEnsembleEstimator
, we can feed both of the models we’ve defined above into the ensemble in the candidate_pool
param:model_dir=os.path.join('/path/to/model/dir')
multi_class_head = tf.contrib.estimator.multi_class_head(
len(encoder.classes_),
loss_reduction=tf.losses.Reduction.SUM_OVER_BATCH_SIZE
)
estimator = adanet.AutoEnsembleEstimator(
head=multi_class_head,
candidate_pool=[
estimator_ndim,
estimator_encoder
],
config=tf.estimator.RunConfig(
save_summary_steps=1000,
save_checkpoints_steps=1000,
model_dir=model_dir
),
max_iteration_steps=5000
)
There’s a lot going on there, let’s break it down:head
is an instance of tf.contrib.estimator.Head
, and it tells our model how to compute loss and evaluation metrics for each possible ensemble. AdaNet calls these potential ensemble networks “candidates”. There are many different types of heads (for regression, multi-class classification, etc.). Here we’re using the multi_class_head
since there are more than 2 possible label classes in our model. For a model assigning multiple labels to one particular input, we’d use multi_label_head
.config
sets up some parameters for running our training job: how often we want to save model summaries and checkpoints, and the directory where TF should save them to. Keep in mind that if you’re training a model in Colab, saving checkpoints too frequently could eat up your available disk space.max_iteration_steps
tells AdaNet how many training steps to perform in a single iteration. An iteration refers to training for a group of candidates, so this number (along with total training steps which we’ll define later) tells AdaNet how often to generate new ensemble candidates.train_and_evaluate
function for this, which will run training and evaluation at the same time. In order to set this up, we need to write our training and evaluation input functions. Input functions handle feeding the data into our model. We’ll use the tf.data
API in our input functions. Even though we have two separate models with different feature columns, we can put both features in the same dict so we only need to write one input function: train_features = {
"ndim": train_text,
"encoder": train_text
}
def input_fn_train():
dataset = tf.data.Dataset.from_tensor_slices((train_features, train_authors))
dataset = dataset.repeat().shuffle(100).batch(64)
iterator = dataset.make_one_shot_iterator()
data, labels = iterator.get_next()
return data, labels
Our evaluation features and input function look very similar: eval_features = {
"ndim": test_text,
"encoder": test_reviews
}
def input_fn_eval():
dataset = tf.data.Dataset.from_tensor_slices((eval_features, test_authors))
dataset = dataset.batch(64)
iterator = dataset.make_one_shot_iterator()
data, labels = iterator.get_next()
return data, labels
We’re getting close now! The last thing we need to do before training is create or train and eval specs. You can think of this as wiring everything together — since we’re running training and evaluation in one go, these specs will tell our estimator which input function to run for each job: train_spec = tf.estimator.TrainSpec(
input_fn=input_fn_train,
max_steps=40000
)
eval_spec=tf.estimator.EvalSpec(
input_fn=input_fn_eval,
steps=None,
start_delay_secs=10,
throttle_secs=10
)
Remember when we defined max_iteration_steps
above? The max_steps
parameter in our TrainSpec
refers to the total number of steps to train for. This means we’ll have 8 iterations total (8 groups of ensemble candidates). Now it’s time to run training and evaluation: tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
setup.py
config.yaml
trainer/
model.py
__init__.py
You can call the trainer
directory anything you like — this is the name of the Python package we’ll be uploading to ML Engine with our model. __init__.py
is an empty file, and model.py
contains all of the code above. setup.py
contains the name and version of our package, along with any Python package dependencies we’re using to create the model. config.yaml
is where you specify any Cloud-specific parameters for training. These are things like whether you’ll make use of GPUs or TPUs for training, and how many workers you’ll need for your training job. All of the configuration options are listed here.
model.py
file mentioned above to export your model to Cloud Storage when it’s done training. If you don’t care about this right now, you can skip to the next section. We’ll export our model using the LatestExporter
class. To create an exporter, we’ll need to define a serving input function. This confused me at first, but it’s not too different from the other input functions we defined. It should return two things: the format of inputs our model should expect when it’s served, and the format of inputs the server should expect. In our model these are the same, but in some cases you may want to do some preprocessing on inputs before they’re fed into the model. Because ours are the same, the serving input function is pretty straightforward:
def serving_input_fn():
feature_placeholders = {
'ndim' : tf.placeholder(tf.string, [None]),
'encoder' : tf.placeholder(tf.string, [None])
}
return tf.estimator.export.ServingInputReceiver(feature_placeholders, feature_placeholders)
If the TF Hub modules we were using didn’t let us pass raw text directly and instead required that text be converted to integers, our input function would return two different objects here. With that function ready to go, we can define our exporter: exporter = tf.estimator.LatestExporter('exporter', serving_input_fn, exports_to_keep=None)
To call export()
, we’ll also need our model’s last checkpoint and the eval results from that checkpoint. We can get those with the following: latest_ckpt = tf.train.latest_checkpoint(model_dir)
last_eval = estimator.evaluate(
input_fn_eval,
checkpoint_path=latest_ckpt
)
exporter.export(estimator, model_dir, latest_ckpt, last_eval, is_the_final_export=True)
Woohoo! When this runs in ML Engine, it’ll save our final model. export JOB_ID=unique_job_name
export JOB_DIR=gs://your/gcs/bucket/path
export PACKAGE_PATH=trainer/
export MODULE=trainer.model
export REGION=your_cloud_project_region
Replace the strings above with the variables specific to your project. Then you’re ready to train with the following gcloud command: gcloud ml-engine jobs submit training $JOB_ID --package-path $PACKAGE_PATH --module-name $MODULE --job-dir $JOB_DIR --region $REGION --runtime-version 1.12 --python-version 3.5 --config config.yaml
If this executes correctly, you should see a message in the console that your job is queued. You can stream your logs from the command line, or navigate to the Jobs tab in ML Engine on your Cloud console: Run the following command to point TensorBoard to your log directory on Cloud Storage:
tensorboard --logdir gs://your/gcs/checkpoint/path
Then point your browser to localhost:6006
to view training progress, and navigate to the scalars tab: Confessions: I had avoided using TensorBoard until now (so many graphs can be intimidating!). But as you’ll soon see, TensorBoard makes it much easier to understand how your model is performing and it’s especially useful for AdaNet. We’ll focus only on the accuracy and adanet_loss
graphs here. Let’s start with accuracy, looking at the adanet_weighted_ensemble
graph: Remember that our model has 5000 steps per iteration, meaning every 5000 steps AdaNet will generate new candidate ensembles (with the exception of the first iteration, which includes only the individual networks). If you hover over the graph you can see which iteration and ensemble each line refers to: We can see that at this point in training, the second ensemble from iteration 7 (t6_DNNEstimator1/eval
) has the best accuracy. TensorBoard really shows us the power of combining models with AdaNet — as training continues, ensemble accuracy improves and is much higher than the accuracy of the individual networks on their own (the pink and light blue lines on the left in the graph above). The loss (or error) graph reveals similar trends: error steadily decreases as AdaNet generates and trains new ensembles.
Because it would be sad to leave you hanging without doing any predictions on our trained model, let’s make use of ML Engine’s local predict
to make a local prediction on our trained model from the command line. All we need to do is create a newline-delimited JSON file with an input we want a prediction for, following the same format as our serving input function.
Here’s an example:
{"encoder": "A strange land indeed! Could it be one with his native New England? Did Congress assemble from the Antipodes?", "ndim": "A strange land indeed! Could it be one with his native New England? Did Congress assemble from the Antipodes?"}
And then we can run the following command: gcloud ml-engine local predict --model-dir=gs://path/to/saved_model.pb --json-instances=path/to/test.json
This is the response: CLASS_IDS CLASSES PROBABILITIES
[1] [u'1'] [0.0043347785249352455, 0.8382837176322937, 0.12185576558113098, 0.025106186047196388, 0.010419543832540512]
This means our model has predicted there’s an 83% chance this was written by the author corresponding with the first index on our label array (we can get this by logging encoder.classes_
above), which is Churchill. That’s correct! AutoEnsembleEstimator
and train it on Cloud ML Engine. Want to learn more about what I covered here? Check out these resources: tf.train_and_evaluate
gennaio 28, 2019
—
Posted by Sara Robinson
Have you ever started building an ML model, only to realize you’re not sure which model architecture will yield the best results? Enter the TensorFlow-based AdaNet framework. With AdaNet, you can feed multiple models into AdaNet’s algorithm and it’ll find the optimal combination of all of them as part of the training process. I’ve been playing with it recently and have bee…