Adaptive Framework for On-device Recommendation
апреля 29, 2021

Posted by Ellie Zhou, Tian Lin, Shuangfeng Li and Sushant Prakash

Introduction & Motivation

We are excited to announce an adaptive framework to build on-device recommendation ML solutions with your own data and advanced user modeling architecture.

After the previously open-sourced on-device recommendation solution, we received a lot of interest from the community on introducing on-device recommender AI. Motivated and inspired by the feedback, we considered various use cases, and created a framework that could generate TensorFlow Lite recommendation models accommodating different kinds of data, features, and architectures to improve the previous models.

Benefits of this framework:

  • Flexible: The adaptive framework allows users to create a model in a configurable way.
  • Better model representation: To improve the previous model, our new recommendation models can utilize multiple kinds of features than a single feature.

Personalized recommendations play an increasingly important role in digital life nowadays. With more and more user actions being moved to edge devices, supporting recommenders on-device becomes an important direction. Compared with conventional pure server-based recommenders, the on-device solution has unique advantages, such as protecting users’ privacy, providing fast reaction to on-device user actions, leveraging lightweight TensorFlow Lite inference, and bypassing network dependency. We welcome you to try out this framework and create recommendation experience in your applications.

In this article, we will

  • Introduce the improved model architecture and framework adaptivity.
  • Walk you through how to utilize the framework step-by-step.
  • Provide insights based on research done with a public dataset.

Please find more details on TensorFlow website.

Model

A recommendation model typically predicts users’ future activities, based on users’ previous activities. Our framework supports models using context information to do the prediction, which can be described in the following architecture:

recommendation code structure
Figure 1: An illustration of the configurable recommendation model. Each module is created according to the user-defined configuration.

At the context side, representations of all user activities are aggregated by the encoder to generate the context embedding. We support three different types of encoders: 1) bag-of-words (a.k.a. BOW), 2) 1-D convolution (a.k.a. CNN), and 3) LSTM. At the label side, the label item as positive and all other items in the vocabulary as negatives will be encoded to vectors as well. Context and label embeddings are combined with a dot product and fed to the loss of softmax cross entropy.

Inside the framework, we encapsulate tf.keras layers for ContextEncoder, LabelEncoder and DotProductSimilarity as key components in RecommendationModel.

To model each user activity, we could use the ID of the activity item (called ID-based), or multiple features of the item (called feature-based), or a combination of both. The feature-based model utilizing multiple features to collectively encode users’ behavior. With our framework, you could create either ID-based or feature-based models in a configurable way.

Similar to the last version, a TensorFlow Lite model will be exported after training which can directly provide top-K predictions among the recommendation candidates.

Step-by-step

To demonstrate the new adaptive framework, we trained a on-device movie recommendation model with MovieLens dataset using multiple features, and integrated it in the demo app. (Both the model and the app are for demonstration purposes only.) The MovieLens 1M dataset contains ratings from 6039 users across 3951 movies, with each user rating only a small subset of movies.

Let’s look at how to use the framework step-by-step in this notebook.

(a) Environment preparation

git clone https://github.com/tensorflow/examples
cd examples/lite/examples/recommendation/ml/
pip install -r requirements.txt

(b) Prepare training data

Please prepare your training data reference to the movielens example generation file. Would like to note that TensorFlow Lite input features are expected to be FixedLenFeature, please pad or truncate your features, and set up feature lengths in input configuration. Feel free to use the following command to process the example dataset.

python -m data.example_generation_movielens \
 --data_dir=data/raw \
 --output_dir=data/examples \
 --min_timeline_length=3 \
 --max_context_length=10 \
 --max_context_movie_genre_length=32 \
 --min_rating=2 \
 --train_data_fraction=0.9 \
 --build_vocabs=True

MovieLens data contains ratings.dat (columns: UserID, MovieID, Rating, Timestamp), and movies.dat (columns: MovieID, Title, Genres). The example generation script takes both files, only keep ratings higher than 2, form user movie interaction timelines, sample activities as labels and previous user activities as the context for prediction. Please find the generated tf.Example:

0 : {
  features: {
    feature: {
      key  : "context_movie_id"
      value: { int64_list: { value: [ 1124, 2240, 3251, ..., 1268 ] } }
    }
    feature: {
      key  : "context_movie_rating"
      value: { float_list: {value: [ 3.0, 3.0, 4.0, ..., 3.0 ] } }
    }
    feature: {
      key  : "context_movie_year"
      value: { int64_list: { value: [ 1981, 1980, 1985, ..., 1990 ] } }
    }
    feature: {
      key  : "context_movie_id"
      value: { int64_list: { value: [ 1124, 2240, 3251, ..., 1268 ] } }
    }
    feature: {
      key  : "context_movie_genre"
      value: { bytes_list: { value: [ "Drama", "Drama", "Mystery", ..., "UNK" ] } }
    }
    feature: {
      key  : "label_movie_id"
      value: { int64_list: { value: [ 3252 ] }  }
    }
  }
}

(c) Create input config

Once data prepared, please set up input configuration, e.g. this is one example configuration for movielens movie recommendation model.

activity_feature_groups {
  features {
    feature_name: "context_movie_id"
    feature_type: INT
    vocab_size: 3953
    embedding_dim: 8
    feature_length: 10
  }
  features {
    feature_name: "context_movie_rating"
    feature_type: FLOAT
    feature_length: 10
  }
  encoder_type: CNN
}
activity_feature_groups {
  features {
    feature_name: "context_movie_genre"
    feature_type: STRING
    vocab_name: "movie_genre_vocab.txt"
    vocab_size: 19
    embedding_dim: 4
    feature_length: 32
  }
  encoder_type: CNN
}
label_feature {
  feature_name: "label_movie_id"
  feature_type: INT
  vocab_size: 3953
  embedding_dim: 8
  feature_length: 1
}

(d) Train model

The model trainer will construct the recommendation model based on the input config, with a simple interface.

python -m model.recommendation_model_launcher -- \
 --training_data_filepattern "data/examples/train_movielens_1m.tfrecord" \
 --testing_data_filepattern "data/examples/test_movielens_1m.tfrecord"\
 --model_dir "model/model_dir" \
 --vocab_dir "data/examples" \
 --input_config_file "configs/sample_input_config.pbtxt" \
 --batch_size 32 \
 --learning_rate 0.01 \
 --steps_per_epoch 2 \
 --num_epochs 2 \
 --num_eval_steps 2 \
 --run_mode "train_and_eval" \
 --gradient_clip_norm 1.0 \
 --num_predictions 10 \
 --hidden_layer_dims "32,32" \
 --eval_top_k "1,5" \
 --conv_num_filter_ratios "2,4" \
 --conv_kernel_size 4 \
 --lstm_num_units 16

Inside the recommendation model, core components are packaged up to keras layers (context_encoder.py, label_encoder.py and dotproduct_similarity.py), each of which could be utilized by itself. The following diagram illustrates the code structure:

An example of model architecture using context information to predict the next movie.
Figure 2: An example of model architecture using context information to predict the next movie. The inputs are the history of (a) movie IDs, (b) ratings and (c) genres, which is specified by the config mentioned above.

With the framework, you can directly execute the model training launcher with command:

python -m model.recommendation_model_launcher \
 --input_config_file "configs/sample_input_config.pbtxt" \
 --vocab_dir "data/examples" \
 --run_mode "export" \
 --checkpoint_path "model/model_dir/ckpt-1000" \
 --num_predictions 10 \
 --hidden_layer_dims "32,32" \
 --conv_num_filter_ratios "2,4" \
 --conv_kernel_size 4 \
 --lstm_num_units 16

The inference code after exporting to TensorFlow Lite can be found in the notebook, and we refer readers to check out the details there.

Framework Adaptivity

Our framework provides a protobuf interface, through which feature groups, types and other information can be configured to build models accordingly. With the interface, you can configure:

  • Features

    The framework generically categorizes features into 3 types: integer, string, and float. Embedding spaces will be created for both integer and string features, hence, embedding dimension, vocabulary name and size need to be specified. Float feature values will be directly used. Besides, for on-device models, we suggest to use fixed length features which can be configured directly.

message Feature {
  optional string feature_name = 1;

  // Supported feature types: STRING, INT, FLOAT.
  optional FeatureType feature_type = 2;

  optional string vocab_name = 3;

  optional int64 vocab_size = 4;

  optional int64 embedding_dim = 5;

  optional int64 feature_length = 6;
}
  • Feature groups

    One feature for one user activity may have multiple values. For instance, one movie can belong to multiple categories, each movie will have multiple genre feature values. To handle the different feature shapes, we introduced the “feature group” to combine features as a group . The features with the same length can be put in the same feature group to be encoded together. Inside input config, you can set up global feature groups and activity feature groups.

message FeatureGroup {
  repeated Feature features = 1;

  // Supported encoder types: BOW, CNN, LSTM.
  optional EncoderType encoder_type = 2;
}
  • Input config

    You can use the input config interface to set up all the features and feature groups together.

message InputConfig {
  repeated FeatureGroup global_feature_groups = 1;

  repeated FeatureGroup activity_feature_groups = 2;

  optional Feature label_feature = 3;
}

The input config is utilized by both input_pipeline.py and recommendation_model.py to process training data to tf.data.Dataset and construct the model accordingly. Inside ContexEncoder, FeatureGroupEncoders will be created for all feature groups, and used to compute feature group embeddings from input features. Concatenated feature group embeddings will be fed through top hidden layers to get the final context embedding. Worth noting that the final context embedding and label embedding dimensions should be equal.

Please check out the different model graphs produced with the different input configurations in the Appendix section.

Experiments and Analysis

We take this opportunity to analyze the performance of ID-based and feature-based models with various configurations, and provide some empirical results.

For the ID-based model, only movie_id is used as the input feature. And for the feature-based model, both movie_id and movie_genre features are used. Both types of models are experimented with 3 encoder types (BOW/CNN/LSTM) and 3 context history lengths (10/50/100).

Comparison between ID-based and Feature-based models.
Comparison between ID-based and Feature-based models. We compare them on BOW/CNN/LSTM encoders and context history lengths 10/50/100.

Since MovieLens dataset is an experimental dataset with ~4000 candidate movies and 19 movie genres, hence we scaled down embedding dimensions in the experiments to simulate the production scenario. For the above experiment result chart, ID embedding dimension is set to 8, and movie genre embedding dimension is set to 4. If we take the context10_cnn as an example, the feature-based model outperforms the ID-based model by 58.6%. Furthermore, the on average results show that feature-based models outperforms by 48.35%. Therefore, In this case, the feature-based model outperforms the ID-based model, because movie_genre feature introduces additional information to the model.

Besides, underlying features of candidate items mostly have a smaller vocabulary size, hence smaller embedding spaces as well. For instance,the movie genre vocabulary is much smaller than the movie ID vocabulary. In this case, utilizing underlying features could reduce the memory size of the model, which is more on-device friendly.

Acknowledgement

Special thanks to Cong Li, Josh Gordon, Khanh LeViet‎, Arun Venkatesan and Lawrence Chan for providing valuable suggestions to this work.

Next post
Adaptive Framework for On-device Recommendation

Posted by Ellie Zhou, Tian Lin, Shuangfeng Li and Sushant PrakashIntroduction & Motivation We are excited to announce an adaptive framework to build on-device recommendation ML solutions with your own data and advanced user modeling architecture. After the previously open-sourced on-device recommendation solution, we received a lot of interest from the community on introducing on-device reco…