Using Model Card Toolkit for TF Model Transparency
November 18, 2020
Posted by Karan Shukla, Software Engineer, Google Research

Machine learning (ML) model transparency is important across a wide variety of domains that impact peoples’ lives, from healthcare to personal finance to employment. At Google, this desire for transparency led us to develop Model Cards, a framework for transparent reporting on ML model performance, provenance, ethical considerations and more. It can be time consuming, however, to compile the information necessary to create a useful Model Card. To address this, we recently announced the open-source launch of Model Card Toolkit (MCT), a collection of tools that supports ML developers in compiling the information that goes into a Model Card.

The toolkit consists of:

  • A JSON schema, which specifies the fields to include in the Model Card
  • A ModelCard data API to represent an instance of the JSON schema and visualize it as a Model Card
  • A component that uses the model provenance information stored with ML Metadata (MLMD) to automatically populate the JSON with relevant information

We wanted the toolkit to be modular so that Model Card creators can still leverage the JSON schema and ModelCard data API even if their modeling environment is not integrated with MLMD. In this post, we’ll show you how you can use these components to create a Model Card for a Keras MobileNetV2 model trained on ImageNet and fine-tuned on the cats_vs_dogs dataset available in TensorFlow Datasets (TFDS). While this model and use case may be trivial from a transparency standpoint, it allows us to easily demonstrate the components of MCT.

Model card for Fine-tuned MobileNetV2 Model for Cats vs Dogs
An example Model Card. Click here for a larger version.

Model Card Toolkit Walkthrough

You can follow along and run the code yourself in the Colab notebook. In this walkthrough, we’ll include some additional information about the considerations you’ll want to keep in mind while using the toolkit.

We begin by installing the Model Card Toolkit.

!pip install 'model-card-toolkit>=0.1.1,<0.2'

Now, we load both the MobileNetV2 model and the weights generated by fine-tuning the model on the cats_vs_dogs dataset. For more information on how we fine-tuned our model, you can see the TensorFlow tutorial on the topic.

URL = ''
BASE_PATH = tempfile.mkdtemp()
ZIP_PATH = os.path.join(BASE_PATH, '')
MODEL_PATH = os.path.join(BASE_PATH,'cats_vs_dogs_model')  
r = requests.get(URL, allow_redirects=True)
open(ZIP_PATH, 'wb').write(r.content)
with zipfile.ZipFile(ZIP_PATH, 'r') as zip_ref:
model = tf.keras.models.load_model(MODEL_PATH)

We also calculate the number of examples, storing it to the “examples” object, and the accuracy scores, disaggregated across class. We’ll use both accuracy and examples later to build graphs to display in our Model Card.

examples = cats_vs_dogs.get_data()
accuracy = compute_accuracy(examples['combined'])
cat_accuracy = compute_accuracy(examples['cat'])
dog_accuracy = compute_accuracy(examples['dog'])

Next, we’ll use the Model Card Toolkit to create our Model Card. The first step is to initialize a ModelCardToolkit object, which maintains assets including a Model Card JSON file and Model Card document. Call ModelCardToolkit.scaffold_assets() to generate these assets and return a ModelCard object.

model_card_dir = tempfile.mkdtemp()
mct = ModelCardToolkit(model_card_dir)
model_card = mct.scaffold_assets()

We then populate the Model Card’s fields. First, we’ll fill in the model_card.model_details section, which contains basic metadata fields.

We begin by specifying the model’s name, writing a brief description of the model in the overview section. = 'Fine-tuned MobileNetV2 Model for Cat vs. Dogs'
model_card.model_details.overview = (
   'This model distinguishes cat and dog images. It uses the MobileNetV2 '
   'architecture ( and is trained on the '
   'Cats vs Dogs dataset '
   '( This model '
   'performed with high accuracy on both Cat and Dog images.'

We provide the model’s owners, version, and references.

model_card.model_details.owners = [
 {'name': 'Model Cards Team', 'contact': ''}
model_card.model_details.version = {'name': 'v1.0', 'data': '08/28/2020'}
model_card.model_details.references = [

Finally, we share the model’s license information, and a url that future users can cite if they choose to reuse the model in the citation section.

model_card.model_details.license = 'Apache-2.0'
model_card.model_details.citation = ''

The model_card.quantitative_analysis field contains information about a model's performance metrics. Here, we’ve created some synthetic performance metric values for a hypothetical model built on our dataset.

model_card.quantitative_analysis.performance_metrics = [
 {'type': 'accuracy', 'value': accuracy},
 {'type': 'accuracy', 'value': cat_accuracy, 'slice': 'cat'},
 {'type': 'accuracy', 'value': dog_accuracy, 'slice': 'Dog'},

model_card.considerations contains qualitative information about your model. In particular, we recommend including some, or all of the following information:

Use cases: What are the intended use cases for this model? This is pretty straightforward for our model:

model_card.considerations.use_cases = [
   'This model classifies images of cats and dogs.'

Limitations: What technical limitations should users keep in mind? What kinds of data cause your model to fail, or underperform? In our case, examples that are not dogs or cats will cause our model to fail, so we’ve acknowledged this:

model_card.considerations.limitations = [
   'This model is not able to classify images of other animals.'

Ethical considerations: What ethical considerations should users be aware of when deciding whether or not to use the model? In what contexts could the model raise ethical concerns? What steps did you take to mitigate ethical concerns?

model_card.considerations.ethical_considerations = [{
       'While distinguishing between cats and dogs is generally agreed to be '
       'a benign application of machine learning, harmful results can occur '
       'when the model attempts to classify images that don’t contain cats or '
       'Avoid application on non-dog and non-cat images.'

Lastly, you can include graphs in your Model Card. We recommend including graphs that reflect the distributions in both your training and evaluation datasets, as well as graphs of your model’s performance on evaluation data. model_card has sections for each of these:

  • for training dataset statistics
  • for evaluation dataset statistics
  • for quantitative analysis of model performance

For this Model Card, we’ve included Matplotlib graphs of our validation set size and the model’s accuracy, both separated by class. Please visit the associated Colab if you’d like to see the Matplotlib code. If you are using ML Metadata, these graphs will be generated automatically (as demonstrated in this Colab). You can also use other visualization libraries, like Seaborn.

We add our graphs to our Model Card. = [
 {'name': 'Validation Set Size', 'image': validation_set_size_barchart},
] = [
 {'name': 'Accuracy', 'image': accuracy_barchart},

We’re finally ready to generate our Model Card! Let’s do that now. First we need to update the ModelCardToolkit object with the latest ModelCard.


Lastly, we generate the Model Card document in the chosen output format.

# Generate a model card document in HTML (default)
html_doc = mct.export_format()
# Display the model card document in HTML
# Generate a model card document in Markdown
md_path = os.path.join(model_card_dir, 'template/md/')
md_doc = mct.export_format(md_path, '')
# Display the model card document in Markdown
Model Card for Fine-tuned MobileNetV2 Model for Cats vs Dogs

And we’ve generated our Model Card! It’s a good idea to review the end product with your direct team, as well as members who are further away from the project. In particular, we recommend reviewing the qualitative fields such as “ethical considerations” to ensure you’ve adequately captured all potential use cases and their potential consequences. Does your Model Card answer the questions that people from different backgrounds might have? Is the language accessible to a developer? What about a policy maker, or a downstream user who might interact with the model? In the future, we hope to offer Model Card creators more guidance that they can use to help answer these questions and provide more thorough instructions on how to fill out the considerations fields.

Have questions? Have Model Cards to share? Let us know at!


Huanming Fang, Hui Miao, Karan Shukla, Dan Nanas, Catherina Xu, Christina Greer, Neoklis Polyzotis, Tulsee Doshi, Tiffany Deng, Margaret Mitchell, Timnit Gebru, Andrew Zaldivar, Mahima Pushkarna, Meena Natarajan, Roy Kim, Parker Barnes, Tom Murray, Susanna Ricco, Lucy Vasserman, and Simone Wu

Next post
Using Model Card Toolkit for TF Model Transparency

Posted by Karan Shukla, Software Engineer, Google Research

Machine learning (ML) model transparency is important across a wide variety of domains that impact peoples’ lives, from healthcare to personal finance to employment. At Google, this desire for transparency led us to develop Model Cards, a framework for transparent reporting on ML model performance, provenance, ethical considerations and m…