February 07, 2023 — Posted by Hannes Hapke and Robert Crowe To produce production-level machine learning models, TensorFlow provides a portfolio of libraries under the umbrella of TensorFlow Extended (TFX). With just a pip install, TFX already includes a number of versatile pipeline components - referred to as the “standard components” - which provide most of the basic functionality for training and batch inference.…
Posted by Hannes Hapke and Robert Crowe
To produce production-level machine learning models, TensorFlow provides a portfolio of libraries under the umbrella of TensorFlow Extended (TFX). With just a pip install, TFX already includes a number of versatile pipeline components - referred to as the “standard components” - which provide most of the basic functionality for training and batch inference. The standard components will get most developers started, but developers often find the need for additional functionality, which can be added by developing custom components. Any TFX pipeline, regardless of which components are included, can be used with a number of pipeline orchestrators like Google Cloud Vertex AI Pipelines, Apache Beam, Apache Airflow, or Kubeflow Pipelines.
While the standard TFX components are great, a community of machine learning engineers from a number of companies including Twitter, Spotify, Digits, and Apple formed a TFX special interest group and started contributing new components, libraries, and examples to an extension of TFX called TFX-Addons.
TFX is an end-to-end platform for deploying production ML pipelines. When you're ready to move your models from research to production, you can use TFX to create and manage an automated production pipeline for both training and/or batch inference. A TFX pipeline is a sequence of components that implement an ML pipeline which is specifically designed for scalable, high-performance machine learning tasks. Components can often be built using the TFX libraries - TensorFlow Data Validation, TensorFlow Transform, and Tensorflow Model Analysis - which can also be used individually. Components can also be built to run completely customized code, and even to distribute processing across a compute cluster through Apache Beam.
TFX provides the following:
TFX is a planet-scale production learning toolkit based on TensorFlow. It provides a configuration framework and shared libraries to integrate common components needed to define, launch, and monitor your machine learning system.
TFX-Addons is a special interest group (SIG) for TFX users who are extending the standard set of components provided by Google’s TensorFlow team. The addons are implementations by other machine learning companies and developers which rely heavily on TFX for their production machine learning operations.
Common MLOps patterns, for example ingesting data into machine learning pipelines, are solved through TFX components. As an example, members of TFX-Addons developed and open-sourced a TFX component to ingest data from a Feast feature store, a component maintained by machine learning engineers at Twitter and Apple.
The TFX-Addons components and examples are accessible via a simple pip installation. To install the latest version, run the following:
pip install tfx-addons
To ensure you have a compatible version of dependencies for any given project, you can specify the project name as an extra requirement during install:
pip install tfx-addons[feast_examplegen]
To use TFX-Addons:
from tfx import v1 as tfx
import tfx_addons as tfxa
# Then you can easily load projects tfxa.{project_name}. Ex:
tfxa.feast_examplegen.FeastExampleGen(...)
The TFX-Addons components can be used in any TFX pipeline. Most components support all TFX orchestrators including Google Cloud’s Vertex Pipelines, Apache Beam, Apache Airflow, or Kubeflow Pipelines.
The list of components, libraries, and examples is constantly growing, with several new projects currently in development. As of this writing, these are the currently available components.
The Example Generator allows you to ingest data samples from a Feast Feature Store.
This component provides an exit handler for TFX pipelines which notifies the user about the final state of the pipeline (failed or succeeded) via a Slack message. If the pipeline fails, the component will provide the error message. The message component supports a number of message providers (e.g. Slack, stdout, logging providers) and can easily be extended to support Twilio. It also serves as an example of how to write exit handlers for TFX pipelines.
This component allows its users to update/change the schema produced by the SchemaGen component, and curate it based on domain knowledge. The curated schema can be used to stop pipelines if a feature drift is detected.
This component allows users to select features from datasets. This component is useful if you want to select features based on statistical feature selection metrics.
This component extends the standard TFX Evaluator component to support trained XGBoost models, in order to do deep analysis of model performance.
This component allows users to balance their training datasets by randomly undersampling or oversampling, reducing the data to the lowest- or highest-frequency class.
This component can be used instead of the standard TFX Transform component, and allows you to work with Pandas dataframes for your feature engineering. Processing is distributed using Beam for scalability.
This project helps users to publish trained models directly from a TFX pipeline to Firebase ML.
The HuggingFace Model Pusher (HFModelPusher
) pushes a blessed model to the HuggingFace Model Hub. Also, it optionally pushes an application to HuggingFace Space Hub.
The TFX-Addons SIG is all about sharing reusable components and best practices. If you are interested in MLOps, join our bi-weekly conference calls. It doesn’t matter if you are new to TFX or an experienced ML engineer, everyone is welcome and the SIG accepts open source contributions from all participants.
If you want to join our next meeting, sign up to our list group sig-tfx-addons@tensorflow.org.
Other resources:
If you’re already using TFX-Addons we’d love to hear from you! Use this form to send us your story!
Big thanks to all the open-source component contributions from following members:
Badrul Chowdhury, Daniel Kim, Fatimah Adwan, Gerard Casas Saez, Hannes Hapke, Marcus Chang, Kshitijaa Jaglan, Pratishtha Abrol, Robert Crowe, Nirzari Gupta, Thea Lamkin, Wihan Booyse, Michael Hu, Vulko Milev, Sayak Paul, Chansung Park, and all the other contributors! Open-source only happens when people like you contribute!
February 07, 2023 — Posted by Hannes Hapke and Robert Crowe To produce production-level machine learning models, TensorFlow provides a portfolio of libraries under the umbrella of TensorFlow Extended (TFX). With just a pip install, TFX already includes a number of versatile pipeline components - referred to as the “standard components” - which provide most of the basic functionality for training and batch inference.…