Extend your TFX pipeline with TFX-Addons

2月 07, 2023

Posted by Hannes Hapke and Robert Crowe

To produce production-level machine learning models, TensorFlow provides a portfolio of libraries under the umbrella of TensorFlow Extended (TFX). With just a pip install, TFX already includes a number of versatile pipeline components - referred to as the “standard components” - which provide most of the basic functionality for training and batch inference. The standard components will get most developers started, but developers often find the need for additional functionality, which can be added by developing custom components. Any TFX pipeline, regardless of which components are included, can be used with a number of pipeline orchestrators like Google Cloud Vertex AI Pipelines, Apache Beam, Apache Airflow, or Kubeflow Pipelines.

While the standard TFX components are great, a community of machine learning engineers from a number of companies including Twitter, Spotify, Digits, and Apple formed a TFX special interest group and started contributing new components, libraries, and examples to an extension of TFX called TFX-Addons.

What is TFX?

TFX is an end-to-end platform for deploying production ML pipelines. When you're ready to move your models from research to production, you can use TFX to create and manage an automated production pipeline for both training and/or batch inference. A TFX pipeline is a sequence of components that implement an ML pipeline which is specifically designed for scalable, high-performance machine learning tasks. Components can often be built using the TFX libraries - TensorFlow Data Validation, TensorFlow Transform, and Tensorflow Model Analysis - which can also be used individually. Components can also be built to run completely customized code, and even to distribute processing across a compute cluster through Apache Beam.

TFX provides the following:

A toolkit for building ML pipelines. TFX pipelines let you orchestrate your ML workflow on several platforms, such as: Apache Airflow, Apache Beam, and Kubeflow Pipelines. Learn more about TFX pipelines.
A set of standard components that you can use as a part of a pipeline, or as a part of your ML training script. TFX standard components provide proven functionality to help you get started building an ML process easily. Learn more about TFX standard components.
Libraries which provide the base functionality for many of the standard components. You can optionally use the TFX libraries to add this functionality to your own custom components, or use them separately. Learn more about the TFX libraries.

TFX is a planet-scale production learning toolkit based on TensorFlow. It provides a configuration framework and shared libraries to integrate common components needed to define, launch, and monitor your machine learning system.

What is TFX-Addons?

TFX-Addons is a special interest group (SIG) for TFX users who are extending the standard set of components provided by Google’s TensorFlow team. The addons are implementations by other machine learning companies and developers which rely heavily on TFX for their production machine learning operations.

Common MLOps patterns, for example ingesting data into machine learning pipelines, are solved through TFX components. As an example, members of TFX-Addons developed and open-sourced a TFX component to ingest data from a Feast feature store, a component maintained by machine learning engineers at Twitter and Apple.

How can you use the TFX-Addons components or examples?

The TFX-Addons components and examples are accessible via a simple pip installation. To install the latest version, run the following:

pip install tfx-addons

To ensure you have a compatible version of dependencies for any given project, you can specify the project name as an extra requirement during install:

pip install tfx-addons[feast_examplegen]

To use TFX-Addons:

from tfx import v1 as tfx
import tfx_addons as tfxa

# Then you can easily load projects tfxa.{project_name}. Ex:

tfxa.feast_examplegen.FeastExampleGen(...)

The TFX-Addons components can be used in any TFX pipeline. Most components support all TFX orchestrators including Google Cloud’s Vertex Pipelines, Apache Beam, Apache Airflow, or Kubeflow Pipelines.

Which additional components are currently available?

The list of components, libraries, and examples is constantly growing, with several new projects currently in development. As of this writing, these are the currently available components.

Feast Component

The Example Generator allows you to ingest data samples from a Feast Feature Store.

More information: tfxa.feast_examplegen

Message Exit Handler

This component provides an exit handler for TFX pipelines which notifies the user about the final state of the pipeline (failed or succeeded) via a Slack message. If the pipeline fails, the component will provide the error message. The message component supports a number of message providers (e.g. Slack, stdout, logging providers) and can easily be extended to support Twilio. It also serves as an example of how to write exit handlers for TFX pipelines.

More information: tfxa.message_exit_handler

Schema Curation Component

This component allows its users to update/change the schema produced by the SchemaGen component, and curate it based on domain knowledge. The curated schema can be used to stop pipelines if a feature drift is detected.

More information: tfxa.schema_curation

Feature Selection Component

This component allows users to select features from datasets. This component is useful if you want to select features based on statistical feature selection metrics.

More information: tfxa.feature_selection

XGBoost Evaluator Component

This component extends the standard TFX Evaluator component to support trained XGBoost models, in order to do deep analysis of model performance.

More information: tfxa.xgboost_evaluator

Sampling Component

This component allows users to balance their training datasets by randomly undersampling or oversampling, reducing the data to the lowest- or highest-frequency class.

More information: tfxa.sampling

Pandas Transform Component

This component can be used instead of the standard TFX Transform component, and allows you to work with Pandas dataframes for your feature engineering. Processing is distributed using Beam for scalability.

More information: tfxa.pandas_transform

Firebase Publisher

This project helps users to publish trained models directly from a TFX pipeline to Firebase ML.

More information: tfxa.firebase_publisher

HuggingFace Model Pusher

The HuggingFace Model Pusher (HFModelPusher) pushes a blessed model to the HuggingFace Model Hub. Also, it optionally pushes an application to HuggingFace Space Hub.

More information: tfxa.huggingface_pusher

How can you participate?

The TFX-Addons SIG is all about sharing reusable components and best practices. If you are interested in MLOps, join our bi-weekly conference calls. It doesn’t matter if you are new to TFX or an experienced ML engineer, everyone is welcome and the SIG accepts open source contributions from all participants.

If you want to join our next meeting, sign up to our list group sig-tfx-addons@tensorflow.org.

Other resources:

TFX-Addons Slack - join here
TFX-Addons Repository

Already using TFX-Addons?

If you’re already using TFX-Addons we’d love to hear from you! Use this form to send us your story!

Thanks to all Contributors

Big thanks to all the open-source component contributions from following members:
Badrul Chowdhury, Daniel Kim, Fatimah Adwan, Gerard Casas Saez, Hannes Hapke, Marcus Chang, Kshitijaa Jaglan, Pratishtha Abrol, Robert Crowe, Nirzari Gupta, Thea Lamkin, Wihan Booyse, Michael Hu, Vulko Milev, Sayak Paul, Chansung Park, and all the other contributors! Open-source only happens when people like you contribute!

Extend your TFX pipeline with TFX-Addons

Community · Tensorflow ·

Extend your TFX pipeline with TFX-Addons

2月 07, 2023 — Posted by Hannes Hapke and Robert Crowe To produce production-level machine learning models, TensorFlow provides a portfolio of libraries under the umbrella of TensorFlow Extended (TFX). With just a pip install, TFX already includes a number of versatile pipeline components - referred to as the “standard components” - which provide most of the basic functionality for training and batch inference.…