Introducing the TFX interactive notebook
November 25, 2019
Posted by Charles Chen, Joe Lee, and Kenny Song on behalf of the TFX team


Run TFX in Google Colab

TensorFlow Extended (TFX) is a platform for creating end-to-end machine learning pipelines. TFX was created by Google to provide the backbone of our own ML applications and services, and we’re steadily open-sourcing TFX to enable other companies and teams to easily build production-grade ML systems (learn more in this blog post).
TFX config
In TFX 0.15, we’re excited to release a faster way to start using TFX. Now you can build, debug, and run your TFX pipeline inside an interactive Google Colab or Jupyter notebook! Within this notebook environment, you can run TFX component-by-component, which makes it easier to iterate and experiment on your ML pipeline.

To get started, this new Colab-based TFX tutorial contains all TFX components, requires no setup, and runs all in your browser! It’s free to use, so try out TFX in a Colab and send us your feedback!

Run TFX in Google Colab

When you’re done developing your pipeline in-notebook, you can convert the notebook code to a pipeline file that can be orchestrated with Apache Airflow or Apache Beam (export to Kubeflow Pipelines coming soon). We recommend this export path for productionizing your TFX pipeline: notebooks are for experimentation, while pipelines are for production.

A key difference between experimentation and production is how you run components. In a production setting, an orchestration engine such as Apache Airflow will execute components for you. During experimentation, the human (you!) running the notebook cells is the orchestrator. The magic that enables this is the InteractiveContext, which manages component execution and state in the notebook.
context = InteractiveContext()
For example, here’s how we can run a StatisticsGen component in a notebook. First, we instantiate a StatisticsGen component and pass in our training data (usually ingested by another TFX component, such as ExampleGen).
statistics_gen = StatisticsGen(examples=example_gen.outputs['examples'])
Next, to run the component, we simply call context.run() and run that cell.
context.run(statistics_gen)
You’re done! As you might expect from the name, StatisticsGen will generate statistics, at the feature-level, over your dataset. After the cell finishes running, you can review these statistics with a built-in TFX visualization by calling context.show().
context.show(statistics_gen.outputs['statistics'])
The output of this function is an interactive visualization that you can explore to analyze the shapes and properties of your data.
interactive visualization
You can run all TFX components in this way, including training a TensorFlow model in a Trainer component, and performing deep analysis of your model’s performance with Tensorflow Model Analysis in an Evaluator component.

This enables fast, easy experimentation. For production, everything you write in the notebook can be converted into an orchestrate-able pipeline file by calling context.export_to_pipeline():
context.export_to_pipeline(notebook_filepath=_notebook_filepath,
                           export_filepath=_pipeline_export_filepath,
                           runner_type=_runner_type)
TFX provides many more components that you can use in your production ML pipelines. To learn more and try out all TFX components in a Colab notebook, check out the tutorial.

We'd also love your feedback – let us know what you think on the TFX mailing list.
Next post
Introducing the TFX interactive notebook

Posted by Charles Chen, Joe Lee, and Kenny Song on behalf of the TFX team


Run TFX in Google ColabTensorFlow Extended (TFX) is a platform for creating end-to-end machine learning pipelines. TFX was created by Google to provide the backbone of our own ML applications and services, and we’re steadily open-sourcing TFX to enable other companies and teams to easily build production-grade ML systems (le…