מאי 24, 2019 —
Posted by Jarek Wilkiewicz on behalf of the TFX team
If your code runs in production, you probably are already familiar with version control / software configuration management (SCM), continuous integration and continuous deployment (CI/CD) as well as many other software engineering best practices. These took years to develop and now we often take them for granted. Much like how writing an effici…
From Research to Production with TFX Pipelines and ML Metadata
מאי 24, 2019
Posted by Jarek Wilkiewicz on behalf of the TFX team
If your code runs in production, you probably are already familiar with version control / software configuration management (SCM), continuous integration and continuous deployment (CI/CD) as well as many other software engineering best practices. These took years to develop and now we often take them for granted. Much like how writing an efficient algorithm implementation is just the beginning a software engineer’s journey, machine learning (ML) model code typically represents only 5% of the overall system¹ required to deploy it to production. At Google, we’ve also been working on improving the remaining 95% over many years². A fruit of our labour, TensorFlow Extended (TFX³), aims to introduce the benefits of software engineering discipline to the fast growing space of ML. In an upcoming series of blog posts, we’ll highlight what’s new in TFX and show you how TFX can help you build and deploy your ML models to production environments.
When the TFX pipeline executes, ML Metadata (MLMD, another Google open source project) keeps track of artifacts pipeline components depend upon (e.g. training data) and produce (e.g. vocabularies and models). ML Metadata is available as a standalone library and has also been integrated with TFX components for your convenience. MLMD allows you to discover the lineage of an artifact (for example what data a model was trained on), find all artifacts created from an artifact (for example all models trained on a specific dataset), and enables many other use cases.
In our next TFX blog post, we will describe the TFX pipeline components in more detail. Until then, please try the TFX developer tutorial. You’ll follow a typical ML development process, starting by examining the dataset, and ending up with a complete working ML pipeline. If you have TFX questions please reach us on Stack Overflow, bug reports and pull requests are always welcome on GitHub, and we invite general discussion at tfx@tensorflow.org.
- - - - - - - -
[1] Sculley, D., Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo and Dan Dennison. “Hidden Technical Debt in Machine Learning Systems.” NIPS (2015).
[3] Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir, Vihan Jain, Levent Koc, Chiu Yuen Koo, Lukasz Lew, Clemens Mewald, Akshay Naresh Modi, Neoklis Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Euijong Whang, Martin Wicke, Jarek Wilkiewicz, Xin Zhang, and Martin Zinkevich. 2017. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘17). ACM, New York, NY, USA, 1387–1395. DOI: https://doi.org/10.1145/3097983.3098021.
Next post
TFX·
From Research to Production with TFX Pipelines and ML Metadata
מאי 24, 2019
—
Posted by Jarek Wilkiewicz on behalf of the TFX team
If your code runs in production, you probably are already familiar with version control / software configuration management (SCM), continuous integration and continuous deployment (CI/CD) as well as many other software engineering best practices. These took years to develop and now we often take them for granted. Much like how writing an effici…