November 10, 2021 — Posted by Goldie Gadde and Josh Gordon for the TensorFlow team TensorFlow 2.7 is here! This release improves usability with clearer error messages, simplified stack traces, and adds new tools and documentation for users migrating to TF2. Improved Debugging Experience The process of debugging your code is a fundamental part of the user experience of a machine learning framework. In this release, …
Posted by Goldie Gadde and Josh Gordon for the TensorFlow team
TensorFlow 2.7 is here! This release improves usability with clearer error messages, simplified stack traces, and adds new tools and documentation for users migrating to TF2.
The process of debugging your code is a fundamental part of the user experience of a machine learning framework. In this release, we've considerably improved the TensorFlow debugging experience to make it more productive and more enjoyable, via three major changes: simplified stack traces, displaying additional context information in errors that originate from custom Keras layers, and a wide-ranging audit of all error messages in Keras and TensorFlow.
TensorFlow is now filtering by default the stack traces displayed upon error to hide any frame that originates from TensorFlow-internal code, and keep the information focused on what matters to you: your own code. This makes stack traces simpler and shorter, and it makes it easier to understand and fix the problems in your code.
If you're actually debugging the TensorFlow codebase itself (for instance, because you're preparing a PR for TensorFlow), you can turn off the filtering mechanism by calling tf.debugging.disable_traceback_filtering()
.
One of the most common use cases for writing low-level code is creating custom Keras layers, so we wanted to make debugging your layers as easy and productive as possible. The first thing you do when you're debugging a layer is to print the shapes and dtypes of its inputs, as well the value of its training
and mask
arguments. We now add this information automatically to all stack traces that originate from custom Keras layers.
See the effect of stack trace filtering and call context information display in practice in the image below:
Simplified stack traces in TensorFlow 2.7 |
Lastly, we've audited every error message in the Keras and TensorFlow codebases (thousands of error locations!) and improved them to make sure they follow UX best practices. A good error message should tell you what the framework expected, what you did that didn't match the framework's expectations, and should provide tips to fix the problem.
We have improved two common types of tf.function
error messages: runtime error messages and "Graph" tensor error messages, by including tracebacks pointing to the error source in the user code. For other vague and inaccurate tf.function
error messages, we also updated them to be more clear and accurate.
For the runtime error message caused by the user code
@tf.function
def f():
l = tf.range(tf.random.uniform((), minval=1, maxval=10, dtype=tf.int32))
return l[20]
A summary of the old error message looks like
# … Python stack trace of the function call …
InvalidArgumentError: slice index 20 of dimension 0 out of bounds.
[[node strided_slice (defined at <'ipython-input-8-250c76a76c0e'>:5) ]] [Op:__inference_f_75]
Errors may have originated from an input operation.
Input Source operations connected to node strided_slice:
range (defined at <ipython-input-8-250c76a76c0e >':4)
Function call stack:
f
A summary of the new error message looks like
# … Python stack trace of the function call …
InvalidArgumentError: slice index 20 of dimension 0 out of bounds.
[[node strided_slice
(defined at <ipython-input-3-250c76a76c0e>:5)
]] [Op:__inference_f_15]
Errors may have originated from an input operation.
Input Source operations connected to node strided_slice:
In[0] range (defined at <ipython-input-3-250c76a76c0e>:4)
In[1] strided_slice/stack:
In[2] strided_slice/stack_1:
In[3] strided_slice/stack_2:
Operation defined at: (most recent call last)
# … Stack trace of the error within the function …
>>> File "<ipython-input-3-250c76a76c0e>", line 7, in <module>
>>> f()
>>>
>>> File "<ipython-input-3-250c76a76c0e>", line 5, in f
>>> return l[20]
>>>
The main difference is runtime errors raised while executing a tf.function now include a stack trace which shows the source of the error, in the user’s code.
# … Original error message and information …
# … More stack frames …
>>> File "<ipython-input-3-250c76a76c0e>", line 7, in <module>
>>> f()
>>>
>>> File "<ipython-input-3-250c76a76c0e>", line 5, in f
>>> return l[20]
>>>
For the “Graph” tensor error messages caused by the following user code
x = None
@tf.function
def leaky_function(a):
global x
x = a + 1 # Bad - leaks local tensor
return a + 2
@tf.function
def captures_leaked_tensor(b):
b += x
return b
leaky_function(tf.constant(1))
captures_leaked_tensor(tf.constant(2))
A summary of the old error message looks like
# … Python stack trace of the function call …
TypeError: An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
For example, the following function will fail:
@tf.function
def has_init_scope():
my_constant = tf.constant(1.)
with tf.init_scope():
added = my_constant * 2
The graph tensor has name: add:0
A summary of the new error message looks like
# … Python stack trace of the function call …
TypeError: Originated from a graph execution error.
The graph execution error is detected at a node built at (most recent call last):
# … Stack trace of the error within the function …
>>> File <ipython-input-5-95ca3a98778f>, line 6, in leaky_function
# … More stack trace of the error within the function …
Error detected in node 'add' defined at: File "<ipython-input-5-95ca3a98778f>", line 6, in leaky_function
TypeError: tf.Graph captured an external symbolic tensor. The symbolic tensor 'add:0' created by node 'add' is captured by the tf.Graph being executed as an input. But a tf.Graph is not allowed to take symbolic tensors from another graph as its inputs. Make sure all captured inputs of the executing tf.Graph are not symbolic tensors. Use return values, explicit Python locals or TensorFlow collections to access it. Please see https://www.tensorflow.org/guide/function#all_outputs_of_a_tffunction_must_be_return_values for more information.
The main difference is errors for attempting to capture a tensor that was leaked from an unreachable graph now include a stack trace which shows where the tensor was created in the user’s code:
# … Original error message and information …
# … More stack frames …
>>> File <ipython-input-5-95ca3a98778f>, line 6, in leaky_function
Error detected in node 'add' defined at: File "<ipython-input-5-95ca3a98778f>", line 6, in leaky_function
TypeError: tf.Graph captured an external symbolic tensor. The symbolic tensor 'add:0' created by node 'add' is captured by the tf.Graph being executed as an input. But a tf.Graph is not allowed to take symbolic tensors from another graph as its inputs. Make sure all captured inputs of the executing tf.Graph are not symbolic tensors. Use return values, explicit Python locals or TensorFlow collections to access it. Please see https://www.tensorflow.org/guide/function#all_outputs_of_a_tffunction_must_be_return_values for more information.
User-defined types can make your projects more readable, modular, maintainable. TensorFlow 2.7.0 introduces the ExtensionType API, which can be used to create user-defined object-oriented types that work seamlessly with TensorFlow's APIs. Extension types are a great way to track and organize the tensors used by complex models. Extension types can also be used to define new tensor-like types, which specialize or extend the basic concept of "Tensor." To create an extension type, simply define a Python class with tf.experimental.ExtensionType as its base, and use type annotations to specify the type for each field:
class TensorGraph(tf.experimental.ExtensionType):
"""A collection of labeled nodes connected by weighted edges."""
edge_weights: tf.Tensor # shape=[num_nodes, num_nodes]
node_labels: typing.Mapping[str, tf.Tensor] # shape=[num_nodes]; dtype=any
class MaskedTensor(tf.experimental.ExtensionType):
"""A tensor paired with a boolean mask, indicating which values are valid."""
values: tf.Tensor
mask: tf.Tensor # shape=values.shape; false for missing/invalid values.
class CSRSparseMatrix(tf.experimental.ExtensionType):
"""Compressed sparse row matrix (https://en.wikipedia.org/wiki/Sparse_matrix)."""
values: tf.Tensor # shape=[num_nonzero]; dtype=any
col_index: tf.Tensor # shape=[num_nonzero]; dtype=int64
row_index: tf.Tensor # shape=[num_rows+1]; dtype=int64
The ExtensionType
base class adds a constructor and special methods based on the field type annotations (similar to typing.NamedTuple
and @dataclasses.dataclass
from the standard Python library). You can optionally customize the type by overriding these defaults, or adding new methods, properties, or subclasses.
Extension types are supported by the following TensorFlow APIs:
Models
and Layers
.
Datasets
, and returned by dataset Iterators
.
tf.hub
modules.
SavedModel
functions.
@tf.function
decorator.
tf.cond
and tf.while_loop
. This includes control flow operations added by autograph.
func
argument to tf.py_function
.
tf.matmul
, tf.gather
, and tf.reduce_sum
), using dispatch decorators.
For more information about extension types, see the Extension Type guide.
Note: The tf.experimental
prefix indicates that this is a new API, and we would like to collect feedback from real-world usage; barring any unforeseen design issues, we plan to migrate ExtensionType
out of the experimental package in accordance with the TF experimental policy.
To support users interested in migrating their workloads from TF1 to TF2, we have created a new Migrate to TF2
tab on the TensorFlow website, which includes updated guides and completely new documentation with concrete, runnable examples in Colab.
A new shim tool has been added which dramatically eases migration of variable_scope-based models to TF2. It is expected to enable most TF1 users to run existing model architectures as-is (or with only minor adjustments) in TF2 pipelines without having to rewrite your modeling code. You can learn more about it in the model mapping guide.
Since the last TensorFlow release, the community really came together to make many new models available on TensorFlow Hub. Now you can find models like MLP-Mixer, Vision Transformers, Wav2Vec2, RoBERTa, ConvMixer, DistillBERT, YoloV5 and many more. All of these models are ready to use via TensorFlow Hub. You can learn more about publishing your models here.
Check out the release notes for more information. To stay up to date, you can read the TensorFlow blog, follow twitter.com/tensorflow, or subscribe to youtube.com/tensorflow. If you’ve built something you’d like to share, please submit it for our Community Spotlight at goo.gle/TFCS. For feedback, please file an issue on GitHub or post to the TensorFlow Forum. Thank you!
November 10, 2021 — Posted by Goldie Gadde and Josh Gordon for the TensorFlow team TensorFlow 2.7 is here! This release improves usability with clearer error messages, simplified stack traces, and adds new tools and documentation for users migrating to TF2. Improved Debugging Experience The process of debugging your code is a fundamental part of the user experience of a machine learning framework. In this release, …