de desembre 12, 2018 —
                                          
Posted by Laurence Moroney
In many scenarios, data doesn’t come evenly divided into uniformly-shaped arrays that can be loaded into tensors. A classic case is in training and processing text. For example, if you look at the Text Classification tutorial that uses the IMDB dataset, you’ll see a major part of your data preparation is in shaping your data to a normalized size. In that case, every rev…
speech = tf.ragged.constant(
  [['All', 'the', 'world', 'is', 'a', 'stage'],
  ['And', 'all', 'the', 'men', 'and', 'women', 'merely', 'players'],
  ['They', 'have', 'their', 'exits', 'and', 'their', 'entrances']])<tf.RaggedTensor [['All', 'the', 'world', 'is', 'a', 'stage'], ['And', 'all', 'the', 'men', 'and', 'women', 'merely', 'players'],  ['They', 'have', 'their', 'exits', 'and', 'their', 'entrances']]>>>print(speech[0])
tf.Tensor(['All', 'the', 'world', 'is', 'a', 'stage'], shape=(6,), dtype=string)tf.ragged package also defines a number of operations that are specific to ragged tensors. For example, the tf.ragged.map_flat_values operation can be used to efficiently transform the individual values in a ragged tensor, while keeping its shape the same:
> print tf.ragged.map_flat_values(tf.strings.regex_replace,speech, pattern="([aeiouAEIOU])", rewrite=r"{\1}")
 But when concatenating ragged tensors, each individual row is joined, to form a single row with the combined length:
But when concatenating ragged tensors, each individual row is joined, to form a single row with the combined length:
 
import math
import tensorflow as tf
tf.enable_eager_execution()
# Set up the embeddingss
num_buckets = 1024
embedding_size = 16
embedding_table = 
    tf.Variable(
        tf.truncated_normal([num_buckets, embedding_size],
        stddev=1.0 / math.sqrt(embedding_size)),
        name="embedding_table")
# Input tensor.
queries = tf.ragged.constant([
    ['Who', 'is', 'Dan', 'Smith']
    ['Pause'],
    ['Will', 'it', 'rain', 'later', 'today']])
# Look up embedding for each word.  map_flat_values applies an operation to each value in a RaggedTensor.
word_buckets = tf.strings.to_hash_bucket_fast(queries, num_buckets)
word_embeddings = tf.ragged.map_flat_values(
        tf.nn.embedding_lookup, embedding_table, word_buckets)  # ①
# Add markers to the beginning and end of each sentence.
marker = tf.fill([queries.nrows()), 1], '#')
padded = tf.concat([marker, queries, marker], axis=1)           # ②
# Build word bigrams & look up embeddings.
bigrams = tf.string_join(
    [padded[:, :-1], padded[:, 1:]], separator='+')             # ③
bigram_buckets = 
    tf.strings.to_hash_bucket_fast(bigrams, num_buckets)
bigram_embeddings = tf.ragged.map_flat_values(
    tf.nn.embedding_lookup, embedding_table, bigram_buckets)   # ④
# Find the average embedding for each sentence
all_embeddings = 
    tf.concat([word_embeddings, bigram_embeddings], axis=1)    # ⑤
avg_embedding = tf.reduce_mean(all_embeddings, axis=1)         # ⑥
print(word_embeddings)
print(bigram_embeddings)
print(all_embeddings)
print(avg_embedding) 
tf.Tensor, since no time or space is wasted on the padding values; and are more flexible and convenient than using a tf.SparseTensor, since they support a wide variety of operations, with the correct semantics for variable-length lists.
 
de desembre 12, 2018
 —
                                  
Posted by Laurence Moroney
In many scenarios, data doesn’t come evenly divided into uniformly-shaped arrays that can be loaded into tensors. A classic case is in training and processing text. For example, if you look at the Text Classification tutorial that uses the IMDB dataset, you’ll see a major part of your data preparation is in shaping your data to a normalized size. In that case, every rev…