https://blog.tensorflow.org/2017/11/interactive-supervision-with-tensorboard.html

Community

https://1.bp.blogspot.com/-FXHFpjesWjA/Xdx2puILfKI/AAAAAAAABUQ/HJr2mXG4jxwts9GAhGe-GB3-gOz2G1WbwCLcBGAsYHQ/s1600/projector.png

November 25, 2017 —
*Guest post by Francois Luus from IBM Research AI*

Originally published at www.ibm.com**"Rather than spending a month figuring out an unsupervised machine learning problem, just label some data for a week and train a classifier"**

— Richard Socher (Chief Data Scientist, Salesforce) in 2017.

The TensorBoard projector features t-distributed Stochastic Neighborhood Embedding (t-SNE) for visualizi…

Interactive supervision with TensorBoard

TensorBoard Projector showing labeled t-SNE |

— Richard Socher (Chief Data Scientist, Salesforce) in 2017.

The TensorBoard projector features t-distributed Stochastic Neighborhood Embedding (t-SNE) for visualizing high-dimensional datasets, since it is a well-balanced dimensionality reduction algorithm that requires no labels yet reveals latent structure in many types of data. What happens when t-SNE can use partial labeling to recreate pairwise similarities in a lower dimensional embedding?

IBM Research AI implemented semi-supervision in TensorBoard t-SNE and contributed components required for interactive supervision to demonstrate cognitive-assisted labeling. A metadata editor, distance metric/space selection, neighborhood function selection, and t-SNE perturbation were added to TensorBoard in addition to semi-supervision for t-SNE. These components function in concert to apply a partial labeling that informs semi-supervised t-SNE to clarify the embedding and progressively ease the labeling burden.

Semi-supervised t-SNE (repeatedly turning supervision on/off) |

Imposing additional constraints by supervising t-SNE could make it harder to escape local optima, which is required e.g. to join two separated same-label clusters, especially when the Barnes-Hut approximation localizes attractive forces. Also, labeling becomes harder when same-label clusters collapse, so a method is required to kick the embedding out of its local optimum.

t-SNE perturb with random walks |

Previously, only cosine and Euclidean metrics in the high-dimensional input space were available to select neighborhoods. These distance metrics have been expanded to include use in the PCA and t-SNE embedding spaces, which is required for multi-sample labeling in the semi-supervised setting.

New options for distance metric/space and neighborhood selection functions |

Labeling datasets is normally a very time-consuming, unenviable task, but one that usually cannot be escaped. Labeling facilitates the use of supervised machine learning, but why not use machine learning to facilitate minimum supervision labeling? Of course, transfer learning, zero-shot or one-shot learning could be used to circumvent the need for labels all together, but these rely on assumptions that will typically not hold for most real-world data.

Provided labels can also be explicitly used to train a feature extractor and classifier that is able to make increasingly confident label recommendations. Recognize however how t-SNE can present an initial view to the user that is amenable to clustering, and that the single global objective function is harnessed to help solve the minimum supervision problem in an elegant and self-contained manner, adhering to the philosophy of simplicity.

EMNIST Letters is a 26-class dataset with 411,302 samples for which a 85.15% accuracy is achieved with an OPIUM-based classifier [3], though we use only about 2000 stratified samples for the labeling exercise. This is a good dataset to demonstrate labeling on, as the sample images are small, familiar and easily distinguishable by the human eye. The bottleneck thus becomes the labeling system, and the challenge is to learn as much from every human click/keypress so as to require the least number of interactions to obtain a decent labeled sample size for every class.

Cognitive-assisted labeling of EMNIST Letters using interactive supervision in TensorBoard. |

We represent signals as small square images that are depictions of spectrograms, or a time-vs-frequency plot that can explain the frequency content and possible nature of the signal. So now if we can visualize signals, we can use TensorBoard interactive labeling to good effect as sample similarity can easily be seen which makes it easy to delineate good clusters.

Quickly labeling some messy real-world data |

Remaining unlabeled samples can be explored as possible anomalies which may require follow-up measurements. You will notice some strange looking signals in the latter part of the video.

- An initial cluster-like view is presented that makes it easy to pick homogeneous clusters for labeling.
- With every labeling operation more samples are compacted into labeled clusters, which organizes the representation so that remaining unlabeled samples are much easier to see and get to. As the curse of dimensionality is solved here, embedding space comes at a premium and has to be recovered at all cost.
- After a sufficient labeling the remaining unlabeled samples are likely outliers which can be explored in terms of content and context in relation to common classes.

[2] Zhirong Yang, Jaakko Peltonen, and Samuel Kaski. “Optimization equivalence of divergences improves neighbor embedding”. International Conference on Machine Learning. 2014.

[3] Gregory Cohen, Saeed Afshar, Jonathan Tapson, and AndrĂ© van Schaik. “EMNIST: an extension of MNIST to handwritten letters.”

Next post

Community

Interactive supervision with TensorBoard

November 25, 2017
—
*Guest post by Francois Luus from IBM Research AI*

Originally published at www.ibm.com**"Rather than spending a month figuring out an unsupervised machine learning problem, just label some data for a week and train a classifier"**

— Richard Socher (Chief Data Scientist, Salesforce) in 2017.

The TensorBoard projector features t-distributed Stochastic Neighborhood Embedding (t-SNE) for visualizi…

Build, deploy, and experiment easily with TensorFlow