https://blog.tensorflow.org/2020/05/galaxy-zoo-classifying-galaxies-with-crowdsourcing-and-active-learning.html

Community

https://2.bp.blogspot.com/-Wl0XidM7kzg/XsVtiiXMp5I/AAAAAAAADGw/-HJ71StSypctOyt1RUzRJ2fRUCZY_EXCACLcBGAsYHQ/s1600/galaxyUI.png

May 21, 2020 —
*A guest article by Mike Walmsley, University of Oxford*

The way we do science is changing; there’s exponentially more data every day but around the same number of scientists. The traditional approach of collecting data samples, looking through them, and drawing some conclusions about each one is often inadequate.

One solution is to deploy algorithms to process the data automatically. Another solut…

Galaxy Zoo: Classifying Galaxies with Crowdsourcing and Active Learning

The way we do science is changing; there’s exponentially more data every day but around the same number of scientists. The traditional approach of collecting data samples, looking through them, and drawing some conclusions about each one is often inadequate.

One solution is to deploy algorithms to process the data automatically. Another solution is to deploy more eyeballs: recruit members of the public to join in and help. I work on the intersection between the two - combining crowdsourcing and machine learning to do better science than with either alone.

In this article, I want to share how I’ve been using crowdsourcing and machine learning to investigate how galaxies evolve by classifying millions of galaxy images. Along the way, I’ll share some techniques we use to train CNNs that make predictions with uncertainty. I’ll also explain how to use those predictions to do active learning: labelling only the data which would best help you improve your models.

Galaxy Zoo is a website that asks members of the public to classify galaxies for us. We show you a galaxy, and we ask simple questions about what you can see, like – is the galaxy smooth, or featured? As you answer, we lead you down a decision tree where the questions depend on how you’ve previously responded.

The Galaxy Zoo UI. Check it out, and join in with the science, here. |

To keep up, we used TensorFlow to build a galaxy classifier. Other researchers have used the responses we’ve collected to train convolutional neural networks (CNNs) – a type of deep learning model tailored for image recognition. However, traditional CNNs have a drawback; they don’t easily handle uncertainty.

Training a CNN to solve a regression problem by predicting a value for each label and minimising the mean squared error, as is common, implicitly assumes that all labels are equally uncertain – which is definitely not the case for Galaxy Zoo. Further, the CNN only gives a ‘best guess’ answer with no error bars – making it difficult to draw scientific conclusions.

In our paper, we use Bayesian CNNs for morphology classification. Bayesian CNNs provide two key improvements:

- They account for varying uncertainty when learning from volunteer responses.

- They predict full posteriors over the morphology of each galaxy.

Training neural networks is much like any other fitting problem: you tweak the model to match the observations. If you are equally confident in all your collected labels, you can just minimise the difference (e.g. mean squared error) between your predictions and the observed values. However for Galaxy Zoo, many labels are more confident than others.

If I observe that, for some galaxy, 30% of volunteers say “bar”, my confidence in that 30% depends heavily on how many people replied – was it 4 or 40? Instead, we predict the probability that a typical volunteer will say “Bar”, and minimise how surprised we should be given the total number of volunteers who replied.

This way, our model understands that errors on galaxies where many volunteers replied are worse than errors on galaxies where few volunteers replied – letting it learn from every galaxy.

In our case, we can model our surprise with the Binomial distribution by recognising that k “Bar” responses from N volunteers is much like k successes from N independent trials.

`loss = tf.reduce_mean(binomial_loss(labels, scalar_predictions))`

Where `binomial_loss` calculates the surprise (negative log likelihood) of the observed labels given our model predictions:
In TF, we can calculate this with:
```
def binomial_loss(observations, est_prob_success):
one = tf.constant(1., dtype=tf.float32)
# to avoid calculating log 0
epsilon = tf.keras.backend.epsilon()
# multiplication in tf requires floats
k_successes = tf.cast(observations[:, 0], tf.float32)
n_trials = tf.cast(observations[:, 1], tf.float32)
# binomial negative log likelihood, dropping (fixed) combinatorial terms
return -( k_successes * tf.log(est_prob_success + epsilon) + (n_trials - k_successes) * tf.log(one - est_prob_success + epsilon )
```

Our model now makes probabilistic predictions, but what if we had trained a different model? It would make slightly different probabilistic predictions. To be Bayesian, we need to

At train time, dropout reduces overfitting by “approximately combining exponentially many different neural network architectures efficiently” (Srivastava 2014). This approximates the Bayesian approach of treating the network weights as random variables to be marginalised over. By also applying dropout at test time, we can exploit this idea of approximating many models to also make Bayesian predictions (Gal 2016).

Here’s a TF 2.0 example using the Subclassing API:

```
from tensorflow.keras import layers, Model
class SimpleClassifier(Model):
def __init__(self):
super(SimpleClassifier, self).__init__()
self.conv1 = layers.Conv2D(32, 3, activation='relu')
self.flatten = layers.Flatten()
self.d1 = layers.Dense(128, activation='relu')
self.dropout1 = layers.Dropout(rate=0.5)
self.d2 = layers.Dense(2, activation='softmax')
def call(self, x, training):
x = self.conv1(x)
x = self.flatten(x)
x = self.d1(x)
if training: # dropout typically applied only at train time
x = self.dropout1(x)
return self.d2(x)
```

Switching on test-time dropout actually involves less code:
```
def call(self, x): # no ‘training’ argument required
x = self.conv1(x)
x = self.flatten(x)
x = self.d1(x)
x = self.dropout1(x) # dropout always on
return self.d2(x)
```

Below, you can see our Bayesian CNN in action. Each row is a galaxy (shown to the left). In the central column, our CNN makes a single probabilistic prediction (the probability that a typical volunteer would answer “Bar”). We can interpret that as a posterior for the probability that k of N volunteers would say “Bar” – shown in black. On the right, we marginalise over many CNNs using dropout. Each CNN posterior (grey) is different, but we can marginalise over them to get the posterior over many CNNs (green) – our Bayesian posterior.
the paper (and check out the code).
Ideally we would only show volunteers the images that the model would find most informative. The model should be able to ask, “Hey, these galaxies would be really helpful to learn from; can you label them for me please?” Then the humans would label them and the model would retrain – this is active learning. In our experiments, applying active learning reduces the number of galaxies needed to reach a given performance level by up to 35-60%.

We can use our posteriors to work out which galaxies are most informative. Remember that we use dropout to approximate training many models (see above). We show in the paper that informative galaxies are galaxies where those models confidently disagree.

Why? We often hold our strongest opinions where we are least informed - and so do our CNN (Hendrycks 2016). Without a basis in evidence, different CNN will often disagree confidently.

Formally, informative galaxies are galaxies where each model is confident (entropy H in the posterior from each model, *p(votes|weights),* is low) but the average prediction over all the models is uncertain (entropy across all averaged posteriors is high). This is only possible because we think about labels probabilistically and approximate training many models. For more, see Houlsby, N. (2014) and Gal 2017, or our code for an implementation.

What galaxies are informative? Exactly the galaxies you would intuitively expect.

- The model strongly prefers diverse featured galaxies over ellipticals (smooth ‘blobs’).

- For identifying bars, the model prefers galaxies which are better resolved (lower redshift).

Our active learning system selects galaxies on the left (featured and diverse) over those on the right (smooth ‘blobs’). |

If you’d like to know more or you have any questions, get in touch in the comments or on Twitter (@mike_w_ai, @chrislintott, @yaringal, @OATML_Oxford).

Cheers,

Mike

Next post

Community

Galaxy Zoo: Classifying Galaxies with Crowdsourcing and Active Learning

May 21, 2020
—
*A guest article by Mike Walmsley, University of Oxford*

The way we do science is changing; there’s exponentially more data every day but around the same number of scientists. The traditional approach of collecting data samples, looking through them, and drawing some conclusions about each one is often inadequate.

One solution is to deploy algorithms to process the data automatically. Another solut…

Build, deploy, and experiment easily with TensorFlow