https://blog.tensorflow.org/2020/08/layerwise-learning-for-quantum-neural-networks.html

Community
**·**
TF Quantum

https://3.bp.blogspot.com/-063GEqVQUdU/Xy25zCLCCLI/AAAAAAAADcI/8XGuxC0HOjAFR9VedOcD3DPw7vZLzU3tQCLcBGAsYHQ/s1600/QNN.png

August 10, 2020 —
*Posted by Andrea Skolik, Volkswagen AG and Leiden University*

In early March, Google released TensorFlow Quantum (TFQ) together with the University of Waterloo and Volkswagen AG. TensorFlow Quantum is a software framework for quantum machine learning (QML) which allows researchers to jointly use functionality from Cirq and TensorFlow. Both Cirq and TFQ are aimed at simulating noisy intermediate-sc…

Layerwise learning for Quantum Neural Networks

In early March, Google released TensorFlow Quantum (TFQ) together with the University of Waterloo and Volkswagen AG. TensorFlow Quantum is a software framework for quantum machine learning (QML) which allows researchers to jointly use functionality from Cirq and TensorFlow. Both Cirq and TFQ are aimed at simulating noisy intermediate-scale quantum (NISQ) devices that are currently available, but are still in an experimental stage and therefore come without error correction and suffer from noisy outputs.

In this article, we introduce a training strategy that addresses vanishing gradients in quantum neural networks (QNNs), and makes better use of the resources provided by a NISQ device. If you’d like to play with the code for this example yourself, check out the notebook on layerwise learning in the TFQ research repository, where we train a QNN on a simulated quantum computer!

Simplified QNN for a classification task with four qubits |

The figure above shows a simplified QNN for learning classification of MNIST digits.

First, we have to encode the data set into quantum states. We do this by using a data encoding layer, marked orange in the figure above. In this case, we transform our input data into a vector, and use the vector values as parameters

The last operation in the quantum circuit is a measurement. During computation, the quantum device performs operations on superpositions of classical bitstrings. When we perform a readout on the circuit, the superposition state collapses to one classical bitstring, which is the output of the computation that we get. The so-called collapse of the quantum state is probabilistic, to get a deterministic outcome we average over multiple measurement outcomes.

In the above picture, marked in green, we perform measurements on the third qubit and use these to predict labels for our MNIST examples. We compare this to the true data label and compute gradients of a loss function just like in a classical NN. These types of QNNs are called “hybrid quantum-classical”, as the parameter optimization is handled by a classical computer, using e.g. the Adam optimizer.

In short, barren plateaus occur when quantum circuits are initialized randomly - in the circuit illustrated above this means picking operations and their parameters at random. This is a fundamental problem for training parametrized quantum circuits, and gets worse as the number of qubits and the number of layers in a circuit grows, as we can see in the figure below.

Variance of gradients decays as a function of the number of qubits and layers in a random circuit |

Remember that for QNNs, outputs are estimated from taking the average over a number of measurements. The smaller the quantity we want to estimate, the more measurements we will need to get an accurate result. If these quantities are much smaller compared to the effects caused by measurement uncertainty or hardware noise, they can't be reliably determined and the circuit optimization will basically turn into a random walk.

To successfully train a QNN, we have to avoid random initialization of the parameters, and also have to stop the QNN from randomizing during training as its gradients get smaller, for example when it approaches a local minimum. For this, we can either limit the architecture of the QNN (e.g. by picking certain gate configurations, which requires tuning the architecture to the task at hand), or control the updates to parameters such that they won’t become random.

We designate a number of

This provides us with a good starting point in the optimization landscape to continue training larger contiguous sets of layers. As another hyperparameter, we define the percentage of layers we train together in the second phase of the algorithm. Here, we choose to split the circuit in half, and alternatingly train both parts, where the parameters of the inactive parts are always frozen. We call one training sequence where all partitions have been trained once a

Let’s compare our training strategy to CDL, which is one of the standard techniques used to train QNNs. To get a fair comparison, we use exactly the same circuit architecture as the one generated by the LL strategy before, but now update all parameters simultaneously in each step. To give CDL a chance to train, we optimize the parameters with zero instead of randomly. As we don’t have access to a real quantum computer yet, we simulate the probabilistic outputs of the QNN, and choose a relatively low value for the number of measurements that we use to estimate each prediction the QNN makes - which is 10 in this case. Assuming a 10kHZ sampling rate on a real quantum computer, we can estimate the experimental wall-clock time of our training runs as shown below:

Notably, the test error of CDL runs increases with the runtime, which might look like overfitting at first. However, each curve in this figure is averaged over many runs, and what is actually happening here is that more and more CDL runs randomize during training, unable to recover. In the legend we show that a much larger fraction of LL runs achieved a classification error on the test set lower than 0.5 compared to CDL, and also did it in less time.

In summary, layerwise learning increases the probability of successfully training a QNN with overall better generalization error in less training time, which is especially valuable on NISQ devices. For more details on the implementation and theory of layerwise learning, check out our recent paper!

If you’d like to learn more about quantum computing and quantum machine learning in general, there are some additional resources below:

- Quirk: a drag-and-drop quantum circuit simulator with nice visualizations

- Quantum Intuition on YouTube, which has many helpful video tutorials

- The TensorFlow Quantum whitepaper, for a deep dive into the theory behind QNNs

- The book Quantum Computing: An Applied Approach which covers theory as well as hands-on examples with code

Next post

Community
**·**
TF Quantum
**·**

Layerwise learning for Quantum Neural Networks

August 10, 2020
—
*Posted by Andrea Skolik, Volkswagen AG and Leiden University*

In early March, Google released TensorFlow Quantum (TFQ) together with the University of Waterloo and Volkswagen AG. TensorFlow Quantum is a software framework for quantum machine learning (QML) which allows researchers to jointly use functionality from Cirq and TensorFlow. Both Cirq and TFQ are aimed at simulating noisy intermediate-sc…

Build, deploy, and experiment easily with TensorFlow