Unfolding the Universe using TensorFlow
dezembro 01, 2022

A guest post by Roberta Duarte, IAG/USP

Astronomy is the science of trying to answer the Universe’s biggest mysteries. How did the Universe begin? How will it end? What is a black hole? What are galaxies and how did they form? Is life a common piece in the Universe’s puzzle? There are so many questions without answers. Machine learning can be a vital tool to answer those questions and help us unfold the Universe.

Astronomy is one of the oldest sciences. The reason is simple: we just have to look at the sky and start questioning what we are seeing. It is what astronomers have been doing for centuries. Galileo discovered a series of celestial objects after he observed the sky through the lenses of his new invention: the telescope. A few years later, Isaac Newton used Galileo's contributions to find the Law of Universal Gravitation. With Newton’s results, we could not only understand better how the Sun affects Earth and other planets but also why we are trapped on Earth’s surface. Centuries later, Edwin Hubble found that galaxies are moving away from us and that further galaxies are moving faster than closer ones. Hubble’s findings showed that the Universe is expanding and is accelerated. These are a few examples of how studying the sky can give us some answers about the universe.

What all of them have in common is that they record data obtained from observations. The data can be a star’s luminosity, planets’ positions, or even galaxies’ distances. With technology improving the observations, more data is available to help us understand the Universe around us. Recently, the most advanced telescope, James Webb Space Telescope (JWST), was launched to study the early Universe in infrared. JWST is expected to transmit 57.2 gigabytes per day of data containing information about early galaxies, exoplanets, and the Universe’s structure.

While this is excellent news for astronomers, it also comes with a high cost. A high computational cost. In 2020, Nature published an article about big data and how Astronomy is now in an era of big data. JWST is one of the examples of how those powerful telescopes are producing huge amounts of data every day. Vera Rubin Observatory is expected to collect 20 terabytes per night. Large Arrays collect petabytes of data every year, and next-generation Large Arrays will collect hundreds of petabytes per year. In 2019, several Astro White Papers were published with the goals and obstacles in the Astronomy field predicted for the 2020s. They outlined how Astronomy needs to change in order to be prepared for the huge volume of data expected during the 2020s. New methods are required since the traditional cannot deal with the expressive number. We see problems showing up when talking about storage, software, and processing.

The storage problem may have a solution in cloud computing, eg. GCP, as noted by Nature. However, processing does not have a simple solution. The methods used to process and analyze the data need to change. It is important to note that Astronomy is a science based on finding patterns. Stars with the same redshift - an estimation of the distance of stars in space relative to us by measuring the shift of the star's light waves towards higher frequencies - and similar composition can be considered candidates for the same population. Galaxies with the same morphology and activity or spectrum originating in the nucleus usually show the presence of black holes with similar behavior. We can even calculate the Universe’s expansion rate by studying the pattern in the spectra of different Type I Supernovae. And, what is the best tool we have to learn patterns in a lot of data? Machine Learning.

Machine learning is a tool that Astronomy can use to deal with the computational problems cited above. A data-driven approach offered by machine learning techniques may help to get analysis and results faster than traditional methods such as numerical simulations or MCMC - a statistical method of sampling from a probability distribution. In the past few years, we are seeing an interesting increase in the interaction between Astronomy and machine learning. To quantify, the keyword machine learning presented in Astronomy’s papers increased four times from 2015 to 2020 while deep learning increased 3 times each year. More specifically, machine learning was widely used to classify celestial objects and to predict spectra from given properties. Today, we see a large range of applications since discovering exoplanets, simulations of the Universe’s cosmic web, and searching for gravitational waves.

Since machine learning offers a data-driven approach, it can accelerate scientific research in the field. An interesting example is the research around black holes. Black holes have been a hot topic for the past few years with amazing results and pictures from the Event Horizon Telescope (EHT). To understand a black hole, we need the help of computational tools. A black hole is a region of spacetime extremely curved that nothing, not even light, can escape. When matter gets trapped around its gravitational field, the matter will create a disk called accretion disk. The accretion disk dynamics are chaotic and turbulent. To understand the accretion disk physics, we need to simulate complex fluid equations. 

A common method to solve this and gain insight into black hole physics is to use numerical simulations. The environment around a black hole can be described using a set of conservative equations - usually, mass conservation, energy conservation, and angular momentum conservation. The set of equations can be solved using numerical and mathematical methods that iteratively solve each parameter for each time. The result is a set of dumps - or frames - with information about density, pressure, velocity field, and magnetic field for each (x, y, t) in the 2D case or (x, y, z, t) in the 3D case. However, numerical simulations are very time-consuming. A simple hydrodynamical treatment around a black hole can go up to 7 days running on 400 CPU cores.

If you start adding complexity, such as electromagnetism equations to understand the magnetic fields around a black hole and general relativity equations to realistically explain the space-time there, the time can increase significantly. We are slowly reaching a barrier in black hole physics due to computational limitations where it is becoming harder and harder to realistically simulate a black hole.

Black hole research

That is where my advisor, Rodrigo Nemmen, and I started to think about a new method to accelerate black hole physics. In other words, a new method that could accelerate the numerical simulations we needed to study these extreme objects. From the beginning machine learning seems like the method with the best perspective for us. We had the data to feed into a machine learning algorithm and there were successful cases in the literature simulating fluids using machine learning. But never around a black hole. It was worth giving it a shot. We began a collaboration with João Navarro from Nvidia Brazil and then we started solving the problem. Carefully, we chose an architecture that we would be based on while building our own scheme. Since we wanted a data-driven approach, we decided to go with supervised learning, more specifically, we decided to use deep learning linked with the great performance of convolutional neural networks.

How we built it

Everything was built using TensorFlow and Keras. We started using TensorFlow 1 since it was the version available at the time. Back then, Keras was not added to TensorFlow yet but funny enough, during that time I attended the TensorFlow Roadshow 2019 in São Paulo, Brazil. It was during that event that I found out about TensorFlow and Keras joining forces in TensorFlow version 2 to create the powerful framework. I even took a picture of the announcement. Also, it was the first time I heard about the strategy scope implemented in TensorFlow 2, I did not know back then that I would be using the same function today.
It needed weeks to deal with the data and to know the best way to prepare before we could feed them to ConvNets. The data described the density of a fluid around a black hole. In our case, we got the data from sub-fed black holes, in other words, black holes with low accretion rates. Back in 2019, the simulations we used were the longest simulations of this kind - 2D profiles using a hydrodynamical treatment. The process that we went through is described in Duarte et al. 2022. We trained our ConvNet with 2D spatial + 1D temporal dimensions. A cluster with two GPUs (NVIDIA G100 and NVIDIA P6000) was our main hardware to train our neural network.
After a few hours of training, our model was ready to simulate black holes. First, we tested the capacity by testing how much the model can learn the rest of the learned simulation. The video shows the target and prediction for a case that we called a direct case: we feed a simulation frame to the model as input and we analyze how well the model can predict the next step.
But we also want to see how much of the Physics the model could learn by only looking at some simulations. We test the model capacity to simulate a never-seen system. During the training process, we hid a simulation from the model. After the training, we input the initial conditions and a single frame so we could test how the model would perform while simulating by itself. The results are great news: the model can simulate a system by only learning Physics from other ones. And the news gets better: you have a 32000x speed-up compared to traditional methods.

Just out of curiosity, we tested a direct prediction from a system where the accretion flow around the black hole has high variability. It’s a really beautiful result to see how the model could follow the turbulent behavior of the accretion flow.

If you are interested in more details and results, they are available at Duarte et al. 2022.

This work demonstrates the power of using deep learning techniques in Astronomy to speed up scientific research. All the work was done using only TensorFlow tools to preprocess, train and predict. How great is that?


As we discussed in this post, AI is already an essential part of Astronomy and we can expect that it will only continue to grow. We have already seen that Astronomy can achieve big wins with the help of AI. It is a field with a lot of data and patterns that are perfect to build and test AI tools using real-world data. There will come a day that AI will be discovering and unfolding the Universe and hopefully, this day is soon!

Next post
Unfolding the Universe using TensorFlow

A guest post by Roberta Duarte, IAG/USP Astronomy is the science of trying to answer the Universe’s biggest mysteries. How did the Universe begin? How will it end? What is a black hole? What are galaxies and how did they form? Is life a common piece in the Universe’s puzzle? There are so many questions without answers. Machine learning can be a vital tool to answer those questions and help us un…