How do you determine the measurable “things” that describe the nature of our universe? To answer that question, researchers used CosmoFlow, a deep learning technique, running on a National Energy Research Scientific Computing Center supercomputer. They analyzed large, complex data sets from 3-D simulations of the distribution of matter to answer that question. The team showed that CosmoFlow offers a new platform to gain a deeper understanding of the universe.
Deep learning is a powerful way to figure out relationships between variables in complex data. It has been widely developed and adopted by industry. However, applying it to scientific data presents unique challenges. For example, scientific data is often more complex (3-D and 4-D) and voluminous (terabytes and petabytes of data) than the data found in commercial applications. By using CosmoFlow and similar applications to study the distribution of matter at the largest scales, scientists can learn more about the physics of the universe involving gravity, dark matter, and dark energy.
In cosmology, the study of dark matter and dark energy has taken on an increasingly important role as scientists strive to gain a better understanding of the origins of the universe and determine its future. Today, both are the subject of several experiments already surveying the sky in multiple wavelengths, yielding huge datasets that scientists must sift through as they search for clues to the fundamental physics of the universe. This presents cosmologists with some unique Big Data problems; for example, they typically characterize the distribution of dark matter using simplified statistical measures of the structure of matter, which limits their ability to deal with more complex and voluminous data sets.
To address this challenge, researchers and engineers worked together to apply deep learning methods to data-intensive science. Using a new, large-scale deep learning application called CosmoFlow to analyze cosmological simulations on the Cori supercomputer, the team was able to train CosmoFlow’s algorithms to recognize signatures of the differing parameters that were used to create those simulations. The scaling and performance improvements they demonstrated enabled them to achieve an unprecedented level of accuracy in their estimation of cosmological parameters.
To be relevant for scientific challenges and enable fast turnaround of the exploration of ideas, deep learning frameworks need to efficiently process multi-dimensional data at scale. In fact, the success of deep learning for scientific problems will hinge upon the ability to develop and optimize algorithms that can efficiently handle the complexity of scientific data. This research project used a new deep learning technique running on National Energy Research Scientific Computing Center’s Cori supercomputer to analyze large sets of multi-dimensional data from 3-D simulations of the distribution of matter to solve a pressing problem in cosmology: how to determine the parameters that describe the nature of the universe. The research team demonstrated that tools such as CosmoFlow could provide scientists with an entirely new platform for gaining a deeper understanding of the universe.
National Energy Scientific Research Computing Center and Lawrence Berkeley National Laboratory
Funding provided through the Big Data Center, a collaboration among National Energy Research Scientific Computing Center (a Department of Energy Office of Science user facility), Intel, Cray, University of California at Berkeley, University of California at Davis, Oxford University, the University of Montreal, and the HDF Group.
A. Mathuriya, D. Bard, P. Mendygral, et al., “CosmoFlow: Using deep learning to learn the universe at scale.” Supercomputing Conference 2018: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Article 65. Dallas Texas, November 11-16 (2018). [https://arxiv.org/pdf/1808.04728.pdf]