NASA has accumulated about 40 petabytes (PB) of Earth science data, which is about twice as much as all of the information stored by the Library of Congress.
In the next five years, NASA’s data will grow up to 250 PB – more than six times larger than what NASA has now. The sheer amount of data provided by NASA gives scientists and the public the extensive Earth science information they need for informed research and decision-making. But that amount of data creates a slew of challenges, including how to store the data, how to get it into consistent and useable formats, and how to search massive data sets.
To help address these issues, NASA has funded 11 new projects as part of the agency’s Earth Science Data Systems’ Advancing Collaborative Connections for Earth Systems Science (ACCESS) program. Proposals submitted in 2019 and funded in 2020 focused on three areas: machine learning, science in the cloud, and open source tools.
Earth scientists often work with data collected by NASA’s space, airborne, and ground observation missions. Before using all of that data for machine learning, however, they have to create large training datasets.
Continue reading at NASA Goddard Space Flight Center
Image via NASA Goddard Space Flight Center