Data Management
The Data Management activity emphasizes data standards, preservation, and data federation to benefit all DOE scientists. To date this activity supports the Earth System Grid Federation (ESGF) and an Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) in support of Environmental System Science activities.
The goal of the Data Management activity is to develop and make available to the community novel scale-aware visualization and analysis methods involving observational and model-generated data. This activity also prioritizes the further development of tools to quantify uncertainty and adapt to different modeling frameworks that enable integrated analysis and comparison of data from multiple sources and across variable experimental conditions. The massive volumes of data and the demand for data security requires a federated data archiving and dissemination construct. Short term goals are to: develop user-focused publication tools utilizing an open source scripting approach that are required to organize, manipulate and publish these data; design visualization and analytics tools by leveraging existing community capabilities and interfaces, that allow multi-scale data analytics and model integration; and demonstrate capabilities within existing federated constructs.
While EESSD supports many activities with integrated experimental and computational functions, dramatic improvements in technologies and analytic methodologies during recent years have shifted the bottleneck in scientific productivity from data production to new opportunities in Data Management, interpretation and visualization. For example, novel computational strategies for integration and interpretation of information generated from different scales are needed to further explain the underlying design principles of different elements of systems and the association between different phenomena. For the Earth, climate, and environmental system sciences, multi-dimensional visualization techniques involving processes acting simultaneously on the smallest scale associated with cloud physics to macroscale climate dynamics are desperately needed. Statistical data analytics, machine learning, and inference are central to virtually all scales of data analytics in the climate, Earth, and environmental sciences
New approaches are also needed to permit scalable data management and ease of access. With the data volumes already beyond exabyte scales, a federated data system is the practical approach going forward. This would allow data providers to maintain a set of geographically dispersed nodes accessible by the scientific community; and through linkages, nodes act to harmonize differing archived data repositories, thus allowing a scientist to access all data as if it were on their system. Such an integrative system also links analysis software, multi-scale and multi-dimensional visualization, and high performance computing as part of the federated data analytics, thus offering unique collaborative work environments not currently available. Using this approach, rapid proto-typing of new algorithms for new model representations, uncertainty quantification, and intelligent pattern recognition are possible. The proposed unique and innovative data environment, i.e., as a Virtual Laboratory, would become a best-in-class resource for DOE and the Nation.
Information Resources of Note
The Earth System Grid Federation, a Peer-to-Peer (P2P) collaboration that develops, deploys and maintains software infrastructure for the management, dissemination, and analysis of model output and observational data.
The Environmental Systems Science Data Infrastructure for a Virtual Ecosystem, a data repository that is in support of EESSD activities, with emphasis on Environmental System Science activities.
The Biological and Environmental Research Advisory Committee report entitled: BER Virtual Laboratory: Innovative Framework for Biological and Environmental Challenges
A Workshop report from the Working Group on Virtual Data Integration. A report that involves integrating our data and moving beyond the investigation of “parts” to an understanding of an integrated environmental systems behavior.
Program Manager
Dr. Justin Hnilo, Program Manager
Climate and Environmental Sciences Division, SC-23.1
Department of Energy, GTN Bldg.
1000 Independence Ave, SW
Washington, DC 20585-1290
(301) 903-1399
Fax: (301) 903-8519
Email: justin.hnilo@science.doe.gov