Statement on Digital Data Management

The Office of Science mission is to deliver the scientific discoveries and major scientific tools that transform our understanding of nature and advance the energy, economic, and national security of the United States. The Office of Science Statement on Digital Data Management has been developed with input from a variety of stakeholders in this mission1.

Here, data management involves all stages of the digital data life cycle including capture, analysis, sharing, and preservation. The focus of this statement is sharing and preservation of digital research data.

Table of Contents


Principles

The Office of Science affirms that the following principles related to the management of digital research data directly support fulfillment of its mission.

  • Effective data management has the potential to increase the pace of scientific discovery and promote more efficient and effective use of government funding and resources. Data management planning should be an integral part of research planning.
  • Sharing and preserving data are central to protecting the integrity of science by facilitating validation of results and to advancing science by broadening the value of research data to disciplines other than the originating one and to society at large. To the greatest extent and with the fewest constraints possible, and consistent with the requirements and other principles of this Statement, data sharing should make digital research data available to and useful for the scientific community, industry, and the public. 
  • Not all data need to be shared or preserved. The costs and benefits of doing so should be considered in data management planning.

Requirements

To integrate data management planning into the overall research plan, the following requirements will apply to all Office of Science research solicitations and invitations for new, renewal, and some supplemental funding issued on or after October 1, 2014. These requirements apply to proposals from all organizations including academic institutions, DOE National Laboratories, and others. These requirements do not apply to applications to use Office of Science user facilities.

All proposals submitted to the Office of Science for research funding must include a Data Management Plan (DMP) that addresses the following requirements:

  1. DMPs should describe whether and how data generated in the course of the proposed research will be shared and preserved. If the plan is not to share and/or preserve certain data, then the plan must explain the basis of the decision (for example, cost/benefit considerations, other parameters of feasibility, scientific appropriateness, or limitations discussed in #4). At a minimum, DMPs must describe how data sharing and preservation will enable validation of results, or how results could be validated if data are not shared or preserved.
  2. DMPs should provide a plan for making all research data displayed in publications resulting from the proposed research open, machine-readable, and digitally accessible to the public at the time of publication. This includes data that are displayed in charts, figures, images, etc. In addition, the underlying digital research data used to generate the displayed data should be made as accessible as possible to the public in accordance with the principles stated above. This requirement could be met by including the data as supplementary information to the published article, or through other means. The published article should indicate how these data can be accessed.
  3. DMPs should consult and reference available information about data management resources to be used in the course of the proposed research. In particular, DMPs that explicitly or implicitly commit data management resources at a facility beyond what is conventionally made available to approved users should be accompanied by written approval from that facility. In determining the resources available for data management at Office of Science User Facilities, researchers should consult the published description of data management resources and practices at that facility and reference it in the DMP. Information about other Office of Science facilities can be found in the additional guidance from the sponsoring program.
     
  4. DMPs must protect confidentiality, personal privacy, Personally Identifiable Information, and U.S. national, homeland, and economic security; recognize proprietary interests, business confidential information, and intellectual property rights; avoid significant negative impact on innovation, and U.S. competitiveness; and otherwise be consistent with all applicable laws, regulations, and DOE orders and policies. There is no requirement to share proprietary data. 

DMPs will be reviewed as part of the overall Office of Science research proposal merit review process. Additional requirements and review criteria for the DMP may be identified by the sponsoring program or sub-program, or in the solicitation.


Additional Guidance on Digital Data Management (Expires 12/31/2021)

(NOTICE: The following Guidance remains in effect for all proposals submitted to Office of Science solicitations issued through December 31, 2021.)

  • The Principal Investigator should determine which data should be the subject of the DMP and, in the DMP, propose which data should be shared and/or preserved in accordance with the Requirements.
  • In determining which data should be shared and preserved, researchers must consider the data needed to validate research findings as described in the Requirements, and are encouraged to consider the potential benefits of their data to their own fields of research, fields other than their own, and society at large.
  • DMPs should reflect relevant standards and community best practices for data and metadata, and make use of community accepted repositories whenever practicable.
  • Costs associated with the scope of work and resources articulated in a DMP may be included in the proposed research budget as permitted by the applicable cost principles.
  • To improve the discoverability of and attribution for datasets created and used in the course of research, the Office of Science encourages the citation of publicly available datasets within the reference section of publications, and the identification of datasets with persistent identifiers such as Digital Object Identifiers (DOIs). In most cases, the Office of Science can provide DOIs free of charge for data resulting from DOE-funded research through its Office of Scientific and Technical Information (OSTI) DataID Service.
  • View a list of suggested elements for a DMP.

Additional Guidance on Digital Data Management (Effective 1/1/2022)

(NOTICE: The following Guidance will take effect for Office of Science solicitations issued after January 1, 2022.)

  • The Principal Investigator should determine which data should be the subject of the DMP and, in the DMP, propose which data should be shared and/or preserved in accordance with the Requirements.
  • In determining which data should be shared and preserved, researchers must consider the data needed to validate research findings as described in the Requirements, and are encouraged to consider the potential benefits of their data to their own fields of research, fields other than their own, and society at large.
  • DMPs should reflect relevant standards and community best practices for data and metadata and make use of community accepted repositories whenever practicable.
  • Costs associated with the scope of work and resources articulated in a DMP may be included in the proposed research budget as permitted by the applicable cost principles.
  • To improve the discoverability of and attribution for datasets created and used in the course of research, the Office of Science encourages the citation of publicly available datasets within the reference section of publications, and the identification of datasets with persistent identifiers such as Digital Object Identifiers (DOIs). In most cases, the Office of Science can provide DOIs free of charge for data resulting from DOE-funded research through its Office of Scientific and Technical Information (OSTI) DOE Data ID Service. OSTI also provides a service for assigning DOIs to DOE-funded software through DOE CODE.

Additional Guidance for Reviewers on Digital Data Management (Effective 1/1/2022)

(NOTICE: The following Guidance will take effect for Office of Science solicitations issued after January 1, 2022.)

As part of the DOE Office of Science Merit Review process, reviewers are asked if the DMP is suitable for the proposed research and to what extent does it support the validation of research results. Reviewers are expected to determine if the DMP has met the Requirements and provide constructive feedback to the Office of Science and the researcher. The Office of Science developed a document containing additional guidance for the reviewer to help reviewers prepare and provide constructive feedback. This document provides example reviewer feedback utilizing the Suggested Elements.

Guidance for Reviews of DMPs


Suggested Elements for a Data Management Plan (Effective 1/1/2021)

(NOTICE: The following Guidance will take effect for Office of Science solicitations issued after January 1, 2022.)

Submitted by researchers, DMPs provide a means for the Office of Science to assess, oversee, and ensure that research efforts align with best practices in data management and result in reusable, open research products to the extent feasible and appropriate. These Suggested Elements for a DMP offer guidance to researchers about what to include in a DMP.

For more information about what qualifies as data, please refer to the provided definition of Digital Research Data. The term digital data includes experimental, observational, and simulation data; codes, software, and algorithms; text; numeric information; images; video; audio; and associated metadata.

Data Collected, Generated, or Used

A brief description of data that are expected to be used or generated during the course of the proposed research, which may include:

  • Description in general terms (nature and scope; method used to generate the data, e.g. simulation, observation, experiment…)
  • Amount or size of the data
  • Modality (text, imaging, genomic, structured…)
  • Level of aggregation (individual, summarized…)
  • Degree of data processing (raw, analyzed…)
  • Relationship of the data to other data, as relevant
  • Confirmation that the PI has rights to use or collect the data; including the rights to share or otherwise manage data as described in the DMP
  • The use of persistent, unique identifiers

Standards

A description of any data or metadata standards or formats to be used or considered, which may include:

  • Any standards to be applied to the scientific data, associated metadata, and documentation including models, formats, identifiers, definitions, unique identifiers, controlled vocabularies, taxonomies, thesauri, ontologies, code books, data dictionaries, and other data documentation
  • Whether the data standards are open or proprietary
  • Indication that no appropriate data standards exist, as may be the case for some scientific fields

Related Tools, Software and/or Code

A description of any code or specialized tools needed to make use of the data, which may include:

  • Names of the code, software, or specialized tools needed to access, manipulate, or make use of the data
  • How these can be accessed (e.g., open source and freely available, generally available for a fee in the marketplace, available only from the research team or some other source)

Data Sharing

A description of how data will be accessed and shared, which may include:

  • A description of the shared subset of the total data generated, collected, or used
  • Data distribution plans regarding a specific instance of a dataset, including a repository from which the data can be accessed
  • Any licenses or access rights for the data
  • A description of whether data will be open to the public, limited, or closed and why (e.g. ethical or legal reasons)
  • Any deadlines or embargo periods for releasing the data after they are generated
  • Any other considerations that may result in limitations on the ability to broadly share scientific data
  • Identifying potential users of the shared data

Data Preservation

A description of plans for preserving data, which may include:

  • Where scientific data will be archived to ensure long-term preservation
  • How, and under what conditions, management responsibility for the data might be transferred (e.g. to a repository)
  • Any future decision points regarding continued preservation, archiving, or retiring of data
  • The minimum preservation time afforded by the proposed budget
  • Plans for preserving metadata even if data are not preserved
  • Estimate of the time period between data collection and submission to a preservation archive
  • How data will be curated to meet user needs and resource constraints

Data Protection: Security and Integrity

A description of measures to ensure data security and integrity, which may include:

  • Measures to prevent the accidental or malicious modification of data
  • Backup measures to prevent single-point failures
  • Necessary physical and cyber resources
  • How back-up, disaster recovery, off-site data storage, and other redundant storage strategies will be used to ensure the data's security and integrity
  • How different tiers or levels of access will be managed

Oversight of Data Management

How alignment with this DMP will be monitored and managed, and by whom, which may include:

  • Roles and responsibilities of individuals in ensuring data management implementation is consistent with the DMP and potentially making updates to the DMP
  • People or groups that have the right to manage the data
  • People or groups that have intellectual property related to the data, data access, or data use

Rationale

A brief justification of the proposed data management plan or any associated costs, which may include:

  • How the DMP influences the potential impact of the data within the immediate field and in other fields, and any broader societal impact
  • How cost, privacy, national security, competitiveness, or other considerations factored into DMP elements
  • Elaborations of budget requests associated with data management

Additional Requirements and Guidance from Office of Science Program Offices


Information about Data Management Resources at Office of Science User Facilities

View information about the data management resources available at the Office of Science User Facilities.


Glossary

Data Preservation:

Data preservation means providing for the usability of data beyond the lifetime of the research activity that generated them.

Data Sharing:

Data sharing means making data available to people other than those who have generated them. Examples of data sharing range from bilateral communications with colleagues, to providing free, unrestricted access to the public through, for example, a web-based platform.

Digital Research Data:

The term digital data encompasses a wide variety of information stored in digital form including: experimental, observational, and simulation data; codes, software and algorithms; text; numeric information; images; video; audio; and associated metadata. It also encompasses information in a variety of different forms including raw, processed, and analyzed data, published and archived data.

This statement focuses on digital research data, which are research data that can be stored digitally and accessed electronically. Research data are defined in regulation (2 CFR 200.315 (e)) as follows:

“Research data means the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. This 'recorded' material excludes physical objects (e.g., laboratory samples). Research data also do not include:

(A) Trade secrets, commercial information, materials necessary to be held confidential by a researcher until they are published, or similar information which is protected under law; and

(B) Personnel and medical information and similar information the disclosure of which would constitute a clearly unwarranted invasion of personal privacy, such as information that could be used to identify a particular person in a research study.”

Validate:

In the context of this statement, validate means to support, corroborate, verify, or otherwise determine the legitimacy of the research findings. Validation of research findings could be accomplished by reproducing the original experiment or analyses; comparing and contrasting the results against those of a new experiment or analyses; or by some other means.

 

FAQs

View Digital Data Management Frequently Asked Questions.


References

Federal Advisory Committee Reports on the Dissemination of Research Results (2011)