Skip to main content

Data Management Plan Guide

Learn how to write a data management plan!

FAQS and Definitions

Below are some frequently asked questions related to data management plans, terminology, data sharing, federal requirements, and more. 

Do you have a question not covered by the guide? Send us an email!

Q: What is a data management plan?

A: A data management plan (DMP) contains details as to how researchers will provide long-term preservation of, and access to, research data. Data management plans include elements such as descriptions of the data to be produced in the proposed study, any standards to be used for collected data and metadata, mechanisms for providing access to and sharing of the data (including provisions for protection of privacy, confidentiality, security, intellectual property, or other rights), provisions for reuse and redistribution, and plans for archiving and long-term preservation of the data, or explaining why long-term preservation and access cannot be justified.1

1. Adapted from the AHRQ definition of data management plan.

Q: Do I really have to keep and share all of my data?

A: In general you are required to retain, share, and make accessible data that validates your research findings. You should also consider preserving/sharing data that:

  • Captures a one-time event.
  • Will be costly, difficult, or impossible to replicate.
  • Environmental data.
  • Data with long-term value.

Please refer to the Requirements by Agency section of the guide for more specific guidance on what you are required to retain and share.

Q: What is "research data"?

A: This guide uses the terms "data" and "research data" interchangeably. The definition of research data used the most often by federal agencies is adapted from OMB Circular A-110:

Research data is defined as the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. This "recorded" material excludes physical objects (e.g., laboratory samples*). Research data also do not include:

(A) Trade secrets, commercial information, materials necessary to be held confidential by a researcher until they are published, or similar information which is protected under law; and

(B) Personnel and medical information and similar information the disclosure of which would constitute a clearly unwarranted invasion of personal privacy, such as information that could be used to identify a particular person in a research study.

NOTE: NIH and NASA do not consider summary statistics, tables, charts, etc. and laboratory notebooks "research data." Although these items are important documents that may contain data they are not research data according to these agencies.

NOTE: Sample/artifact/specimen preservation and sharing is very important for some types of research. The fact that they are not "data" as defined by the government doesn't prohibit you from including this information in your grant proposal and some agencies do want this information included in a data management plan.

Q: What is metadata?

A: Meta‚Äčdata, commonly called "data about data", is information which describes data. Good metadata enables others to understand and reuse data that they themselves did not create. A minimum amount of metadata should be agreed upon and implemented before starting data collection. Data collection and documentation is easier if you know what you need to collect and how to record it. This also helps maintain data consistency and quality.

There are many different ways to record and share metadata. Some of the most common methods are:

  • Data dictionary is an effective and concise way to describe the elements or variables that make up your dataset. (For more information see DataOne's page on Data Dictionaries).
  • Metadata schema is formal framework for recording and describing data. These are often used by large collaborative data gathering/sharing projects.
  • Readme files come in a variety of styles. They are often a combination of explanations additional information with elements of a data dictionary or metadata schema mixed in. They are the least formal way to document data.

A metadata example

Harry Potter and the Sorcerer's StoneLet's say that you wish to read the book Harry Potter and the Sorcerer's Stone. How would you locate this book at the library? You'd search for it. You'd probably do a search for the book's title or for it's author, both of which will let you locate the book's library record which contains descriptive information about the book, including it's location in the library.

All of the information in the library record is metadata. It describes the book.

Library book metadata is valuable is because it describes the book in variety of ways which lets you search, locate, and identify particular books.

Research data metadata describes how the data was created, recorded, generated, analyzed, etc. This added context and details let others understand and reuse the data.

Q: What is a "data repository" and why should I use one? 

A: Data repositories are devoted to keeping data accessible, safe, and secure. They use special software, metadata, workflows, and networks to meet these goals. Data repositories also help guarantee authenticity by providing control mechanisms and change logs. They are usually the best choice for research data sharing, distribution, and preservation because of this. 

Data repositories often have limits and restrictions governing which data they accept. Most have rules governing data formats, size, and require data documentation while others will only accept research from specific domains (such as "biology" or "social science"). The later are known as disciplinary data repositories. Another type of specialized repository is the institutional data repository which focuses on collecting the outputs of select group such as a university or federal agency.

The Data Sharing portion of the guide provides more information and resources for locating data repositories.

Q: What is "machine-readable data"?

A: Machine-readable data is data which can be read and processed by a computer. By comparison human-readable data can only be read (and understood) by a human. It is important to understand that charts, graphs, and most tables are not machine-readable but the data they were generated from probably is.

Examples of human-readable data include books, PDFs, representations of data (charts, graphs, tables, etc.), and datasets which have not been structured to be read by computers.

Examples of machine-readable data include data which has been encoded with a mark-up language (html, xml, etc.), data sets which have been structured to be read by computers, and data which is encoded for machine processing and is not human-readable.

Learn more about machine-readable data

Q: What does "digitally accessible" mean?

A: Making data digitally accessible is part of making data machine-readable. There is no clear definition of this term but it is generally understood that:

  1. The data must be stored in a digital format (i.e. on a computer)
  2. The data should be be available online, be machine-readable, and shared as freely as possible.




Request a consultation or ask a question:

submit feedback

Get Research IT support: