Step 6. Data Sharing
It is important to understand that data sharing is an expected part of every data management plan requirement as well as a common journal article publishing requirement. The vast majority of research data can be publicly shared with very low, or no, risk. However, if sharing your data will compromise someone or something's safety, confidentiality, intellectual property or violate law or policy, then not publicly sharing data, or sharing the data under limited circumstances, may be the responsible thing to do.
Are there justifiable reasons for limiting, or not sharing data?
Yes, but they must be clearly articulated in your DMP and the justification must be something your peers (and/or your funder) would agree with and consider reasonable. The following are examples where limited, or no, data sharing may be justifiable¹:
- Explicit federal, state, local, university, or Tribal law, regulation, or policy prohibits sharing and disclosure.
- Informed consent/IRB protocol does not permit or will limit the scope of sharing and future research use, OR, existing consent (e.g., for previously collected data or specimens) prohibits sharing or limits the scope or extent of sharing and future research use.
- The privacy or safety of research subjects would be compromised or place them at greater risk of re-identification or suffering harm, and protective measures such as de-identification, data use agreements, Certificates of Confidentiality, etc. would be insufficient.
- Restrictions imposed by existing or anticipated agreements (e.g., with third party funders, with partners, with repositories, thorough data use agreements, or licensing limitations, etc.).
- Sharing the research data may violate or hinder intellectual property filings so sharing will be delayed.
In addition, the following are examples of reasons that that are not considered justifiable factors to limit data sharing:
- Sharing data would threaten my own research or the research of my team,
- Data are considered too small or are thought to be of limited interest to others,
- Data are not thought to have a suitable repository.
¹ both lists in this section are based off NIH's answer to the FAQ titled "What are justifiable reasons for limiting sharing of data?"
When will the data be available?
Different agencies have different requirements on when research data should be made available. Some agencies require that the data be made available at the time of publication while others simply require the data to be available within 12 months or within a "reasonable time" after publication. Make sure to check the exact requirements of the sponsor as they may have a specific time period that you need to comply with.
Examples of data available statements:
Data will be available at the time of publication.
Supporting data will be available upon the acceptance of the research paper.
Data will be available no later than 6 months following publication.
All data gathered by this research will be available within a year after the grant funding has ceased.
How will others access the data?
A statement that "data is available upon request" does not show a strong commitment to data sharing. Because of this it's a good idea to share research data in a more formal matter, in effect "publishing" the data. Sharing data in a formal way also provides some additional benefits such as:
- Improved discoverability - published datasets are easier to find as they are assigned metadata.
- Citable - published datasets are often accompanied by a recommended citation.
- Stable - published datasets are often assigned a permanent identifier such as a DOI (digital object identifier) which does not change even if the URL to access the data changes.
Not all data publishing venues are created equal. A common way to "publish" data is to publish data as journal article supplementary information (SI) file. While this is an easy way to share data, not all SI files are publicly available, assigned a DOI, or assigned metadata different from that of the article (which will make it more difficult to find and track). Make sure to share your data via a method that satisfies all requirements and fits your needs.
Examples of shared data access:
Data will be publicly available on Figshare where it will be shared under a CC0 license and assigned a DOI.
Data will be available as journal article supplementary information (SI) files. These SI files will be open access.
Will there be any restrictions on the data?
If there is an acceptable reason to restrict access to your research data then make sure that the method of data sharing you have chosen is compatible with your restriction needs. Keep in mind that many data repositories require you to apply a CC0 license to your datasets which is not compatible with usage restrictions.
Example data restrictions:
Data will be deposited with ICPSR but access will be restricted due to the sensitive nature of its contents. Anyone wishing to use the data must first contact and receive permission from the PI.
Will the data have enough documentation to be useful?
Making data accessible is only one-half of data sharing. The other half is making sure your data has enough documentation for it to be meaningful and useful to others. Some repositories will not accept your data unless it is accompanied with a readme file, data dictionary, or other form of metadata or documentation. See section 3: Data Documentation for more details.
Examples of data documentation:
Shared data will be accompanied by a readme file which will provide additional information such as instrument calibrations and data coding details.
- Valuable data should be shared when possible. Examples of valuable data include data of one-time events, data that are expensive to collect, and data that validate research findings.
- Try to deposit your data in a repository that handles both data sharing and preservation.
- Try to share both the raw and analyzed data whenever possible as analyzed data often contains computed values which cannot be reversed back into their individual variables.