In this section you will include details about how your data is documented with a special concentration on which standards your data will utilize. Standards are established methods or systems for accomplishing the same task. For example, light bulbs and light socket shapes and sizes are standardized so they’re compatible with each other.
File format standards affect which programs can read and use data and metadata standards provide a uniform way of classifying and structuring data making it easier to remix and reuse. Lastly you also need to address how the whole data set will be documented as metadata alone is typically not enough to facilitate reuse.
File formats are a standardized way of formatting digital data so it can be read and used by computers consistently. Whenever possible you should use open, non-proprietary file formats. This means using CSV (.csv) over Excel (.xlsx) and plain text (.txt) over Word (.docx) when possible. There are two reasons to do so:
If you cannot save your data in an open, non-proprietary format then you will need to document the software, tools, and/or code needed to access the data and explain why you’ve chosen this format in your plan.
A simple definition of metadata is that it is "data about data" (how meta!). A better definition is that metadata is descriptive and structured information describing a thing. A simple example of metadata is to think of how you would describe a book: title, subject matter, length, author, date of publication, etc., all these are metadata that describe the book. In other words, metadata are details that surround data.
Metadata is often formatted as data and may be in the same file or in a supplemental file. Importantly, metadata often follows standards. A good example is Ecological Metadata Language (EML), which is a vocabulary used to describe ecological data, or ISO 8601 which is an international standard for dates and time, or the MRLC Land Cover Classifications which is a standard vocabulary for land usage for the United States. Using metadata standards such as these ensures data can be easily mixed and reused with similar data will minimal information loss.
Documentation backs up and supports metadata by providing context. Documentation tells us the details of how the data was made and how different files relate. It provides details that might not be captured in the data and metadata alone. The most common kind of documentation is a readme – a simple text file with information to read first before using a data set.
Data dictionaries and codebooks are more focused kind of documentation that define variables, units, data types and more and often accompany spreadsheets. In other words, they define your data by providing metadata.
For example, a spreadsheet has columns and rows containing data. Column headers are often written in shorthand, so they need documentation to explain their full names and to give details such as measurement units, allowed ranges, and more. Similarly, your entire dataset needs documentation to help other people understand why it’s important and if they can reuse it.
For scientific data to be readily accessible and usable it is critical to use an appropriate community-recognized standard and machine-readable formats when they exist. If the data will be managed in domain-specific workspaces or submitted to public databases, indicate that their required formats will be followed. Regardless of the format used, the data set must contain enough information to allow independent use (understand, validate and use) of the data.
State what common data standards will be applied to the scientific data and associated metadata to enable interoperability of datasets and resources and provide the name(s) of the data standards that will be applied and describe how these data standards will be applied to the scientific data generated by the research proposed in this project. If applicable, indicate that no consensus standards exist
Researchers are encouraged to use the data standards, formats, and other common practices of their research community or scientific domain to ensure maximal interoperability and impact of data sharing.