Spreadsheet software, such as Excel, support many functions through proprietary file formats (i.e. .xlsx). While this is convenient for working with data it's not a good way to store data as you'll end up storing much more than just the data.
Storing data in .xlsx (and other proprietary formats) may also store:
For all of these reasons we suggest not storing data in proprietary formats. Instead, use Excel (and other spreadsheet programs) to create a working copy of the data and store the raw/original data in an open, plain text format such as CSV.
The formats discussed in this section should only be used for storing data. Plain-text files are very simple files which means they can be opened and read by a large number of programs. This makes them ideal for exporting and sharing data however, they do not support:
The most common plain-text formats for tabular data are comma separated values (.csv) and tab separated values which are discussed below.
By far the most common plain-text format to store and share tabular data. CSV uses commas (,) to designate cell contents and line breaks to separate rows. Below is an example data stored in CSV format:
field_name1,field_name2,field_name3
aaa,bbb,ccc
zzz,yyy,xxx
While CSV has been used for decades to exchange and convert data between programs it still has one major shortcoming: it does not differentiate between a comma used as a value and a comma used as a value separator. Some software, such as Excel, appears to work around this problem by guessing if a comma is a separator or part of a sentence inside of a cell. However, this method isn't fool-proof so CSV is not a recommend format for text-heavy tabular data.
TSV is a less common format than CSV but it follows the same principles. Instead of commas TSV uses tabs to designate cell contents and line breaks to separate rows. Below is an example data stored in TSV format:
field_name1 field_name2 field_name3
aaa bbb ccc
zzz yyy xxx
Unlike the comma, it is rarer for data to contain a tab character. For this reason TSV can be a better option for text-heavy files. The extension .tsv is no longer widely used so you may encounter TSV files saved as text (.txt) files. When importing these files you may need to set TAB as the delimiter value.
There are other plain-text formats that can be used to store tabular data. For example it is possible to store tabular data in a text file (.txt) using spaces as a separator. However, using this format requires being very careful that your cell contents don't contain extra spaces (something that's very easy to do with text or values that have been copied and pasted). We recommend CSV and TSV as they are the most well known and supported formats.
File encoding is a technical setting that affects how your data is stored and read by a computer. If all goes well you may never need to check this setting but if you see a � [?] character in your data then you've likely hit an encoding problem.
Supports basic operator symbols, numbers, and English letters. An old formatting with no frills but it's still the default for a number of programs.
Supports pretty much every character you'll ever need and languages other than English. It's also backwards compatible with ASCII. Excel 365 (2019) files exported as CSV or TSV files are UTF-8 encoded but older versions of Excel may use ASCII by default.
Megan O'Donnell
mno@iastate.edu
Data Services Librarian
ENT, EEOB, NREM, and Environment.
Kris Stacy-Bates
kksb@iastate.edu
Science and Technology Librarian
ABE, CCEE, Math, and Stats
Katie Wampole
kwampole@iastate.edu
Research Data Curator