Skip to Main Content

Escape from Spreadsheet Hell

The website for the workshop! TL;DR we’ll teach you how to create spreadsheets optimized for reuse by you and others. Your future self will thank you.
Exporting data
You can change Excel spreadsheets into plain-text files using Save As or Export in the File menu. However, for reasons that are unclear, 'Export' works better.
  1. Click 'File'
  2. Click 'Export'
  3. Click 'Change File Type'
  4. Under 'Other File Types' choose 'Text (tab delimited)' or 'CSV (Comma delimited)'

File formats

Software specific (aka, proprietary) file formats

Spreadsheet software, such as Excel, support many functions through proprietary file formats (i.e. .xlsx). While this is convenient for working with data it's not a good way to store data as you'll end up storing much more than just the data.

Storing data in .xlsx (and other proprietary formats) may also store:

  • Cell formatting (i.e. "round up")
  • Colors, fonts, and other decorative formatting (which isn't data)
  • Formulas and functions that transform data (often won't work in other programs)
  • Converted data (see 'dates and times' for an example).
  • Other weird things (macros, links to other files, etc.)

For all of these reasons we suggest not storing data in proprietary formats. Instead, use Excel (and other spreadsheet programs) to create a working copy of the data and store the raw/original data in an open, plain text format such as CSV. 

Plain-text formats

The formats discussed in this section should only be used for storing data. Plain-text files are very simple files which means they can be opened and read by a large number of programs. This makes them ideal for exporting and sharing data however, they do not support:

  • Multiple workbooks
  • Formulas and macros
  • Charts and graphs
  • Links to other data (including data in local and online files and data in other workbooks)
  • Formatting (colors, fonts, display, etc.)

The most common plain-text formats for tabular data are comma separated values (.csv) and tab separated values which are discussed below.

CSV: comma separated values (.csv)

By far the most common plain-text format to store and share tabular data. CSV uses commas (,) to designate cell contents and line breaks to separate rows. Below is an example data stored in CSV format:

field_name1,field_name2,field_name3

aaa,bbb,ccc

zzz,yyy,xxx

While CSV has been used for decades to exchange and convert data between programs it still has one major shortcoming: it does not differentiate between a comma used as a value and a comma used as a value separator. Some software, such as Excel, appears to work around this problem by guessing if a comma is a separator or part of a sentence inside of a cell. However, this method isn't fool-proof so CSV is not a recommend format for text-heavy tabular data. 

TSV: tab separated values (.tsv or .txt)

TSV is a less common format than CSV but it follows the same principles. Instead of commas TSV uses tabs to designate cell contents and line breaks to separate rows. Below is an example data stored in TSV format:

field_name1    field_name2    field_name3

aaa    bbb    ccc

zzz    yyy    xxx

Unlike the comma, it is rarer for data to contain a tab character. For this reason TSV can be a better option for text-heavy files. The extension .tsv is no longer widely used so you may encounter TSV files saved as text (.txt) files. When importing these files you may need to set TAB as the delimiter value.

Other formats

There are other plain-text formats that can be used to store tabular data. For example it is possible to store tabular data in a text file (.txt) using spaces as a separator. However, using this format requires being very careful that your cell contents don't contain extra spaces (something that's very easy to do with text or values that have been copied and pasted). We recommend CSV and TSV as they are the most well known and supported formats.

A note about encoding

File encoding is a technical setting that affects how your data is stored and read by a computer. If all goes well you may never need to check this setting but if you see a � [?] character in your data then you've likely hit an encoding problem.

ASCII

Supports basic operator symbols, numbers, and English letters. An old formatting with no frills but it's still the default for a number of  programs.

UTF-8

Supports pretty much every character you'll ever need and languages other than English. It's also backwards compatible with ASCII. Excel 365 (2019) files exported as CSV or TSV files are UTF-8 encoded but older versions of Excel may use ASCII by default.

Escape Coordinators

Megan O'Donnell
mno@iastate.edu
Data Services Librarian
ENT, EEOB, NREM, and Environment.

Kris Stacy-Bates
kksb@iastate.edu
Science and Technology Librarian
ABE, CCEE, Math, and Stats