Zero (0) is a number. It is a measurement that shows that something "is not," did not occur, or occurred below a measurable level. A zero is not the same as a missing (was not observed) or invalid value. The only way zero should be encoded is with the '0' character as this ensures that it will be processed as a number.
Unlike zero, null values are not an absolute or something quantifiable; they're an unknown. As such they should not be stored as numbers (i.e. 0, -999, 999, etc.). The table below provides information on the best and worst ways to encode null values in a dataset.
Null Values | Problems | Compatibility | Recommendation |
---|---|---|---|
0 | Indistinguishable from a true zero | NEVER use | |
Blank | Hard to distinguish values that are missing from those overlooked on entry. Hard to distinguish blanks from spaces, which behave differently. | R, Python, SQL, Excel | Best option |
-999, 999 | Not recognized as null by many programs without user input. Can be inadvertently entered into calculations. | Avoid | |
NA, na | Can also be an abbreviation (e.g., North America), can cause problems with data type (turn a numerical column into a text column). NA is more commonly recognized than na. | R | Good option |
N/A | An alternate form of NA, but often not compatible with software [due to the '/'] | Avoid | |
NULL | Can cause problems with data type [i.e. text in numerical field]. | SQL | Good option |
None | Uncommon. Can cause problems with data type [i.e. text in numerical field]. | Python | Avoid |
No data | Uncommon. Can cause problems with data type, contains a space. | Avoid | |
Missing | Uncommon. Can cause problems with data type. | Avoid | |
-, +, . | Uncommon. Can cause problems with data type. | Avoid |
Megan O'Donnell
mno@iastate.edu
Data Services Librarian
ENT, EEOB, NREM, and Environment.
Kris Stacy-Bates
kksb@iastate.edu
Science and Technology Librarian
ABE, CCEE, Math, and Stats
Katie Wampole
kwampole@iastate.edu
Research Data Curator