- Library
- Library Guides
- Special topic
- Microsoft Tips and Codes
- Sorting an alpha numeric string

Codes, tips and tricks found over years of experience using Office.

This formula uses CONCATENATE(), TEXT(), MID(), and FIND()

A frustration while sorting in Excel is an alpha-numeric string of characters, like a call number or standard number. Any alpha numeric sort places the alpha in order, and then numbers in order by their first numeral, so 1300 will come before 140 which does not work well for lists like call numbers in libraries.

The following formula will grab text and numbers from a string and concatenate them into a sortable string. I used this to sort a file of ASTM standards that had changes from what we had in print. When it was printed for users, the standards were in the order they would be in the file.

To understand how this works, here is a sample from a recent list of records we tried to sort in Excel:

ASTM D6615-2010 |

ASTM D6783-2005 (R2011) |

ASTM D779-2003 |

ASTM D918-1993 |

ASTM D982-2006 (R2009) |

ASTM E1323-2009 |

ASTM E141-2010 |

ASTM E2232-2010 |

Notice how the numbers in the thousands are placed before the numbers in the hundreds in a standard Excel sort. I am going to grab the string in the middle of the standard number and format it so the string is the same length for all standards, and then sort by that string instead of the one shown above.

Here is the formula we used if the first item to sort was in cell A2:

=CONCATENATE(MID(A2,FIND(" ",A2,2)+1,1),TEXT(MID(A2,7,FIND("-",A2)-7),"00000"))

Let's break it apart!

First we want to find the letter of the string we want to use to sort and isolate it from the numbers. For ASTM D6615-2010, we want to isolate the "D." The MID command is used to extract information from the middle of the string.

MID(text,FIND()+1,num_chars)

The **text** part of the formula indicates the cell being read, which is in this case A2.

The **FIND **command is used to extract the **start_num** of the string we want. By finding the space (" ") in cell A2, and adding 1, we get a number representing the first character after the space in our string:

FIND(" ",A2)+1

Since we only want one letter, we can use "1" for **num_chars**, so here is the part that grabs the "D":

MID(A2,FIND(" ",A2)+1,1) = D

The next part gathers the numbers after "D" (minus the date) and formats them into a 5-digit text string (some existing standards have 5-digit numbers in them, so this is designed to handle them as well). First we will find the numbers using MID and FIND:

MID(A2,7,FIND("-",A2)-7

Since we have a set number of characters to find the first numeral*, we can use "7" as the **start_num** in the MID() statement. We then use the FIND() to count the number of characters from the beginning of the string and then subtract the start_num from the total to get our number of characters to count. So in essence the formulais "in ASTM D6615-2010, find the 7th character (6) and then count all characters from the beginning of the string to the dash (11), and subtract the 7 characters at the beginning (4) = 6615"

To make that a 5-digit number, we use TEXT to format it:

TEXT(MID(A2,7,FIND("-",A2)-7),"00000") = TEXT(6615,"0000")

CONCATENATE the two formulas into one, and the sort is D06615. So our list above then looks like this:

docNum | Sort |

ASTM D6615-2010 | D06615 |

ASTM D6783-2005 (R2011) | D06783 |

ASTM D779-2003 | D00779 |

ASTM D918-1993 | D00918 |

ASTM D982-2006 (R2009) | D00982 |

ASTM E1323-2009 | E01323 |

ASTM E141-2010 | E00141 |

ASTM E2232-2010 | E02232 |

And when we sort it by the new sort field:

docNum | Sort |

ASTM D779-2003 | D00779 |

ASTM D918-1993 | D00918 |

ASTM D982-2006 (R2009) | D00982 |

ASTM D6615-2010 | D06615 |

ASTM D6783-2005 (R2011) | D06783 |

ASTM E141-2010 | E00141 |

ASTM E1323-2009 | E01323 |

ASTM E2232-2010 | E02232 |

**NOTE: This is a simple example that only needs to find one alpha character. More complex formulas with multiple alpha characters would require extensive programming beyond this.*