Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data Cleaning with Excel

Through this guide you will learn tricks and tips to better clean and organize tabular data using Microsoft Excel.

Controlling Terms: Cell Validation and Custom Lists

With the cell validation feature you can set up your cells to only accept certain values, even from the beginning of data collection. Much like a validated field in REDCap, this prevents any terms other than those on the list from being entered as data values. For this example we'll work with the Other Diagnosis column in the NEISS small subset spreadsheet.

Setting up Cell Validation Instructions

  • Create a new column titled "Diagnosis Controlled Vocab" next to Other_Diagnosis. Fill it with a small sample of diagnoses
  • Select the cells you wish to validate, in this case all the cells in Other_Diagnosis (M2:M4531)
  • From the Data Tab choose Data Validation. In the resulting list of choices, choose Data Validation

  • Under Validation Criteria you have many choices of the kinds of values you can allow such as numbers formatted in certain ways, dates, etc. Choose 'List.'
  • For the Source, click in the white box, then in the first cell of the values in the 'Diagnosis Controlled Vocab' column, type : , delete anything that auto-fills, and replace it with $(Letter)$(Number) for the last cell in the 'Diagnosis Controlled Vocab' column. The result will be something like $N$2:$N$8. Click OK.
  • Once setup is done, go back to the Other_Diagnosis column, choose a field, and try to enter a data value that is not on the controlled list

  • Notice that a drop-down list of your entire controlled vocabulary appears. If you try to enter a term that is not on the list, you should see this error:

Custom lists are used to make sort orders appear in Excel in a specified way. For instance, if you had a column where the data values were the spelled-out months of the year and sorted it, the order would be alphabetical (April, August, etc.) rather than the order in which the months occur in the year. Here we'll demonstrate how to apply a custom list while also doing more data wrangling on the Months column from the Splitting Multi-Valued Cells exercise. If you completed that exercise with the NEISS small subset dataset, you should have a column called Month with the months represented by their numbers.

Steps to Apply a Custom Sort to the Month column

  • Insert a new column next to Month called 'Month Spelled Out'
  • In the first cell (row 2) enter the formula =TEXT(E2*29,"mmmm") This converts the number representing the month to the spelled out month name
  • Double click in the lower right corner to copy this formula down the spreadsheet

  • Next, in the Ribbon go to File, then Options, then Advanced, then General, then click the 'Edit Custom Lists' button. This is simply to demonstrate how to find Excel's built-in collection of custom lists. A list already exists for the months of the year spelled out. Notice that there is an option to create a new list, which will be saved in the system for future use. Click OK.

  • Back in the worksheet, go to the Data Tab on the Ribbon, then Sort. In the dialog box that pops up, check that your data has headers, for Sort By choose the Months Spelled Out column, and in the order option choose Custom List. Choose the months option

  • The result will be that your spreadsheet's data will now sort in the order in which the months occur in the year.