Skip to Main Content

Data Cleaning with Excel

Through this guide you will learn tricks and tips to better clean and organize tabular data using Microsoft Excel.

Controlling Terms: Cell Validation

With the cell validation feature you can set up your cells to only accept certain values, even from the beginning of data collection. Much like a validated field in REDCap, this prevents any terms other than those on the list from being entered as data values. For this example we'll work with the Other Diagnosis column in the NEISS small subset spreadsheet.

Setting up Cell Validation Instructions

  • Create a new column titled "Diagnosis Controlled Vocab" next to Other_Diagnosis. Fill it with a small sample of diagnoses
  • Select the cells you wish to validate, in this case all the cells in Other_Diagnosis (M2:M4531)
  • From the Data Tab choose Data Validation. In the resulting list of choices, choose Data Validation

Screen Clipping showing the Data Validation setup dialog box, accessible from the Data Validation option on the Data tab on the Excel ribbon. The source field is circled to show the format for entering the parameters of the source cells.

  • Under Validation Criteria you have many choices of the kinds of values you can allow such as numbers formatted in certain ways, dates, etc. Choose 'List.'
  • For the Source, click in the white box, then in the first cell of the values in the 'Diagnosis Controlled Vocab' column, type : , delete anything that auto-fills, and replace it with $(Letter)$(Number) for the last cell in the 'Diagnosis Controlled Vocab' column. The result will be something like $N$2:$N$8. Click OK.
  • Once setup is done, go back to the Other_Diagnosis column, choose a field, and try to enter a data value that is not on the controlled list

Screen Clipping showing the drop-down menu that can be accessed in the data-validated column after validation has been set up. A red arrow points to this menu for emphasis.

  • Notice that a drop-down list of your entire controlled vocabulary appears. If you try to enter a term that is not on the list, you should see this error:

Screen Clipping showing the error message that the user receives when they try to enter a value in a cell in a data-validated column that is not present in the source list. The incorrect value is circled for emphasis.