Skip to Main Content

Data Cleaning with Excel

Through this guide you will learn tricks and tips to better clean and organize tabular data using Microsoft Excel.

Cleaning Scenario Using COUNTIF, Advanced Filters, and Find & Replace

The example scenario in this section of the guide is drawn from the class 10 Simple Rules for Data Cleaning with Excel, and utilizes a small subset of the National Electronic Injury Surveillance System dataset from 2022. This dataset compiles information from patient visits for emergency treatment for injuries involving commercial products. In the dataset's "Narrative" column, the patient's experience is recorded. In this column, the shorthand "C/O" is used in place of the words "complaining of". Let's imagine for this scenario that we would like to replace each instance of "C/O" with "complaining of" for greater clarity.

As a first step in this multi-part cleaning task, we'll find out how many rows in the spreadsheet (the small sample version used in the class) use the shorthand "C/O".

COUNTIF Instructions

  • Select a column (Narrative_1 in this example), then choose an empty cell beneath it
  • From the Formulas tab, choose "Insert Function" and search for "COUNTIF"
  • Enter a range of cells over which to count (this can also by done by clicking first in the Range box and then in the corresponding cells in the spreadsheet, but be careful of unintended autofills)
  • Enter the criteria that you would like Excel to count over (Excel will count a cell if the criteria is true). Use the format "*text*" to search for a string of characters no matter where it occurs in a cell. In our example, the criteria is "*c/o*"
  • A total for your count appears in the cell where you placed the formula

Screen image of application of the COUNTIF function in Excel

In our COUNTIF exercise on a small subset of the NEISS data, the total numbers of rows that use the "C/O" shorthand in the Narrative field is 634. In the next exercise, we'll use advanced filters to filter for only the rows that use this shorthand, so we can carry out data cleaning just on these rows.


References:

1. US Consumer Product Safety Commission. NEISS Highlights, Data and Query Builder. Available at: https://www.cpsc.gov/cgibin/NEISSQuery/home.aspx

From what we learned in the COUNTIF exercise, we know we must filter for the rows containing "C/O" in the Narrative column of our dataset in order to clean/update this shorthand to read "complaining of". For this task we can use advanced filters.

Screen clipping showing how to reach the custom filter option in Excel. The user first chooses Filter from the Data Tab, then from the filter icon in their chosen column they choose Text Filters, then Custom Filter.

 

Steps to create a custom text filter:

  • From the Data Tab, choose Filter
  • Choose your column on which to filter (Narrative) and click the drop-down arrow. Then choose Text Filters (just above the Search Bar)
  • Choose Custom Filter (the last choice in the menu). In Mac, Custom Filter choices are available in the "Choose one" menu once Filter is clicked.
  • The dialog box that pops up offers specific criteria from which you can choose to build your custom filter, such as whether the cells in the column contain, begin with, or end with a certain string of text. The Advanced Filter option assumes that you're searching for a string of text, so the "*text*" format is not necessary
  • For our example, choose "contains" from the criteria choices in each of the two criteria boxes on the left (Windows). In the blank text boxes enter C/O in one and c/o if the other. Join these two criteria with "and". (These joining options and criteria choices also appear in Mac as text is entered).

Screen Clipping showing the Custom AutoFilter dialog box, which appears when the Custom Filter option is chosen. Drop-down boxes towards the left of the dialog box allow the user to choose specific parameters for their custom filter.

The result of this advanced filtering is that we have filtered for all the rows containing "C/O". These rows are now ready to be cleaned.

Note: The above is a fairly simple example of applying Advanced Filter criteria. For instructions on applying more complicated criteria, see this example from Microsoft Support.

Building on work from the COUNTIF and Advanced Filter exercises, we can now use Find & Replace to replace the shorthand "C/O" with "complaining of" in the Narrative column. It is best to complete the Advanced Filter exercise first and leave the filter on; this has selected for us all the rows which use the "C/O" shorthand

Find & Replace instructions:

  • From the Home tab on the Ribbon, choose Find & Select (this feature is at the end of the Ribbon, with a magnifying glass icon)
  • Find a string (e.g., C/O) and replace it with the desired text (complaining of). Click Replace.
  • Using the Filter arrow still at the top of the Narrative_1 column, go back to Text Filters, then Custom Filter (or in Mac, Filters - Contains - C/O). The filter criteria you entered before should still be present. Click OK, and this filter will run again, taking out the rows that now meet the criteria due to the Find & Replace change
  • Repeat the above steps as many times as necessary to replace "C/O" in each row where it appears with "complaining of"
  • Alternatively, rather than replacing the "C/O" entries one at a time, click Replace All if you are sure that you want to replace all instances of the text string "C/O" anywhere it appears in any fields in the Narrative column. (This can be done if "C/O" was never used to denote anything other than "complaining of")

Screen image illustrating the use of the Find and Replace tool in Excel