The example scenario in this section of the guide is drawn from the class 10 Simple Rules for Data Cleaning with Excel, and utilizes a small subset of the National Electronic Injury Surveillance System dataset from 2022. This dataset compiles information from patient visits for emergency treatment for injuries involving commercial products. In the dataset's "Narrative" column, the patient's experience is recorded. In this column, the shorthand "C/O" is used in place of the words "complaining of". Let's imagine for this scenario that we would like to replace each instance of "C/O" with "complaining of" for greater clarity.
As a first step in this multi-part cleaning task, we'll find out how many rows in the spreadsheet (the small sample version used in the class) use the shorthand "C/O".
In our COUNTIF exercise on a small subset of the NEISS data, the total numbers of rows that use the "C/O" shorthand in the Narrative field is 634. In the next exercise, we'll use advanced filters to filter for only the rows that use this shorthand, so we can carry out data cleaning just on these rows.
References:
1. US Consumer Product Safety Commission. NEISS Highlights, Data and Query Builder. Available at: https://www.cpsc.gov/cgibin/NEISSQuery/home.aspx
From what we learned in the COUNTIF exercise, we know we must filter for the rows containing "C/O" in the Narrative column of our dataset in order to clean/update this shorthand to read "complaining of". For this task we can use advanced filters.
The result of this advanced filtering is that we have filtered for all the rows containing "C/O". These rows are now ready to be cleaned.
Note: The above is a fairly simple example of applying Advanced Filter criteria. For instructions on applying more complicated criteria, see this example from Microsoft Support.
Building on work from the COUNTIF and Advanced Filter exercises, we can now use Find & Replace to replace the shorthand "C/O" with "complaining of" in the Narrative column. It is best to complete the Advanced Filter exercise first and leave the filter on; this has selected for us all the rows which use the "C/O" shorthand.