GalterGuides: Essential Spreadsheet Data Cleaning with OpenRefine: Facet by Blank

Facet by Blank

After blanking down the duplicate CPSC_Case_Number values from the sample dataset, even if we take off the Duplicates facet, we know that within this table, the corresponding 8 records still have blank values for CPSC_Case_Number. Using OpenRefine's Facet by Blank feature, we can quickly identify the records that have blank values in the CPSC_Case_Number field and delete them.

Facet by Blank Steps

From the CPSC_Case_Number column drop-down choose Facet – Customized facets – Facet by blank (null or empty string)

In the resulting facet window on the left, you’ll see that there are 8 records that are “true” for having a blank value in the column CPSC_Case_Number. Choose “true” in this facet to isolate the 8 rows.
From the “All” column drop-down towards the left, choose Edit rows – Remove matching rows.

With the Facet by Blank still in place, you can see that OpenRefine just updated to show that now there are 0 rows that meet the criteria for being blank in the field CPSC_Case_Number. These rows have been removed, and our total number of rows has changed from 1511 to 1503.

Essential Spreadsheet Data Cleaning with OpenRefine

Facet by Blank

Facet by Blank Steps

Northwestern University Feinberg School of Medicine

Northwestern University
Feinberg School of Medicine