GalterGuides: Cleaning Spreadsheet Data with OpenRefine: Faceting Data

Faceting Data in OpenRefine

Faceting is one of the most powerful features of OpenRefine. Faceting counts the unique values in every column and displays them in a box on the left side of the screen. This helps users quickly find inconsistencies in the data.In this example we'll facet the Other_Diagnosis column from the sample dataset. It can help to first go to the All column, then choose Edit Cells - Reorder/remove columns, and drag the Other_Diagnosis column towards the top. This brings the column farther left on the screen.

Faceting Steps

From the Other_Diagnosis column's drown down arrow, choose Facet, then Text facet

Notice that the column name Other_Diagnosis and all its values have populated in the Facet area on the left side of the screen.

The Faceted view of this data offers various options for data cleaning. Firstly, the terms can be arranged either alphabetically or in the order of how frequently they occur in the column.
Clicking on any term will select it and allow you to work with only the rows in which that term appears in the Other_Diagnosis column.
After selecting terms, you can edit them either in their individual cells, or through the small blue edit link in the faceting area. If you fix a mis-spelling, be aware that the mis-spelled instances of terms will then disappear, as they join up with other instances that were spelled correctly.

Cleaning Spreadsheet Data with OpenRefine

Faceting Data in OpenRefine

Faceting Steps

Northwestern University Feinberg School of Medicine

Northwestern University
Feinberg School of Medicine