Skip to Main Content

Essential Spreadsheet Data Cleaning with OpenRefine

This guide accompanies the Galter Health Sciences Library class of the same name, or can be used on its own to learn a few essential data cleaning functions of the open source application OpenRefine.

Facet on Multiple Columns

For most data cleaning tasks, faceting on one column at a time works well in order to isolate discrete fields of data. However faceting by multiple columns is possible in OpenRefine, and this can include facets on different types of values, such as Text and Numeric. The image below shows simultaneous facets on the number-based Sex column and the Text-based Treatment_Date and Other_Diagnosis columns from the sample dataset.

Screenshot showing Facets on three different columns in OpenRefine

By “including” or narrowing in on certain entries within a facet, the other facets will update (narrow down) based on the first entry you selected. This essentially takes a slice of your data, one variable at a time.