Skip to Main Content

Essential Spreadsheet Data Cleaning with OpenRefine

This guide accompanies the Galter Health Sciences Library class of the same name, or can be used on its own to learn a few essential data cleaning functions of the open source application OpenRefine.

Numeric Facets

Like the scatterplot facet, the numeric facet provides a quick visual representation of numeric values in a column, by counting the values' frequencies of occurrence and displaying them in a small histogram.

To begin, from the Age column drop down arrow in the sample dataset choose Facet – Numeric facet.

Screenshot showing the menu path to create a Numeric facet in OpenRefine

Much like the Timeline facet, this will result in a graph with sliders in the Facet/Filter area on the left of the screen. Use the sliders to narrow in on a subset of the dataset’s ages. However, there’s something strange in this example – the patients’ ages go up to 230! This happened because in the NEISS coding manual, coders were instructed to put a 2 in the first space of the age code whenever a patient’s age was expressed in months. (Ages from 0 – 4 weeks were grouped in the 1-month-old category). In the next exercise we'll transform these values to display properly as a fraction of a year.

Screenshot showing the results screen from a Numeric facet in OpenRefine