Skip to Main Content

Essential Spreadsheet Data Cleaning with OpenRefine

This guide accompanies the Galter Health Sciences Library class of the same name, or can be used on its own to learn a few essential data cleaning functions of the open source application OpenRefine.

Scatterplot Facets

The scatterplot facet is a quick and easy way to get a visual representation of how the values in two numeric columns relate to each other. We can try this facet on the Age and Weight columns of the sample dataset. First, from the “All” drop down arrow in the far left, choose “Edit Columns – Reorder/Remove Columns.” Weight appears at the bottom of this list; drag it close to the top, so it appears just after Age. Then from the drop down arrows on both the Age and Weight columns, choose Edit cells – Common transforms – To number.

Screenshot showing the menu path to convert a column to number format

From the Age column drop down arrow, choose Facet – Scatterplot facet. OpenRefine will automatically generate the scatterplot graph using the number-based column to the right of the column we selected, which is Weight.


Screenshot showing the menu path to create a Scatterplot facet in OpenRefine

Screenshot showing the results of a Scatterplot facet in OpenRefine