Skip to Main Content

Essential Spreadsheet Data Cleaning with OpenRefine

This guide accompanies the Galter Health Sciences Library class of the same name, or can be used on its own to learn a few essential data cleaning functions of the open source application OpenRefine.

Removing Duplicate and Blank Rows: a Three Part Exercise

Duplicate and/or blank rows of data can cause errors when feeding spreadsheets into analysis programs. In the next three sections, we will show OpenRefine's tools and methods for identifying duplicate rows of data, blank rows, and faceting by blank in order to remove blank rows.