Skip to Main Content

Essential Spreadsheet Data Cleaning with OpenRefine

This guide accompanies the Galter Health Sciences Library class of the same name, or can be used on its own to learn a few essential data cleaning functions of the open source application OpenRefine.

Extensions in OpenRefine

Installing extensions is an important first step to taking advantage of some of the online services with which OpenRefine can interact. A list of currently supported extensions is available on the OpenRefine extensions page. In the exercise outlined over the next three sections, we'll install the RDF extension. RDF stands for “Resource Description Framework,” which is a framework for representing information on the Internet. With this extension in place, we can query APIs and SPARQL endpoints, which is a way of bringing information from outside websites into our OpenRefine projects.