Skip to Main Content

Cleaning Spreadsheet Data with OpenRefine

This guide accompanies the Galter Health Sciences Library class of the same name, or can be used on its own to learn the basic functions of OpenRefine. The class and guide are adapted from Library Carpentry OpenRefine, Copyright 2016-2019

Undo/Redo and Exporting a Project

The Undo/Redo tab next to the Facet tab in OpenRefine offers additional powerful features of this tool. By clicking Undo/Redo, you can see a list of every change and transformation made to the dataset since the project was created. If there is a step in the data wrangling process that should not have been done, you can click to the step immediately above it, and it will be erased. Keep in mind that everything below the step you click will be erased.

Just below Undo/Redo there are options to Extract and Apply. By clicking Extract, you can see JSON code representing every step that was taken as you cleaned your spreadsheet. This code can be copied and saved in a plain text editor (like Notepad) and applied to additional OpenRefine projects by using the Apply function and pasting it in. This is very helpful if you have a collection of similar data files that all require the same cleaning steps.

When you have finished making all data changes in your project, look for the Export button in the upper right-hand corner of OpenRefine’s main screen. There are multiple data options for export, including many comma-separated value options. Remember that OpenRefine will export the new, cleaned data file, which will be different from the original file you uploaded. The automatically-generated file name will be similar, so be sure to change it to reflect that this is a new, cleaned data file.