Skip to Main Content

Cleaning Spreadsheet Data with OpenRefine

This guide accompanies the Galter Health Sciences Library class of the same name, or can be used on its own to learn the basic functions of OpenRefine. The class and guide are adapted from Library Carpentry OpenRefine, Copyright 2016-2019

Transforming Dates

In the project setup steps recommended in this guide for the sample dataset, it was recommended to direct OpenRefine not to automatically parse out dates and numbers. Therefore any columns that appear to contain dates or numbers will consider them as text. For the Treatment_Date field, we can do a simple transformation to change it from text format to date.

Common Transform: To Date

  • From the drop down arrow from the Treatment Date column, choose Edit cells, Common Transforms, then To date

Screenshot showing the menu path to date transformations in OpenRefine

  • When this transformation is applied, the date formatted as text changes to an ISO-formatted datestamp, complete with timestamp:

Screenshot of an ISO formatted date and timestamp in OpenRefine

Several different GREL formulas for additional transformations that can be done starting with data in the date format can be found here. In the example below we will transform this date back to a text format in a different style, while still maintaining the original date column.

Transform Date to Text in New Style

  • From the Date column drop down arrow, select Edit column, then Add column based on this column

Screenshot showing the menu path to add a column based on another column

  • Give the new column the name Formatted_Date
  • In the Expression box type: value.toString("dd MMMM yyyy")  Make sure the quotation marks are straight up-and-down, not curled.

Screenshot showing a date transformation GREL forumla in OpenRefine

  • Click OK. This will change the date back to text format, in a different style.