Transformations are a powerful aspect of OpenRefine. Some common transformations are pre-loaded into the system such as trimming leading and training whitespace, changing all text in a cell to Title case (capitalizing the first letter of all words), Upper case, or Lower case, or transforming the format of the values in cells to number, date, or text.
To demonstrate how the trimming whitespace transformation works, go to the Other_Diagnosis column in the sample dataset and, from the drop down arrow, choose Facet, then Text Facet. There should be 938 choices. Scroll down the list until LOW BACK PAIN is near the top,
Notice that there are two seemingly identical entries for LOW BACK PAIN. The only difference is that the one with only two occurrences has an extra space at the end. By running the Trim leading and trailing whitespace transformation outlined above, the extra space will automatically be deleted and the incorrect entries will be joined up with the others, to make a total of 120 instances of LOW BACK PAIN.
Although common transformations are built in to OpenRefine, you can write a wide variety of different transformations using GREL, or General Refine Expression Language. Transformations can range from simple to very complex. It is not necessary to memorize them as long as you have access to a sample of GREL 'recipe books.' Links can be found at the bottom of this page.
Below is the GREL recipe version of the Title case transformation (changing the text of a value so that each word starts with a capital letter)
In the Product_1 field of the sample dataset, we see product numbers with the prefix "NEISS" before every entry. To work with those numbers, we might want to strip away the prefix first.
As the preview box shows, this split transformation works by using the hyphen in the value as the separator on which to split, and strips away everything before that separator, plus the separator itself. It is not necessary to memorize such transformations - users can search for specific transformation recipes online, or look through some of the recipe collections linked below.
In the project setup steps recommended in this guide for the sample dataset, it was recommended to direct OpenRefine not to automatically parse out dates and numbers. Therefore any columns that appear to contain dates or numbers will consider them as text. For the Treatment_Date field, we can do a simple transformation to change it from text format to date.
Several different GREL formulas for additional transformations that can be done starting with data in the date format can be found here. In the example below we will transform this date back to a text format in a different style, while still maintaining the original date column.
If you would like to perform mathematical calculations or transformations directly in OpenRefine, these functions are available. A list of functions with their GREL recipes can be found at the official OpenRefine GitHub page's Math Functions list. A few of these functions and their formulas are outlined below.