The previous exercise, Faceting, showed how to isolate individual data values for the purpose of examining and cleaning them. Though an extremely useful feature of OpenRefine, it can be time consuming to clean individual data points this way in large datasets. This is where OpenRefine's Clustering feature can be extremely helpful.
To begin, choose the Other_Diagnosis column from the sample dataset. From the drop down arrow at the top choose Edit Cells, then Cluster and edit.
In the results screen, OpenRefine has automatically brought together the terms that seem most related into clusters. Here you can quickly review the groupings and select the term that should be used according to your data standards or conventions. As you make your selections click the Merge checkbox next to them, and when you've made all your selections click 'Merge Selected & Re-cluster' at the bottom of the screen to search for any additional matches.