Skip to Main Content

Data Cleaning with Excel

Through this guide you will learn tricks and tips to better clean and organize tabular data using Microsoft Excel.

Histograms for Quick Visualizations

If you'd like to get a sense of the distribution of your data with a simple histogram, you can do so through the Data Analysis Toolpak. To begin, choose a column of numeric data that you'd like to graph, and create bins in regular increments that cover the span of the data. In the example, I worked with a dataset that listed patient weights in kg, with no measurement exceeding 100. I created a column with ten bins, each in an increment of ten and ending up in '100.'

On the top ribbon, go to Data Tab, then Data Analysis, then Histogram:

Screen Clipping showing how to access the Histogram option under the Data Analysis options, which is reached from the Data Analysis icon on the Data Tab of the Excel ribbon.

On the next screen, enter both the column range and the bin range, using a $ before each column letter and row number. This makes the formula work on the entire data range selected, including anything hidden. As an alternative to typing your ranges manually, you can place the cursor in the blank box for each range and then select the corresponding cells for the range on the worksheet. Be careful of unintended auto-fills.

Screen Clipping showing the Histogram parameters available when users choose to analyze an Excel column using a Histogram.

Your resulting histogram should bring to light outliers in the data or surprises, which may be a sign of data-entry errors.

Screen Clipping showing the histogram bar chart output of the weights of patients from the dataset, in ten pound bins from zero to one hundred kilograms.