Skip to Main Content

Data Cleaning with Excel

Through this guide you will learn tricks and tips to better clean and organize tabular data using Microsoft Excel.

Leverage Built-ins and Add-ons

Excel features many keyboard shortcuts that will help you save time while navigating spreadsheets. These include:

Moving around quickly in a worksheet

  • Ctrl (Cmd for Mac) + right arrow: move right
  • Ctrl (Cmd for Mac) + left arrow: move left
  • Ctrl (Cmd for Mac) + down arrow: move to bottom of a column
  • Ctrl (Cmd for Mac) + up arrow: move to top of a column
  • Ctrl (Cmd for Mac) + f (Find): Brings up a dialog box to type the string of characters you'd like to find

Screen clipping illustrating the results of finding the word "asthma" using Control F to find character strings in Excel

Copying/Pasting

  • Ctrl (Cmd for Mac) + c: Copy
  • Ctrl (Cmd for Mac) + v: Paste
  • Ctrl (Cmd for Mac) + Alt + v: Paste special. Choose specific conditions for the paste operation.
  • Ctrl (Cmd for Mac) + z: Undo

Quick Access to Menus

  • Ctrl (Cmd for Mac) + 1: Bring up the formatting menu
  • Alt + Shift + Tab: Brings up the quick menu to the other programs you have open

Much like the tabs of this guide, your options in accessing the features of Excel are arranged in tabs along the top ribbon. Choose the data tab and look at the section all the way on the right to see if you have the Data Analysis Toolpak installed.

Screenshot of the Data Analysis button on the Data Tab of the Excel ribbon

If you don't see the Data Analysis link, go to the File tab, choose Options (the last choice in the left-hand margin), then choose Add-ins (the second-to-last choice in the left hand margin). Among the Add-ins you should see one called Analysis Toolpak or Analysis Toolpak VBA. Choose the latter if you use Visual Basic; otherwise, choose Analysis Toolpak.

Screen Clipping of the Excel Options screen, showing the Analysis Toolpak option under Add-ins. The Excel Options screen is accessible from the File Tab.

If you don't see the Analysis Toolpak as an Add-in option, you may need to contact FSMIT to help you install this feature. Additional guidance can be found on this Microsoft Support page.

Summary statistics as generated by Excel are useful for a quick preliminary analysis. In addition they can be used to check for errant data values.

Generate summary statistics for a number-formatted column

  • In the Data Tab, choose Data Analysis, then Descriptive Statistics
  • Enter the input range
  • Check the summary statistics box

Screen Clipping showing the Descriptive Statistics dialog box that is accessible through Excel via the Data Analysis button on the ribbon's Data Tab, under Descriptive Statistics

Your statistics will open in a new worksheet by default. You have the option of having them populate to a new workbook.

Screen Clipping showing the output of Summary Statistics taken over a single column of data.

A mean or median that is far different than expected could indicate errors in data entry in the selected column.

If you'd like to get a sense of the distribution of your data with a simple histogram, you can do so through the Data Analysis Toolpak. To begin, choose a column of numeric data that you'd like to graph, and create bins in regular increments that cover the span of the data. In the example, I worked with a dataset that listed patient weights in kg, with no measurement exceeding 100. I created a column with ten bins, each in an increment of ten and ending up in '100.'

On the top ribbon, go to Data Tab, then Data Analysis, then Histogram:

Screen Clipping showing how to access the Histogram option under the Data Analysis options, which is reached from the Data Analysis icon on the Data Tab of the Excel ribbon.

On the next screen, enter both the column range and the bin range, using a $ before each column letter and row number. This makes the formula work on the entire data range selected, including anything hidden. As an alternative to typing your ranges manually, you can place the cursor in the blank box for each range and then select the corresponding cells for the range on the worksheet. Be careful of unintended auto-fills.

Screen Clipping showing the Histogram parameters available when users choose to analyze an Excel column using a Histogram.

Your resulting histogram should bring to light outliers in the data or surprises, which may be a sign of data-entry errors.

Screen Clipping showing the histogram bar chart output of the weights of patients from the dataset, in ten pound bins from zero to one hundred kilograms.