Skip to Main Content

Data Cleaning with Excel

Through this guide you will learn tricks and tips to better clean and organize tabular data using Microsoft Excel.

Splitting Multi-Valued Cells

A 'value' in a spreadsheet file is simply the contents of any given cell. A frequent need in data cleaning is to take a group of cells in which multiple values, or types of content, have been added, and break them out into new columns for each type of data. We'll demonstrate how to do this using a date field in the NEISS small subset spreadsheet. A full date consists of three combined values, day, month, and year, which could all be broken out.

Screen Clipping showing the end result of the cell-splitting operation described on this page. After the operation, new cells created to hold a date field's split-up data now hold the day, month, and year data in separate columns.

Step 1: Convert a Date-formatted Cell to Text

  • Treatment Date is an existing, date-formatted column in the NEISS small subset dataset. Next to it create a new column called 'Reformatted Date.' In this example, Treatment Date is column B and Reformatted Date is column C.
  • In the second cell of this new column (directly under the header) type the formula: =TEXT(B2, "dd mm yyyy")
  • A new, text-formatted version of the date has populated in cell C2. Notice that month and day have changed places, so now day is first.
  • Double-click the plus sign at the bottom of cell C2 to copy this formula down the entire spreadsheet
  • If you click into any of the cells in Column C, you'll see in the formula bar that Excel still thinks of the values of column C's cells as formulas. To replace the formulas with the values themselves, click on C2, then do Ctrl (Cmd in Mac) + Shift + down arrow. This selects all the values in column C. Click Ctrl (or Cmd) + c to copy all the values, then right click on C2, scroll down to Paste Special, and choose the Values option.
  • Scroll back over some of column C's cells, and check the formula bar. They should now all appear as values, not formulas.

An important step in prepping the dates for splitting is to copy and paste the values in the Reformatted Date column over themselves, thus overriding the formula. This screen clipping illustrates how to apply "Paste Special" in Excel.

The Text to Columns function will take text in one column and split it up into multiple columns. Using the Reformatted Date column from the previous exercise, follow the steps listed here to put each data element in its own column.

Step 2: Text to Columns instructions:

  • Create three new columns next to Reformatted Date and title them Day, Month and Year (columns D - F)
  • Click on C2, then do Ctrl (Cmd in Mac) + Shift + down arrow to select all the text date values
  • Click on the Data Tab, then Text to Columns (near the Center of the Ribbon)
  • The dialog box that pops up is a wizard that will walk you through some selections for creating your new columns. In the first screen, choose Delimited, then check the box by 'Space' (uncheck the others). Our values are delimited, or separated, by a space
  • For each of the three new columns you can see in the Data Preview window, change the column data format from General to Text
  • In Destination, click into the white box, then into the first field of the destination columns where this newly separated data will populate. This should be $D$2, the first field in the Day column. Then type : , delete anything that auto-fills, then type $F$4531, which is the last cell in the Year column. 
  • Click Finish
  • The three new columns you created in your spreadsheet should now have filled with the Day, Month, and Year respectively from the original dates.
  • As a final step, if you'd like to work with your newly separated dates as numbers, highlight cells D, E, and F from cell 2 to the bottom of the spreadsheet (highlight the first three rows, then do Ctrl (Cmd in Mac) + Shift + down arrow), click on the arrow to the right of the small diamond alert message that comes up, and from the options choose 'Convert to Number.'

Screen clipping showing the third step in the Convert Text to Columns Wizard. The circles highlight the importance of choosing the column format and location of appropriate destination columns for the newly split data.