Skip to Main Content

Data Organization and Documentation

A guide for improving the organization, documentation, and long-term preservation of digital research data.

README files

Many Federal funders now require data management plans and data sharing as a stipulation of funding. When a study is completed, deposit of datasets in Federal or other publicly-accessible databases may be mandatory. A best practice when depositing data, in addition to including a codebook that defines all variables, is the inclusion of a README file. A plain-text README file provides all the explanation that someone new to the study would need in order to understand and interpret the study's shared data files. Key things to record about shared datasets include methodologies, dates and places of data collection, modifications applied to data, definitions of acronyms, a description of the interrelationship of files, PI identification, and funder identification.

See the links below for guidance and example templates of README files for research data.

Standard Operating Procedures

If you are implementing data management practices for the first time, or just looking to bring unruly files under control, one way to start managing data is by using Standard Operating Procedures. Standard Operating Procedures, or SOPs, simply stated are comprehensive outlines of daily operational practices which provide clear descriptions of functions to be completed and who is responsible for completing them. SOPs can be adopted in any endeavor, and are especially useful for research data management.

SOP templates and samples are available from various sources, and can easily be adapted for a research team's purposes. At a minimum an SOP should state its Purpose, Scope, Responsibility (person or people who will perform the actions outlined in the SOP), a list of the Procedures involved, and a list of any necessary Definitions. Last updated date is also helpful to include, as SOPs should be reviewed and updated periodically to ensure that all members of the research team are aware of and conforming to the most recent procedures.

Sample SOPs can be found at:

For step-by-step guidance to creating Standard Operating Procedures, see WikiHow's How to Write a Standard Operating Procedure website.

Data Inventories

If your research project is underway, and you want to make your data management practices align better with what was outlined in your grant application’s Data Management Plan, or if you want to establish a new plan or more comprehensive data documentation, one way to start is with a data inventory. Appoint a data manager as the official keeper of documentation, and record the answers to the what, who, where, how, and when questions surrounding data collection. Below is a list of points to consider as you document:

What is the Data?

  • Title
  • Description
  • Number, format, and size of files
  • Rate of file growth
  • Versions of files

Who created, accesses, and owns the data?

  • PI/Study lead/Contact
  • University, Department, Research Core, Research Team, Consortium or Group
  • Funder
  • Authentication levels/Restrictions
  • Outside requesters

Where is the data stored?

  • Institutional servers or services (like Box)
  • Filesharing services
  • Personal account drives
  • Individuals' computers
  • Backups

How is data being created and manipulated?

  • Collection techniques and instruments
  • File naming
  • Workflows
  • Metadata
  • Analysis tools and software

When is data being collected, and what are the plans for its future?

  • Date created/modified
  • Attribution: who did what, when?
  • Data retention
  • File format migration
  • Long-term storage