When working with datasets it is important to cite them as you would a scholarly article. Citation ensures increased research impact for the dataset owners and proper attribution for the dataset creators. As with journal articles, there is no one accepted way to create a dataset citation.
Sources such as DataCite (an international registry of datasets) and the Australian National Data Service agree that the following elements are the minimum for any citation of data:
Organizations like DataCite and CrossRef assign and register DOIs, or Digital Object Identifiers, to digital objects deposited in repositories such as Prism. DOIs stay with an object, like a dataset, throughout the course of its online existence, helping to ensure the object’s provenance and persistence. If the dataset you are citing does not have a DOI, check with the data owners to recommend that they deposit their data to a trusted repository and thereby receive a DOI.
Version information is also important to include when citing datasets, since datasets can change over time. Citing the version that you accessed helps to avoid mistakes or misunderstandings related to reproducibility of research.
Many of the leading DOI registries, including DataCite and CrossRef, have teamed to offer an online tool that extracts data automatically from a DOI link to create a dataset citation. This tool is the DOI Citation Formatter. Citations can be created in a wide variety of citation styles and generally follow the standard of: “Creator(s); Publication Year; Title; Publisher; DOI”. To use the formatter, enter the DOI beginning from the prefix (“10.zzzzz”)
The links provided below offer additional guidance and assistance on citing datasets, including style examples:
Keep in mind that some of the organizations and repositories that serve as data sources have requirements for the citation of their data. ICPSR, the Roper Center, and Dataverse.org all have specific recommendations for citing data from their repositories.