A key part of the NIH's Data Management and Sharing Policy is the requirement to share data in a specific place. Repositories are online storage sites for data that will preserve the digital data in perpetuity and allow data access to other researchers and/or the public, based on the repository's specific privacy settings.
Document in your DMSP the repository where your data will be preserved and shared. Certain NIH ICOs have released helpful guidance documents and tools for selecting a repository, including the NIDDK's Repository Selection Consideration Tool. The related sample workflow below may be used to determine where to deposit data:
- The NIH ICO (Institute, Center, or Office) releasing the NOFO (Notice of Funding Opportunity) may have requirements on where data from their funded projects should be preserved and shared. If so, use the required repository.
- If there is not a required repository, but an NIH-approved discipline or data-specific repository exists, choose from that list.
- If not necessarily NIH-approved, but a domain, discipline, or data-specific repository exists and is vetted and commonly used in your field of study, it may be used.
- If none of the above apply, and the dataset is small (up to 2 GB in size), it may be included as supplementary material to accompany articles submitted to PubMed Central (see the PubMed Central - Policies - Supplementary Materials guidance: https://www.ncbi.nlm.nih.gov/pmc/about/guidelines/#suppm).
- If none of the above applies, select a generalist or institutional repository to deposit your data. The NIH provides specific guidance on Selecting a Repository, outlining the key characteristics that any repository should have in order to be appropriate for research data.
- This blog post by Elliott Smith on the website FAIRSharing.org outlines how the site can be used both to identify domain-specific repositories and similar metadata standards.
- Generalist repositories included in the NIH's Generalist Repository Ecosystem Initiative support uniform standards for data sharing.
- The Network of the National Library of Medicine hosts a finder tool for identifying NIH-Supported Data Sharing Resources
The NIH's Desirable Characteristics for Data Repositories, a section in their Selecting a Repository guidance, outlines characteristics that should be adhered to as closely as possible when selecting a repository for data. A brief summary of the desirable characteristics follows:
- Metadata and PIDs: descriptive metadata enables FAIRness (Findability, Accessibility, Interoperability, and Reusability). A unique identifier is assigned to at least the data record itself, and to other descriptors where possible (Creator, Organization, Subjects, etc.)
- Easy Access: Free access for records tagged as Open Access, reuse enabled through clear licenses, employs widely used, preferably non-proprietary formats. Guidance on how to use data is clear.
- Long-term sustainability: Repository has a long-term management plan and retention policy.
- Curation/Provenance: Repository either provides, or allows access to people who provide, curation or quality control assistance.
- Security/Integrity/Confidentiality: Repository’s levels of security match sensitivity of the data. User confidentiality assured.
Any repository chosen for storing human data should also meet the Additional Considerations for Repositories Storing Human Data (even if de-identified) as outlined in the Selecting a Repository guidance. These considerations include stricter controls than the above, and should be reviewed carefully.