The ALCOA standard has been used by the Food and Drug Administration since the 1990s to define and enforce data integrity with regard to drug manufacturing practice. This acronym outlines five aspects of well-managed data that can help to ensure its authenticity and integrity.
The FAIR principles represent a consensus among data and information security professionals about best practices to make data freely and safely available. Data that is stored, curated, and shared according to the FAIR principles is:
Findable: described richly with metadata and have a unique identifier (often a URI)
Accessible: retrievable by their identifier through free and open Internet protocols that allow authentication and authorization where necessary
Interoperable: described with commonly used metadata standards and controlled vocabularies
Re-usable: have been assigned a license assuring their re-use and have a clear provenance
Reference:
Wilkinson, Mark D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Nature (online, comment), March 15, 2016. https://doi.org/10.1038/sdata.2016.18
Metadata is commonly defined as "data about data," and can also be thought of as tags added to existing resources in order to describe them. When managing datasets that you may eventually may want to make findable either in institutional repositories, or repositories to which deposits are required by funders, you may want to keep in mind basic descriptors for the datasets. Datasets can be described either minimally or in great detail, using either free-text or controlled terms.
The following metadata elements, at a minimum, should be appended to any dataset stored in a repository:
To meet the Attributable requirement of ALCOA-compliant data, each action taken on data should be clearly attributable to one actor. Clear attribution to a particular person is more easily made if, when describing the dataset, tags for the PI and other investigators are linked to a service that verifies researchers' identifies, such as ORCiD. To create a Northwestern-linked ORCiD account, visit: https://orcid.it.northwestern.edu/
To increase datasets' findability and interoperability, consider including among the keywords terms from controlled vocabularies such as MeSH or the Library of Congress Subject Headings.
To ensure that the terms of dataset re-use are clear with any shared datasets, assign a license to datasets deposited or described in repositories. A license for your dataset is different than a license or embargo period that a publisher may enforce for a journal submission (check the website SHERPA/RoMEO to check on journal licensing requirements), and it does not refer to copyrighting raw data, which is not commonly considered a 'creative work.' However, your processed data may have required unique or creative input that you may license upon sharing, in order to place clear restrictions on what re-users can and cannot do with the dataset. Data.world, a for-profit company that works to reduce barriers to dataset access, has published a helpful list of "Common license types for datasets." A flowchart created by Creative Commons Australia can help you to select the most appropriate Creative Commons license for your data works.