Documenting Data


“Data without context are inert, but data within contexts become information, knowledge.”
From: Changing the Subject: Art and Attention in the Internet Age By Sven Birkerts

Proper documentation provides the context that your data needs to persist through time, to integrate into new systems and to give you credit for your contributions in the form of data citations. Where possible, you should contribute the following information along with your dataset:

  1. Complete Metadata - Metadata aids Discovery and Reuse. Scholar@UC requires metadata such as title of work, creator name, data submitted and description. All text is searchable. You should write a detailed description to increase discoverability of your content. Also provided are additional metadata fields under the Show Additional Description link. Here you can add in additional fields such as subject terms for search enhancement. If you are following a suggested schema for your discipline, indicate that in the description or in the next document – the README file.

  2. README.txt File - This is a text document that provides relevant information such as purpose of the project and the organizational structure or relationship of the files. It explains terms that are unique to the dataset, keywords, omissions and errors. If you are using a file naming convention, you can explain it in the readme file. It is also the place to put additional details that were not included in the metadata, such as additional information about external storage of the data, metadata schema followed and researcher contact information.

    A good example can be found in the Data ab Initio blog post README.txt by Kristin Briney

  3. Data Dictionary/Codebook - This document explains all the variables and abbreviations associated with the dataset.

    Here are links to example data dictionaries from The University of Texas at Austin's Population Research Center (data dictionary examples) and IRSA (data dictionary example) provide two quick examples of data dictionaries.

  4. Methodology/Protocol - This document details the steps taken to collect the data. If you submitted modified data instead of a raw data set, you can explain those steps in this document.

Additional Resources on Documentation for Data:

  1. ICPSR's Guide to Social Science Data Preparation and Archiving
  2. DataOne Best Practices
  3. Mantra Research Data Management Training
  4. DataDryad FAQ
  5. Digital Curation Centre How-to Guides & Checklists

Copyright and Data

Copyright is intended to protect creators’ rights concerning their creative works such as written works, images, video. In most cases data is considered fact and not protected by copyright. For guidance on whether your data falls under copyright, you can consult this blog post and flyer by Kristen Briney.