In the world of computer programing, a file that comes with a piece of software or code and contains critical information about its origins and how to use it is called a README file. The README’s all-caps title emphasizes how urgent it is that you read it carefully before you use the code or software. We can borrow this documentation convention when sharing our data by bundling a README file with data to ensure that new users have all the information they need to responsibly and effectively use the data. README files are usually plain text files frequently stored in a top-level directory or folder with the rest of your data files so they can be easily found. (For more information on different types of data documentation files, see the Additional Resources section at the end of this primer.)
We can think of our DATA_README file as a data reuse plan. A DATA_README is the highest level of metadata for a dataset -- it should be written for humans and contain all the information a person might need to understand how the data was collected, processed, analyzed, licensed, and presented. A useful DATA_README will answer five questions: what, who, when, where, and how. A great DATA_README is iterative. You can create the file as you start data collection and add to it as your project progresses. Click here to walk through an exercise on creating a great DATA_README.
A DATA_README is intended to be read by humans. Another best practice is to create machine-readable metadata for your dataset. Machine readable data allows your data to be read, understood, and placed in its correct context by computers. Creating a machine-readable companion file will help make your data indexable, searchable, and discoverable. Detailed instructions on how to do this outside the scope of this primer, but if you're interested, you can read more about machine-readable metadata at Project Open Data.