Whether you're depositing data for publishing or grant requirements, or just want to make the output of your research available to your colleagues, depositing data in a repository or digital archive will ensure your research will be discoverable and usable for a long time. This guide will review some factors to keep in mind when preparing the data for deposit in the Digital Repository Service, or elsewhere.
The DRS can accept most datasets under a 1TB, and will accept any file type, which makes it a suitable home for many research outputs. But, the DRS may not be the best place for you to store the data! Other professionals in your discipline or subject area may have a preferred repository that is better suited to the data produced by your research. Check out https://fairsharing.org/ to find data repositories in your discipline, or contact your subject librarian for help finding the right repository for you.
Consider a few factors when selecting a repository for deposit:
You may be tempted to gather up every file used in the collecting, recording, and processing of the data, but how useful will that be for another researcher accessing the data for their own research? Or, how useful will that be for you five years from now? It's important to carefully think about the data that needs to be archived and how it should be packaged.
Here are a few things to consider when selecting data and data packages:
Here are a few things to avoid when selecting the data:
When preparing the data for archiving or sharing, it's a good practice to use file formats that are open and sustainable. This may not be possible for every file type, but using a recommended format will ensure your file will remain usable for a long time. The Library of Congress keeps a list of recommended file formats for many types of files here: https://www.loc.gov/preservation/resources/rfs/
Regardless of whether or not you use a recommended file format, always check the file type requirements for the deposit system to make sure your chosen file format is allowed.
Many deposit systems set a size limit for file uploads, usually ranging from 1GB to 5GB. These limits are fixed in some systems, but others may accept files that exceed the limit through a mediated process. File size limits for the deposit system may influence how you package your files, or what system you choose for deposit.
DRS users may deposit files up to 1GB. Library staff are available to assist with depositing files larger than 1GB. There are no size limits for individual file downloads, but you should take into consideration how easy or difficult it may be for consumers of the data to download the files. For this reason, we recommend grouping data in packages no larger than 15GB each when depositing in the DRS.
Using a clear and consistent method for naming your files will help ensure your files can be accessed easily. There are a few general rules to follow when creating a system for naming files:
Stanford Libraries and Princeton University Library have put together very useful guides with detailed information about best practices for naming files:
Like file names, the files themselves should be organized in a clear, consistent manner. Consider:
You should consider packaging the data in naturally occurring groupings for the data or research, but also keep in mind how other researchers may expect to access or use the data. If possible, compress the data files into ZIP or TAR packages. Compressing your files will reduce the upload and download sizes, reduce the number of files you will have to deposit and the number of files to be downloaded, and will preserve the desired organization of the files. Another good reason is file names. Some systems, including the DRS, will change the file name for every file deposited to avoid collisions between similarly named files in the storage system and when packaging files for bulk downloading (for example, when project A uses data.csv and project B also uses data.csv, storing these files together will cause issues). If the file names are important to accessing the data, compressing data files into a zip package will ensure the original file names are retained.
Documentation is crucial to the reusability and reproducibility of data. If you have documentation that can be shared, like a codebook, those guiding documents should be deposited alongside the data. If you don't have shareable documentation, consider creating a README file that describes and provides context for the data. A README might include:
See the box on the right of this page for more information on README files.
Expect to provide information about each file you deposit, including:
Only a title and one keyword are required for deposit into the DRS, but names, dates, and descriptions are highly recommended. Supplying that information will help your data be discovered, and it will help users decide whether or not the data is useful to them.
Examples of data in the DRS
Northeastern University Library offers a variety of services to help users get started with the DRS, including documentation, consultations, trainings, workshops, and general guidance.
Use the DRS contact form or contact Library-Repository-Team[@]northeastern.edu to start a new project, ask questions about files or features, get help with an issue, or to set up a training or general consultation.
README files describe your data, and help facilitate accurate understanding and reuse of your work.
Recommended README file content, in brief
For more detail, please see this README file template.