Whether you're depositing data for publishing or grant requirements, or just want to make the output of your research available to your colleagues, depositing data in a repository or digital archive will ensure your research will be discoverable and usable for a long time. This guide will review some factors to keep in mind when preparing the data for deposit in the Digital Repository Service, or elsewhere.
The DRS can accept most datasets under a 1TB, and will accept any file type, which makes it a suitable home for many research outputs. But, the DRS may not be the best place for you to store the data! Other professionals in your discipline or subject area may have a preferred repository that is better suited to the data produced by your research. Check out https://fairsharing.org/ to find data repositories in your discipline, or contact your subject librarian for help finding the right repository for you.
Consider a few factors when selecting a repository for deposit:
You may be tempted to gather up every file used in the collecting, recording, and processing of the data, but how useful will that be for another researcher accessing the data for their own research? Or, how useful will that be for you five years from now? It's important to carefully think about the data that needs to be archived and how it should be packaged.
Here are a few things to consider when selecting data and data packages:
Here are a few things to avoid when selecting the data:
File formats
When preparing the data for archiving or sharing, it's a good practice to use file formats that are open and sustainable. This may not be possible for every file type, but using a recommended format will ensure your file will remain usable for a long time. The Library of Congress keeps a list of recommended file formats for many types of files here: https://www.loc.gov/preservation/resources/rfs/
Regardless of whether or not you use a recommended file format, always check the file type requirements for the deposit system to make sure your chosen file format is allowed.
File size
Many deposit systems set a size limit for file uploads, usually ranging from 1GB to 5GB. These limits are fixed in some systems, but others may accept files that exceed the limit through a mediated process. File size limits for the deposit system may influence how you package your files, or what system you choose for deposit.
DRS users may deposit files up to 1GB. Library staff are available to assist with depositing files larger than 1GB. There are no size limits for individual file downloads, but you should take into consideration how easy or difficult it may be for consumers of the data to download the files. For this reason, we recommend grouping data in packages no larger than 15GB each when depositing in the DRS.
File names
Using a clear and consistent method for naming your files will help ensure your files can be accessed easily. There are a few general rules to follow when creating a system for naming files:
Stanford Libraries and Princeton University Library have put together very useful guides with detailed information about best practices for naming files:
File naming and structure, Princeton University Library
Data best practices and case studies, Name files, Stanford Libraries
File organization
Like file names, the files themselves should be organized in a clear, consistent manner. Consider:
You should consider packaging the data in naturally occurring groupings for the data or research, but also keep in mind how other researchers may expect to access or use the data. If possible, compress the data files into ZIP or TAR packages. Compressing your files will reduce the upload and download sizes, reduce the number of files you will have to deposit and the number of files to be downloaded, and will preserve the desired organization of the files. Another good reason is file names. Some systems, including the DRS, will change the file name for every file deposited to avoid collisions between similarly named files in the storage system and when packaging files for bulk downloading (for example, when project A uses data.csv and project B also uses data.csv, storing these files together will cause issues). If the file names are important to accessing the data, compressing data files into a zip package will ensure the original file names are retained.
Documentation
Documentation is crucial to the reusability and reproducibility of data. If you have documentation that can be shared, like a codebook, those guiding documents should be deposited alongside the data. If you don't have shareable documentation, consider creating a README file that describes and provides context for the data. A README might include:
See the box on the right of this page for more information on README files.
Metadata
Expect to provide information about each file you deposit, including:
Only a title and one keyword are required for deposit into the DRS, but names, dates, and descriptions are highly recommended. Supplying that information will help your data be discovered, and it will help users decide whether or not the data is useful to them.
Data management plans often ask researchers to include information about the system that will be used to store and share data at the end of a project. The Northeastern University Library has provided the following text describing the Digital Repository Service for researchers to use in their proposals:
Suggested text for Northeastern's Digital Repository Service:
Northeastern University (NU) Library’s Digital Repository Service (NU-DRS, https://repository.library.northeastern.edu/) is a long-term digital asset management system developed and maintained by the Northeastern University Library. The NU-DRS provides the following services: 1) deposition of all file types, many of which are natively supported by the system, 2) provisioning and maintenance of a permanent identifier and URL for both the project space and individual data files via the library’s handle server, 3) discovery, access and editorial control using the Shibboleth single sign-on identity management framework, and 4) data storage and backup services, provided jointly by the library and university information systems. The Research Data Management librarian and Digital Production Services staff will work with project staff to determine appropriate metadata models and deposition schedules. Project staff will ensure all relevant files are submitted either to available community resources or to Northeastern's repository system.
The National Institutes of Health (NIH) is implementing a new policy to further their efforts to improve the reproducibility and reliability of NIH-funded research through effective and efficient data management and data sharing practices. This policy, NOT-OD-21-013, is effective January 25, 2023, and it applies to “all research, funded or conducted in whole or in part by NIH, that results in the generation of scientific data.” For more information about how this may impact your plan for preparing and publishing your data, please visit the Data Management for Research subject guide or the post from the library's blog summarizing the new policies.
Resources
Examples of data in the DRS
README files describe your data, and help facilitate accurate understanding and reuse of your work.
Getting started
Recommended README file content, in brief
For more detail, please see this README file template.
Additional resources
Northeastern University Library offers a variety of services to help users get started with the DRS, including documentation, consultations, trainings, workshops, and general guidance.
Use the DRS contact form or contact Library-Repository-Team[@]northeastern.edu to start a new project, ask questions about files or features, get help with an issue, or to set up a training or general consultation.