File formats
When preparing the data for archiving or sharing, it's a good practice to use file formats that are open and sustainable. This may not be possible for every file type, but using a recommended format will ensure your file will remain usable for a long time. The Library of Congress keeps a list of recommended file formats for many types of files, which can be useful to consult when preparing your files.
Regardless of whether or not you use a recommended file format, always check the file type requirements for the deposit system to make sure your chosen file format is allowed.
File size
Many deposit systems set a size limit for file uploads, usually ranging from 1GB to 5GB. These limits are fixed in some systems, but others may accept files that exceed the limit through a mediated process. File size limits for the deposit system may influence how you package your files, or what system you choose for deposit.
DRS users may deposit files up to 1GB. Library staff are available to assist with depositing files larger than 1GB. There are no size limits for individual file downloads, but you should take into consideration how easy or difficult it may be for consumers of the data to download the files. For this reason, we recommend grouping data in packages no larger than 15GB each when depositing in the DRS.
File names
Using a clear and consistent method for naming your files will help ensure your files can be accessed easily. There are a few general rules to follow when creating a system for naming files:
- Create file names that are descriptive, but brief (fewer than 30 characters)
- Use numbers letters (either upper or lowercase)
- Avoid using special characters, especially those that may be misinterpreted by an operating system
- Use underscores instead of spaces
- Include a date in the file name formatted as YYYYMMDD (e.g. 20200101)
- Apply the chosen naming system consistently.
Stanford Libraries and Princeton University Library have put together very useful guides with detailed information about best practices for naming files:
File Organization, Princeton University Library
Data best practices and case studies, Name files, Stanford Libraries
File organization
Like file names, the files themselves should be organized in a clear, consistent manner. Consider:
- Creating a browsable hierarchy or directory structure that uses the same naming conventions as the files
- Sorting files into related groupings, like by experiments or by date
You should consider packaging the data in naturally occurring groupings for the data or research, but also keep in mind how other researchers may expect to access or use the data. If possible, compress the data files into ZIP or TAR packages. Compressing your files will reduce the upload and download sizes, reduce the number of files you will have to deposit and the number of files to be downloaded, and will preserve the desired organization of the files. Another good reason is file names. Some systems, including the DRS, will change the file name for every file deposited to avoid collisions between similarly named files in the storage system and when packaging files for bulk downloading (for example, when project A uses data.csv and project B also uses data.csv, storing these files together will cause issues). If the file names are important to accessing the data, compressing data files into a zip package will ensure the original file names are retained.