Skip to content

File Storage and Backup Best Practices

Picture of external hard drive.
Photo by Avinash Kumar: https://www.pexels.com/photo/an-external-storage-drive-on-wooden-table-13595074/

This is the fifth article in a series on the changes to the NIH Data Management and Sharing policies that will go into effect for NIH grant applications starting January 2023. For more information, see our previous articles on data management.

File storage is an important piece of data management. While conducting your research, you’ll be saving and accessing your data often, so thinking about where and how to store this data before you begin your research is important. However, keep in mind that data storage is different from data preservation. Data storage addresses storage options during the active research process, while data preservation deals with the long-term storage of research data following the completion of a research project (Washington State University Libraries [WSU Libraries], 2022). And remember - backing up your files is an important piece of file storage.

The Rule of Three

When it comes to file storage, the best practice is to follow the “rule of three:” 

  • THREE copies of every file
  • TWO different media types (i.e. types of storage such as local/hard drive and cloud)
  • ONE copy in a different location (Cornell, 2022). 

Another way to think about the rule of three is “here, near, and far.” In this model of the rule of three, you’ll still want to keep at least three copies of your data. Keep one copy “here” - a local copy on your laptop or desktop computer. Keep a second copy “near” - an external copy on a different device (such as an external hard drive or a network drive). And keep a third copy “far” - an external copy in a geographically different location, such as in the cloud (Cornell, 2022). This strategy will ensure that if you lose a copy of a file, you will have it in other locations. However, be sure to save all files in all locations after every change or edit to a document. Having files in multiple locations is only helpful if each copy is updated with the most current version of the document. 

Where Can You Store Files?

There are a variety of different options available when it comes to storing your research data and files. Local hardware, such as your desktop or laptop computer, and external storage devices such as external hard drives can be convenient options. However, this storage strategy can be risky due to the threat of damage, loss, theft, or obsolescence of these devices (Himmelfarb Health Sciences Library [Himmelfarb], 2022). In order to help prevent theft, damage, or loss, it’s best to store external hard drives away from your computer (WSU Libraries, 2022). Network drives are typically very stable and secure since they are controlled by your institution's IT division (WSU Libraries, 2022). However, be aware that many network drives have size restrictions that make it an unrealistic option if you create large amounts of data. 

Remote storage, also known as cloud storage, is an option that stores your files on remotely located servers. This option can often cost money and it’s important to read and understand the terms of service before storing your data on the cloud. Many funders and institutions require any sensitive data to be stored on cloud services whose servers are located in the United States, so be sure to investigate where the servers are before saving your files (Himmelfarb, 2022). GW Box is the university’s enterprise file-sharing service for online cloud storage and collaboration. GW Box is free for all GW students, faculty, and staff. Cloud storage such as Google Drive and Box can be synchronized with your computer, which makes backing up your files easy (WSU Libraries, 2022). If your research required high-performance computing for data analysis, GW High-Performance Computing could be a good solution. Another cloud option is the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability Initiative, also known as NIH STRIDES. More information about NIH STRIDES initiative is available through the NIH Office of Data Science Strategy

Storage Formats

For long-term storage, it’s best to use formats that are unencrypted and uncompressed so the files will remain readable in the future. Formats that are open, well-documented, and widely used will help ensure your files will remain accessible and usable in the long term (Himmelfarb, 2022). Preferred file formats include:

  • Text: DOCX, ODT, PDF
  • Databases: XML, SQLite
  • Tabulated data: CSV
  • Images: PNG, JPEG, TIFF
  • Sound: MP3, WAVE
  • Video: MP4

Data Security

Data security is a key concern when it comes to storing your files, especially when data in your files contain potentially sensitive information. It’s important to think carefully about and include data security in your data management plan. Some important considerations include:

  • Who will be responsible for storing and backing up your data? How frequently will this be done?
  • How will you manage access to your data? Consider physical access to hardware. Where will you store computers and external hard drives? Will these be password protected?
  • How will you secure hardware for locally stored data? Will you use firewalls? How will you update antivirus protection? Who will update the software and how often?
  • How will you keep the integrity of your data? Will you use encryption, watermarking, or digital signatures? 

(Himmelfarb, 2022)

What’s Your Backup Plan?

Creating a backup plan for your files will prevent the loss of data in the event of losing files or data from disasters such as fire or flood, theft, unauthorized use, or hardware/software malfunctions (Himmelfarb, 2022). Following the rule of three described above provides a great level of protection through multiple copies in a variety of locations. Knowing how to recover data from your backups before you need to in an emergency is also highly recommended (Cornell, 2022). Having two backups of your data, one locally on a device other than your main workstation, and another remotely is a great way to backup your data. 

Having a regular backup routine is also important. A full backup, backing up each file every time you do a backup, allows you to retrieve all of your data if you need to do so. However, this method takes a lot of time and resources. Another option is to do incremental backups. During incremental backups, you only need to back up files that have been edited or changed since your last backup (Himmelfarb, 2022). 

To Learn More:

To learn more about storage options, take a look at the Storage Options page of our NIH Data Management and Sharing Plan Research Guide.

References:

Cornell University. (June 2022). Data storage and backup. Research data management service group: Comprehensive data management planning & services. https://data.research.cornell.edu/content/data-storage-and-backup

Himmelfarb Health Sciences Library. (November 14, 2022).  Storage options. NIH data management & sharing plan (DMSP) Research Guide. https://guides.himmelfarb.gwu.edu/NIHDMSPolicy/storage-options

Washington State University Libraries. (January 10, 2022). Data storage & backup. Research data management. https://libguides.libraries.wsu.edu/rdmlibguide/datastorage

Print Friendly, PDF & Email