In January the NIH implemented new policies requiring that research data be managed, archived, and shared using a data management and sharing plan that must be submitted as part of any new grant application. These policies encourage data re-use and reproducibility, increase transparency, and enable researchers to build on previous work.
Himmelfarb Library’s NIH Data Management and Sharing Plan (DMSP) Research Guide brings data management and sharing services and resources together for easy reference and instruction. The guide can step you through the process of determining what data needs to be shared and archived, putting together a data management plan, finding a data storage solution, and/or an open data repository for sharing.
Templates for data management plans are helpful development tools. DMPTool provides a variety of templates, including the NIH_GEN DMSP (2023) template specifically for NIH funding. You can find it and other sources for templates on the DMSP guide Getting Started tab. NIH recently released 13 additional sample templates on its website, including templates for genomic and survey data. The Survey and Interview Data (Sample Plan M) includes language related to data that can't be shared.
Finding an appropriate open data repository for storage and sharing can be a challenge. The NIH-supported Scientific Data Repositories site is useful for finding specialized repositories. For more generalist data, NIH began the Generalist Repository Ecosystem Initiative (GREI) and has partnered with seven organizations that offer open repositories, including figshare, Mendeley Data, OSF, and DRYAD. More information about these repositories, including recorded webinars, is on NIH’s GREI website.
The June R01 deadline has just passed, meaning that the next submission date is in October. If you’re planning to apply for NIH funding, don’t put off work on a data management and sharing plan! Start now and reach out to data specialists at GW with your questions. Sara Hoover, Metadata and Scholarly Publishing Librarian is the contact at Himmelfarb Library. You can reach Sara at shoover@gwu.edu. Additionally, Gelman Library offers data management consultation services. Librarians can answer your questions or refer them to other University research services for assistance, including the OVPR, Office of Sponsored Projects, the Office of Research Integrity, and the Office of Clinical Research.
This is the sixth article on the changes to the NIH Data Management and Sharing policies that will go into effect for NIH grant applications starting January 2023. For more information, see our previous articles on data management.
Broadly speaking, file management pertains to the organization, access, storage and retrieval of documents, folders and information. When creating a research plan, it is important to spend time reflecting on your file management system to avoid future complications such as losing a necessary document or being locked out of a folder. A proper file management system will ensure that all documents are correctly labeled, stored and available to be viewed by project team members.
The Lamar Soutter Library states that file management consists of “Structuring the hierarchical organization of file folders in a logical and clear way; Planning for the syntax and vocabulary of individual file names; [and] Using agreed-upon conventions consistently.” (Lamar Soutter Library, 2022) A file management system clarifies steps for preserving research data in a manner that is accessible to all research team members.
You should create your file management system before engaging in your research. Spend as much time as necessary establishing and documenting a routine, folder hierarchy, file naming conventions and other information. Here are some points to consider when creating your file management system:
Think about the goal and purpose of file management for your research: Having a goal in mind will allow you to determine the best way to manage your files and ensure that you are in compliance with the new NIH Data Management and Sharing policies. A goal will also allow you to eliminate any confusion surrounding file management or the software or hardware you will use to maintain your research files.
Seek input from all research team members: If multiple people are involved in the research project, seek their input on how they manage their personal files. Learn which tools and resources they are accustomed to. Form a consensus on which software and file management structure the team will use.
“Develop a nested folder structure that makes the most sense for your project and team’s retrieval needs” (Lamar Soutter Library, 2022) : There are multiple ways to organize a nested folder structure. If the research project will occur over a long period of time, consider creating a “base” folder with the project name and adding additional folders based on the year, month or quarter for the project. You may also consider nesting folders based on the type of information. For example, if you conduct a survey, separate the responses into different folders based on whether the survey occurred in person, over the phone, via email or through another form of communication. Think about your research plan and what types of information you will or may gather, then create corresponding folders. Be sure to create folders that may be of use in the future so you can immediately organize that information without disrupting your current file management system.
Once a system has been established, make sure all team members have access: No matter how you decide to store your files, take time to ensure all project members can access the folders and documents that they need. If there are passwords attached to folders or documents, store those passwords in a secure location that members can locate.
Create a reference file management document: Once you and your team have established a file management system, create a separate document that notes the systems and software you plan to use, folder passwords, file naming conventions and other relevant information. Store this document in a location that is accessible and use it as a reference over the course of the research project. If changes are made to the file management system, be sure to update this document to reflect the new changes.
Resources are available to guide you through the file management development and maintenance process. Read our previous NIH Data Management and Sharing policies articles to gain valuable information on the new policy and how to comply with it. Our File Naming Conventions article offers insight on how to name your files in a clear and consistent manner and our File Storage and Backup Best Practices article discusses storage tools, the importance of saving your files in multiple locations and data security.
Lastly, if you’re unsure of where to begin with creating a file management system MIT Libraries’ Data Management Guide provides a worksheet that walks you through the process of creating a file management hierarchy. Follow the steps from the very beginning or pick sections from the worksheet to help you develop a file management system for your research project.
File management may feel like a daunting task. By reflecting on the goals of your management system and developing a plan before collecting data, you will avoid losing research data or navigating unorganized folders and files. The staff at Himmelfarb Library are here to help you understand file management or any topic related to the new NIH policy. Be sure to read our previous articles or browse our NIH Data Management & Sharing Plan (DMSP) Research Guide. Continue to follow our Himmelfarb Library News site for future data management articles!
This is the fifth article in a series on the changes to the NIH Data Management and Sharing policies that will go into effect for NIH grant applications starting January 2023. For more information, see our previous articles on data management.
File storage is an important piece of data management. While conducting your research, you’ll be saving and accessing your data often, so thinking about where and how to store this data before you begin your research is important. However, keep in mind that data storage is different from data preservation. Data storage addresses storage options during the active research process, while data preservation deals with the long-term storage of research data following the completion of a research project (Washington State University Libraries [WSU Libraries], 2022). And remember - backing up your files is an important piece of file storage.
The Rule of Three
When it comes to file storage, the best practice is to follow the “rule of three:”
THREE copies of every file
TWO different media types (i.e. types of storage such as local/hard drive and cloud)
ONE copy in a different location (Cornell, 2022).
Another way to think about the rule of three is “here, near, and far.” In this model of the rule of three, you’ll still want to keep at least three copies of your data. Keep one copy “here” - a local copy on your laptop or desktop computer. Keep a second copy “near” - an external copy on a different device (such as an external hard drive or a network drive). And keep a third copy “far” - an external copy in a geographically different location, such as in the cloud (Cornell, 2022). This strategy will ensure that if you lose a copy of a file, you will have it in other locations. However, be sure to save all files in all locations after every change or edit to a document. Having files in multiple locations is only helpful if each copy is updated with the most current version of the document.
Where Can You Store Files?
There are a variety of different options available when it comes to storing your research data and files. Local hardware, such as your desktop or laptop computer, and external storage devices such as external hard drives can be convenient options. However, this storage strategy can be risky due to the threat of damage, loss, theft, or obsolescence of these devices (Himmelfarb Health Sciences Library [Himmelfarb], 2022). In order to help prevent theft, damage, or loss, it’s best to store external hard drives away from your computer (WSU Libraries, 2022). Network drives are typically very stable and secure since they are controlled by your institution's IT division (WSU Libraries, 2022). However, be aware that many network drives have size restrictions that make it an unrealistic option if you create large amounts of data.
Remote storage, also known as cloud storage, is an option that stores your files on remotely located servers. This option can often cost money and it’s important to read and understand the terms of service before storing your data on the cloud. Many funders and institutions require any sensitive data to be stored on cloud services whose servers are located in the United States, so be sure to investigate where the servers are before saving your files (Himmelfarb, 2022). GW Box is the university’s enterprise file-sharing service for online cloud storage and collaboration. GW Box is free for all GW students, faculty, and staff. Cloud storage such as Google Drive and Box can be synchronized with your computer, which makes backing up your files easy (WSU Libraries, 2022). If your research required high-performance computing for data analysis, GW High-Performance Computing could be a good solution. Another cloud option is the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability Initiative, also known as NIH STRIDES. More information about NIH STRIDES initiative is available through the NIH Office of Data Science Strategy.
Storage Formats
For long-term storage, it’s best to use formats that are unencrypted and uncompressed so the files will remain readable in the future. Formats that are open, well-documented, and widely used will help ensure your files will remain accessible and usable in the long term (Himmelfarb, 2022). Preferred file formats include:
Text: DOCX, ODT, PDF
Databases: XML, SQLite
Tabulated data: CSV
Images: PNG, JPEG, TIFF
Sound: MP3, WAVE
Video: MP4
Data Security
Data security is a key concern when it comes to storing your files, especially when data in your files contain potentially sensitive information. It’s important to think carefully about and include data security in your data management plan. Some important considerations include:
Who will be responsible for storing and backing up your data? How frequently will this be done?
How will you manage access to your data? Consider physical access to hardware. Where will you store computers and external hard drives? Will these be password protected?
How will you secure hardware for locally stored data? Will you use firewalls? How will you update antivirus protection? Who will update the software and how often?
How will you keep the integrity of your data? Will you use encryption, watermarking, or digital signatures?
(Himmelfarb, 2022)
What’s Your Backup Plan?
Creating a backup plan for your files will prevent the loss of data in the event of losing files or data from disasters such as fire or flood, theft, unauthorized use, or hardware/software malfunctions (Himmelfarb, 2022). Following the rule of three described above provides a great level of protection through multiple copies in a variety of locations. Knowing how to recover data from your backups before you need to in an emergency is also highly recommended (Cornell, 2022). Having two backups of your data, one locally on a device other than your main workstation, and another remotely is a great way to backup your data.
Having a regular backup routine is also important. A full backup, backing up each file every time you do a backup, allows you to retrieve all of your data if you need to do so. However, this method takes a lot of time and resources. Another option is to do incremental backups. During incremental backups, you only need to back up files that have been edited or changed since your last backup (Himmelfarb, 2022).
With the new 2023 NIH Data Management and Sharing Policy scheduled to take effect in January 2023, file naming conventions are an important piece of the data management puzzle. This new policy encourages project teams to agree on file naming conventions for objects and files and follow file naming convention best practices. This post will explore current best practices for file naming conventions.
Why Use Standardized File Naming Conventions?
Creating standardized file naming conventions is an important part of the research process. Standard file names are a great way to keep your research organized while ensuring that files can be easily located and identified by everyone in the research group. Using standard file naming conventions will also help future users find and understand the data after the project has ended. Using standardized and descriptive file names will help streamline the workflow by helping users easily identify the contents of a file without having to open the file (Univ. of Michigan Library, 2022).
The best time to develop a file naming convention is before you start your research project. Having a file naming convention in place before you start the project will prevent your project from having a backlog of unorganized files, which can lead to misplaced or lost data (Longwood Research Data Management, 2022). Your research group should decide at the outset of your project what naming conventions will be used. Once a file naming convention has been agreed upon by the research group, it must be consistently followed by all members of the group. If the naming convention isn’t followed, data could become difficult to find, making it unusable.
What Should Be Included in File Names?
File names should be descriptive enough to capture relevant information about the file, so try to build two or three salient characteristics of the project and dataset into each file name (University of Michigan Library, 2022). Think about the types of files you’ll be working with and the types of information each file will contain when developing your file naming convention. For example, what groups of files will your naming convention cover? Are different naming conventions needed for different sets of files? Does your group, department, or discipline already have file naming conventions in place which could be used?
It’s also a great idea to think about the metadata you’d like to include in each file name. Consider what information should be included to allow users to easily and quickly locate or search for a needed file. Since computers arrange files by name, character by character, it’s a good idea to put the most important information at the beginning of the file name. If finding information by date is a priority, start each file name with a date (see the Standardized Dates section below for more information on using dates in file names). If the type of data is the most important piece of information, start each file name with the type of information instead.
Consider including the following pieces of information in your naming convention structure:
Unique identifiers (such as a grant number)
Project, study, or experiment name or acronym
Location information (such as spatial coordinates)
Researcher initials
Date or date range (in a standardized format)
Experimental conditions (such as instrument, temperature, etc.)
Version number (more information below in the Use Versioning section below)
Type of data (image, dataset, samples, etc.)
Family type, or file extension
Lab name or location
What Should be Avoided in File Names?
While many file naming best practices revolve around what should be included in a file name, there are also best practices related to what should not be included in file names. Here are the top three things to avoid in your file naming conventions:
Spaces: While separating metadata elements is a common practice, avoid using spaces to separate each element. Consider using dashes or underscores instead of spaces. For example, instead of using File Name.xxx, consider using File-Name.xxx or File_Name.xxx instead. You could also consider not separating metadata at all, and using Camel Case to eliminate spaces: FileName.xxx
Special Characters: Avoid using special characters such as @ # $ % & * in file names. Limit file names to alphanumeric characters.
Long File Names: In general, file names should be kept to 30 characters or less. Shorter file names will make it easier for users to identify the contents of the file. Longer file names may not be readable by software programs.
Standardizing Dates
When including dates in file names, using International Organization of Standardization (ISO) standards is generally considered to be the best practice. Dates should be formatted starting with the four-digit year, followed by the two-digit month, and two-digit day:
YYYYMMDD (ex: 20221021)
YYYY-MM-DD (ex: 2022-10-21)
Use Version Control!
Many research projects involve creating and maintaining multiple versions of the same file. If this is the case for your research project, be sure to use versioning to indicate the most current version of files. Using file versioning not only helps you keep track of which file is the most recent update, but it also provides you with the ability to revert data to an earlier version without starting from scratch or having to regenerate data (Cornell University, 2022).
Some tools such as electronic lab notebooks or Box allow you to assign version numbers, but you can create version control by building versioning into your file naming convention. You can track versions by adding version information to the end of a file name. Here’s an example:
File_Name_v001.xxx
File_Name_v002.xxx
File_Name_v003.xxx
You can also include the date to indicate a version number:
File_Name_20220213.xxx
File_Name_20220321.xxx
File_Name_20220601.xxx
Avoid using ambiguous labels, such as “revision” or “final” in your file names. It’s also a good idea to save your original, untouched raw data and leave it that way. Having this raw data saved will allow you to always have the original data as a safe, untouched copy.
Standardized Numbers - Use a Leading 0!
If sequential numbering is part of your file naming structure, use leading zeros. For example, instead of using 1, 2, 3, use 001, 002, 003. This will ensure that your files will be sorted in an easily findable manner. This applies to version control numbering as well.
Directory Structure Naming Conventions
File naming conventions don’t just apply to your files, use the same best practices to structure your directory folders as well. Directory folders should provide key information about the file contents stored within each folder. Be sure to include the project title, unique identifiers, and the date. It might be helpful to create a brief description of the content stored in major folders and to provide an overview of the directory structure in your documentation. The level of detail included should be enough to help someone understand the contents and organization of the files.
Here’s a nice example: (Cornell University, 2022)
Top Folder: Study_name
Subfolder1: Study_name_Datasets
Study_name_2019-2020.csv
Study_name_2021-2022.csv
Subfolder2: Study_name_Semanitc_analysis
Study_name_semantic_analysis.R
Study_name_semantic_analysis_output.csv
Readme File: Study_name_readme.txt
Document Naming Conventions
Be sure to document each file naming convention in a top-level readme file. This file should include instructions for navigating the structure so that others involved in the research project, and others who might use this data once the project is complete can follow the naming conventions used. This file can be a README.txt file and should be kept with your files.
File naming conventions are an essential part of any research project! Be sure to take the time to create a file naming convention that will help keep your files organized, easily findable, and usable by your research team and any others who may look at your data once your project is finished. Stay tuned for future posts on best practices related to other data management topics!
With the 2023 NIH Data Management and Sharing Policy going into effect on January 25, 2023, there’s no better time to explore data management resources! This post explores resources that can help you with your data management needs.
What is data management?
Data management involves the process of collecting or producing, cleaning and analyzing, preserving, and sharing data from a research project. Data management takes place throughout the entire research life cycle, from deciding on consistent file naming conventions to depositing the data in a repository for long-term archiving.
Why Data Management?
Data management is vital for transparency (showing your work promotes reproducibility of work), compliance (funding organizations and journals often require making data available), and personal and organizational benefit (using data within your own lab is easier with proper management).
I Think It’s FAIR to Say…
Understanding data management best practices is important to make well-informed decisions when selecting data management resources and tools. The FAIR Principles, first published in 2016, provide a set of guidelines for data management. FAIR stands for Findable, Accessible, Interoperable, and Reusable. You can learn more about the FAIR Principles on our Data Management Guide. Another great resource to help guide your data management is Cornell University’s Research Data Management Service Group’s Comprehensive Data Management Planning and Services Best Practices which provides extensive information related to best practices for:
Himmelfarb’s Data Management Guide provides a wealth of information and resources related to data management. In addition to some basic information about data management, you’ll find information about NIH and NSF funder requirements. Data management plans (DMPs) are also covered in detail. The documentation and metadata page explains what metadata is, what should be included in your metadata, metadata schemas, controlled vocabularies, file naming conventions, and electronic lab notebooks. The data storage and security page includes data storage, storage formats, creating a backup plan, and data security. You’ll also learn about data sharing, including GW’s policy on regulated information, and data repositories.
I might need to make a plan for this…
Creating a data management plan (DMP) is often part of the grant writing process required by funding institutions. A comprehensive data management plan should address:
Data Collection: Must be reliable and valid.
Data Storage: Appropriate amount of data so research can be reproduced.
Data Analysis: Interpretation of data from which conclusions can be derived.
Data Protection: Ensuring sensitive data is safe and secure, preventing tampering or loss of data.
Data Ownership: Addresses legal rights associated with data.
Data Retention: Addresses how long data should be kept and proper disposal of sensitive data.
Data Reporting: Publication of data.
Data Sharing: Addresses what data can be shared with others and how.
When it comes to creating a DMP, there are a number of tools available to help! The DMPTool is a free, open-source tool that helps researchers create DMPs that comply with funder requirements. DMPTool also provides links to funder websites, and best practices resources to help guide your data management efforts. Since GW is affiliated with DMPTool, GW users can create a personalized dashboard that allows them to see and organize the DMPs created through the tool. From the DMPTool’s website, simply click “sign in” and use Option 1 to search for George Washington University. Then log in with your GW UserID and password and create your data management plan!
The Framework for Creating a Data Management Plan, created by ICPSR, is a great outline that will help you create a DMP for your grant application. The framework includes a list of elements to be included, explains why each element is important and provides examples for each element. Michener’s article Ten Simple Rules for Creating a Good Data Management Plan is another great starting point to gain an understanding of the principles and practices of creating a DMP and ensuring your data are safe and shareable. For more DMP resources and to see examples and templates, check out the Data Management Plan page of the data management guide.
What’s Next?
Stay tuned for future posts on best practices for writing a data management plan, data storage, file naming conventions, creating “readme” metadata, and other data management topics. In the meantime, check out the lists of GW resources and additional resources below to learn more!