Skip to content

File Naming Conventions

This is the third article in a series on the changes to the NIH Data Management and Sharing policies that will come into effect for NIH grant applications starting January 2023. For more information, see our first article for a general overview of data management resources and our second article for writing a data management plan best practices.

With the new 2023 NIH Data Management and Sharing Policy scheduled to take effect in January 2023, file naming conventions are an important piece of the data management puzzle. This new policy encourages project teams to agree on file naming conventions for objects and files and follow file naming convention best practices. This post will explore current best practices for file naming conventions.

Why Use Standardized File Naming Conventions?

Creating standardized file naming conventions is an important part of the research process. Standard file names are a great way to keep your research organized while ensuring that files can be easily located and identified by everyone in the research group. Using standard file naming conventions will also help future users find and understand the data after the project has ended. Using standardized and descriptive file names will help streamline the workflow by helping users easily identify the contents of a file without having to open the file (Univ. of Michigan Library, 2022).

The best time to develop a file naming convention is before you start your research project. Having a file naming convention in place before you start the project will prevent your project from having a backlog of unorganized files, which can lead to misplaced or lost data (Longwood Research Data Management, 2022). Your research group should decide at the outset of your project what naming conventions will be used. Once a file naming convention has been agreed upon by the research group, it must be consistently followed by all members of the group. If the naming convention isn’t followed, data could become difficult to find, making it unusable. 

What Should Be Included in File Names?

File names should be descriptive enough to capture relevant information about the file, so try to build two or three salient characteristics of the project and dataset into each file name (University of Michigan Library, 2022). Think about the types of files you’ll be working with and the types of information each file will contain when developing your file naming convention. For example, what groups of files will your naming convention cover? Are different naming conventions needed for different sets of files? Does your group, department, or discipline already have file naming conventions in place which could be used? 

It’s also a great idea to think about the metadata you’d like to include in each file name. Consider what information should be included to allow users to easily and quickly locate or search for a needed file. Since computers arrange files by name, character by character, it’s a good idea to put the most important information at the beginning of the file name. If finding information by date is a priority, start each file name with a date (see the Standardized Dates section below for more information on using dates in file names). If the type of data is the most important piece of information, start each file name with the type of information instead. 

Consider including the following pieces of information in your naming convention structure:

  • Unique identifiers (such as a grant number)
  • Project, study, or experiment name or acronym
  • Location information (such as spatial coordinates)
  • Researcher initials
  • Date or date range (in a standardized format)
  • Experimental conditions (such as instrument, temperature, etc.)
  • Version number (more information below in the Use Versioning section below)
  • Type of data (image, dataset, samples, etc.)
  • Family type, or file extension
  • Lab name or location

What Should be Avoided in File Names?

While many file naming best practices revolve around what should be included in a file name, there are also best practices related to what should not be included in file names. Here are the top three things to avoid in your file naming conventions:

  • Spaces: While separating metadata elements is a common practice, avoid using spaces to separate each element. Consider using dashes or underscores instead of spaces. For example, instead of using File Name.xxx, consider using File-Name.xxx or File_Name.xxx instead. You could also consider not separating metadata at all, and using Camel Case to eliminate spaces: FileName.xxx
  • Special Characters: Avoid using special characters such as @ # $ % & * in file names. Limit file names to alphanumeric characters.
  • Long File Names: In general, file names should be kept to 30 characters or less. Shorter file names will make it easier for users to identify the contents of the file. Longer file names may not be readable by software programs.

Standardizing Dates

When including dates in file names, using International Organization of Standardization (ISO) standards is generally considered to be the best practice. Dates should be formatted starting with the four-digit year, followed by the two-digit month, and two-digit day: 

  • YYYYMMDD (ex: 20221021)
  • YYYY-MM-DD (ex: 2022-10-21)

Use Version Control!

Many research projects involve creating and maintaining multiple versions of the same file. If this is the case for your research project, be sure to use versioning to indicate the most current version of files. Using file versioning not only helps you keep track of which file is the most recent update, but it also provides you with the ability to revert data to an earlier version without starting from scratch or having to regenerate data (Cornell University, 2022). 

Some tools such as electronic lab notebooks or Box allow you to assign version numbers, but you can create version control by building versioning into your file naming convention. You can track versions by adding version information to the end of a file name. Here’s an example: 

  • File_Name_v001.xxx
  • File_Name_v002.xxx
  • File_Name_v003.xxx

You can also include the date to indicate a version number:

  • File_Name_20220213.xxx
  • File_Name_20220321.xxx
  • File_Name_20220601.xxx

Avoid using ambiguous labels, such as “revision” or “final” in your file names. It’s also a good idea to save your original, untouched raw data and leave it that way. Having this raw data saved will allow you to always have the original data as a safe, untouched copy. 

Standardized Numbers - Use a Leading 0!

If sequential numbering is part of your file naming structure, use leading zeros. For example, instead of using 1, 2, 3, use 001, 002, 003. This will ensure that your files will be sorted in an easily findable manner. This applies to version control numbering as well.

Directory Structure Naming Conventions

File naming conventions don’t just apply to your files, use the same best practices to structure your directory folders as well. Directory folders should provide key information about the file contents stored within each folder. Be sure to include the project title, unique identifiers, and the date. It might be helpful to create a brief description of the content stored in major folders and to provide an overview of the directory structure in your documentation. The level of detail included should be enough to help someone understand the contents and organization of the files. 

Here’s a nice example: (Cornell University, 2022)

  • Top Folder: Study_name
    • Subfolder1: Study_name_Datasets
    • Study_name_2019-2020.csv
    • Study_name_2021-2022.csv
  • Subfolder2: Study_name_Semanitc_analysis
    • Study_name_semantic_analysis.R
    • Study_name_semantic_analysis_output.csv
  • Readme File: Study_name_readme.txt

Document Naming Conventions

Be sure to document each file naming convention in a top-level readme file. This file should include instructions for navigating the structure so that others involved in the research project, and others who might use this data once the project is complete can follow the naming conventions used. This file can be a README.txt file and should be kept with your files.

File naming conventions are an essential part of any research project! Be sure to take the time to create a file naming convention that will help keep your files organized, easily findable, and usable by your research team and any others who may look at your data once your project is finished. Stay tuned for future posts on best practices related to other data management topics!

References:

Cornell University. (June 2022). File management. Research Data Management Service Group: Comprehensive Data Management Planning & Services. https://data.research.cornell.edu/content/file-management

Longwood Research Data Management: Harvard Medical School. (2022). File Naming Conventions. Data Management, Harvard Medical School. https://datamanagement.hms.harvard.edu/collect/file-naming-conventions
University of Michigan Library (September 14, 2022). File-naming conventions. LabArchives: Best practices for research data management. https://guides.lib.umich.edu/c.php?g=739306&p=5286418

Print Friendly, PDF & Email