Data Management – The Rotation: A Himmelfarb Library Blog

New Scholarly Publishing Video Tutorials Available!

jltJuly 21, 2023July 20, 2023

photo of coffee in teacup with open notebook, pen and laptop — *Image from pxfuel.com*

Himmelfarb Library’s Scholarly Communications Committee produces short tutorial videos on scholarly publishing and communications topics for SMHS, GWSPH, and GW School of Nursing students, faculty, and staff. Five new videos are now available on our YouTube channel and Scholarly Publishing Research Guide!

2023 NIH Data Management and Sharing Policy Resources by Sara Hoover - Sara is our resident expert on data management policy and resources. She provides an overview of the NIH policy, the essential elements of a data management and sharing plan, and highlights GW and non-GW resources that can aid you in putting together a data management and sharing plan. The video is 10 minutes in length.

Animal Research Alternatives by Paul Levett - Paul demonstrates how to conduct 3Rs alternatives literature searches for animal research protocols. He defines the 3Rs and explains how to report the search in the GW Institutional Animal Care and Use Committee (IACUC) application form. Paul is currently a member of the GW IACUC. The video is 13 minutes long.

Artificial Intelligence Tools and Citations by Brittany Smith - As a Library Science graduate student, Brittany has an interest in how AI is impacting the student experience. She discusses how tools like Chat GPT can assist with your research, the GW policy on AI, and how to create citations for these resources. The video is 6.5 minutes in length.

UN Sustainable Development Goals: Finding Publications by Stacy Brody - Stacy addresses why the goals were developed, what they hope to achieve, and shows ways to find related publications in Scopus. The video is 5 minutes long.

Updating Your Biosketch via SciEncv by Tom Harrod - Tom talks about the differences between NIH’s SciEncv and Biosketch and demonstrates how to use SciEncv to populate a Biosketch profile. Tom advises GW SMHS, School of Nursing, and GWSPH researchers on creating and maintaining research profiles and he and Sara provide research profile audit services. The video is 5 minutes long.

You can find the rest of the videos in the Scholarly Communications series in this YouTube playlist or on the Scholarly Publishing Research Guide.

Applying for NIH Funding? Learn How to Meet Data Management and Sharing Requirements

jltJune 7, 2023June 6, 2023

In January the NIH implemented new policies requiring that research data be managed, archived, and shared using a data management and sharing plan that must be submitted as part of any new grant application. These policies encourage data re-use and reproducibility, increase transparency, and enable researchers to build on previous work.

Himmelfarb Library’s NIH Data Management and Sharing Plan (DMSP) Research Guide brings data management and sharing services and resources together for easy reference and instruction. The guide can step you through the process of determining what data needs to be shared and archived, putting together a data management plan, finding a data storage solution, and/or an open data repository for sharing.

Templates for data management plans are helpful development tools. DMPTool provides a variety of templates, including the NIH_GEN DMSP (2023) template specifically for NIH funding. You can find it and other sources for templates on the DMSP guide Getting Started tab. NIH recently released 13 additional sample templates on its website, including templates for genomic and survey data. The Survey and Interview Data (Sample Plan M) includes language related to data that can't be shared.

Finding an appropriate open data repository for storage and sharing can be a challenge. The NIH-supported Scientific Data Repositories site is useful for finding specialized repositories. For more generalist data, NIH began the Generalist Repository Ecosystem Initiative (GREI) and has partnered with seven organizations that offer open repositories, including figshare, Mendeley Data, OSF, and DRYAD. More information about these repositories, including recorded webinars, is on NIH’s GREI website.

The June R01 deadline has just passed, meaning that the next submission date is in October. If you’re planning to apply for NIH funding, don’t put off work on a data management and sharing plan! Start now and reach out to data specialists at GW with your questions. Sara Hoover, Metadata and Scholarly Publishing Librarian is the contact at Himmelfarb Library. You can reach Sara at shoover@gwu.edu. Additionally, Gelman Library offers data management consultation services. Librarians can answer your questions or refer them to other University research services for assistance, including the OVPR, Office of Sponsored Projects, the Office of Research Integrity, and the Office of Clinical Research.

RefWorks Legacy Sunsetting

veejayApril 26, 2023January 10, 2024

Proquest will discontinue RefWorks Classic in June 2023. All users should move to the new and improved RefWorks interface. The upgraded version of the software has new functions, such as drag-and-drop functionality, enhanced sharing features, project management, and a PDF annotation function.

Instructions on how to upgrade your existing account to the new RefWorks interface can be found at RefWorks: Access. Also, Himmelfarb Library offers valuable information to assist users who are new to RefWorks. Review the library’s libguide RefWorks New to get started.

Emerging Open Science Platforms to Support Open Science Practices

jltMarch 27, 2023January 30, 2024

In 2021, UNESCO developed a Recommendation on Open Science to be adopted by member states. This recommendation evolved from a 2017 Recommendation on Science and Scientific Researchers promoting science as a common good.

UNESCO’s Recommendation on Science and Scientific Researchers, https://youtu.be/94T7NGirUlM

The Recommendation on Open Science includes a definition of open science:

Open science is … an inclusive construct that combines various movements and practices aiming to make multilingual scientific knowledge openly available, accessible and reusable for everyone, to increase scientific collaborations and sharing of information for the benefits of science and society, and to open the processes of scientific knowledge creation, evaluation and communication to societal actors beyond the traditional scientific community. It comprises all scientific disciplines and aspects of scholarly practices, including basic and applied sciences, natural and social sciences and the humanities, and it builds on the following key pillars: open scientific knowledge, open science infrastructures, science communication, open engagement of societal actors and open dialogue with other knowledge systems.
(UNESCO, 2021.)

The recommendation calls for scientific publications, research data, software and source code to be open and available to scientists internationally. The NIH adopted an open access policy for manuscripts resulting from funded research in 2008. Early this year the NIH adopted a policy requiring that all newly funded grants include a data management and sharing plan to make the underlying research data freely available to other researchers. For more information on the data sharing policy and how to comply, explore our Research Guide.

Platforms that support the sharing and dissemination of research findings and their underlying data are becoming available. The Open Science Framework (OSF) is a “free, open platform to support your research and enable collaboration”. It provides tools to design a study, collect and analyze data, and publish and share results. OSF was designed and is maintained by the non-profit Center for Open Science.

A helpful feature of OSF is the ability to generate a unique, persistent URL (uniform resource locator) for a project for sharing and attribution. There is also built-in version control and collaborators can be assigned a hierarchical level of permissions for data and document management. Researchers can decide to make all or parts of a project public and searchable and add licensing. Public projects can be searched on the OSF site. Registering a project creates a timestamped version for preservation. Pre-prints can also be hosted and made available for searching.

OSF has integrations with a number of useful tools including storage add-ons like Amazon S3, Google Drive, DropBox and figshare. Zotero and Mendeley can be integrated for citation management and GitHub can be used for managing software and code.

Institutions can set up a custom landing page for OSF and build user communities to promote sharing and collaboration within the institution and beyond. Harvard, Johns Hopkins, and NYU are among the many research universities that are using OSF in this way.

Last month Nature and Code Ocean announced a partnership to launch and curate Open Science Library. The Open Science Library contains research software used in Nature journal articles. “Compute capsules” which include the code, data, and computing environment will allow researchers to reproduce results, re-use the code, and collaborate. As open science becomes the norm, more multifunction platforms that enhance sharing and reproducibility while preserving work and ensuring attribution will continue to emerge.

References:

UNESCO. (2021). UNESCO Recommendation on Open Science. https://unesdoc.unesco.org/ark:/48223/pf0000379949.locale=en

UNESCO. (2017). Consolidated Report on the Implementation by Member States of the 2017 Recommendation on Science and Scientific Researchers. https://unesdoc.unesco.org/ark:/48223/pf0000379704

Foster, E. D., & Deardorff, A. (2017). Open Science Framework (OSF). Journal of the Medical Library Association: JMLA, 105(2), 203–206. https://doi.org/10.5195/jmla.2017.88

Code Ocean. (2023). Code Ocean Partners with Nature Portfolio to Launch the Open Science Library with Ready-to-Run Software from Authors in Nature Journals (Press Release). https://codeocean.com/press-release/code-ocean-partners-with-nature-portfolio-open-science-library/

Creating and Maintaining a File Management System

bsmith91January 6, 2023January 5, 2023

This is the sixth article on the changes to the NIH Data Management and Sharing policies that will go into effect for NIH grant applications starting January 2023. For more information, see our previous articles on data management.

Broadly speaking, file management pertains to the organization, access, storage and retrieval of documents, folders and information. When creating a research plan, it is important to spend time reflecting on your file management system to avoid future complications such as losing a necessary document or being locked out of a folder. A proper file management system will ensure that all documents are correctly labeled, stored and available to be viewed by project team members.

The Lamar Soutter Library states that file management consists of “Structuring the hierarchical organization of file folders in a logical and clear way; Planning for the syntax and vocabulary of individual file names; [and] Using agreed-upon conventions consistently.” (Lamar Soutter Library, 2022) A file management system clarifies steps for preserving research data in a manner that is accessible to all research team members.

You should create your file management system before engaging in your research. Spend as much time as necessary establishing and documenting a routine, folder hierarchy, file naming conventions and other information. Here are some points to consider when creating your file management system:

Think about the goal and purpose of file management for your research: Having a goal in mind will allow you to determine the best way to manage your files and ensure that you are in compliance with the new NIH Data Management and Sharing policies. A goal will also allow you to eliminate any confusion surrounding file management or the software or hardware you will use to maintain your research files.
Seek input from all research team members: If multiple people are involved in the research project, seek their input on how they manage their personal files. Learn which tools and resources they are accustomed to. Form a consensus on which software and file management structure the team will use.
“Develop a nested folder structure that makes the most sense for your project and team’s retrieval needs” (Lamar Soutter Library, 2022) : There are multiple ways to organize a nested folder structure. If the research project will occur over a long period of time, consider creating a “base” folder with the project name and adding additional folders based on the year, month or quarter for the project. You may also consider nesting folders based on the type of information. For example, if you conduct a survey, separate the responses into different folders based on whether the survey occurred in person, over the phone, via email or through another form of communication. Think about your research plan and what types of information you will or may gather, then create corresponding folders. Be sure to create folders that may be of use in the future so you can immediately organize that information without disrupting your current file management system.
Once a system has been established, make sure all team members have access: No matter how you decide to store your files, take time to ensure all project members can access the folders and documents that they need. If there are passwords attached to folders or documents, store those passwords in a secure location that members can locate.
Create a reference file management document: Once you and your team have established a file management system, create a separate document that notes the systems and software you plan to use, folder passwords, file naming conventions and other relevant information. Store this document in a location that is accessible and use it as a reference over the course of the research project. If changes are made to the file management system, be sure to update this document to reflect the new changes.

Resources are available to guide you through the file management development and maintenance process. Read our previous NIH Data Management and Sharing policies articles to gain valuable information on the new policy and how to comply with it. Our File Naming Conventions article offers insight on how to name your files in a clear and consistent manner and our File Storage and Backup Best Practices article discusses storage tools, the importance of saving your files in multiple locations and data security.

GW IT has a breakdown on the different document management services available to GW faculty and staff. This webpage explains the difference between regulated, restricted and public data and has a guide to help you determine which service best meets your research needs.

Lastly, if you’re unsure of where to begin with creating a file management system MIT Libraries’ Data Management Guide provides a worksheet that walks you through the process of creating a file management hierarchy. Follow the steps from the very beginning or pick sections from the worksheet to help you develop a file management system for your research project.

File management may feel like a daunting task. By reflecting on the goals of your management system and developing a plan before collecting data, you will avoid losing research data or navigating unorganized folders and files. The staff at Himmelfarb Library are here to help you understand file management or any topic related to the new NIH policy. Be sure to read our previous articles or browse our NIH Data Management & Sharing Plan (DMSP) Research Guide. Continue to follow our Himmelfarb Library News site for future data management articles!

References:

MIT Libraries. (n.d.). Organize Your Files. Data Management. https://libraries.mit.edu/data-management/store/organize/

Lamar Soutter Library. (November 22, 2022). File Management. Research Data Management Resources. https://libraryguides.umassmed.edu/research_data_management_resources/file_management
Microsoft 365 Team. (June 15, 2021). 11 ideas for how to organize digital files. Microsoft: Business Insights and Ideas. https://www.microsoft.com/en-us/microsoft-365/business-insights-ideas/resources/11-ideas-for-how-to-organize-digital-files

File Storage and Backup Best Practices

Ruth BueterNovember 23, 2022November 22, 2022

Picture of external hard drive. — Photo by Avinash Kumar: https://www.pexels.com/photo/an-external-storage-drive-on-wooden-table-13595074/

This is the fifth article in a series on the changes to the NIH Data Management and Sharing policies that will go into effect for NIH grant applications starting January 2023. For more information, see our previous articles on data management.

File storage is an important piece of data management. While conducting your research, you’ll be saving and accessing your data often, so thinking about where and how to store this data before you begin your research is important. However, keep in mind that data storage is different from data preservation. Data storage addresses storage options during the active research process, while data preservation deals with the long-term storage of research data following the completion of a research project (Washington State University Libraries [WSU Libraries], 2022). And remember - backing up your files is an important piece of file storage.

The Rule of Three

When it comes to file storage, the best practice is to follow the “rule of three:”

THREE copies of every file
TWO different media types (i.e. types of storage such as local/hard drive and cloud)
ONE copy in a different location (Cornell, 2022).

Another way to think about the rule of three is “here, near, and far.” In this model of the rule of three, you’ll still want to keep at least three copies of your data. Keep one copy “here” - a local copy on your laptop or desktop computer. Keep a second copy “near” - an external copy on a different device (such as an external hard drive or a network drive). And keep a third copy “far” - an external copy in a geographically different location, such as in the cloud (Cornell, 2022). This strategy will ensure that if you lose a copy of a file, you will have it in other locations. However, be sure to save all files in all locations after every change or edit to a document. Having files in multiple locations is only helpful if each copy is updated with the most current version of the document.

Where Can You Store Files?

There are a variety of different options available when it comes to storing your research data and files. Local hardware, such as your desktop or laptop computer, and external storage devices such as external hard drives can be convenient options. However, this storage strategy can be risky due to the threat of damage, loss, theft, or obsolescence of these devices (Himmelfarb Health Sciences Library [Himmelfarb], 2022). In order to help prevent theft, damage, or loss, it’s best to store external hard drives away from your computer (WSU Libraries, 2022). Network drives are typically very stable and secure since they are controlled by your institution's IT division (WSU Libraries, 2022). However, be aware that many network drives have size restrictions that make it an unrealistic option if you create large amounts of data.

Remote storage, also known as cloud storage, is an option that stores your files on remotely located servers. This option can often cost money and it’s important to read and understand the terms of service before storing your data on the cloud. Many funders and institutions require any sensitive data to be stored on cloud services whose servers are located in the United States, so be sure to investigate where the servers are before saving your files (Himmelfarb, 2022). GW Box is the university’s enterprise file-sharing service for online cloud storage and collaboration. GW Box is free for all GW students, faculty, and staff. Cloud storage such as Google Drive and Box can be synchronized with your computer, which makes backing up your files easy (WSU Libraries, 2022). If your research required high-performance computing for data analysis, GW High-Performance Computing could be a good solution. Another cloud option is the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability Initiative, also known as NIH STRIDES. More information about NIH STRIDES initiative is available through the NIH Office of Data Science Strategy.

Storage Formats

For long-term storage, it’s best to use formats that are unencrypted and uncompressed so the files will remain readable in the future. Formats that are open, well-documented, and widely used will help ensure your files will remain accessible and usable in the long term (Himmelfarb, 2022). Preferred file formats include:

Text: DOCX, ODT, PDF
Databases: XML, SQLite
Tabulated data: CSV
Images: PNG, JPEG, TIFF
Sound: MP3, WAVE
Video: MP4

Data Security

Data security is a key concern when it comes to storing your files, especially when data in your files contain potentially sensitive information. It’s important to think carefully about and include data security in your data management plan. Some important considerations include:

Who will be responsible for storing and backing up your data? How frequently will this be done?
How will you manage access to your data? Consider physical access to hardware. Where will you store computers and external hard drives? Will these be password protected?
How will you secure hardware for locally stored data? Will you use firewalls? How will you update antivirus protection? Who will update the software and how often?
How will you keep the integrity of your data? Will you use encryption, watermarking, or digital signatures?

(Himmelfarb, 2022)

What’s Your Backup Plan?

Creating a backup plan for your files will prevent the loss of data in the event of losing files or data from disasters such as fire or flood, theft, unauthorized use, or hardware/software malfunctions (Himmelfarb, 2022). Following the rule of three described above provides a great level of protection through multiple copies in a variety of locations. Knowing how to recover data from your backups before you need to in an emergency is also highly recommended (Cornell, 2022). Having two backups of your data, one locally on a device other than your main workstation, and another remotely is a great way to backup your data.

Having a regular backup routine is also important. A full backup, backing up each file every time you do a backup, allows you to retrieve all of your data if you need to do so. However, this method takes a lot of time and resources. Another option is to do incremental backups. During incremental backups, you only need to back up files that have been edited or changed since your last backup (Himmelfarb, 2022).

To Learn More:

To learn more about storage options, take a look at the Storage Options page of our NIH Data Management and Sharing Plan Research Guide.

References:

Cornell University. (June 2022). Data storage and backup. Research data management service group: Comprehensive data management planning & services. https://data.research.cornell.edu/content/data-storage-and-backup

Himmelfarb Health Sciences Library. (November 14, 2022). Storage options. NIH data management & sharing plan (DMSP) Research Guide. https://guides.himmelfarb.gwu.edu/NIHDMSPolicy/storage-options

Washington State University Libraries. (January 10, 2022). Data storage & backup. Research data management. https://libguides.libraries.wsu.edu/rdmlibguide/datastorage

NIH Data Management and Sharing Plan Research Guide Is Here to Answer Your Questions

jltNovember 16, 2022November 15, 2022

Himmelfarb Library just launched a Research Guide on the NIH’s Data Management and Sharing Plan (DMSP) requirements that come into effect in the new year. If you are applying for a research grant or renewing an existing grant through NIH on or after January 23, 2023 that will generate scientific data, you will need to comply with the new requirements and include a plan in your grant application.

*Screenshot of Himmelfarb Library's NIH Data Management and Sharing Plan (DMSP) Research Guide*

The new research guide will help step you through the process of developing a plan, from general information to get you started, to storing, sharing and budgeting options. If you’re unsure if you need to comply, it defines what scientific data is and what activity codes are subject to the policy. The guide also includes freely available tools and sample DMSPs from a variety of sources.

Understanding and applying FAIR principles are key to a successful DMSP. FAIR stands for Findable, Accessible, Interoperable and Reusable. The Getting Started page breaks down the FAIR principles and how to apply them to your data practices.

The guide includes videos from a variety of sources, including a two webinar series from the NIH that provide an overview and more in depth look at the policy. You’ll find other videos on the Getting Started page on commonly used tools including the DMPTool site, NLM Common Data Elements Repository, and LabArchive.

Storage options available to you at GW are covered as well as options for sharing archived data through general and specialized repositories. Guidance on estimating costs and building them into your grant round out the guide.

Through the fall we’ve published a series of blog posts on data management and sharing, including data management resources, best practices for writing a data management plan and file naming conventions. All of these articles are linked on the guide’s homepage. We are planning additional blog posts in the coming months, so stay tuned to this space and check the Research Guide for updates and new materials.

Questions about DMSPs that you can’t find answers to in our research guide? You can reach out directly to Sara Hoover, Metadata and Scholarly Publishing Librarian, at shoover@gwu.edu for more information and guidance.

File Naming Conventions

Ruth BueterOctober 21, 2022October 18, 2022

Photo from https://www.pxfuel.com/en/free-photo-jylug

This is the third article in a series on the changes to the NIH Data Management and Sharing policies that will come into effect for NIH grant applications starting January 2023. For more information, see our first article for a general overview of data management resources and our second article for writing a data management plan best practices.

With the new 2023 NIH Data Management and Sharing Policy scheduled to take effect in January 2023, file naming conventions are an important piece of the data management puzzle. This new policy encourages project teams to agree on file naming conventions for objects and files and follow file naming convention best practices. This post will explore current best practices for file naming conventions.

Why Use Standardized File Naming Conventions?

Creating standardized file naming conventions is an important part of the research process. Standard file names are a great way to keep your research organized while ensuring that files can be easily located and identified by everyone in the research group. Using standard file naming conventions will also help future users find and understand the data after the project has ended. Using standardized and descriptive file names will help streamline the workflow by helping users easily identify the contents of a file without having to open the file (Univ. of Michigan Library, 2022).

The best time to develop a file naming convention is before you start your research project. Having a file naming convention in place before you start the project will prevent your project from having a backlog of unorganized files, which can lead to misplaced or lost data (Longwood Research Data Management, 2022). Your research group should decide at the outset of your project what naming conventions will be used. Once a file naming convention has been agreed upon by the research group, it must be consistently followed by all members of the group. If the naming convention isn’t followed, data could become difficult to find, making it unusable.

What Should Be Included in File Names?

File names should be descriptive enough to capture relevant information about the file, so try to build two or three salient characteristics of the project and dataset into each file name (University of Michigan Library, 2022). Think about the types of files you’ll be working with and the types of information each file will contain when developing your file naming convention. For example, what groups of files will your naming convention cover? Are different naming conventions needed for different sets of files? Does your group, department, or discipline already have file naming conventions in place which could be used?

It’s also a great idea to think about the metadata you’d like to include in each file name. Consider what information should be included to allow users to easily and quickly locate or search for a needed file. Since computers arrange files by name, character by character, it’s a good idea to put the most important information at the beginning of the file name. If finding information by date is a priority, start each file name with a date (see the Standardized Dates section below for more information on using dates in file names). If the type of data is the most important piece of information, start each file name with the type of information instead.

Consider including the following pieces of information in your naming convention structure:

Unique identifiers (such as a grant number)
Project, study, or experiment name or acronym
Location information (such as spatial coordinates)
Researcher initials
Date or date range (in a standardized format)
Experimental conditions (such as instrument, temperature, etc.)
Version number (more information below in the Use Versioning section below)
Type of data (image, dataset, samples, etc.)
Family type, or file extension
Lab name or location

What Should be Avoided in File Names?

While many file naming best practices revolve around what should be included in a file name, there are also best practices related to what should not be included in file names. Here are the top three things to avoid in your file naming conventions:

Spaces: While separating metadata elements is a common practice, avoid using spaces to separate each element. Consider using dashes or underscores instead of spaces. For example, instead of using File Name.xxx, consider using File-Name.xxx or File_Name.xxx instead. You could also consider not separating metadata at all, and using Camel Case to eliminate spaces: FileName.xxx
Special Characters: Avoid using special characters such as @ # $ % & * in file names. Limit file names to alphanumeric characters.
Long File Names: In general, file names should be kept to 30 characters or less. Shorter file names will make it easier for users to identify the contents of the file. Longer file names may not be readable by software programs.

Standardizing Dates

When including dates in file names, using International Organization of Standardization (ISO) standards is generally considered to be the best practice. Dates should be formatted starting with the four-digit year, followed by the two-digit month, and two-digit day:

YYYYMMDD (ex: 20221021)
YYYY-MM-DD (ex: 2022-10-21)

Use Version Control!

Many research projects involve creating and maintaining multiple versions of the same file. If this is the case for your research project, be sure to use versioning to indicate the most current version of files. Using file versioning not only helps you keep track of which file is the most recent update, but it also provides you with the ability to revert data to an earlier version without starting from scratch or having to regenerate data (Cornell University, 2022).

Some tools such as electronic lab notebooks or Box allow you to assign version numbers, but you can create version control by building versioning into your file naming convention. You can track versions by adding version information to the end of a file name. Here’s an example:

File_Name_v001.xxx
File_Name_v002.xxx
File_Name_v003.xxx

You can also include the date to indicate a version number:

File_Name_20220213.xxx
File_Name_20220321.xxx
File_Name_20220601.xxx

Avoid using ambiguous labels, such as “revision” or “final” in your file names. It’s also a good idea to save your original, untouched raw data and leave it that way. Having this raw data saved will allow you to always have the original data as a safe, untouched copy.

Standardized Numbers - Use a Leading 0!

If sequential numbering is part of your file naming structure, use leading zeros. For example, instead of using 1, 2, 3, use 001, 002, 003. This will ensure that your files will be sorted in an easily findable manner. This applies to version control numbering as well.

Directory Structure Naming Conventions

File naming conventions don’t just apply to your files, use the same best practices to structure your directory folders as well. Directory folders should provide key information about the file contents stored within each folder. Be sure to include the project title, unique identifiers, and the date. It might be helpful to create a brief description of the content stored in major folders and to provide an overview of the directory structure in your documentation. The level of detail included should be enough to help someone understand the contents and organization of the files.

Here’s a nice example: (Cornell University, 2022)

Top Folder: Study_name
- Subfolder1: Study_name_Datasets
- Study_name_2019-2020.csv
- Study_name_2021-2022.csv
Subfolder2: Study_name_Semanitc_analysis
- Study_name_semantic_analysis.R
- Study_name_semantic_analysis_output.csv
Readme File: Study_name_readme.txt

Document Naming Conventions

Be sure to document each file naming convention in a top-level readme file. This file should include instructions for navigating the structure so that others involved in the research project, and others who might use this data once the project is complete can follow the naming conventions used. This file can be a README.txt file and should be kept with your files.

File naming conventions are an essential part of any research project! Be sure to take the time to create a file naming convention that will help keep your files organized, easily findable, and usable by your research team and any others who may look at your data once your project is finished. Stay tuned for future posts on best practices related to other data management topics!

References:

Cornell University. (June 2022). File management. Research Data Management Service Group: Comprehensive Data Management Planning & Services. https://data.research.cornell.edu/content/file-management

Longwood Research Data Management: Harvard Medical School. (2022). File Naming Conventions. Data Management, Harvard Medical School. https://datamanagement.hms.harvard.edu/collect/file-naming-conventions
University of Michigan Library (September 14, 2022). File-naming conventions. LabArchives: Best practices for research data management. https://guides.lib.umich.edu/c.php?g=739306&p=5286418

Best Practices for Writing a Data Management Plan

jltSeptember 28, 2022January 30, 2024

Graphic image of computer screen with connections to gears, light bulb and bulls eye from pxfuel — Photo from https://www.pxfuel.com/en/free-photo-jylug

This is the second in a series of articles on the changes to the NIH Data Management and Sharing policies that will come into effect for NIH grant applications starting January 2023. See our first article for a general overview.

If you’re preparing to apply for an NIH grant, having a plan to manage and share your data just turned up on your to-do list. Currently, only grants of $500,000 or more are required to have a data management plan. Effective January 25, 2023, ALL grant applications or renewals that generate scientific data must include a detailed plan related to managing and storing data through the duration of the funded period, including plans for data dissemination. NIH just released a list of activity codes for grants that will be subject to the new policy last week. Where do you start? What should be included in this plan? We’ll provide some answers and resources to guide you here.

All data management plans should incorporate the FAIR (Findable, Accessible, Interoperable, Reusable) principles to ensure optimal research data stewardship. Beyond following FAIR guidelines, what are the specific elements that must be included in a data management plan? Here’s an outline of things to include and think through:

Who will be responsible for the data?

Usually, data is owned by the institution awarded the grant and the principal investigator is responsible for data collection and management.
If there are others responsible, this should be documented in the plan.

What types of data will be generated and where will they come from? Create a descriptive list of all the data that will be collected during the research process, as well as an estimate of how much data will be generated. Further things to consider include:

Why is it desirable to share this data and how could it be re-used? All data that is required to replicate results should be shared.
Are there any risks to disclosing this data? If any data cannot be shared due to legal, ethical, or technical reasons, exceptions for sharing can be written into the plan. However, all data must be managed.
At what point in the research process should data be shared? Will it be in a usable format at that time?
If you’re using data from other sources, include the source and any conditions for using it, also what relationship it may have to the original data generated during the research.

What formats and standards will be used for your data?

Non-proprietary file formats (.csv or .txt or XML or PDF, for example) are preferred. This ensures they will be readable in the future and is important for preservation.
Consider using a directory structure with a formalized naming convention and version control to better organize your data. Learn more about file management naming conventions from Cornell.

What formats and standards will be used for your metadata? Metadata describes your data and makes it findable.

Metadata elements to include/consider are a descriptive title, subject/keywords, file format, a unique identifier (such as a DOI), rights, and contact information.
Determine what metadata schema will work best for the research. This could be a general schema like Dublin Core, or a discipline-specific schema like Darwin Core for biological data.
Should a controlled vocabulary like MeSH be used to standardize the metadata? This will make it more findable.
Learn more about metadata on our data documentation and metadata page and check out Cornell’s best practices for writing “readme” style metadata.

What will be the methods for archiving and sharing the data?

Where will the data be stored during the research process and how will it be backed up and secured (is encryption required)? Find tips on our data storage and security page.
How will the data be made accessible after the research is complete? Find some options on our data repositories page. Cornell has considerations for selecting a repository site on their Sharing and archiving data page.
Determine the rights for sharing. A CC0 or CC-BY license is recommended when possible, but there may be commercial or intellectual property limitations for your research. Learn more about data licensing and protection in this guide from Cornell and about GW’s policies for sharing data.
Will any tools and software be needed to work with the data and metadata? How will those be provided?
How long should the data be preserved and made available? It may not be necessary or practical to preserve all the data in perpetuity. Making plans for how long it should be available is important to selecting a repository site.

Additional Resources:

Himmelfarb Library has a Research Guide on Data Management that covers FAIR principles, funder requirements, data management plans, and more.
Gelman Library’s Research Data Management Guide allows you to book a data services consultation with Data Services Librarian, Ann James.
NIH has a website on Scientific Data Sharing that has been updated to reflect the new policy and they recently hosted a two-part webinar series on the policy and how to comply. You can view recordings here.
Cornell University provides a comprehensive Data Management Planning Site.
The University of Michigan has a sample data management plan from the Inter-university Consortium for Political and Social Research (ICPSR) and a step-by-step framework for creating a plan with examples.
The California Digital Library makes public data management plans accessible for searching and viewing on its DMPTool site. You can specify Funder in the column to the left to limit to NIH or NSF grants.

If you have questions about creating data management plans or need further resources or information for guidance, contact Sara Hoover, Metadata and Scholarly Publishing Librarian at shoover@gwu.edu.

Preparing for the NIH Data Management & Sharing Policy: Data Management Resources

Ruth BueterAugust 31, 2022January 30, 2024

With the 2023 NIH Data Management and Sharing Policy going into effect on January 25, 2023, there’s no better time to explore data management resources! This post explores resources that can help you with your data management needs.

What is data management?

Data management involves the process of collecting or producing, cleaning and analyzing, preserving, and sharing data from a research project. Data management takes place throughout the entire research life cycle, from deciding on consistent file naming conventions to depositing the data in a repository for long-term archiving.

Why Data Management?

Data management is vital for transparency (showing your work promotes reproducibility of work), compliance (funding organizations and journals often require making data available), and personal and organizational benefit (using data within your own lab is easier with proper management).

I Think It’s FAIR to Say…

Understanding data management best practices is important to make well-informed decisions when selecting data management resources and tools. The FAIR Principles, first published in 2016, provide a set of guidelines for data management. FAIR stands for Findable, Accessible, Interoperable, and Reusable. You can learn more about the FAIR Principles on our Data Management Guide. Another great resource to help guide your data management is Cornell University’s Research Data Management Service Group’s Comprehensive Data Management Planning and Services Best Practices which provides extensive information related to best practices for:

Broad Data Management Resources

Himmelfarb’s Data Management Guide provides a wealth of information and resources related to data management. In addition to some basic information about data management, you’ll find information about NIH and NSF funder requirements. Data management plans (DMPs) are also covered in detail. The documentation and metadata page explains what metadata is, what should be included in your metadata, metadata schemas, controlled vocabularies, file naming conventions, and electronic lab notebooks. The data storage and security page includes data storage, storage formats, creating a backup plan, and data security. You’ll also learn about data sharing, including GW’s policy on regulated information, and data repositories.

I might need to make a plan for this…

Creating a data management plan (DMP) is often part of the grant writing process required by funding institutions. A comprehensive data management plan should address:

Data Collection: Must be reliable and valid.
Data Storage: Appropriate amount of data so research can be reproduced.
Data Analysis: Interpretation of data from which conclusions can be derived.
Data Protection: Ensuring sensitive data is safe and secure, preventing tampering or loss of data.
Data Ownership: Addresses legal rights associated with data.
Data Retention: Addresses how long data should be kept and proper disposal of sensitive data.
Data Reporting: Publication of data.
Data Sharing: Addresses what data can be shared with others and how.

When it comes to creating a DMP, there are a number of tools available to help! The DMPTool is a free, open-source tool that helps researchers create DMPs that comply with funder requirements. DMPTool also provides links to funder websites, and best practices resources to help guide your data management efforts. Since GW is affiliated with DMPTool, GW users can create a personalized dashboard that allows them to see and organize the DMPs created through the tool. From the DMPTool’s website, simply click “sign in” and use Option 1 to search for George Washington University. Then log in with your GW UserID and password and create your data management plan!

The Framework for Creating a Data Management Plan, created by ICPSR, is a great outline that will help you create a DMP for your grant application. The framework includes a list of elements to be included, explains why each element is important and provides examples for each element. Michener’s article Ten Simple Rules for Creating a Good Data Management Plan is another great starting point to gain an understanding of the principles and practices of creating a DMP and ensuring your data are safe and shareable. For more DMP resources and to see examples and templates, check out the Data Management Plan page of the data management guide.

What’s Next?

Stay tuned for future posts on best practices for writing a data management plan, data storage, file naming conventions, creating “readme” metadata, and other data management topics. In the meantime, check out the lists of GW resources and additional resources below to learn more!

Additional GW Resources:

Biostatistics & Epidemiology Consulting Service (BECS)
Data Classification Guide (GW Office of Ethics, Compliance & Privacy)
Data Protection Guide (GW Office of Ethics, Compliance & Privacy)
CCAS Cloud Data Storage (GW Information Technology)
GW Box (GW Information Technology)
Backup, Storage, & Document Management (GW Information Technology)
REDCap: Research Electronic Data Capture (GW SMHS)

Additional Data Management Resources:

Biomedical Data Lifecycle (Harvard): Includes a Research Data Management Checklist.
Borghi, J., Abrams, S., Lowenberg, D., Simms, S., & Chodacki, J. (2018). Support your data: A research data management guide for researchers. Research Ideas and Outcomes, 4, e26439.
Introduction to the NIH Data Management and Sharing Plan (video)
NIH Data Sharing: Coming 2023 (Virginia Commonwealth University guide)
Understanding the New NIH Data Management and Sharing (DMS) Policy (video)

Category: Data Management