This is the second in a series of articles on the changes to the NIH Data Management and Sharing policies that will come into effect for NIH grant applications starting January 2023. See our first article for a general overview.
If you’re preparing to apply for an NIH grant, having a plan to manage and share your data just turned up on your to-do list. Currently, only grants of $500,000 or more are required to have a data management plan. Effective January 25, 2023, ALL grant applications or renewals that generate scientific data must include a detailed plan related to managing and storing data through the duration of the funded period, including plans for data dissemination. NIH just released a list of activity codes for grants that will be subject to the new policy last week. Where do you start? What should be included in this plan? We’ll provide some answers and resources to guide you here.
All data management plans should incorporate the FAIR (Findable, Accessible, Interoperable, Reusable) principles to ensure optimal research data stewardship. Beyond following FAIR guidelines, what are the specific elements that must be included in a data management plan? Here’s an outline of things to include and think through:
Who will be responsible for the data?
- Usually, data is owned by the institution awarded the grant and the principal investigator is responsible for data collection and management.
- If there are others responsible, this should be documented in the plan.
What types of data will be generated and where will they come from? Create a descriptive list of all the data that will be collected during the research process, as well as an estimate of how much data will be generated. Further things to consider include:
- Why is it desirable to share this data and how could it be re-used? All data that is required to replicate results should be shared.
- Are there any risks to disclosing this data? If any data cannot be shared due to legal, ethical, or technical reasons, exceptions for sharing can be written into the plan. However, all data must be managed.
- At what point in the research process should data be shared? Will it be in a usable format at that time?
- If you’re using data from other sources, include the source and any conditions for using it, also what relationship it may have to the original data generated during the research.
What formats and standards will be used for your data?
- Non-proprietary file formats (.csv or .txt or XML or PDF, for example) are preferred. This ensures they will be readable in the future and is important for preservation.
- Consider using a directory structure with a formalized naming convention and version control to better organize your data. Learn more about file management naming conventions from Cornell.
What formats and standards will be used for your metadata? Metadata describes your data and makes it findable.
- Metadata elements to include/consider are a descriptive title, subject/keywords, file format, a unique identifier (such as a DOI), rights, and contact information.
- Determine what metadata schema will work best for the research. This could be a general schema like Dublin Core, or a discipline-specific schema like Darwin Core for biological data.
- Should a controlled vocabulary like MeSH be used to standardize the metadata? This will make it more findable.
- Learn more about metadata on our data documentation and metadata page and check out Cornell’s best practices for writing “readme” style metadata.
What will be the methods for archiving and sharing the data?
- Where will the data be stored during the research process and how will it be backed up and secured (is encryption required)? Find tips on our data storage and security page.
- How will the data be made accessible after the research is complete? Find some options on our data repositories page. Cornell has considerations for selecting a repository site on their Sharing and archiving data page.
- Determine the rights for sharing. A CC0 or CC-BY license is recommended when possible, but there may be commercial or intellectual property limitations for your research. Learn more about data licensing and protection in this guide from Cornell and about GW’s policies for sharing data.
- Will any tools and software be needed to work with the data and metadata? How will those be provided?
- How long should the data be preserved and made available? It may not be necessary or practical to preserve all the data in perpetuity. Making plans for how long it should be available is important to selecting a repository site.
Additional Resources:
- Himmelfarb Library has a Research Guide on Data Management that covers FAIR principles, funder requirements, data management plans, and more.
- Gelman Library’s Research Data Management Guide allows you to book a data services consultation with Data Services Librarian, Ann James.
- NIH has a website on Scientific Data Sharing that has been updated to reflect the new policy and they recently hosted a two-part webinar series on the policy and how to comply. You can view recordings here.
- Cornell University provides a comprehensive Data Management Planning Site.
- The University of Michigan has a sample data management plan from the Inter-university Consortium for Political and Social Research (ICPSR) and a step-by-step framework for creating a plan with examples.
- The California Digital Library makes public data management plans accessible for searching and viewing on its DMPTool site. You can specify Funder in the column to the left to limit to NIH or NSF grants.
If you have questions about creating data management plans or need further resources or information for guidance, contact Sara Hoover, Metadata and Scholarly Publishing Librarian at shoover@gwu.edu.