Skip to content

photo of coffee in teacup with open notebook, pen and laptop
Image from pxfuel.com

Himmelfarb Library’s Scholarly Communications Committee produces short tutorial videos on scholarly publishing and communications topics for SMHS, GWSPH, and GW School of Nursing students, faculty, and staff. Five new videos are now available on our YouTube channel and Scholarly Publishing Research Guide!

2023 NIH Data Management and Sharing Policy Resources by Sara Hoover - Sara is our resident expert on data management policy and resources. She provides an overview of the NIH policy, the essential elements of a data management and sharing plan, and highlights GW and non-GW resources that can aid you in putting together a data management and sharing plan. The video is 10 minutes in length. 

Animal Research Alternatives by Paul Levett - Paul demonstrates how to conduct 3Rs alternatives literature searches for animal research protocols. He defines the 3Rs and explains how to report the search in the GW Institutional Animal Care and Use Committee (IACUC) application form. Paul is currently a member of the GW IACUC. The video is 13 minutes long.

Artificial Intelligence Tools and Citations by Brittany Smith - As a Library Science graduate student, Brittany has an interest in how AI is impacting the student experience. She discusses how tools like Chat GPT can assist with your research, the GW policy on AI, and how to create citations for these resources. The video is 6.5 minutes in length.

UN Sustainable Development Goals: Finding Publications by Stacy Brody - Stacy addresses why the goals were developed, what they hope to achieve, and shows ways to find related publications in Scopus. The video is 5 minutes long.

Updating Your Biosketch via SciEncv by Tom Harrod - Tom talks about the differences between NIH’s SciEncv and Biosketch and demonstrates how to use SciEncv to populate a Biosketch profile. Tom advises GW SMHS, School of Nursing, and GWSPH researchers on creating and maintaining research profiles and he and Sara provide research profile audit services. The video is 5 minutes long.

You can find the rest of the videos in the Scholarly Communications series in this YouTube playlist or on the Scholarly Publishing Research Guide.

Image of orange buttons with Open Access logo using the letters O and A to form an open padlock in a white bowl.
"Open Access promomateriaal" by biblioteekje is licensed under CC BY-NC-SA 2.0.

What is Open Access?

Open access (OA) journals make content available to anyone free of charge. While traditional publishing models require readers or institutions to purchase subscriptions to gain access to published content, users attempting to access this content without a subscription will find the content hidden behind a paywall. OA articles, on the other hand, can be accessed and read by anyone without payment or a subscription. 

The two most common OA publishing models are Gold OA and Hybrid OA. Gold OA journals make all published articles available to readers free of charge. Hybrid OA journals publish OA articles that are free to all readers, as well as traditional articles that can only be accessed and read by subscribers who pay for that content. Hybrid OA journals let authors choose whether or not to make their research available as open access or to restrict access via the traditional paywall model.

Article Processing Charges (APCs)

While publishing your research as OA makes your work more widely accessible, it does come at a cost to the author. OA journals transfer the cost of publication from the reader to the author by charging authors Article Processing Charges, also known as APCs. The cost of APCs varies by journal, but the cost range from $2,000 to $5,000 for health sciences journals.

If you’d like to publish your research as OA, it’s important to consider how you will pay for APCs early in your research process. We recommend that you request funding for APCs in grant and funding proposals. Building these costs into your funding proposals will ensure that you have the necessary funds needed to cover APCs when you’re ready to publish. NIH grants and NSF grants allow for publication costs to be included in grant applications - so be sure to secure funding from the start of the research process!

To learn more about APCS, take a few minutes to watch Himmelfarb’s tutorials on Locating APCs and Including APCs in Funding Proposals!

Locating Article Publishing Charges (APCs) tutorial:

Including APCs in Funding Proposals tutorial:

APCs Waived for GW Authors!

GW currently has active “transformative agreements” with two publishers: Cambridge University Press, and The Company of Biologists. These agreements allow GW authors to publish their research as open access at no cost to authors - APCs are waived! The Cambridge University Press agreement covers nearly 50 medicine and health sciences journals. The Company of Biologist agreement waives APCs for GW authors in the following three hybrid journals:

It’s important to note that these agreements do not guarantee acceptance for publication in these journals. Manuscripts must meet the journal’s acceptance criteria. Authors must also use GW as their primary affiliation upon manuscript submission. Authors who claim another organization (such as the MFA, GW Hospital, CNHS, or the VA) are not covered under these agreements. For more information about GW’s Read and Publish agreements with Cambridge University Press and The Company of Biologist, contact Ruth Bueter at rbueter@gwu.edu.

Learn More:

If you’d like to learn more about open access publishing, check out our Open Access Publishing page of the Scholarly Publishing Research Guide

Decorative image of stacks of paper.
Photo by Christa Dodoo on Unsplash

Hindawi, an Open Access journal publisher once identified by Jeffrey Beall as a potentially predatory publisher and later labeled as a “borderline case” by Beall, has made great efforts to transform its reputation into that of a reputable, scholarly publisher. The publisher was purchased by Wiley in January 2021, and many hoped that the purchase would add a layer of trustworthiness and legitimacy to the once-questionable publisher’s practices. Recent developments have proved that the path toward implementing scholarly publishing best practices is a long, uphill struggle for Hindawi and Wiley.

In late March 2023, Clarivate removed 19 Hindawi journals from Web of Science when they released the monthly update of their Master Journal List. These 19 journals published 50% of articles published in Hindawi journals in 2022 (Petrou, 2023). The removal of these journals from Web of Science comes after Wiley disclosed they were suspending the publication of special issues due to “compromised articles” (Kincaid, 2023). 

Web of Science dropped 50 journals from its index in March for failure to meet 24 of the quality criteria required to be included in Web of Science. Common quality violations included: adequate peer review, appropriate citations, and content that was irrelevant to the scope of the journal. The 19 Hindawi journals removed from Web of Science accounted for 38% of the total 50 journals removed from the index! Health sciences titles published by Hindawi that were removed from Web of Science include:

  • Biomed Research International
  • Disease Markers
  • Evidence-Based Complementary and Alternative Medicine
  • Journal of Environmental and Public Health
  • Journal of Healthcare Engineering
  • Journal of Oncology
  • Oxidative Medicine and Cellular Longevity

The potential impact of this decision could be significant for authors. When a journal is no longer included in Web of Science, Clarivate no longer indexes the papers published in the journal, no longer counts citations from papers published in the journal, and no longer calculates an impact factor for the journal (Kincaid, 2023). Authors who publish in these journals will be negatively impacted as many universities factor in these types of metrics into promotion and tenure decisions (Kincaid, 2023).

This is just one of the more recent examples of the struggles Hindawi and Wiley have grappled with since Wiley purchased Hindawi in 2021. Last year, Wiley announced the retraction of more than 500 Hindawi papers that had been linked to peer review rings. Hindawi has also been involved in paper mill activity, publishing articles coming out of paper mills in at least nine of the journals that were delisted. Paper mills are “unethical outsourcing agencies proficient in fabricating fraudulent manuscripts submitted to scholarly journals” (Pérez-Neri et al., 2022).

In the early days of paper mills, plagiarism was the biggest concern. However, paper mills have become more sophisticated and are now capable of fabricating data, and images, and producing fake study results (Pérez-Neri et al., 2022). Pérez-Neri et al. reviewed 325 retracted articles with suspected paper mill involvement from 31 journals and found that these retracted articles produced 3,708 citations (Pérez-Neri et al., 2022). The study also found a marked increase in retracted paper mill articles with the number of paper mill articles increasing from nine articles in 2016, to 44 articles in 2017, 88 articles in 2018, and 109 articles in 2019 (Pérez-Neri et al., 2022). Nearly half of the analyzed retracted papers (45%) were from the health sciences fields (Pérez-Neri et al., 2022). In a pre-print analysis of Hindawi’s paper mill activity, Dorothy Bishop found that paper mills target journals “precisely because they are included in WoS [Web of Science], which gives them kudos and means that any citations count towards indicators such as H-index, which is used by many institutions in hiring and promotion” (Bishop, 2023). 

Another model that has become popular among questionable journals is the “guest editor” model, in which a journal invites a scholar or group of scholars to serve as guest editors for a specific issue of papers on the same topic or theme. MDPI is another publisher once identified by Jeffrey Beall as potentially predatory and has made efforts to turn around its reputation and become known as a reputable scholarly publisher. MDPI has used the guest editor model to help grow its business. In a recent post in The Scholarly Kitchen, Christos Petrou wrote that “the Guest Editor model fueled MDPI’s rise, yet it pushed Hindawi off a cliff” (Petrou, 2023). According to Petrou, the guest editor model accounts for at least 60% of MDPI’s papers. Hindawi has also embraced this model in recent years increasing the number of papers published under the guest editor model from 17% of papers in 2019 to 53% of papers published in 2022 (Petrou, 2023). Hindawi’s use of the guest editor model contributed to its exploitation by paper mills, which lead to more than 500 retractions between November 2022 and March 2023 (Petrou, 2023). 

MDPI was not left unscathed by Clarivate’s decision to delist 50 journals from Web of Science. MDPI’s largest journal, the International Journal of Environmental Research and Public Health (IJERPH), which had an Impact Factor above 4.0, was among the titles delisted in March. IJERPH was delisted for publishing content that was not relevant to the journal’s scope. Petrou argued that while this is likely a problem for hundreds of other journals, Web of Science “sent a message by going after the largest journal of MDPI” (Petrou, 2023). 

The guest editor model is no longer used exclusively by questionable publishers and has slowly been embraced by traditional scholarly publishers. Petrou encourages publishers interested in the guest editor model to implement transparent safeguards into this model to uphold editorial integrity. While the scholarly publishing landscape continues to evolve and formerly questionable publishers attempt to gain legitimacy and stabilize their reputations, we must remain vigilant in evaluating the journals in which we choose to publish. Likewise, scholarly publishers must address the research integrity of the articles they publish by ensuring safeguards are in place to prevent the proliferation of paper mill-produced papers from making it through the peer review and screening process and ending up as published papers in trusted journals, only to be retracted once they have been exposed as fraudulent. 

References:

Bishop, D. V. M. (2023, February 6). Red flags for paper mills need to go beyond the level of individual articles: a case study of Hindawi special issues. PsyArXiv Preprints. https://doi.org/10.31234/osf.io/6mbgv

Kincaid, E. (2023, March 21). Nearly 20 Hindawi journals delisted from leading index amid concerns of papermill activity. Retraction Watch. https://retractionwatch.com/2023/03/21/nearly-20-hindawi-journals-delisted-from-leading-index-amid-concerns-of-papermill-activity/#more-126734

Pérez-Neri, I., Pineda, C., & Sandoval, H. (2022). Threats to scholarly research integrity arising from paper mills: A rapid scoping review. Clinical Rheumatology, 41(7), 2241-2248. https://doi.org/10.1007/s10067-022-06198-9

Petrou, C. (2023, March 30). Guest post: Of special issues and journal purges. The Scholarly Kitchen. https://scholarlykitchen.sspnet.org/2023/03/30/guest-post-of-special-issues-and-journal-purges/?informz=1&nbd=c6b4a896-51b7-4c2b-abe8-f0c60236adc9&nbd_source=informz

In 2021, UNESCO developed a Recommendation on Open Science to be adopted by member states. This recommendation evolved from a 2017 Recommendation on Science and Scientific Researchers promoting science as a common good. 

UNESCO’s Recommendation on Science and Scientific Researchers, https://youtu.be/94T7NGirUlM

The Recommendation on Open Science includes a definition of open science:

Open science is … an inclusive construct that combines various movements and practices aiming to make multilingual scientific knowledge openly available, accessible and reusable for everyone, to increase scientific collaborations and sharing of information for the benefits of science and society, and to open the processes of scientific knowledge creation, evaluation and communication to societal actors beyond the traditional scientific community. It comprises all scientific disciplines and aspects of scholarly practices, including basic and applied sciences, natural and social sciences and the humanities, and it builds on the following key pillars: open scientific knowledge, open science infrastructures, science communication, open engagement of societal actors and open dialogue with other knowledge systems.

(UNESCO, 2021.)

The recommendation calls for scientific publications, research data, software and source code to be open and available to scientists internationally. The NIH adopted an open access policy for manuscripts resulting from funded research in 2008. Early this year the NIH adopted a policy requiring that all newly funded grants include a data management and sharing plan to make the underlying research data freely available to other researchers.  For more information on the data sharing policy and how to comply, explore our Research Guide.

Platforms that support the sharing and dissemination of research findings and their underlying data are becoming available. The Open Science Framework (OSF) is a “free, open platform to support your research and enable collaboration”. It provides tools to design a study, collect and analyze data, and publish and share results. OSF was designed and is maintained by the non-profit Center for Open Science. 

A helpful feature of OSF is the ability to generate a unique, persistent URL (uniform resource locator) for a project for sharing and attribution. There is also built-in version control and collaborators can be assigned a hierarchical level of permissions for data and document management. Researchers can decide to make all or parts of a project public and searchable and add licensing. Public projects can be searched on the OSF site. Registering a project creates a timestamped version for preservation. Pre-prints can also be hosted and made available for searching.

OSF has integrations with a number of useful tools including storage add-ons like Amazon S3, Google Drive, DropBox and figshare. Zotero and Mendeley can be integrated for citation management and GitHub can be used for managing software and code. 

Institutions can set up a custom landing page for OSF and build user communities to promote sharing and collaboration within the institution and beyond. Harvard, Johns Hopkins, and NYU are among the many research universities that are using OSF in this way.

Last month Nature and Code Ocean announced a partnership to launch and curate Open Science Library. The Open Science Library contains research software used in Nature journal articles. “Compute capsules” which include the code, data, and computing environment will allow researchers to reproduce results, re-use the code, and collaborate. As open science becomes the norm, more multifunction platforms that enhance sharing and reproducibility while preserving work and ensuring attribution will continue to emerge.

References:

UNESCO. (2021). UNESCO Recommendation on Open Science. https://unesdoc.unesco.org/ark:/48223/pf0000379949.locale=en

UNESCO. (2017). Consolidated Report on the Implementation by Member States of the 2017 Recommendation on Science and Scientific Researchers. https://unesdoc.unesco.org/ark:/48223/pf0000379704

Foster, E. D., & Deardorff, A. (2017). Open Science Framework (OSF). Journal of the Medical Library Association: JMLA, 105(2), 203–206. https://doi.org/10.5195/jmla.2017.88

Code Ocean. (2023). Code Ocean Partners with Nature Portfolio to Launch the Open Science Library with Ready-to-Run Software from Authors in Nature Journals (Press Release). https://codeocean.com/press-release/code-ocean-partners-with-nature-portfolio-open-science-library/

Robotic hand reaches for a mural of white dots and connecting lines displayed on a blue backdrop
Photo credit: Photo by Tara Winstead

OpenAI, an artificial intelligence research and development company, released the latest version of their generative text chatbot program, ChatGPT, near the end of 2022. The program provides responses based on prompts from users. Since its release universities, research institutions, publishers and other educators worry that ChatGPT and similar products will radically change the current education system. Some institutions have taken action to limit or ban the use of AI generated text. Others argue that ChatGPT and similar products may be the perfect opportunity to reimagine education and scholarly publishing. There is a lot to learn about AI and its impact on research and publishing. This article aims to serve as an introduction to this rapidly evolving technology.

In a Nature article, Chris Stokel-Walker described ChatGPT as “a large language model (LLM), which generates convincing sentences by mimicking the statistical patterns of language in a huge database of text collated from the Internet.” (Stokel-Walker, 2023, para. 3) OpenAI’s website says “The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.” (OpenAI, n.d., para. 1) ChatGPT may be used to answer simple and complex questions and may provide long-form responses based on the prompt. In recent months, students and researchers have used the chatbot to perform simple research tasks or develop and draft manuscripts. By automating certain tasks, ChatGPT and other AI technologies may provide people with the opportunity to focus on other aspects of the research or learning process.

There are benefits and limitations to AI technology and many people agree that guidelines must be in place before ChatGPT and similar models are fully integrated into the classroom or laboratory.

Van Dis et al. notes that “Conversational AI is likely to revolutionize research practices and publishing, creating both opportunities and concerns. It might accelerate the innovation process, shorten time-to-publication, and by helping people to write fluently, make science more equitable and increase the diversity of scientific perspectives.” (van Dis et. al., 2023, para. 4) Researchers who have limited or no English language proficiency would benefit from using ChatGPT to develop their manuscript for publication. The current version of ChatGPT is  free to use making it accessible to anyone with internet access and a computer. This may make scholarly publishing more equitable, though there is a version of the program that is only available with a monthly subscription fee. If future AI technologies require fees, this will create additional access and equity issues. 

 While ChatGPT can produce long-form, seemingly thoughtful responses there are concerns about its ability to accurately cite information. OpenAI states that “ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers.” (OpenAI, n.d., para. 7) There is a potential for AI generated text to spread misleading information. Scholars who have tested ChatGPT also note that the AI will create references that do not exist. Researchers must fact-check the sources pulled by the AI to ensure that their work adheres to current integrity standards. There are also concerns about ChatGPT’s relationship to properly citing original sources. “And because this technology typically reproduces text without reliably citing the original sources or authors, researchers using it are at risk of not giving credit to earlier work, unwittingly plagiarizing a multitude of unknown texts and perhaps even giving away their own ideas.” (van Dis et al, 2023, para. 10)

Students and researchers interested in using AI generated text should be aware of current policies and restrictions. Many academic journals, universities and colleges have updated their policies to either limit the use or institute a complete ban of AI in research. Other institutions are actively discussing their plans for this new technology and may implement new policies in the future. At the time of writing, GWU has not shared policies to address AI usage in the classroom. If you’re interested in using AI generated text in your research papers or projects, be sure to closely read submission guidelines or university policies. 

ChatGPT and other AI text generators are having profound impacts and as the technology continues to improve, it will become increasingly difficult distinguishing work written without the aid of an AI and work co-authored with an AI. The long term impacts of AI in the classroom have yet to be fully understood. Many institutions are moving to address this new technology. As we continue to learn about ChatGPT’s benefits and limitations, it is important to remain aware of your institution’s policies on using AI in research. To learn more about ChatGPT, please read any of the sources listed below! Himmelfarb Library will continue to discuss AI technology and its impact on research as more information is made available.

Additional Reading:

Work Cited:

GW Medical Student Research Day is scheduled for Wednesday, April 26, 2023 as a live in-person event at the University Student Center. There will be a plenary speaker and students will have an opportunity to share their research projects with a poster and oral presentation. Videos of past poster presentations are available on our Research Guide and YouTube channel.

Research poster abstract checklist image and March 1st submission due date

Poster abstracts for Medical Student Research Day 2023 will be due on March 1st, a week from today! If you’re just starting to put an abstract together, or putting on the finishing touches, Himmelfarb Library has resources to help. 

You can find much of what you need on our Research Day Resources: Writing Abstracts page. The guide outlines the basic components of a scientific abstract and provides both recommendations and examples for producing a quality abstract. 

If you need more help you can chat our reference librarians or make an appointment with the GW Writing Center. The Writing Center now has Thursday hours from 6-8pm at Himmelfarb Library. We recommend scheduling an appointment in advance by calling 202-994-3765. 

If your poster abstract is accepted, congratulations! Come back to our GW Research Day Resources Research Guide for valuable information and tips for designing a winning poster and presenting it effectively.

NEW in large font on an orange brick wall.
Photo by Nick Fewings on Unsplash

Himmelfarb Library’s Scholarly Communications Committee is pleased to announce five new short video tutorials have been added to our video library! This video library now includes 30 short 3-7 minute videos on a variety of scholarly publishing topics, perfect for microlearning! This round of new videos covers topics including human participant research support, addressing health misinformation and disinformation, using Dimensions Analytics, Cabells Journalytics, and finding an author’s H-Index using Google Scholar and Scopus.

Human Participants Research Support - Fall 2022

Are you interested in learning more about the resources available to support human participant research at George Washington University? This video includes information about the Office of Human Research (OHR), Institutional Review Boards (IRB), and CITI Training available through GW in this short three and half minute video.

Addressing Health Mis- and Dis- Information

This five-minute video discusses how to address health mis- and disinformation. Learn the difference between mis- and disinformation, the different types of mis- and disinformation, why this matters in relation to healthcare providers and health literacy, and how to address mis- and disinformation with patients. 

Dimensions Analytics: An Introduction

Dimensions, a database from Digital Science, tracks research output and has information about grants, publications, datasets, clinical trials, policy documents, and more. This tutorial provides a brief overview of Dimensions Analytics, which allows you to track and visualize research output trends, and allows for more comprehensive functionality. Several examples of use cases are also included.

Cabells Journalytics

This five-minute tutorial provides an overview of Cabells Journalytics, a tool that can be used to evaluate and compare journals in which to publish a manuscript. Learn how to access Cabells Journalytics, and see example journal records to see the depth of information provided about each journal. You’ll also learn how to compare up to five journals.

H-Index: Google Scholar vs. Scopus

In this five-minute tutorial, you’ll learn more about what the H-Index is (a measure of both quantity and quality of research output) and how it is used to track researcher productivity. This tutorial will then walk you through how to find an H-Index using both Google Scholar and Scopus, and why there is sometimes a difference in the H-Index value between these two sources.

This newest installment of videos is part of the Scholarly Communications Committee’s Short Video Series, which covers a wide range of scholarly communications-related topics and covers all phases of the research life cycle. Have a scholarly publishing topic that you’d like us to discuss? We’d love to hear from you! To suggest a topic for an upcoming video, please contact Sara Hoover at shoover@gwu.edu

To learn more about scholarly publishing, check out our Scholarly Publishing Guide. This guide includes resources to help scholars find an appropriate journal in which to publish their research, tips on how to spot and avoid predatory publishers, and information on how to promote and increase the visibility of your published research.

This is the third article in a series on the changes to the NIH Data Management and Sharing policies that will come into effect for NIH grant applications starting January 2023. For more information, see our first article for a general overview of data management resources and our second article for writing a data management plan best practices.

With the new 2023 NIH Data Management and Sharing Policy scheduled to take effect in January 2023, file naming conventions are an important piece of the data management puzzle. This new policy encourages project teams to agree on file naming conventions for objects and files and follow file naming convention best practices. This post will explore current best practices for file naming conventions.

Why Use Standardized File Naming Conventions?

Creating standardized file naming conventions is an important part of the research process. Standard file names are a great way to keep your research organized while ensuring that files can be easily located and identified by everyone in the research group. Using standard file naming conventions will also help future users find and understand the data after the project has ended. Using standardized and descriptive file names will help streamline the workflow by helping users easily identify the contents of a file without having to open the file (Univ. of Michigan Library, 2022).

The best time to develop a file naming convention is before you start your research project. Having a file naming convention in place before you start the project will prevent your project from having a backlog of unorganized files, which can lead to misplaced or lost data (Longwood Research Data Management, 2022). Your research group should decide at the outset of your project what naming conventions will be used. Once a file naming convention has been agreed upon by the research group, it must be consistently followed by all members of the group. If the naming convention isn’t followed, data could become difficult to find, making it unusable. 

What Should Be Included in File Names?

File names should be descriptive enough to capture relevant information about the file, so try to build two or three salient characteristics of the project and dataset into each file name (University of Michigan Library, 2022). Think about the types of files you’ll be working with and the types of information each file will contain when developing your file naming convention. For example, what groups of files will your naming convention cover? Are different naming conventions needed for different sets of files? Does your group, department, or discipline already have file naming conventions in place which could be used? 

It’s also a great idea to think about the metadata you’d like to include in each file name. Consider what information should be included to allow users to easily and quickly locate or search for a needed file. Since computers arrange files by name, character by character, it’s a good idea to put the most important information at the beginning of the file name. If finding information by date is a priority, start each file name with a date (see the Standardized Dates section below for more information on using dates in file names). If the type of data is the most important piece of information, start each file name with the type of information instead. 

Consider including the following pieces of information in your naming convention structure:

  • Unique identifiers (such as a grant number)
  • Project, study, or experiment name or acronym
  • Location information (such as spatial coordinates)
  • Researcher initials
  • Date or date range (in a standardized format)
  • Experimental conditions (such as instrument, temperature, etc.)
  • Version number (more information below in the Use Versioning section below)
  • Type of data (image, dataset, samples, etc.)
  • Family type, or file extension
  • Lab name or location

What Should be Avoided in File Names?

While many file naming best practices revolve around what should be included in a file name, there are also best practices related to what should not be included in file names. Here are the top three things to avoid in your file naming conventions:

  • Spaces: While separating metadata elements is a common practice, avoid using spaces to separate each element. Consider using dashes or underscores instead of spaces. For example, instead of using File Name.xxx, consider using File-Name.xxx or File_Name.xxx instead. You could also consider not separating metadata at all, and using Camel Case to eliminate spaces: FileName.xxx
  • Special Characters: Avoid using special characters such as @ # $ % & * in file names. Limit file names to alphanumeric characters.
  • Long File Names: In general, file names should be kept to 30 characters or less. Shorter file names will make it easier for users to identify the contents of the file. Longer file names may not be readable by software programs.

Standardizing Dates

When including dates in file names, using International Organization of Standardization (ISO) standards is generally considered to be the best practice. Dates should be formatted starting with the four-digit year, followed by the two-digit month, and two-digit day: 

  • YYYYMMDD (ex: 20221021)
  • YYYY-MM-DD (ex: 2022-10-21)

Use Version Control!

Many research projects involve creating and maintaining multiple versions of the same file. If this is the case for your research project, be sure to use versioning to indicate the most current version of files. Using file versioning not only helps you keep track of which file is the most recent update, but it also provides you with the ability to revert data to an earlier version without starting from scratch or having to regenerate data (Cornell University, 2022). 

Some tools such as electronic lab notebooks or Box allow you to assign version numbers, but you can create version control by building versioning into your file naming convention. You can track versions by adding version information to the end of a file name. Here’s an example: 

  • File_Name_v001.xxx
  • File_Name_v002.xxx
  • File_Name_v003.xxx

You can also include the date to indicate a version number:

  • File_Name_20220213.xxx
  • File_Name_20220321.xxx
  • File_Name_20220601.xxx

Avoid using ambiguous labels, such as “revision” or “final” in your file names. It’s also a good idea to save your original, untouched raw data and leave it that way. Having this raw data saved will allow you to always have the original data as a safe, untouched copy. 

Standardized Numbers - Use a Leading 0!

If sequential numbering is part of your file naming structure, use leading zeros. For example, instead of using 1, 2, 3, use 001, 002, 003. This will ensure that your files will be sorted in an easily findable manner. This applies to version control numbering as well.

Directory Structure Naming Conventions

File naming conventions don’t just apply to your files, use the same best practices to structure your directory folders as well. Directory folders should provide key information about the file contents stored within each folder. Be sure to include the project title, unique identifiers, and the date. It might be helpful to create a brief description of the content stored in major folders and to provide an overview of the directory structure in your documentation. The level of detail included should be enough to help someone understand the contents and organization of the files. 

Here’s a nice example: (Cornell University, 2022)

  • Top Folder: Study_name
    • Subfolder1: Study_name_Datasets
    • Study_name_2019-2020.csv
    • Study_name_2021-2022.csv
  • Subfolder2: Study_name_Semanitc_analysis
    • Study_name_semantic_analysis.R
    • Study_name_semantic_analysis_output.csv
  • Readme File: Study_name_readme.txt

Document Naming Conventions

Be sure to document each file naming convention in a top-level readme file. This file should include instructions for navigating the structure so that others involved in the research project, and others who might use this data once the project is complete can follow the naming conventions used. This file can be a README.txt file and should be kept with your files.

File naming conventions are an essential part of any research project! Be sure to take the time to create a file naming convention that will help keep your files organized, easily findable, and usable by your research team and any others who may look at your data once your project is finished. Stay tuned for future posts on best practices related to other data management topics!

References:

Cornell University. (June 2022). File management. Research Data Management Service Group: Comprehensive Data Management Planning & Services. https://data.research.cornell.edu/content/file-management

Longwood Research Data Management: Harvard Medical School. (2022). File Naming Conventions. Data Management, Harvard Medical School. https://datamanagement.hms.harvard.edu/collect/file-naming-conventions
University of Michigan Library (September 14, 2022). File-naming conventions. LabArchives: Best practices for research data management. https://guides.lib.umich.edu/c.php?g=739306&p=5286418

On August 25, 2022, the White House Office of Science and Technology Policy (OSTP) published a memorandum that updated guidance on research funded by federal grants and called for the end of a 12-month embargo period. By 2025, published research that is funded by the federal government must be made available to the general public and key stakeholders and must no longer reside behind a paywall. Government agencies, academic libraries, and other research institutions are in the process of understanding this memo and updating their policies. Many institutions believe this new memo and guidance will radically change the academic publishing landscape. While this policy will likely advance a cultural shift towards open sciences, there are also likely new challenges related to the publication lifecycle that researchers are likely to encounter. In this post, we’ll provide a detailed explanation of the OSTP’s guidance, how this will impact researchers, and offer library resources to help prepare for the change. 

The memorandum entitled ‘Ensuring Free, Immediate, and Equitable Access to Federally Funded Research’ (also known as the Nelson memo), lists three recommendations for federal agencies:

  1. “Update their public access policies as soon as possible, and no later than December 31st, 2025, to make publications and their supporting data resulting from federally funded research publicly accessible without an embargo on their free and public release;
  2. Establish transparent procedures that ensure scientific and research integrity is maintained in public access policies; and, 
  3. Coordinate with OSTP to ensure equitable delivery of federally funded research results and data.” (Office of Science and Technology Policy, 2022, p. 1)

Dr. Alondra Nelson, Deputy Assistant to the President, explained later in the memo that “The insights of new and cutting-edge research stemming from the support of federal agencies should be immediately available–not just in moments of crisis, but in every moment. Not only to fight a pandemic, but to advance all areas of study, including urgent issues such as cancer, clean energy, economic disparities, and climate change” (Office of Science and Technology Policy, 2022, p. 2-3). Under the OSTP’s new guidance, researchers, journalists, members of the public, and other interested parties will be able to access new research as soon as it is published at no additional cost. This will provide the public with the opportunity to learn more about new innovations and experiments and it will allow other researchers to replicate or expand on existing research. 

The new guidance will allow for more collaboration as researchers combat complex topics such as climate change, future pandemics, and other global concerns. New research and data will be made freely available to the public, so researchers and institutions will need to address how to handle publication processing fees. While this new guidance won’t go into effect until 2025 and there are still questions about how specifically it will alter existing public access policies at government agencies like the NIH, the staff at Himmelfarb are here to assist researchers who may have questions about how the OSTP memo will impact their work. 

The Scholarly Communication Committee tutorial ‘How to include Article Processing Charges (APCs) in Funding Proposals’ is a great place to learn more about budgeting for article processing charges when creating a grant proposal. ‘Open Access and Your Research’ examines the different open access models and the consequences it has on your research. And if you’re unsure of how to find article processing charges, the tutorial ‘Locating Article Processing Charges (APCs)’ offers guidance on locating this information on a publisher’s website.  If you have specific questions about this new memo and would like to speak directly with a librarian,  please contact the library via phone, email, or chat

References:

Office of Science and Technology Policy. (2022). Ensuring Free, Immediate, and Equitable Access to Federally Funded Research. Author. 

With the 2023 NIH Data Management and Sharing Policy going into effect on January 25, 2023, there’s no better time to explore data management resources! This post explores resources that can help you with your data management needs.

What is data management? 

Data management involves the process of collecting or producing, cleaning and analyzing, preserving, and sharing data from a research project. Data management takes place throughout the entire research life cycle, from deciding on consistent file naming conventions to depositing the data in a repository for long-term archiving. 

Why Data Management?

Data management is vital for transparency (showing your work promotes reproducibility of work), compliance (funding organizations and journals often require making data available), and personal and organizational benefit (using data within your own lab is easier with proper management).

I Think It’s FAIR to Say…

Understanding data management best practices is important to make well-informed decisions when selecting data management resources and tools. The FAIR Principles, first published in 2016, provide a set of guidelines for data management. FAIR stands for Findable, Accessible, Interoperable, and Reusable. You can learn more about the FAIR Principles on our Data Management Guide. Another great resource to help guide your data management is Cornell University’s Research Data Management Service Group’s Comprehensive Data Management Planning and Services Best Practices which provides extensive information related to best practices for: 

Broad Data Management Resources

Himmelfarb’s Data Management Guide provides a wealth of information and resources related to data management. In addition to some basic information about data management, you’ll find information about NIH and NSF funder requirements. Data management plans (DMPs) are also covered in detail. The documentation and metadata page explains what metadata is, what should be included in your metadata, metadata schemas, controlled vocabularies, file naming conventions, and electronic lab notebooks. The data storage and security page includes data storage, storage formats, creating a backup plan, and data security. You’ll also learn about data sharing, including GW’s policy on regulated information, and data repositories.

I might need to make a plan for this… 

Creating a data management plan (DMP) is often part of the grant writing process required by funding institutions. A comprehensive data management plan should address:

  • Data Collection: Must be reliable and valid.
  • Data Storage: Appropriate amount of data so research can be reproduced.
  • Data Analysis: Interpretation of data from which conclusions can be derived.
  • Data Protection: Ensuring sensitive data is safe and secure, preventing tampering or loss of data.
  • Data Ownership: Addresses legal rights associated with data.
  • Data Retention: Addresses how long data should be kept and proper disposal of sensitive data.
  • Data Reporting: Publication of data.
  • Data Sharing: Addresses what data can be shared with others and how.

When it comes to creating a DMP, there are a number of tools available to help! The DMPTool is a free, open-source tool that helps researchers create DMPs that comply with funder requirements. DMPTool also provides links to funder websites, and best practices resources to help guide your data management efforts. Since GW is affiliated with DMPTool, GW users can create a personalized dashboard that allows them to see and organize the DMPs created through the tool. From the DMPTool’s website, simply click “sign in” and use Option 1 to search for George Washington University. Then log in with your GW UserID and password and create your data management plan! 

The Framework for Creating a Data Management Plan, created by ICPSR, is a great outline that will help you create a DMP for your grant application. The framework includes a list of elements to be included, explains why each element is important and provides examples for each element. Michener’s article Ten Simple Rules for Creating a Good Data Management Plan is another great starting point to gain an understanding of the principles and practices of creating a DMP and ensuring your data are safe and shareable. For more DMP resources and to see examples and templates, check out the Data Management Plan page of the data management guide.

What’s Next?

Stay tuned for future posts on best practices for writing a data management plan, data storage, file naming conventions, creating “readme” metadata, and other data management topics. In the meantime, check out the lists of GW resources and additional resources below to learn more!

Additional GW Resources:

Additional Data Management Resources: