Skip to content

Thoughts from ALL4Health: The First Workshop on Applying LLMs in LMICs for Healthcare Solutions at IEEE ICHI 2024

Me presenting at ICHI All4Health workshop. Picture credit: Robert Pless

Earlier this week, I attended the ALL4Health workshop at ICHI, in which I presented some early results from our ongoing pilot study with EHA Clinics. Here, I want to share some thoughts I had while listening to the day's talks.

The day reaffirmed for me how impactful the integration of LLMs in LMICs can be. An obvious potential critique of the practice is, if people are really interested in making quality care more accessible, why are they not funding efforts to train people who live there to be doctors, as opposed to implementing some weird, convoluted AI system? There are several reasons that the workshop made apparent to me. Primarily, there are already many active healthcare institutions in these regions, as the existence of EHA Clinics and other organizations like it (many of which I learned about at the workshop) proves; upending these institutions and replacing them with something completely dependent on outside support and intervention is not ideal from a developmental standpoint. Additionally, an approach purely focused on skills development may ultimately undermine its own goals, since a skilled physician may end up leaving their home country for an opportunity to work a wealthier one. In perhaps one of the most striking remarks of the day, one of the speakers referenced a quip that an Ethiopian colleague of theirs had made: "There may just be more Ethiopian opthamologists in Chicago than in Ethiopia."

It should also be noted that this movement towards LLM integration in clinical systems is happening in rich countries as well. Keynote speaker Mark Drezde spoke very openly about how, just a few years ago (the days of GPT-2), he firmly believed that people would not be able to build a medical chatbot to include in a real clinical setting. He pointed to how an ability to answer medical licensing questions accurately does not make one a good doctor, as well as the facts that clinical EHR data are often ambiguous and that the truly hard cases for doctors are those that don't match textbook presentation. However, he humorously touted that he was dead wrong. LLMs are currently far more capable with medical data than he thought at the time that they could be. Today, major tech and medical companies are partnering to realize in-clinic chatbot assistants. Now, the questions to be addressed are not if they can be useful, but how exactly they should be used and how exactly their performance should be evaluated. Dr. Drezde personally advocated for LLMs being used to identify and recommend which specialist(s) a patient should see in their particular case, due to the incredible breadth of knowledge that LLMs have. This knowledge is useful not just for rich patients, and given how inexpensive a given query of, for example, GPT-4 is, it can and should be used in as many possible safe contexts as is feasible.

Like the paper I presented, the primary concern of all of the work I got to see focused on the question of how to make LLM integration safe and feasible. In most cases, the work was concentrated on a single case study---a specific clinic in a defined region of sub-saharan Africa, sometimes focused on a particular specialty, such as maternal care. A key objective for many of the studies was to identify and describe integration strategies that the human staff at the clinic liked and trusted. In line with this goal, many of the speakers presented rankings of models-prompt pairs by both algorithmic means (such as BLEU) and human feedback surveys. What I found to be perhaps one of the most interesting takeaways from the workshop is that (according to my personal notes), out of the four talks that presented a ranking of models and prompts based on scores from both algorithmic evaluations and human feedback, each of them reported a different "best" model for the algorithmic ranking and the human feedback. The models that performed best on benchmark evaluations did not perform best in the feedback surveys. Although this is a small sample, this suggests that there still does not exist an algorithmic metric that adequately approximates how likely a human is to have a positive perception of the language model's output. At the same time, it also is unclear what exactly makes a "good" response. Is it politeness? Is it how clearly organized and laid out the information presented is? These questions are particularly relevant to our ongoing project with eHealth Africa, as we continue to try to refine our prompt and understand what GPT does well and does poorly.

Based on a conversation I was able to have with some of the organizers and attendees, there was a strong belief among those at the workshop that there would be another iteration of it next year, and that it would be bigger, as LLMs continue to be more widely adopted throughout the healthcare industry. Based on what I saw at this year's event, I would say that there are plenty of interesting questions related to this subject that are unanswered, so another workshop next year (and for several more years after that) would certainly be valuable.

2 thoughts on “Thoughts from ALL4Health: The First Workshop on Applying LLMs in LMICs for Healthcare Solutions at IEEE ICHI 2024

  1. Robert

    Nice post! it was interesting to see the breadth of work in this domain. Nice integration of pointers to BLEU etc. in your post. Do you think we should try to submit again for next year? Not clear if the workshop would be integrated again with ICHI or would be with another venue.

    Reply
    1. Grady McPeak

      I think it would be a really good idea to submit to this venue or one like it. Being computer scientists working in the medical space, there's a lot of relevant background knowledge/expertise we don't have, and so opportunities to interact with and get feedback from people who do would make our work much better and "keep us honest."

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *