Skip to content

GENERATIVE AI text

A Public Health Data Science* Perspective and Personal Opinion Statement

*Data Science used in its broadest sense to include data collection, data management, (statistical) analysis, visualization, presentation, disseminations, and more.

Human beings have long sought the power of intelligence from machines. From failed initial attempts such as expert decision systems, we have the current revolution driven by the mathematical function approach of machine learning.

Machine learning itself has progressed dramatically. Today, we stand at the dawn of the age of generative artificial intelligence (GenAI) models. While virtually unknown just last Christmas, GenAI products such as ChatGPT are now household names. While previous machine learning techniques have had niche successes (and continues to do so - think self-driving cars), GenAI is different. It integrates with our normal day-to-day activities. We can communicate with it in our own language. We get answers in our own language. It assist with our daily tasks, makes our lives easier, and the bar to entry for its use, is very low. Simply consider the way that it is revolutionizing our web searches. No longer do we type terse sentences into a search engine text box only to get a million links that we have to sort through. Now we type a real sentence and get a real answer.

GenAI models such as those of OpenAI’s ChatGPT and Google’s Bard are generative pre-trained (GPT) models. They are large language models (LLMs), having trained on enormous sets of data, be it written data on the internet, coding data, and more. They function by the simple concept of predicting every subsequent word in a sentence. LLMs perform so well, that their responses are coherent enough to have the illusion of intelligence.

Exactly because of their illusion of intelligence combined with their general purpose, they have found a way to infiltrate so much of our daily lives. Interacting with web searches is but one example. GPT models can write essays, answer emails, create recipes, and so much more. Pertaining to our own domain, we have to state that they excel at working with data and at teaching. In other words, they excel at the core of our academic enterprises of teaching and research.

Our first task is to accept our new reality. The proverbial horses have bolted. They will never be put back in the stable. 

Our second task is to stop fearing artificial intelligence in general. While it is prudent to look to the future and safeguard that future, we should not be overwhelmed by fear. After all, we have to admit to the fear of the first motor car. To be sure, the moro vehicle has, and will continue to, kill human beings through road traffic accidents, but we cannot deny what it has meant to our society to be mobile. Humans have a long traditional of fearing the new. We have uncountable examples of how fear is weaponized to influence and control us. It happens to this day.

As motorized and electrified transportation, modern medicine, communication devices, and so much more have benefitted us, despite the shortcomings of each and every example that can be added to this list, so it must be with GenAI. Instead of ignoring it or trying to ban it, we should embrace it, manage it, and use it to our advantage.

The pace of evolution and revolution is staggering. It is not long ago that the term Data Science entered our collective awareness. Whereas probability and statistics are mature sciences, they are now the purview of the much wider world of data science. We have only recently introduced data science courses and programs into our academic teaching pursuits, and here we are, having to revisit what we are still busy creating, by having to incorporate GenAI models.

Modern applied statistics (read biostatistic) such as is used in our School, is taught using software. We use software both in our teaching, but also actively in our research. The aim of our teaching efforts is to prepare our students for a modern working environment. That data science environment, inclusive of the software used, will without doubt make use of GenAI. With GenAI set to be a full component of the pipeline of data science, we are compelled to integrate it in our teaching.

As a brand new component to data science and indeed to our daily lives, it would be impossible to lay out a complete plan at this time. What is clear, though, is that GenIA models excel at simplifying connected processes and at generating code, both tasks which are central to data science. As such, they allows us to focus on the tasks at hand instead of the minutiae of how to do the task. As simple example, we can consider exploratory data analysis. It is today, a trivial task to upload tabular data to OpenAi and ask ChatGPT to conduct a full set of exploratory data analysis and data visualization. Instead of having to learn the intricacies of performing these tasks and writing the code, we can instead concentrate on the information that the model produces. We remove the mundane tasks and replace it instead with our natural ability to use spoken language (which for now, we unfortunately have to type). The process extend naturally to statistical test and modeling, to model interpretation, and the presentation of results. GenAI models can do all of this, including generating reports and  summaries. They are the consummate research assistant.

If GenAI models can be the consummate research assistant, then they can be the consummate assistant for curriculum design, educational resource design, and be a general teaching assistant. It is this last task that perhaps excites the most. Assuming the constructivist theory of learning that postulates that students construct versions of knowledge, building on pre-existing knowledge and experiences, rather than a behaviorist (change in behavior due to stimulus) or a cognitivist (instructional) approach, we can use GenAI to allow student to explore a knowledge space in a natural as opposed to prescribed way. The instantaneous response and always-available nature of GenAI allows a learner to engage with the knowledge space when and where they want and in a way that naturally occurs to them. They can ask follow up questions, view the results, and repeat this process until their curiosity is satiated. This cannot be mimicked by fixed educational materials and overburdened faculty.

There are caveats to take cognizance of. First and foremost is the fact that some pre-existing knowledge is required to understand the responses of GenAI in the first place. As an example, stands code generation. GenAI models can produce code which can be copied into a coding application. With no knowledge of coding, a user will not understand the code, how it can be changed to be more efficient, or how to fix problems. Fortunately, GenAI models are excellent at explaining code and are the ideal tool for learning computer languages. In fact, it can be argued that they excel at it. We also have to admit that GenAI models are much more responsive, in fact infinitely more responsive, than the fixed written word of textbooks and other written or pre-recorded material.

We also have to recognize the problem of hallucinations. They make mistakes. Perhaps we over-emphasize this problem, by subconsciously believing that human teachers make no mistakes at all. Even so, the responses of GenAI models are not peer-reviewed and they are not edited by a production team at a large publishing house. The real human-in-the-loop cannot be ignored here. The act of teaching still requires the active presence and involvement of a teacher. GenAI, though, is as argued before, the ideal assistant in the task of education.

Another problem that we have to deal with is that of the use of GenAI models to cheat the system of assessment. Assessment is core to our academic enterprise. At times, we have to look at our own faults first, though. To some extent, much of academia has automated the process of assessment. Most of us are too overwhelmed with the tasks of being an academic to pay full and undivided attention to the level and adequacy of the  knowledge gained by our students. Instead, we have designed exams that are divorced from the real-world and are mere high-stakes hurdles that a student must navigate to prove the success of our system of education. Now, more so than ever before, we are tasked with improving our understanding of the knowledge level of our students. Many teachers, Schools, and Universities have done just this, incorporating individual interactions with students for continued  assessment, encouraging critical thinking and creativity, and above all, ethical awareness. Such systems, which we have already implemented, must be recognized and applauded.

Automated policing, while admirable, is not the solution. It can be argued, in fact, to be a zero-sum game. As techniques are developed to identify the output of GenAI models, so systems are developed to overcome this detection. It might very well be a never-ending race. Added to this is the problem of negative flagging. The repercussions of false positives must be considered and may be devastating, for students and for teachers, and even for researchers.

It is only be engaging with GenAI models in our teaching and learning that we will discover its true potential.