I am not sure why, but I suddenly thought about stoichiometry (balancing chemical reactions) this morning. Surely it would make for a quick comparison between o1 and Claude 3.5 Sonnet.
So, that just what I did. I do have paid access to both ChatGPT and Claude. My simple prompt to both models read: Show me how I can use a matrix and Gauss-Jordan elimination to solve stoichiometry problems.
I share some screenshots and impressions below.
We can start with 3.5 Sonnet. The response started well.
The chemical equation that the 3.5 Sonnet model came up with tried to balance the reaction between iron-oxide and carbon-monoxide, which produces iron and carbon-dioxide, a redox reaction as both oxidation of carbon monoxide and reduction of iron is taking place.
3.5 Sonnet added variables and generated a set of three equations in four unknowns. It then proceeded to generate the correct augmented matrix. So far so good.
Correct elementary row operations followed and the reduced row-echelon form was produced. Unfortunately, that's where the errors started. The 3.5 Sonnet model could not interpret the results from the final matrix as can be seen in the next image.
I pointed out the error and the model then produced the correct result, although it still made an error with the equations using the least common multiple (bottom-middle of the image below).
The o1 model from OpenAI faired much better, creating a reaction of propane and oxygen.
The o1 model set up the correct system of equations and created the augmented matrix. It performed elementary row operations and stopped when the matrix was in row-echelon form. The interpretation was correct as can be seen in the image below.
This was just a single, simple comparison that added to my bias. I much prefer the o1 model from OpenAI, especially for coding and mathematics.
A Public Health Data Science* Perspective and Personal Opinion Statement
*Data Science used in its broadest sense to include data collection, data management, (statistical) analysis, visualization, presentation, disseminations, and more.
Human beings have long sought the power of intelligence from machines. From failed initial attempts such as expert decision systems, we have the current revolution driven by the mathematical function approach of machine learning.
Machine learning itself has progressed dramatically. Today, we stand at the dawn of the age of generative artificial intelligence (GenAI) models. While virtually unknown just last Christmas, GenAI products such as ChatGPT are now household names. While previous machine learning techniques have had niche successes (and continues to do so - think self-driving cars), GenAI is different. It integrates with our normal day-to-day activities. We can communicate with it in our own language. We get answers in our own language. It assist with our daily tasks, makes our lives easier, and the bar to entry for its use, is very low. Simply consider the way that it is revolutionizing our web searches. No longer do we type terse sentences into a search engine text box only to get a million links that we have to sort through. Now we type a real sentence and get a real answer.
GenAI models such as those of OpenAI’s ChatGPT and Google’s Bard are generative pre-trained (GPT) models. They are large language models (LLMs), having trained on enormous sets of data, be it written data on the internet, coding data, and more. They function by the simple concept of predicting every subsequent word in a sentence. LLMs perform so well, that their responses are coherent enough to have the illusion of intelligence.
Exactly because of their illusion of intelligence combined with their general purpose, they have found a way to infiltrate so much of our daily lives. Interacting with web searches is but one example. GPT models can write essays, answer emails, create recipes, and so much more. Pertaining to our own domain, we have to state that they excel at working with data and at teaching. In other words, they excel at the core of our academic enterprises of teaching and research.
Our first task is to accept our new reality. The proverbial horses have bolted. They will never be put back in the stable.
Our second task is to stop fearing artificial intelligence in general. While it is prudent to look to the future and safeguard that future, we should not be overwhelmed by fear. After all, we have to admit to the fear of the first motor car. To be sure, the moro vehicle has, and will continue to, kill human beings through road traffic accidents, but we cannot deny what it has meant to our society to be mobile. Humans have a long traditional of fearing the new. We have uncountable examples of how fear is weaponized to influence and control us. It happens to this day.
As motorized and electrified transportation, modern medicine, communication devices, and so much more have benefitted us, despite the shortcomings of each and every example that can be added to this list, so it must be with GenAI. Instead of ignoring it or trying to ban it, we should embrace it, manage it, and use it to our advantage.
The pace of evolution and revolution is staggering. It is not long ago that the term Data Science entered our collective awareness. Whereas probability and statistics are mature sciences, they are now the purview of the much wider world of data science. We have only recently introduced data science courses and programs into our academic teaching pursuits, and here we are, having to revisit what we are still busy creating, by having to incorporate GenAI models.
Modern applied statistics (read biostatistic) such as is used in our School, is taught using software. We use software both in our teaching, but also actively in our research. The aim of our teaching efforts is to prepare our students for a modern working environment. That data science environment, inclusive of the software used, will without doubt make use of GenAI. With GenAI set to be a full component of the pipeline of data science, we are compelled to integrate it in our teaching.
As a brand new component to data science and indeed to our daily lives, it would be impossible to lay out a complete plan at this time. What is clear, though, is that GenIA models excel at simplifying connected processes and at generating code, both tasks which are central to data science. As such, they allows us to focus on the tasks at hand instead of the minutiae of how to do the task. As simple example, we can consider exploratory data analysis. It is today, a trivial task to upload tabular data to OpenAi and ask ChatGPT to conduct a full set of exploratory data analysis and data visualization. Instead of having to learn the intricacies of performing these tasks and writing the code, we can instead concentrate on the information that the model produces. We remove the mundane tasks and replace it instead with our natural ability to use spoken language (which for now, we unfortunately have to type). The process extend naturally to statistical test and modeling, to model interpretation, and the presentation of results. GenAI models can do all of this, including generating reports and summaries. They are the consummate research assistant.
If GenAI models can be the consummate research assistant, then they can be the consummate assistant for curriculum design, educational resource design, and be a general teaching assistant. It is this last task that perhaps excites the most. Assuming the constructivist theory of learning that postulates that students construct versions of knowledge, building on pre-existing knowledge and experiences, rather than a behaviorist (change in behavior due to stimulus) or a cognitivist (instructional) approach, we can use GenAI to allow student to explore a knowledge space in a natural as opposed to prescribed way. The instantaneous response and always-available nature of GenAI allows a learner to engage with the knowledge space when and where they want and in a way that naturally occurs to them. They can ask follow up questions, view the results, and repeat this process until their curiosity is satiated. This cannot be mimicked by fixed educational materials and overburdened faculty.
There are caveats to take cognizance of. First and foremost is the fact that some pre-existing knowledge is required to understand the responses of GenAI in the first place. As an example, stands code generation. GenAI models can produce code which can be copied into a coding application. With no knowledge of coding, a user will not understand the code, how it can be changed to be more efficient, or how to fix problems. Fortunately, GenAI models are excellent at explaining code and are the ideal tool for learning computer languages. In fact, it can be argued that they excel at it. We also have to admit that GenAI models are much more responsive, in fact infinitely more responsive, than the fixed written word of textbooks and other written or pre-recorded material.
We also have to recognize the problem of hallucinations. They make mistakes. Perhaps we over-emphasize this problem, by subconsciously believing that human teachers make no mistakes at all. Even so, the responses of GenAI models are not peer-reviewed and they are not edited by a production team at a large publishing house. The real human-in-the-loop cannot be ignored here. The act of teaching still requires the active presence and involvement of a teacher. GenAI, though, is as argued before, the ideal assistant in the task of education.
Another problem that we have to deal with is that of the use of GenAI models to cheat the system of assessment. Assessment is core to our academic enterprise. At times, we have to look at our own faults first, though. To some extent, much of academia has automated the process of assessment. Most of us are too overwhelmed with the tasks of being an academic to pay full and undivided attention to the level and adequacy of the knowledge gained by our students. Instead, we have designed exams that are divorced from the real-world and are mere high-stakes hurdles that a student must navigate to prove the success of our system of education. Now, more so than ever before, we are tasked with improving our understanding of the knowledge level of our students. Many teachers, Schools, and Universities have done just this, incorporating individual interactions with students for continued assessment, encouraging critical thinking and creativity, and above all, ethical awareness. Such systems, which we have already implemented, must be recognized and applauded.
Automated policing, while admirable, is not the solution. It can be argued, in fact, to be a zero-sum game. As techniques are developed to identify the output of GenAI models, so systems are developed to overcome this detection. It might very well be a never-ending race. Added to this is the problem of negative flagging. The repercussions of false positives must be considered and may be devastating, for students and for teachers, and even for researchers.
It is only be engaging with GenAI models in our teaching and learning that we will discover its true potential.
Lorena Barba, Professor of Mechanical and Aerospace Engineering at The George Washington University, has written a tutorial on the use of generative artificial intelligence models in JupyterLab. Please follow the link below to view the instructive tutorial.
Many computer programs (integrated development environments) are available for writing code. Chief among these are the Jupyter environments. JupyterLab has arguably become the de facto standard software to use for your Python code. Now, you can link your account with providers of generative artificial intelligence models such as OpenAI (ChatGPT) or Anthropic (Claude) when using JupyterLab. You will get all the power of these models as chat agents and as coding assistants, right at your fingertips.