I am not sure why, but I suddenly thought about stoichiometry (balancing chemical reactions) this morning. Surely it would make for a quick comparison between o1 and Claude 3.5 Sonnet.
So, that just what I did. I do have paid access to both ChatGPT and Claude. My simple prompt to both models read: Show me how I can use a matrix and Gauss-Jordan elimination to solve stoichiometry problems.
I share some screenshots and impressions below.
We can start with 3.5 Sonnet. The response started well.
The chemical equation that the 3.5 Sonnet model came up with tried to balance the reaction between iron-oxide and carbon-monoxide, which produces iron and carbon-dioxide, a redox reaction as both oxidation of carbon monoxide and reduction of iron is taking place.
3.5 Sonnet added variables and generated a set of three equations in four unknowns. It then proceeded to generate the correct augmented matrix. So far so good.
Correct elementary row operations followed and the reduced row-echelon form was produced. Unfortunately, that's where the errors started. The 3.5 Sonnet model could not interpret the results from the final matrix as can be seen in the next image.
I pointed out the error and the model then produced the correct result, although it still made an error with the equations using the least common multiple (bottom-middle of the image below).
The o1 model from OpenAI faired much better, creating a reaction of propane and oxygen.
The o1 model set up the correct system of equations and created the augmented matrix. It performed elementary row operations and stopped when the matrix was in row-echelon form. The interpretation was correct as can be seen in the image below.
This was just a single, simple comparison that added to my bias. I much prefer the o1 model from OpenAI, especially for coding and mathematics.
Watch the video I made of my first short look HERE
Mistral has updated their LLM chat interface called Le Chat, which I understand is French for The Cat.
Anyway, it is a chat interface to their generative artificial intelligence model, much like OpenAI's ChatGPT, and Anthropic's Claude. You can read more about the new capabilities of Le Chat HERE.
All you need is a free account (for now). I do a quick first test of Le Chat to look at how it produces simple Python code. Le Chat does not allow for the upload of a spreadsheet file, so instead, I tried to use it to solve a simple system of two linear equations in two unknown.
I do stress test the chat interface a bit by using LaTeX in my prompts and I also get it to use the symbolic Python package, sympy. A package that I absolutely love, but is not that commonly used in the broader context of Python use-cases.
I copy and paste the code into Visual Studio Code (after having set up a virtual Python environment in which I installed numpy, matplotlib, and sympy beforehand).
Le Chat did a good job in this small first test. It generated the two lines in the plane using matplotlib to show the unique solution (at the intersection of the lines). It generated the augmented matrix as I instructed, but then solved the system of linear equations using the solve method in sympy. After instructing Le Chat to rather calculate the reduced row-echelon form of the matrix using the sympy rref method, it did indeed that.
Check out the new Le Chat for yourself HERE or watch the short video I made of my first test below.
March 4 saw the release of Claude 3, the newest large language model (LLM) from Anthropic. Claude 3 competes with other LLMs such as GPT-4 in ChatGPT from OpenAI.
Claude 3.0 comes in three versions. Haiku (the smallest and fastest model, which is yet to be released as of this writing and is intended to be used as a corporate chatbot and other similar tasks), Sonnet (the free version and similar to GPT-3 from OpenAI), and Opus (paid). Opus is the largest model which invariably scores best on most benchmarks. It is not clear if the benchmarks in the release notes compares Claude 3 Opus against GPT-4 or the newer GPT-4 Turbo. The benchmarks (and release notes) can be found HERE. Take a closer look. It makes for interesting reading.
An exciting advancement is the larger context window in Claude 3. Anthropic's models already have some of the largest context windows. The context window, measured in tokens, allows for larger inputs (which includes prompts and other data) and the model's responses. As and example, a larger context window means that we can upload larger documents in our prompts. The model can interact with these documents when returning a response. It must be noted that the very largest context windows are only available to select customers.
A noted problem with large input windows is the needle in a haystack problem. This problem refers to LLMs inability to remember information in the middle of large inputs. Companies have devised tests for this problem. Such a needle-in-a-haystack test verifies a model's recall ability by inserting a target sentence (the needle) into the corpus of a document(s) (the haystack) and asks a question that can only be answered by using information in the needle. Company officials note surprise at how good the new Claude 3 model is at this test. Claude can actually recognize that it is being tested in this way and can include this in its response.
A Public Health Data Science* Perspective and Personal Opinion Statement
*Data Science used in its broadest sense to include data collection, data management, (statistical) analysis, visualization, presentation, disseminations, and more.
Human beings have long sought the power of intelligence from machines. From failed initial attempts such as expert decision systems, we have the current revolution driven by the mathematical function approach of machine learning.
Machine learning itself has progressed dramatically. Today, we stand at the dawn of the age of generative artificial intelligence (GenAI) models. While virtually unknown just last Christmas, GenAI products such as ChatGPT are now household names. While previous machine learning techniques have had niche successes (and continues to do so - think self-driving cars), GenAI is different. It integrates with our normal day-to-day activities. We can communicate with it in our own language. We get answers in our own language. It assist with our daily tasks, makes our lives easier, and the bar to entry for its use, is very low. Simply consider the way that it is revolutionizing our web searches. No longer do we type terse sentences into a search engine text box only to get a million links that we have to sort through. Now we type a real sentence and get a real answer.
GenAI models such as those of OpenAI’s ChatGPT and Google’s Bard are generative pre-trained (GPT) models. They are large language models (LLMs), having trained on enormous sets of data, be it written data on the internet, coding data, and more. They function by the simple concept of predicting every subsequent word in a sentence. LLMs perform so well, that their responses are coherent enough to have the illusion of intelligence.
Exactly because of their illusion of intelligence combined with their general purpose, they have found a way to infiltrate so much of our daily lives. Interacting with web searches is but one example. GPT models can write essays, answer emails, create recipes, and so much more. Pertaining to our own domain, we have to state that they excel at working with data and at teaching. In other words, they excel at the core of our academic enterprises of teaching and research.
Our first task is to accept our new reality. The proverbial horses have bolted. They will never be put back in the stable.
Our second task is to stop fearing artificial intelligence in general. While it is prudent to look to the future and safeguard that future, we should not be overwhelmed by fear. After all, we have to admit to the fear of the first motor car. To be sure, the moro vehicle has, and will continue to, kill human beings through road traffic accidents, but we cannot deny what it has meant to our society to be mobile. Humans have a long traditional of fearing the new. We have uncountable examples of how fear is weaponized to influence and control us. It happens to this day.
As motorized and electrified transportation, modern medicine, communication devices, and so much more have benefitted us, despite the shortcomings of each and every example that can be added to this list, so it must be with GenAI. Instead of ignoring it or trying to ban it, we should embrace it, manage it, and use it to our advantage.
The pace of evolution and revolution is staggering. It is not long ago that the term Data Science entered our collective awareness. Whereas probability and statistics are mature sciences, they are now the purview of the much wider world of data science. We have only recently introduced data science courses and programs into our academic teaching pursuits, and here we are, having to revisit what we are still busy creating, by having to incorporate GenAI models.
Modern applied statistics (read biostatistic) such as is used in our School, is taught using software. We use software both in our teaching, but also actively in our research. The aim of our teaching efforts is to prepare our students for a modern working environment. That data science environment, inclusive of the software used, will without doubt make use of GenAI. With GenAI set to be a full component of the pipeline of data science, we are compelled to integrate it in our teaching.
As a brand new component to data science and indeed to our daily lives, it would be impossible to lay out a complete plan at this time. What is clear, though, is that GenIA models excel at simplifying connected processes and at generating code, both tasks which are central to data science. As such, they allows us to focus on the tasks at hand instead of the minutiae of how to do the task. As simple example, we can consider exploratory data analysis. It is today, a trivial task to upload tabular data to OpenAi and ask ChatGPT to conduct a full set of exploratory data analysis and data visualization. Instead of having to learn the intricacies of performing these tasks and writing the code, we can instead concentrate on the information that the model produces. We remove the mundane tasks and replace it instead with our natural ability to use spoken language (which for now, we unfortunately have to type). The process extend naturally to statistical test and modeling, to model interpretation, and the presentation of results. GenAI models can do all of this, including generating reports and summaries. They are the consummate research assistant.
If GenAI models can be the consummate research assistant, then they can be the consummate assistant for curriculum design, educational resource design, and be a general teaching assistant. It is this last task that perhaps excites the most. Assuming the constructivist theory of learning that postulates that students construct versions of knowledge, building on pre-existing knowledge and experiences, rather than a behaviorist (change in behavior due to stimulus) or a cognitivist (instructional) approach, we can use GenAI to allow student to explore a knowledge space in a natural as opposed to prescribed way. The instantaneous response and always-available nature of GenAI allows a learner to engage with the knowledge space when and where they want and in a way that naturally occurs to them. They can ask follow up questions, view the results, and repeat this process until their curiosity is satiated. This cannot be mimicked by fixed educational materials and overburdened faculty.
There are caveats to take cognizance of. First and foremost is the fact that some pre-existing knowledge is required to understand the responses of GenAI in the first place. As an example, stands code generation. GenAI models can produce code which can be copied into a coding application. With no knowledge of coding, a user will not understand the code, how it can be changed to be more efficient, or how to fix problems. Fortunately, GenAI models are excellent at explaining code and are the ideal tool for learning computer languages. In fact, it can be argued that they excel at it. We also have to admit that GenAI models are much more responsive, in fact infinitely more responsive, than the fixed written word of textbooks and other written or pre-recorded material.
We also have to recognize the problem of hallucinations. They make mistakes. Perhaps we over-emphasize this problem, by subconsciously believing that human teachers make no mistakes at all. Even so, the responses of GenAI models are not peer-reviewed and they are not edited by a production team at a large publishing house. The real human-in-the-loop cannot be ignored here. The act of teaching still requires the active presence and involvement of a teacher. GenAI, though, is as argued before, the ideal assistant in the task of education.
Another problem that we have to deal with is that of the use of GenAI models to cheat the system of assessment. Assessment is core to our academic enterprise. At times, we have to look at our own faults first, though. To some extent, much of academia has automated the process of assessment. Most of us are too overwhelmed with the tasks of being an academic to pay full and undivided attention to the level and adequacy of the knowledge gained by our students. Instead, we have designed exams that are divorced from the real-world and are mere high-stakes hurdles that a student must navigate to prove the success of our system of education. Now, more so than ever before, we are tasked with improving our understanding of the knowledge level of our students. Many teachers, Schools, and Universities have done just this, incorporating individual interactions with students for continued assessment, encouraging critical thinking and creativity, and above all, ethical awareness. Such systems, which we have already implemented, must be recognized and applauded.
Automated policing, while admirable, is not the solution. It can be argued, in fact, to be a zero-sum game. As techniques are developed to identify the output of GenAI models, so systems are developed to overcome this detection. It might very well be a never-ending race. Added to this is the problem of negative flagging. The repercussions of false positives must be considered and may be devastating, for students and for teachers, and even for researchers.
It is only be engaging with GenAI models in our teaching and learning that we will discover its true potential.
Lorena Barba, Professor of Mechanical and Aerospace Engineering at The George Washington University, has written a tutorial on the use of generative artificial intelligence models in JupyterLab. Please follow the link below to view the instructive tutorial.
Many computer programs (integrated development environments) are available for writing code. Chief among these are the Jupyter environments. JupyterLab has arguably become the de facto standard software to use for your Python code. Now, you can link your account with providers of generative artificial intelligence models such as OpenAI (ChatGPT) or Anthropic (Claude) when using JupyterLab. You will get all the power of these models as chat agents and as coding assistants, right at your fingertips.
This post is all about writing better prompts. A prompt is the input text that we write when chatting or otherwise communicating with a generative artificial intelligence model. In many cases, our default position is to be very terse when writing prompts. We have become accustomed to typing very short expressions into a search engine. This is not the case with generative models. They are designed to mimic communication and interaction with a human being. When we want detailed and precise responses from a real person that we are talking to, we are generally more exact in our own words. This is very much the approach we need to take when conversing with a generative model. We would all like generative artificial intelligence models such as ChatGPT to provide us with the perfect response. To do this, we really need to focus on our prompts. Our focus helps the model to return exactly what we need in a response or to set up the chat for a productive conversation.
I am your future ruler, uhm, I mean your friendly generative artificial intelligence model. What can I help you with?
Thinking about and writing proper prompts is now a recognized skill. Prompt engineering is the term that has develop to describe how to communicate effectively with generative models so that they understand our requirements and generate the best possible responses. A quick look at the job market now sees advertisements for prompt engineers. Some of these positions pay quite well.
Courses have been developed to teach prompt engineering and there are many quick tutorials on platforms such as YouTube. This post adds to the long list of help in writing better prompts. The field of generative artificial intelligence is changing at a rapid rate and I will no doubt return to this topic in the future. In this post, I take a quick look at some basic principles to keep in mind when writing a prompt. These principles are mainly used when we first initiate a new chat with a generative models. Subsequent prompts in a chat can indeed be more terse.
In short, a proper prompt should include the components in the list below. Take note that there is some order of importance (from most to least important) to this list and there is, at least to some extent, an overlap between the components.
The task that the generative artificial intelligence model should perform
The context in which the conversation is taking place
One or more examples pertaining the the prompt
The persona that the model should take on when responding
The format in which the model should respond
The tone of speech that the models should write in
We can think of constructing a prompt by writing content for the following placeholders.
It is important to note that not every prompt needs all the information above. It is typical that only the prompt that initiates a chat should include as much information as possible. Below, we take a closer look at each of the components of a proper prompt.
It is usually a good idea to start the task with a verb. Examples include words such as Write ..., Create ..., Generate ..., Complete ... , Analyze ..., and so on. We should be as precise and direct as possible when writing the task. This is, after all, what we want the model to do. The task might refer to a single action, or indeed, many actions. An example might be the following sentence: Write the definition of the measures of central tendency and provide examples of such measures. This task starts with a verb and contains two instructions.
The context is not always easy to construct. In the case of our example we might add: I am a postgraduate student in public health and I am learning about biostatistics. This context can guide the generative model to return a response that can be used as learning material that should be easier to understand than a more technical response. Additional information such as: I am new to biostatistics or This is the first time I am learning statistics or I am reviewing this content for exam preparation, can be added to the context. The sky is the limit here. This is not to say, that we should overload the model with context. Just enough to guide the model when generating the response, usually performs wonders.
The quality and accuracy of prompts have been shown to increase decidedly when we include examples. Continuing with our reference to measures of central tendency, we might add the following: My textbook includes measures of central tendency such as the arithmetic and geometric mean, the median, and and the mode.
The persona helps the model to generate text in a specific framework. We might write the following: You are a University Professor teaching postgraduate level biostatistics. Clearly, the inclusion of this information should guide the model when generating its response. The response might be quite different if we add the the following persona: You are a middle school teacher or even You are Jedi Master Yoda.
Describing the format allows us to guide how the result should be generated. We might want a simple paragraph of text explaining the measures of central tendency or a bullet-point list of the names of the measures of central tendency and their definitions or a table with columns for measure of central tendency, definition, and example. We have to envision how we want the final result to be formatted. Note that we can also include this information in the examples that we provide in the prompt. The format also ties in with the task. We might want the model to write an essay about the topic or create study notes.
The tone of voice is not always required. We might want to include a specific tone of voice if we plan to use the content generated by the model as our own personal study notes or as formal content for an assignment (given that we have permission to use a model to complete our work or stipulate that we used a model to complete the work). Here we might also mention that we prefer the first or third-person perspective or even of the response should be humorous or very formal.
In human communication we can infer from context, voice intonation, facial expressions, verbal interactions, and much more to attain the information we require. In the case of a generative artificial intelligence model, we have to attempt the same thing, but with our words only. We actually have a lot of practice with this, having moved much of our interactions to email and chat applications. Now we are just chatting with a model.