Skip to content

Just wasting some time on a Saturday morning

stoichiometry

I am not sure why, but I suddenly thought about stoichiometry (balancing chemical reactions) this morning. Surely it would make for a quick comparison between o1 and Claude 3.5 Sonnet.

So, that just what I did. I do have paid access to both ChatGPT and Claude. My simple prompt to both models read: Show me how I can use a matrix and Gauss-Jordan elimination to solve stoichiometry problems.

I share some screenshots and impressions below.

We can start with 3.5 Sonnet. The response started well.

First response from Claude.

The chemical equation that the 3.5 Sonnet model came up with tried to balance the reaction between iron-oxide and carbon-monoxide, which produces iron and carbon-dioxide, a redox reaction as both oxidation of carbon monoxide and reduction of iron is taking place.

3.5 Sonnet added variables and generated a set of three equations in four unknowns. It then proceeded to generate the correct augmented matrix. So far so good.

Correct elementary row operations followed and the reduced row-echelon form was produced. Unfortunately, that's where the errors started. The 3.5 Sonnet model could not interpret the results from the final matrix as can be seen in the next image.

Wrong interpretation of the results from the reduced row-echelon form of the matrix.

I pointed out the error and the model then produced the correct result, although it still made an error with the equations using the least common multiple (bottom-middle of the image below).

The o1 model from OpenAI faired much better, creating a reaction of propane and oxygen.

o1 propane example.

The o1 model set up the correct system of equations and created the augmented matrix. It performed elementary row operations and stopped when the matrix was in row-echelon form. The interpretation was correct as can be seen in the image below.

Correct interpretation.

This was just a single, simple comparison that added to my bias. I much prefer the o1 model from OpenAI, especially for coding and mathematics.