March 4 saw the release of Claude 3, the newest large language model (LLM) from Anthropic. Claude 3 competes with other LLMs such as GPT-4 in ChatGPT from OpenAI.
Claude 3.0 comes in three versions. Haiku (the smallest and fastest model, which is yet to be released as of this writing and is intended to be used as a corporate chatbot and other similar tasks), Sonnet (the free version and similar to GPT-3 from OpenAI), and Opus (paid). Opus is the largest model which invariably scores best on most benchmarks. It is not clear if the benchmarks in the release notes compares Claude 3 Opus against GPT-4 or the newer GPT-4 Turbo. The benchmarks (and release notes) can be found HERE. Take a closer look. It makes for interesting reading.
An exciting advancement is the larger context window in Claude 3. Anthropic's models already have some of the largest context windows. The context window, measured in tokens, allows for larger inputs (which includes prompts and other data) and the model's responses. As and example, a larger context window means that we can upload larger documents in our prompts. The model can interact with these documents when returning a response. It must be noted that the very largest context windows are only available to select customers.
A noted problem with large input windows is the needle in a haystack problem. This problem refers to LLMs inability to remember information in the middle of large inputs. Companies have devised tests for this problem. Such a needle-in-a-haystack test verifies a model's recall ability by inserting a target sentence (the needle) into the corpus of a document(s) (the haystack) and asks a question that can only be answered by using information in the needle. Company officials note surprise at how good the new Claude 3 model is at this test. Claude can actually recognize that it is being tested in this way and can include this in its response.