We are pleased to announce Claude 2, our new model. Claude 2 has improved performance, longer responses, and can be accessed via API as well as a new public-facing beta website, claude.ai. We have heard from our users that Claude is easy to converse with, clearly explains its thinking, is less likely to produce harmful outputs, and has a longer memory. We have made improvements from our previous models on coding, math, and reasoning. For example, our latest model scored 76.5% on the multiple choice section of the Bar exam, up from 73.0% with Claude 1.3. When compared to college students applying to graduate school, Claude 2 scores above the 90th percentile on the GRE reading and writing exams, and similarly to the median applicant on quantitative reasoning.
Think of Claude as a friendly, enthusiastic colleague or personal assistant who can be instructed in natural language to help you with many tasks. The Claude 2 API for businesses is being offered for the same price as Claude 1.3. Additionally, anyone in the US and UK can start using our beta chat experience today.
As we work to improve both the performance and safety of our models, we have increased the length of Claude’s input and output. Users can input up to 100K tokens in each prompt, which means that Claude can work over hundreds of pages of technical documentation or even a book. Claude can now also write longer documents – from memos to letters to stories up to a few thousand tokens – all in one go.
In addition, our latest model has greatly improved coding skills. Claude 2 scored a 71.2% up from 56.0% on the Codex HumanEval, a Python coding test. On GSM8k, a large set of grade-school math problems, Claude 2 scored 88.0% up from 85.2%. We have an exciting roadmap of capability improvements planned for Claude 2 and will be slowly and iteratively deploying them in the coming months.
We’ve been iterating to improve the underlying safety of Claude 2, so that it is more harmless and harder to prompt to produce offensive or dangerous output. We have an internal red-teaming evaluation that scores our models on a large representative set of harmful prompts, using an automated test while we also regularly check the results manually. In this evaluation, Claude 2 was 2x better at giving harmless responses compared to Claude 1.3. Although no model is immune from jailbreaks, we’ve used a variety of safety techniques (which you can read about here and here), as well as extensive red-teaming, to improve its outputs.
Claude 2 powers our chat experience, and is generally available in the US and UK. We are working to make Claude more globally available in the coming months. You can now create an account and start talking to Claude in natural language, asking it for help with any tasks that you like. Talking to an AI assistant can take some trial and error, so read up on our tips to get the most out of Claude.
We are also currently working with thousands of businesses who are using the Claude API. One of our partners is Jasper, a generative AI platform that enables individuals and teams to scale their content strategies. They found that Claude 2 was able to go head to head with other state of the art models for a wide variety of use cases, but has particular strength for long form low latency uses. “We are really happy to be among the first to offer Claude 2 to our customers, bringing enhanced semantics, up-to-date knowledge training, improved reasoning for complex prompts, and the ability to effortlessly remix existing content with a 3X larger context window,” said Greg Larson, VP of Engineering at Jasper. “We are proud to help our customers stay ahead of the curve through partnerships like this one with Anthropic.”
Sourcegraph is a code AI platform that helps customers write, fix, and maintain code. Their coding assistant Cody uses Claude 2’s improved reasoning ability to give even more accurate answers to user queries while also passing along more codebase context with up to 100K context windows. In addition, Claude 2 was trained on more recent data, meaning it has knowledge of newer frameworks and libraries for Cody to pull from. “When it comes to AI coding, devs need fast and reliable access to context about their unique codebase and a powerful LLM with a large context window and strong general reasoning capabilities,” says Quinn Slack, CEO & Co-founder of Sourcegraph. “The slowest and most frustrating parts of the dev workflow are becoming faster and more enjoyable. Thanks to Claude 2, Cody’s helping more devs build more software that pushes the world forward.”
We welcome your feedback as we work to responsibly deploy our products more broadly. Our chat experience is an open beta launch, and users should be aware that Claude – like all current models – can generate inappropriate responses. AI assistants are most useful in everyday situations, like serving to summarize or organize information, and should not be used where physical or mental health and well-being are involved. Please let us know if you’d like to talk to Claude in a currently unsupported area, or if you are a business who would like to start working with Claude.
© 2023 Anthropic PBC