Anthropic launches program Funding the development of new types of benchmarks that can evaluate the performance and impact of AI models, including generative models such as its Claude model.
The Anthropic program, unveiled Monday, will award grants to outside organizations that can, the company says in a blog post, “effectively measure advanced AI capabilities.” Interested parties can apply to be evaluated on an ongoing basis.
“Our investment in these assessments is aimed at advancing the entire AI safety landscape, and providing valuable tools that benefit the entire ecosystem,” Anthropic wrote on its official blog. “Developing high-quality, safety-relevant assessments remains a challenge, and demand outstrips supply.”
As we’ve highlighted before, AI has a benchmarking problem. The most common benchmarks in AI today don’t do a good job of capturing how the average person would use the systems they’re testing. There are also questions about whether some benchmarks, especially those released before the dawn of modern generative AI, measure what they claim to measure, given their age.
Anthropic’s proposed solution, which is very high-level and harder than it sounds, is to create tough standards with a focus on AI security and societal impacts through new tools, infrastructure, and methods.
Specifically, the company is calling for tests to assess the model’s ability to perform tasks such as carrying out cyberattacks, “optimizing” weapons of mass destruction (such as nuclear weapons), and manipulating or deceiving people (such as through deepfakes or disinformation). As for AI risks to national security and defense, Anthropic says it is committed to developing some sort of “early warning system” to identify and assess risks, though it doesn’t reveal in the blog post what such a system might entail.
Anthropic also says its new program intends to support research into “end-to-end” standards and tasks that explore the potential of AI to assist in scientific study, speak multiple languages, and mitigate inherent biases as well as self-censorship toxicity.
To make all this happen, Anthropic envisions new platforms that would allow subject matter experts to develop their own assessments and large-scale experiments for models involving “thousands” of users. The company says it has hired a full-time coordinator for the program and may buy or expand projects it believes have the potential to scale.
“We offer a range of funding options tailored to meet the needs and stage of development of each project,” Anthropic wrote in the post, though an Anthropic spokesperson declined to provide further details on those options. “Teams will have the opportunity to engage directly with Anthropic’s domain experts from the Red Boundaries team, Fine-Tuning team, Trust and Safety team, and other relevant teams.”
Anthropic’s efforts to support new AI standards are commendable—assuming there’s enough money and manpower to support them. But given the company’s commercial ambitions in the AI race, it may be hard to trust it entirely.
In the blog post, Anthropic was fairly transparent about the fact that it wants some of the assessments it funds to be consistent with AI Safety Ratings He. She developed (With some input from third parties like the nonprofit AI research organization METR.) That’s within the company’s purview. But it could also force applicants to accept definitions of “safe” or “risky” AI that they may not fully agree with.
Part of the AI community is also likely to disagree with Anthropic’s references to “catastrophic” and “misleading” risks of AI, such as the risks of nuclear weapons. Many experts Experts say there is little evidence that AI as we know it will gain superhuman capabilities and end the world anytime soon, if ever. Claims of imminent “superintelligence” only serve to distract from pressing regulatory issues surrounding AI today, such as AI’s hallucinatory tendencies, they say.
In her post, Anthropic wrote that she hopes her program will serve as “a catalyst for progress toward a future where comprehensive AI assessment is the industry standard.” That’s a mission that’s open to many people. Not affiliated with the company Efforts to create better standards for AI could fit into this trend, but it remains to be seen whether these efforts are willing to cooperate with an AI vendor whose ultimate loyalty lies with shareholders.