News

The Arc Prize Foundation has a new test for AGI that leading AI models from Anthropic, Google, and DeepSeek score poorly on.
Kolena, a startup building a platform to test and validate AI models, has raised $15 million in a venture funding round.
Large language models don’t have a theory of mind the way humans do—but they’re getting better at tasks designed to measure it in humans.