News
The Arc Prize Foundation has a new test for AGI that leading AI models from Anthropic, Google, and DeepSeek score poorly on.
Kolena, a startup building a platform to test and validate AI models, has raised $15 million in a venture funding round.
Large language models don’t have a theory of mind the way humans do—but they’re getting better at tasks designed to measure it in humans.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results