Google,Gulab 69 (2025) Hindi Web Series OpenAI, DeepSeek, et al. are nowhere near achieving AGI (Artificial General Intelligence), according to a new benchmark.
The Arc Prize Foundation, a nonprofit that measures AGI progress, has a new benchmark that is stumping the leading AI models. The test, called ARC-AGI-2 is the second edition ARC-AGI benchmark that tests models on general intelligence by challenging them to solve visual puzzles using pattern recognition, context clues, and reasoning.
This Tweet is currently unavailable. It might be loading or has been removed.
According to the ARC-AGI leaderboard, OpenAI's most advanced model o3-low scored 4 percent. Google's Gemini 2.0 Flash and DeepSeek R1 both scored 1.3 percent. Anthropic's most advanced model, Claude 3.7 with an 8K token limit (which refers to the amount of tokens used to process an answer) scored 0.9 percent.
The question of how and when AGI will be achieved remains as heated as ever, with various factions bickering about the timeline or whether it's even possible. Anthropic CEO Dario Amodei said it could take as little as two to three years, and OpenAI CEO Sam Altman said "it's achievable with current hardware." But experts like Gary Marcus and Yann LeCun say the technology isn't there yet and it doesn't take an expert to see how fueling AGI hype is advantageous to AI companies seeking major investments.
The ARC-AGI benchmark is designed to challenge AI models beyond specialized intelligence by avoiding the memorization trap — spewing out PhD-level responses without an understanding of what it means. Instead it focuses on puzzles that are relatively easy for humans to solve because of our innate ability to take in new information and make inferences, thus revealing gaps that can't be resolved by simply feeding AI models more data.
"Intelligence requires the ability to generalize from limited experience and apply knowledge in new, unexpected situations. AI systems are already superhuman in many specific domains (e.g., playing Go and image recognition)" read the announcement.
SEE ALSO: I compared Sesame to ChatGPT voice mode and I'm unnerved"However, these are narrow, specialized capabilities. The 'human-ai gap' reveals what's missing for general intelligence - highly efficiently acquiring new skills."
To get a sense of AI models' current limitations, you can take the ARC-AGI test for yourself. And you might be surprised by its simplicity. There's some critical thinking involved, but the ARC-AGI test wouldn't be out of place next to the New York Timescrossword puzzle, Wordle, or any of the other popular brain teasers. It's challenging but not impossible and the answer is there in the puzzle's logic, which is something the human brain has evolved to interpret.
OpenAI's o3-low model scored 75.7 percent on the first edition of ARC-AGI. By comparison, its 4 percent score on the second edition shows how difficult the test is, but also how there's a lot more work to be done with reaching human level intelligence.
Topics Google OpenAI
Politician casually falls asleep on live TV mid'Jeopardy!' contestant mixed up Jack White and EminemHBO boss insists 'Game of Thrones' final season 'isn't a delay'Liam Neeson's 'witch hunt' comments miss the point of the #MeToo movementVisa finally gets on board with optional credit card signaturesTV channel Spike is soon to be no longer, so its Twitter went wildDude discovers genius way to get some rest and relaxation at the airportHawaii agency behind false missile alarm unknowingly exposes passwordHawaii agency behind false missile alarm unknowingly exposes passwordDC Comics, '12 Years a Slave' writer put spotlight on disenfranchised heroesNet neutrality's latest hope: 22 state attorneys general file lawsuit against FCCLeak claims Samsung Galaxy S9 might launch on February 267 reasons to binge 'Lovesick' on NetflixSorry, Trump: 'The Handmaid’s Tale' was saying MAGA before youHTC U11 EYEs has a big battery and a dual front cameraDashcam video: Car crashes into second floor of office buildingCrowd imitates Belarusian tennis player Aryna Sabalenka's loud gruntsKid gets head start on surgical career by performing doll face transplantsDC Comics, '12 Years a Slave' writer put spotlight on disenfranchised heroesStreams and sales of The Cranberries are surging on Spotify, iTunes, and Amazon 'Quordle' today: See each 'Quordle' answer and hints for March 12 Telehealth startup Cerebral had a HIPAA Please wash your hands, not just because all the Purell is sold out Jill Biden and Symone Sanders rush protester off stage during Joe Biden's victory speech Not fasting is killing us, but fasting can hurt us too. Here's what to do. Wordle today: Here's the answer, hints for March 13 World Health Organization is using TikTok to dispel coronavirus rumors Elizabeth Warren put Michael Bloomberg on blast in the debate, and the internet loved it Best podcasts for sleep and to help insomnia Realme C55 smartphone has iPhone Voting on Los Angeles' new machines was a mess Coronavirus has people making DIY hand sanitizer Joe Biden mocks Bloomberg's meme strategy and calls out his fake friendship with Obama Using Google Chrome to manage your passwords is a bad idea. Here's why. A guide to each airline's rules about emotional support animals ICE runs facial recognition searches on Maryland driver's licenses Apple might be working on HomePod with 7 Here's the story behind that viral backflip that's making everyone freak out Grammarly introduces a ChatGPT Discord goes all in with AI: chatbots, automods, whiteboards and more
2.5207s , 8225.8671875 kb
Copyright © 2025 Powered by 【Gulab 69 (2025) Hindi Web Series】,Fresh Information Network