Close Menu
Xarkas BlogXarkas Blog
    What's Hot

    Redmi K90 Ultra Confirmed for June 30 Launch With Active Cooling Fan and Snapdragon 8 Elite

    June 24, 2026

    Samsung UFS 5.0 Storage Announced for Next-Gen Flagships: Massive Speed Boost And Efficiency Gains Touted

    June 24, 2026

    New Smartphone Brand Coming Soon! Fire-Boltt Could Shake up India’s Budget Smartphone Segment

    June 24, 2026
    Facebook X (Twitter) Instagram
    Xarkas BlogXarkas Blog
    • Tech News

      Hummer EV Price in India 2026: Complete Guide, Features, Specifications & Availability

      April 2, 2026

      Apple Vision Pro vs Meta Quest 3: The Ultimate VR Headset Showdown

      December 3, 2025

      ChatGPT told them they were special — their families say it led to tragedy

      November 24, 2025

      Beehiiv’s CEO isn’t worried about newsletter saturation

      November 24, 2025

      TechCrunch Mobility: Searching for the robotaxi tipping point

      November 24, 2025
    • Mobiles

      Redmi K90 Ultra Confirmed for June 30 Launch With Active Cooling Fan and Snapdragon 8 Elite

      June 24, 2026

      Samsung UFS 5.0 Storage Announced for Next-Gen Flagships: Massive Speed Boost And Efficiency Gains Touted

      June 24, 2026

      New Smartphone Brand Coming Soon! Fire-Boltt Could Shake up India’s Budget Smartphone Segment

      June 24, 2026

      Samsung Galaxy M47 5G Launching in India on June 29

      June 23, 2026

      Vivo X500 Pro Tipped With Dimensity 9600 Pro and a 64MP Portrait Lens

      June 23, 2026
    • Gaming

      Ubisoft co-founder Claude Guillemot dies in plane crash

      June 22, 2026

      MapTap, a daily geography game, is my new Wordle

      June 18, 2026

      Netflix expands revamped mobile app across Asia and doubles down on kids’ gaming

      June 10, 2026

      Oura Ring 5 review: Thinner, lighter, better

      June 4, 2026

      Meta mercifully spun out VR fitness game Supernatural instead of just killing it

      June 4, 2026
    • SEO Tips
    • PC/ Laptops

      Dell Pro 14 (AMD Ryzen AI 7 Pro 350) Review: The Sensible Choice for Everyday Office Work

      January 9, 2026

      CES 2026: MSI Unveils New Prestige, Raider, Stealth and Crosshair Laptops with Intel Core Ultra SoCs

      January 7, 2026

      CES 2026: Samsung Unveils New Galaxy Book6 Laptops

      January 6, 2026

      CES 2026: HP Shows a Keyboard-Based PC and New EliteBooks

      January 6, 2026

      CES 2026: Intel Unveils Core Ultra Series 3, Its First Platform Built on 18A

      January 6, 2026
    • EV

      Hummer EV Price in India 2026: Complete Guide, Features, Specifications & Availability

      April 2, 2026

      Here’s How Much It Costs

      November 15, 2025

      Sodium-Ion Batteries Have Landed In America. The Hard Part Starts Now

      November 15, 2025

      Mazda Begins Testing Its Long-Overdue U.S. EV

      November 14, 2025

      Volkswagen Adds Smartwatch Support For U.S. Vehicles

      November 14, 2025
    • Gadget
    • AI
    Facebook
    Xarkas BlogXarkas Blog
    Home - Featured - A high schooler built a website that lets you challenge AI models to a Minecraft build-off
    Featured

    A high schooler built a website that lets you challenge AI models to a Minecraft build-off

    KavishBy KavishMarch 28, 2025No Comments3 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    A high schooler built a website that lets you challenge AI models to a Minecraft build-off
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email


    As conventional AI benchmarking techniques prove inadequate, AI builders are turning to more creative ways to assess the capabilities of generative AI models. For one group of developers, that’s Minecraft, the Microsoft-owned sandbox-building game.

    The website Minecraft Benchmark (or MC-Bench) was developed collaboratively to pit AI models against each other in head-to-head challenges to respond to prompts with Minecraft creations. Users can vote on which model did a better job, and only after voting can they see which AI made each Minecraft build.

    Image Credits:Minecraft Benchmark (opens in a new window)

    For Adi Singh, the 12th-grader who started MC-Bench, the value of Minecraft isn’t so much the game itself, but the familiarity that people have with it — after all, it is the best-selling video game of all time. Even for people who haven’t played the game, it’s still possible to evaluate which blocky representation of a pineapple is better realized.

    “Minecraft allows people to see the progress [of AI development] much more easily,” Singh told TechCrunch. “People are used to Minecraft, used to the look and the vibe.”

    MC-Bench currently lists eight people as volunteer contributors. Anthropic, Google, OpenAI, and Alibaba have subsidized the project’s use of their products to run benchmark prompts, per MC-Bench’s website, but the companies are not otherwise affiliated.

    “Currently we are just doing simple builds to reflect on how far we’ve come from the GPT-3 era, but [we] could see ourselves scaling to these longer-form plans and goal-oriented tasks,” Singh said. “Games might just be a medium to test agentic reasoning that is safer than in real life and more controllable for testing purposes, making it more ideal in my eyes.”

    Other games like Pokémon Red, Street Fighter, and Pictionary have been used as experimental benchmarks for AI, in part because the art of benchmarking AI is notoriously tricky.

    Researchers often test AI models on standardized evaluations, but many of these tests give AI a home-field advantage. Because of the way they’re trained, models are naturally gifted at certain, narrow kinds of problem-solving, particularly problem-solving that requires rote memorization or basic extrapolation.

    Put simply, it’s hard to glean what it means that OpenAI’s GPT-4 can score in the 88th percentile on the LSAT, but cannot discern how many Rs are in the word “strawberry.” Anthropic’s Claude 3.7 Sonnet achieved 62.3% accuracy on a standardized software engineering benchmark, but it is worse at playing Pokémon than most five-year-olds.

    Image Credits:Minecraft Benchmark

    MC-Bench is technically a programming benchmark, since the models are asked to write code to create the prompted build, like “Frosty the Snowman” or “a charming tropical beach hut on a pristine sandy shore.”

    But it’s easier for most MC-Bench users to evaluate whether a snowman looks better than to dig into code, which gives the project wider appeal — and thus the potential to collect more data about which models consistently score better.

    Whether those scores amount to much in the way of AI usefulness is up for debate, of course. Singh asserts that they’re a strong signal, though.

    “The current leaderboard reflects quite closely to my own experience of using these models, which is unlike a lot of pure text benchmarks,” Singh said. “Maybe [MC-Bench] could be useful to companies to know if they’re heading in the right direction.”



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Kavish
    • Website

    Related Posts

    Redmi K90 Ultra Confirmed for June 30 Launch With Active Cooling Fan and Snapdragon 8 Elite

    June 24, 2026

    Samsung UFS 5.0 Storage Announced for Next-Gen Flagships: Massive Speed Boost And Efficiency Gains Touted

    June 24, 2026

    New Smartphone Brand Coming Soon! Fire-Boltt Could Shake up India’s Budget Smartphone Segment

    June 24, 2026

    Samsung Galaxy M47 5G Launching in India on June 29

    June 23, 2026

    Vivo X500 Pro Tipped With Dimensity 9600 Pro and a 64MP Portrait Lens

    June 23, 2026

    iQOO 15, iQOO 15R, Neo 10, and Vivo T5 Pro Get Second Price Hike in India

    June 23, 2026

    Comments are closed.

    Top Reviews
    Editors Picks

    Redmi K90 Ultra Confirmed for June 30 Launch With Active Cooling Fan and Snapdragon 8 Elite

    June 24, 2026

    Samsung UFS 5.0 Storage Announced for Next-Gen Flagships: Massive Speed Boost And Efficiency Gains Touted

    June 24, 2026

    New Smartphone Brand Coming Soon! Fire-Boltt Could Shake up India’s Budget Smartphone Segment

    June 24, 2026

    Samsung Galaxy M47 5G Launching in India on June 29

    June 23, 2026
    About Us
    About Us

    Email Us: info@xarkas.com

    Facebook Pinterest
    © 2026 . Designed by Xarkas Technologies.
    • Home
    • Mobiles
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.