LiveBench is an open LLM benchmark using contamination-free test data

It’s time to celebrate the incredible women leading the way in AI! Nominate your inspiring leaders for VentureBeat’s Women in AI Awards today before June 18. Learn More

A team of Nvidia, Abacus.ai, New York University, the University of Maryland and the University of Southern California has developed a new benchmark that addresses “serious limitations” with industry incumbents. LiveBench is a general-purpose LLM benchmark that offers contamination-free test data, which occurs when more models train on the same dataset. It utilizes “frequently-updated questions from recent sources, scoring answers automatically according to objective ground-truth values, and contains a wide variety of challenging tasks, spanning math, coding, reasoning, language, instruction following, and data analysis.”

The release of LiveBench is especially notable because one of its contributors is Yann LeCun, a pioneer in the world of AI, Meta’s chief AI scientist, and someone who recently got into a spat with Elon Musk. Joining him are Abacus.ai’s Head of Research Colin White and research scientists Samuel Dooley, Manley Roberts and Arka Pal; Nvidia’s Senior Research Scientist Siddhartha Jain; and academics Ben Feuer, Ravid Shwartz-Ziv, Neel Jain, Khalid Saifullah, Siddartha Naidu, Chinmay Hegde, Tom Goldstein, Willie Neiswanger, and Micah Goldblum.