NEW:
Quality evaluations (ARC, GSM8K, IFEval) are now live.
Open Model Leaderboard →