NEW: Quality evaluations (ARC, GSM8K, IFEval) are now live. Open Model Leaderboard →