Gemma 4とQwen 3.5のベンチマーク比較、各自のHugging Faceモデルカードからまとめたもの:
| モデル | MMLUP | GPQA | LCB | ELO | TAU2 | MMMLU | HLE-n | HLE-t |
|----------------|-------|-------|-------|------|-------|-------|-------|-------|
| G4 31B | 85.2% | 84.3% | 80.0% | 2150 | 76.9% | 88.4% | 19.5% | 26.5% |
| G4 26B A4B | 82.6% | 82.3% | 77.1% | 1718 | 68.2% | 86.3% | 8.7% | 17.2% |
| G4 E4B | 69.4% | 58.6% | 52.0% | 940 | 42.2% | 76.6% | - | - |
| G4 E2B | 60.0% | 43.4% | 44.0% | 633 | 24.5% | 67.4% | - | - |
| G3 27B no-T | 67.6% | 42.4% | 29.1% | 110 | 16.2% | 70.7% | - | - |
| GPT-5-mini | 83.7% | 82.8% | 80.5% | 2160 | 69.8% | 86.2% | 19.4% | 35.8% |
| GPT-OSS-120B | 80.8% | 80.1% | 82.7% | 2157 | -- | 78.2% | 14.9% | 19.0% |
| Q3-235B-A22B | 84.4% | 81.1% | 75.1% | 2146 | 58.5% | 83.4% | 18.2% | -- |
| Q3.5-122B-A10B | 86.7% | 86.6% | 78.9% | 2100 | 79.5% | 86.7% | 25.3% | 47.5% |
| Q3.5-27B | 86.1% | 85.5% | 80.7% | 1899 | 79.0% | 85.9% | 24.3% | 48.5% |
| Q3.5-35B-A3B | 85.3% | 84.2% | 74.6% | 2028 | 81.2% | 85.2% | 22.4% | 47.4% |
MMLUP: MMLU-Pro GPQA: GPQA Diamond LCB: LiveCodeBench v6 ELO: Codeforces ELO TAU2: TAU2-Bench MMMLU: MMMLU HLE-n: Humanity's Last Exam (no tools / CoT) HLE-t: Humanity's Last Exam (with search / tool) no-T: no think