Accuracy scores on the test subset (3,113 examples)
of
SolidGeo.
๐จ To submit your results to the leaderboard, please send to this email with your result in this format
# | Model | Stage | Source | Date | Overall | CSS | SMR | SSI | PUC | MSGF | SGM | MVP | 3DCV | Avg.tokens |
1 | Deepseek-V3(Text Only) | System-1 | Link | 2025-05-20 | 9.3 | 10.7 | 8.1 | 8.3 | 12.7 | 6.3 | 7.8 | 10.3 | 12.2 | 787.2 |
2 | GPT-4o(Text Only) | System-1 | Link | 2025-05-20 | 9.1 | 10.0 | 10.4 | 10.6 | 6.8 | 12.1 | 8.6 | 7.3 | 9.6 | 692.6 |
3 | LLaVA-v1.5-7B | System-1 | Link | 2025-05-20 | 1.8 | 1.1 | 1.1 | 6.7 | 2.2 | 0.6 | 0.0 | 4.6 | 0.0 | 246.2 |
4 | InternLM-XComposer2.5-VL-7B | System-1 | Link | 2025-05-20 | 4.4 | 2.5 | 1.8 | 6.7 | 8.9 | 0.6 | 0.0 | 9.4 | 1.2 | 151.8 |
5 | DeepSeek-VL2-7B | System-1 | Link | 2025-05-20 | 5.1 | 2.8 | 2.6 | 11.1 | 5.1 | 1.4 | 1.8 | 11.7 | 1.8 | 338.2 |
6 | Math-LLaVA-13B | System-1 | Link | 2025-05-20 | 5.9 | 4.2 | 4.1 | 7.6 | 11.7 | 2.7 | 4.2 | 12.6 | 6.2 | 7.4 |
7 | LLaVA-NeXT-Interleave-7B | System-1 | Link | 2025-05-20 | 7.7 | 2.5 | 2.3 | 21.5 | 13.5 | 2.3 | 7.3 | 16.7 | 0.6 | 486.3 |
8 | LLaVA-OneVision-Chat-7B | System-1 | Link | 2025-05-20 | 8.6 | 4.3 | 2.9 | 19.3 | 15.2 | 3.5 | 6.4 | 17.9 | 0.0 | 353.2 |
9 | Qwen2.5-VL-Instruct-7B | System-1 | Link | 2025-05-20 | 15.5 | 8.4 | 8.8 | 30.1 | 13.3 | 26.2 | 16.2 | 15.2 | 10.2 | 490.2 |
10 | LLaVA-OneVision-Chat-72B | System-1 | Link | 2025-05-20 | 15.9 | 13.2 | 9.5 | 31.9 | 18.1 | 12.9 | 11.8 | 23.7 | 8.4 | 396.3 |
11 | InternVL3-8B | System-1 | Link | 2025-05-20 | 17.7 | 11.8 | 10.0 | 24.4 | 17.4 | 28.0 | 19.1 | 19.9 | 7.2 | 488.8 |
12 | Mistral-small-3.1-24b-instruct | System-1 | Link | 2025-05-20 | 19.6 | 15.2 | 15.8 | 27.4 | 17.1 | 28.9 | 10.9 | 17.0 | 16.8 | 769.7 |
13 | Qwen2.5-VL-Instruct-72B | System-1 | Link | 2025-05-20 | 24.2 | 19.7 | 18.8 | 29.6 | 21.5 | 35.4 | 16.4 | 22.5 | 18.0 | 485.0 |
14 | InternVL3-78B | System-1 | Link | 2025-05-20 | 26.2 | 17.4 | 17.9 | 34.8 | 24.9 | 36.8 | 22.7 | 30.5 | 17.4 | 493.2 |
15 | Llama-4-Maverick-17B-128E | System-1 | Link | 2025-05-20 | 29.6 | 25.1 | 30.9 | 34.6 | 20.5 | 43.4 | 32.6 | 20.7 | 26.3 | 605.6 |
16 | LlamaV-o1-11B | System-2 | Link | 2025-05-20 | 1.5 | 0.6 | 0.7 | 1.5 | 0.5 | 5.0 | 2.7 | 0.1 | 0.0 | 106.1 |
17 | LLaVA-CoT-11B | System-2 | Link | 2025-05-20 | 7.3 | 4.2 | 2.5 | 7.4 | 6.5 | 15.1 | 8.2 | 7.4 | 1.8 | 401.7 |
18 | VLM-R1-3B | System-2 | Link | 2025-05-20 | 9.6 | 6.3 | 4.4 | 11.1 | 8.7 | 19.6 | 4.5 | 8.3 | 2.4 | 453.0 |
19 | R1-Onevision-7B | System-2 | Link | 2025-05-20 | 13.2 | 7.7 | 9.7 | 25.2 | 10.1 | 23.3 | 11.8 | 12.3 | 9.0 | 522.3 |
20 | Vision-R1-7B | System-2 | Link | 2025-05-20 | 18.1 | 11.7 | 11.3 | 28.6 | 17.8 | 26.9 | 13.9 | 19.3 | 12.0 | 1498.7 |
21 | Skywork-R1V2-38B | System-2 | Link | 2025-05-20 | 23.0 | 18.4 | 29.5 | 13.3 | 11.6 | 31.2 | 30.0 | 12.3 | 26.9 | 5682.9 |
22 | QvQ-72B-Preview | System-2 | Link | 2025-05-20 | 26.6 | 17.9 | 28.1 | 37.0 | 22.9 | 34.7 | 20.9 | 20.3 | 22.8 | 3622.2 |
23 | Claude-3.5-Sonnet | System-1 | Link | 2025-05-20 | 22.2 | 16.9 | 9.8 | 42.2 | 24.2 | 36.5 | 25.5 | 23.5 | 9.6 | 992.1 |
24 | GPT-4V | System-1 | Link | 2025-05-20 | 25.3 | 16.6 | 15.8 | 35.6 | 21.5 | 41.5 | 25.5 | 25.9 | 18.0 | 1433.5 |
25 | Gemini-1.5-pro | System-1 | Link | 2025-05-20 | 25.3 | 18.5 | 16.8 | 34.8 | 19.6 | 41.6 | 17.3 | 25.6 | 19.2 | 1003.5 |
26 | GPT-4o | System-1 | Link | 2025-05-20 | 25.5 | 18.9 | 16.8 | 32.6 | 19.6 | 41.0 | 17.3 | 26.5 | 19.2 | 1344.9 |
27 | Claude-3.7-Sonnet | System-1 | Link | 2025-05-20 | 34.1 | 27.7 | 28.2 | 43.0 | 32.9 | 46.8 | 43.6 | 28.5 | 26.3 | 1217.4 |
28 | Gemini-2.5-pro | System-1 | Link | 2025-05-20 | 42.7 | 52.0 | 75.7 | 24.8 | 20.9 | 26.0 | 58.4 | 19.6 | 72.9 | 1263.9 |
29 | OpenAI-o1 | System-2 | Link | 2025-05-20 | 49.5 | 48.7 | 54.2 | 48.9 | 36.1 | 55.3 | 59.1 | 43.0 | 55.1 | 4942.6 |
- | Human* | - | 2025-05-20 | 77.5 | 88.2 | 70.9 | 90.2 | 77.2 | 87.4 | 71.2 | 78.5 | 69.2 | - |
Stage types: System-1: fast, intuitive System, System-2: slower, more deliberate System
Problem types: CSS: Composite Solid Structures, SMR: Spatial Metric Relations, SSI: Solid Shape Identification,
PUC: Planar Unfolding and Configuration, MSGF: Measurement of Solid Geometric Forms. SGM: Solid Geometry Modeling,
MVP: Multi-view Projection, 3DCV: 3D Coordinate and Vector Reasoning,