Logo SolidGeo

Measuring Multimodal Spatial Math Reasoning in Solid Geometry

1MAIS, Institute of Automation of Chinese Academy of Sciences,
2School of Artificial Intelligence, University of Chinese Academy of Sciences, 3University of Electronic Science and Technology of China, 4TAL
* indicates equal contribution
geometric reasoning

Performance of six MLLMs on Logo SolidGeo benchmark across 8 solid geometry subjects (left), and trade-off between accuracy and average generated token length across 25 MLLMs (right).

Introduction

Geometry is a fundamental branch of mathematics and plays a crucial role in evaluating the reasoning capabilities of multimodal large language models (MLLMs). However, existing multimodal mathematics benchmarks mainly focus on plane geometry and largely ignore solid geometry, which requires spatial reasoning and is more challenging than plane geometry.

To address this critical gap, we introduce Logo SolidGeo, the first large-scale benchmark specifically designed to evaluate the performance of MLLMs on mathematical reasoning tasks in solid geometry. Logo SolidGeo consists of 3,113 real-world Kโ€“12 and competition-level problems, each paired with visual context and annotated with 3 difficulty levels and 8 fine-grained solid geometry categories.

Logo SolidGeo covers a wide range of 3D reasoning subjects such as projection, unfolding, spatial measurement, and spatial vector, offering a rigorous testbed for assessing solid geometry. Through extensive experiments, we observe that MLLMs encounter substantial challenges in solid geometry math tasks, with a considerable performance gap relative to human capabilities on Logo SolidGeo. Moreover, we analyze the performance, inference effiency and error patterns of various models, offering insights into the solid geometric mathematical reasoning capabilities of MLLMs. We hope Logo SolidGeo serves as a catalyst for advancing MLLMs toward deeper geometric reasoning and spatial intelligence.

Leaderboard on Logo SolidGeo

Accuracy scores on the test subset (3,113 examples) of Logo SolidGeo.

๐Ÿšจ To submit your results to the leaderboard, please send to this email with your result in this format

# Model Stage Source Date Overall CSS SMR SSI PUC MSGF SGM MVP 3DCV Avg.tokens
1 Deepseek-V3(Text Only) System-1 Link 2025-05-20 9.3 10.7 8.1 8.3 12.7 6.3 7.8 10.3 12.2 787.2
2 GPT-4o(Text Only) System-1 Link 2025-05-20 9.1 10.0 10.4 10.6 6.8 12.1 8.6 7.3 9.6 692.6
3 LLaVA-v1.5-7B System-1 Link 2025-05-20 1.8 1.1 1.1 6.7 2.2 0.6 0.0 4.6 0.0 246.2
4 InternLM-XComposer2.5-VL-7B System-1 Link 2025-05-20 4.4 2.5 1.8 6.7 8.9 0.6 0.0 9.4 1.2 151.8
5 DeepSeek-VL2-7B System-1 Link 2025-05-20 5.1 2.8 2.6 11.1 5.1 1.4 1.8 11.7 1.8 338.2
6 Math-LLaVA-13B System-1 Link 2025-05-20 5.9 4.2 4.1 7.6 11.7 2.7 4.2 12.6 6.2 7.4
7 LLaVA-NeXT-Interleave-7B System-1 Link 2025-05-20 7.7 2.5 2.3 21.5 13.5 2.3 7.3 16.7 0.6 486.3
8 LLaVA-OneVision-Chat-7B System-1 Link 2025-05-20 8.6 4.3 2.9 19.3 15.2 3.5 6.4 17.9 0.0 353.2
9 Qwen2.5-VL-Instruct-7B System-1 Link 2025-05-20 15.5 8.4 8.8 30.1 13.3 26.2 16.2 15.2 10.2 490.2
10 LLaVA-OneVision-Chat-72B System-1 Link 2025-05-20 15.9 13.2 9.5 31.9 18.1 12.9 11.8 23.7 8.4 396.3
11 InternVL3-8B System-1 Link 2025-05-20 17.7 11.8 10.0 24.4 17.4 28.0 19.1 19.9 7.2 488.8
12 Mistral-small-3.1-24b-instruct System-1 Link 2025-05-20 19.6 15.2 15.8 27.4 17.1 28.9 10.9 17.0 16.8 769.7
13 Qwen2.5-VL-Instruct-72B System-1 Link 2025-05-20 24.2 19.7 18.8 29.6 21.5 35.4 16.4 22.5 18.0 485.0
14 InternVL3-78B System-1 Link 2025-05-20 26.2 17.4 17.9 34.8 24.9 36.8 22.7 30.5 17.4 493.2
15 Llama-4-Maverick-17B-128E System-1 Link 2025-05-20 29.6 25.1 30.9 34.6 20.5 43.4 32.6 20.7 26.3 605.6
16 LlamaV-o1-11B System-2 Link 2025-05-20 1.5 0.6 0.7 1.5 0.5 5.0 2.7 0.1 0.0 106.1
17 LLaVA-CoT-11B System-2 Link 2025-05-20 7.3 4.2 2.5 7.4 6.5 15.1 8.2 7.4 1.8 401.7
18 VLM-R1-3B System-2 Link 2025-05-20 9.6 6.3 4.4 11.1 8.7 19.6 4.5 8.3 2.4 453.0
19 R1-Onevision-7B System-2 Link 2025-05-20 13.2 7.7 9.7 25.2 10.1 23.3 11.8 12.3 9.0 522.3
20 Vision-R1-7B System-2 Link 2025-05-20 18.1 11.7 11.3 28.6 17.8 26.9 13.9 19.3 12.0 1498.7
21 Skywork-R1V2-38B System-2 Link 2025-05-20 23.0 18.4 29.5 13.3 11.6 31.2 30.0 12.3 26.9 5682.9
22 QvQ-72B-Preview System-2 Link 2025-05-20 26.6 17.9 28.1 37.0 22.9 34.7 20.9 20.3 22.8 3622.2
23 Claude-3.5-Sonnet System-1 Link 2025-05-20 22.2 16.9 9.8 42.2 24.2 36.5 25.5 23.5 9.6 992.1
24 GPT-4V System-1 Link 2025-05-20 25.3 16.6 15.8 35.6 21.5 41.5 25.5 25.9 18.0 1433.5
25 Gemini-1.5-pro System-1 Link 2025-05-20 25.3 18.5 16.8 34.8 19.6 41.6 17.3 25.6 19.2 1003.5
26 GPT-4o System-1 Link 2025-05-20 25.5 18.9 16.8 32.6 19.6 41.0 17.3 26.5 19.2 1344.9
27 Claude-3.7-Sonnet System-1 Link 2025-05-20 34.1 27.7 28.2 43.0 32.9 46.8 43.6 28.5 26.3 1217.4
28 Gemini-2.5-pro System-1 Link 2025-05-20 42.7 52.0 75.7 24.8 20.9 26.0 58.4 19.6 72.9 1263.9
29 OpenAI-o1 System-2 Link 2025-05-20 49.5 48.7 54.2 48.9 36.1 55.3 59.1 43.0 55.1 4942.6
- Human* - 2025-05-20 77.5 88.2 70.9 90.2 77.2 87.4 71.2 78.5 69.2 -
Human Performance*: Average human performance from annotators who have high school diplomas
Stage types: System-1: fast, intuitive System, System-2: slower, more deliberate System
Problem types: CSS: Composite Solid Structures, SMR: Spatial Metric Relations, SSI: Solid Shape Identification,
PUC: Planar Unfolding and Configuration, MSGF: Measurement of Solid Geometric Forms. SGM: Solid Geometry Modeling,
MVP: Multi-view Projection, 3DCV: 3D Coordinate and Vector Reasoning,

Logo SolidGeo Dataset

data-overview

Key statistics of Logo SolidGeo.

data-composition

Distribution of Logo SolidGeo.

Visualization

Logo Visualization Examples

BibTeX

coming soon