We’re building a single AI model that understands the structure, content, and function of the 3D world - from simple video inputs, without manual labels.
We’re building a single AI model that understands the structure, content, and function of the 3D world - from simple video inputs, without manual labels.
This enables deep understanding of objects, spaces, and their relationships - all with a single, generalisable model.
This enables deep understanding of objects, spaces, and their relationships - all with a single, generalisable model.
What LLMs did for language, we’re doing for 3D - unlocking new applications across architecture, gaming, and robotics, and without the need for retraining.
What LLMs did for language, we’re doing for 3D - unlocking new applications across architecture, gaming, and robotics, and without the need for retraining.
[ Read more ↓ ]
[ Read more ↓ ]
Overview
Overview
[ Problem ]
[ Problem ]
AI can label, caption, and generate 3D objects or scenes - but it still doesn't understand 3D spaces.
AI can label, caption, and generate 3D objects or scenes - but it still doesn't understand 3D spaces.
Without true 3D understanding, real-world applications like architecture, gaming, and robotics remain brittle, manual, and unintelligent.
Without true 3D understanding, real-world applications like architecture, gaming, and robotics remain brittle, manual, and unintelligent.
[ Solution ]
[ Solution ]
Spatial Intelligence teaches AI to perceive and reason about 3D spaces like humans - recognising objects, their properties, functions, and relationships.
Spatial Intelligence teaches AI to perceive and reason about 3D spaces like humans - recognising objects, their properties, functions, and relationships.
[ Use Cases ]
[ Use Cases ]
This unlocks new possibilities across industries:
This unlocks new possibilities across industries:
- Architecture and gaming:
Automate up to 40% of workflows with 3D copilots for smarter 3D search, asset selection, and scene generation. [ We've already demonstrated this with our proof of concept - enabling 3D creatives to fill blank scenes with intelligently selected and positioned assets in a single click. ] - Robotics:
Save 35% of development time by generating diverse, realistic, infinite-scale simulation environments - and enable embodied agents to reason about objects through a 3D knowledge base.
These use cases are just the beginning - we plan to collaborate with vertical partners to turn this core intelligence into transformative products.
These use cases are just the beginning - we plan to collaborate with vertical partners to turn this core intelligence into transformative products.
[ Why Now? ]
[ Why Now? ]
Just as LLMs unlocked the digital world through language, foundational 3D models will unlock the 3D and physical worlds - powering the next generation of digital and embodied AI.
Just as LLMs unlocked the digital world through language, foundational 3D models will unlock the 3D and physical worlds - powering the next generation of digital and embodied AI.
The world is ready - and waiting for the first mover.
The world is ready - and waiting for the first mover.
[ Why Us? ]
[ Why Us? ]
We aren’t building another 3D tool.
We aren’t building another 3D tool.
We’re building the core intelligence layer that will power them all.
We’re building the core intelligence layer that will power them all.
[ Opportunity ]
[ Opportunity ]
Whoever builds the intelligence layer for 3D will shape the future of both digital creativity and embodied AI.
Whoever builds the intelligence layer for 3D will shape the future of both digital creativity and embodied AI.
We’re building that future.
We’re building that future.
Interested in collaborating?
Interested in collaborating?
We’re always looking for partners, early adopters, and exceptional technical talent.
We’re always looking for partners, early adopters, and exceptional technical talent.
Let’s build the future of spatial intelligence together:
Let’s build the future of spatial intelligence together:
FAQs
FAQs
Q: How does it work?
Q: How does it work?
A: Most people hear music as a seamless flow of sound. A composer, though, hears the individual instruments, rhythms, and structures - and understands how they fit together.
A: Most people hear music as a seamless flow of sound. A composer, though, hears the individual instruments, rhythms, and structures - and understands how they fit together.
At Spatial Intelligence, our model learns to "hear" 3D spaces like a composer: breaking down images and videos into objects, learning their forms, functions and relationships.
At Spatial Intelligence, our model learns to "hear" 3D spaces like a composer: breaking down images and videos into objects, learning their forms, functions and relationships.
This transforms 3D spaces from a flat soup of pixels into structured, understandable environments - enabling flexible intelligence that can be applied across real-world tasks.
This transforms 3D spaces from a flat soup of pixels into structured, understandable environments - enabling flexible intelligence that can be applied across real-world tasks.
Q: How is this different from other 3D generation or recognition tools?
Q: How is this different from other 3D generation or recognition tools?
A: Unlike tools that focus on generating scenes or classify objects in isolation, Spatial Intelligence is learning the structure, content, and function of 3D spaces - enabling true perception and reasoning.
A: Unlike tools that focus on generating scenes or classify objects in isolation, Spatial Intelligence is learning the structure, content, and function of 3D spaces - enabling true perception and reasoning.
For example, generating a realistic chair shows some understanding of shape and style - but not how tall the chair should be to fit under a table, its function, or where it belongs in a room.
For example, generating a realistic chair shows some understanding of shape and style - but not how tall the chair should be to fit under a table, its function, or where it belongs in a room.
Our model learns about shape, function and relationships in a way that generalises across applications.
Our model learns about shape, function and relationships in a way that generalises across applications.
Q: What's the difference between this and 3D LLMs or VLMs?
Q: What's the difference between this and 3D LLMs or VLMs?
A: Google Translate and ChatGPT use similar algorithms - but Google Translate learned translation, while ChatGPT learned the structure and content of language itself, enabling broad generalisation.
A: Google Translate and ChatGPT use similar algorithms - but Google Translate learned translation, while ChatGPT learned the structure and content of language itself, enabling broad generalisation.
Similarly, most 3D LLMs and VLMs gain partial 3D knowledge as a by-product of solving specific tasks. Our model is different: it is trained to understand the structure, content, and function of 3D environments - objects, spaces, and how they function together - enabling generalisation across tasks and industries.
Similarly, most 3D LLMs and VLMs gain partial 3D knowledge as a by-product of solving specific tasks. Our model is different: it is trained to understand the structure, content, and function of 3D environments - objects, spaces, and how they function together - enabling generalisation across tasks and industries.
Q: What makes this technically feasible now?
Q: What makes this technically feasible now?
A: Advances in vision transformers, foundation models, and self-supervised learning have made it possible to move beyond single-task 3D AI.
A: Advances in vision transformers, foundation models, and self-supervised learning have made it possible to move beyond single-task 3D AI.
At Spatial Intelligence, we're combining these advances with new 3D-centric learning objectives focused on object, spatial, and relational understanding.
At Spatial Intelligence, we're combining these advances with new 3D-centric learning objectives focused on object, spatial, and relational understanding.
The technology is ready - we seek to apply it across tasks, industries, and real-world applications.
The technology is ready - we seek to apply it across tasks, industries, and real-world applications.
Q: What will you use to train your model?
Q: What will you use to train your model?
A: We pre-train our model without manual labels using on RGB video data from standard cameras - phones, drones, autonomous vehicles.
A: We pre-train our model without manual labels using on RGB video data from standard cameras - phones, drones, autonomous vehicles.
By leveraging ubiquitous data types and flexible learning methods, we can generalise across applications and scale towards universal integration.
By leveraging ubiquitous data types and flexible learning methods, we can generalise across applications and scale towards universal integration.
Q: How do you go to market?
Q: How do you go to market?
A: Initially through partnerships in architecture visualisation, gaming, and robotics simulation - industries where 3D asset workflows and simulation are critical pain points.
A: Initially through partnerships in architecture visualisation, gaming, and robotics simulation - industries where 3D asset workflows and simulation are critical pain points.
Our proof of concept with SpaceForm Technologies already demonstrates early value: automating asset selection and placement within 3D scenes, saving up to 40% of workflow time. In robotics, our model can generate diverse, realistic 3D environments for training, reducing the time and cost of manual setup.
Our proof of concept with SpaceForm Technologies already demonstrates early value: automating asset selection and placement within 3D scenes, saving up to 40% of workflow time. In robotics, our model can generate diverse, realistic 3D environments for training, reducing the time and cost of manual setup.
We'll expand through further commercial development partnerships across industries.
We'll expand through further commercial development partnerships across industries.
Q: What’s your moat?
Q: What’s your moat?
A: By learning about the structure, content, and function of the 3D world - not just generating scenes or classifying objects - we enable broader generalisation across tasks and industries.
A: By learning about the structure, content, and function of the 3D world - not just generating scenes or classifying objects - we enable broader generalisation across tasks and industries.
Our model's flexibility means one core intelligence can automate scene design, generate simulation environments, and enable robotic reasoning - without retraining from scratch - unlike our competitors. This defensibility compounds as the model scales.
Our model's flexibility means one core intelligence can automate scene design, generate simulation environments, and enable robotic reasoning - without retraining from scratch - unlike our competitors. This defensibility compounds as the model scales.
We expect competition - but acting now gives us first-mover advantage, and our novel approach strengthens our moat over time.
We expect competition - but acting now gives us first-mover advantage, and our novel approach strengthens our moat over time.
Q: What’s the biggest risk?
Q: What’s the biggest risk?
A: As with any deep-tech innovation, there are both technology and market risks - we are operating at the frontier with many unknowns.
A: As with any deep-tech innovation, there are both technology and market risks - we are operating at the frontier with many unknowns.
But the opportunity is enormous - and foundational models have historically become the platforms that power entire ecosystems.
But the opportunity is enormous - and foundational models have historically become the platforms that power entire ecosystems.
We believe 3D will follow the same pattern.
We believe 3D will follow the same pattern.