3D AI that can generate a realistic chair, or LLMs that can describe or caption, shows some understanding of shape, position and function - but lack an understanding of how tall the chair should be to fit under a table, its function, or feasible positions in a room.
Below are examples 3D AI with limited spatial awareness. AI does not truly understand the 3D spaces - it lacks awareness of:
The objects present in the scene [ appearance, function, etc. ]
Each object's size, shape, and position
The relationships between objects
↑ Depth estimation only sees the 3D world as 2D pixels.
[ This is not true understanding - this intelligence does not generalise well. ]
↓ 3D reconstruction only "understands" the real world [ ↑ ] as points in 3D space.
[ This is not true understanding - this intelligence does not generalise well. ]
Automating 3D scenes
[ E.g. architecture visualisation, games design, etc. ]
AI struggles to select and position the right assets due to limited object understanding and a lack of spatial reasoning.
Embodied intelligence
[ E.g. robotics ]
AI struggles in uncertain environments from a lack of robust understanding of individual objects - their sizes, shapes, positions, properties, and functions.