[ Analogy - What LLMs did for language, we’re doing for 3D ]
Google Translate and ChatGPT use similar algorithms - but Google Translate learned task of translation and gained partial language knowledge. ChatGPT learned the structure and content of language itself, enabling broad generalisation across language tasks.
Similarly, most 3D LLMs, and VLMs gain partial 3D knowledge as a by-product of solving specific tasks. The difference: our model is trained to understand objects, spaces, and their relations - the structure, content, and function of 3D environments - enabling broad generalisation across 3D tasks and industries.