Ai2’s MolmoAct Model ‘Thinks in 3D’ to Compete with Nvidia and Google in Robotics AI

Ai2’s MolmoAct Model ‘Thinks in 3D’ to Compete with Nvidia and Google in Robotics AI

Looking for smart insights in your inbox? Sign up for our weekly newsletters to receive crucial updates for enterprise AI, data, and security leaders. Subscribe Now


The field of Physical AI, which combines robotics with foundation models, is rapidly expanding as companies like Nvidia, Google, and Meta conduct research and experiments in integrating large language models (LLMs) with robots. 

New research from the Allen Institute for AI (Ai2) seeks to rival companies like Nvidia and Google in physical AI by introducing MolmoAct 7B, a new open-source model that enables robots to “reason in space.” Based on Ai2’s open-source Molmo, MolmoAct “thinks” in three dimensions. The institute is also releasing the training data. Ai2 has an Apache 2.0 license for the model, while the datasets are available under CC BY-4.0. 

Ai2 defines MolmoAct as an Action Reasoning Model, where foundation models reason about actions within a physical, 3D space.

This means MolmoAct can leverage its reasoning capabilities to understand the physical world, strategize its spatial occupation, and take action accordingly. 


AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive discussion to learn how leading teams are:

  • Making energy a strategic advantage
  • Designing efficient inference for substantial throughput gains
  • Achieving competitive ROI with sustainable AI systems

Reserve your spot for insights: https://bit.ly/4mwGngO


“MolmoAct provides 3D spatial reasoning capabilities in contrast to traditional vision-language-action (VLA) models,” Ai2 explained to VentureBeat in an email. “Most robotics models are VLA-based without spatial reasoning, but MolmoAct includes this feature, enhancing its performance and generalizability from an architectural perspective.”

Physical understanding 

Since robots operate in the physical world, Ai2 asserts that MolmoAct aids robots in comprehending their environment for improved interaction. 

“MolmoAct could be utilized anywhere a machine needs to reason about its physical settings,” the company noted. “We primarily consider its application in home environments as they present the greatest challenge for robotics due to irregularity and constant change, but MolmoAct is versatile.”

MolmoAct comprehends the physical world by generating “spatially grounded perception tokens,” which are tokens pretrained and derived using a vector-quantized variational autoencoder or a model turning data inputs

Leave a Reply

Your email address will not be published. Required fields are marked *