On Tuesday, the face -to -face face released Smolola, which is an open source vision language action (VLA) artificial intelligence (AI) model. The big language model aims to work on robotics workflows and training. The company claims that the AI ​​model is a small and efficient enough to run on a computer locally with a single user GPU, or MacBook. The reservoir of the AI ​​model based in New York, the United States has also claimed that Smola can improve models that are much larger than that. AI model is currently available to download.
Smelling the face by hugs can run on MacBook locally
According to the throat face, the progress in robotics has been slow despite the increase in AI’s space. The company says the reason is a Lack of high quality and diverse dataAnd larger language models (LLMS) designed for robotics workflows.
VLAS has emerged as a problem, but most of the leading models of companies like Google and NVIDIA are proprietary and are trained on private datases. As a result, the large robotics research community, which relies on open source data, faces major obstacles to recovering or building these AI models, highlighting this post.
These VLA models can capture images, videos, or direct camera feeds, can understand the state of the real world and then perform an elite using robotics hardware.
The throat face says that Smolola has identified both pain points at the time the robotics research community-this is an open source robotics-based model that is trained on the open dataset of the Lerobot community. Smolola is a 450 million parameter AI model that can run on a desktop computer with the same compatible GPU, or even one of the new MacBook devices.
The architecture is approaching, made on the company’s VLM models. It contains a smuggling vision encoder and the language of the language. Visual information is caught and extracted by vision encoder, while natural language indicators are fed in Tokinzed and Decoder.
When dealing with movements or physical action (performing work through robotic hardware), the sensormotor signals are added to the same token. The decoder then connects all this information into the same stream and works together. This enables the model to understand real -world data and work, and not as separate organizations.
Smola sends everything and sends everything known as an action expert, which shows what to do. The action expert is a transformer -based architecture with 100 million parameters. It has predicted a series of future tricks for robots (walking steps, arm movements, etc.), also known as the action section.
Although this applies to a niche -populated data, but those who can work with robotics Download Open Open Open Open Open Open Open Open Weight, Details, and Training Recitation Recipes. In addition, robotics enthusiasts who have access to robotic arm or similar hardware can also download them to run models and try real -time robotics workflows.