
For more than a decade, the discussion AI has promised assistants like humans who can work more than chat. However, when a large language model (LLM) is learning, explanation and code such as Chat GPT, Gemini, and Cloud, an important category of dialogue is not largely resolved – to complete reliable work for people. Out of chat.
Even Best AI models only score in The Terminal Bench on the tough 30th, A third party benchmark, which is mostly demanding reliability by businesses and consumers, is designed to evaluate the performance of AI agents on completing a variety of browser -based tasks. And task specific standards Tao Bench like airline, Which measures The reliability of AI agents on the search and booking of flights On behalf of a user, the maximum rate does not take place with the pass Only 56 % for high -performing agents and models (Claude 3.7 Sant) – This means that the agent fails in about half a time.
Resident of New York City Increased intelligence (AUI) Inc.By virtue Ohhid Elhilo And Ore cohinIt is believed that the reliability of the AI ​​agent has finally come up with a solution to the extension of the level where most businesses can trust that they will reliably comply with guidance.
The company’s new Foundation model, called apollo-1 – who now lives in preview with early testers but is close to a coming general release – made on the principle that is called State Neuro Symbol Reasoning.
This is a hybrid architecture that also confronted Gary Marks such as LLM skipaticsEach customer’s interaction is designed to guarantee the results according to the policy.
In a recent interview with the Venture Bat, Allo said, “The talk AI is basically two half.” “The first half-open conversation-is beautifully handled by the LLM. They are designed for creative or research use matters. The remaining half is a task-based dialogue, where there is always a special purpose behind the conversation. This is not a solution because it is not needed.”
Oi explained Certainty As a difference between an agent that “perhaps” performs a task and which almost “always”.
For example, on Tao Bench Airline, it performs at a surprisingly 92.5 % pass rateAccording to a joint benchmark with Venture Bat, all other existing rivals leave in the dust – Posted on AUI’s website.
Alloo offered simple examples: a bank that should implement an ID verification for a refund of more than $ 200, or an airline that should always offer a business class upgrade before the economy.
“These are not priorities,” he said. “They are requirements. And no one can fully assure such a productive approach.”
The first subscription news outlet was covered by the first subscription news outlet to improve AUI and reliability InformationBut so far widespread coverage has not been received in publicly accessible media.
From pattern matching to prediction action
The team argues that the transformer model, through design, cannot meet this bar. Big language models produce understandable text, not guaranteed. “When you tell the LLM that you always offer insurance before paying, this can be usually,” said Elihillo. “Create Apollo 1 with this principle, and it will be every time.”
He said that this discrimination itself is from architecture. Transformers predict the next token in a sequence. Apollo -1, vice versa, predicts The next action In a conversation, what to say to AUI Typered symbolic condition.
Cohen explained this idea in more technical terms. “Neuro symbolic means that we are integrating two dominant parables,” he said. “The symbolic layer gives you a structure-it knows what an intention, an entity and parameters are-while the nerve layer gives you the flow of the language. Neuro symbolic reasoning sits among them. This is a different type of brain for dialogue.”
Where the transformer behaves every output as a text generation, Apollo 1 runs a closed reasoning loop: an encoder translates the natural language into a symbolic state, a state machine maintains this state, a decision engine determines the next action, a plan maker performs it, and a decorative language. “This is a repetition,” Cohen said. “Unless this work is finished.
A Foundation Model for Task Implementation
Unlike traditional chat boats or basipox automation systems, Apollo 1 is intended to act as a Foundation model For task-based dialogue-a single, domain-on-insost system that can be created for banking, travel, retail or insurance through which AUI is called. System Pympt.
“The system is not a configuration file.” This is a behavior contract, Eliju said. You explain how your agent should behave in a situation of interest, and Apollo -1 guarantees that these behavior will be implemented. “
Organizations can use gestures to encode the symbolic slots, parameters, parameters and policies-tool-tools and rules depending on the state.
For example, a food supply app “If allergies are mentioned, always inform the restaurant”, while a telecom provider can explain “after three failed payment attempts, after suspending service”. In both cases, behavior performs in practice, not in terms of figures.
Eight years in making
The AUI’s Apollo -1 route began in 2017, when the team began to encode millions of task conversations through a human agent manpower of 60,000 people.
The ability to separate the symbolic language as a result of this work Knowledge of procedure – steps, obstacles and flow – Descriptive knowledge Like entities and attributes.
“The insight was that the task -based dialogue contains samples of global procedures.” “Food supply, claim processing, and order management all share similar structures. Once you make a clear model, you can count their commitment.”
From there, the company built a neuro-symbolic reasoning-a system that uses a symbolic state to decide what to do with the token forecasts.
Benchmarks suggest that architecture makes a measure.
In his own diagnosis of Oi, Apollo 1 achieved success 90 % In comparison, the completion of the task on the bench-airline benchmark 60 % For Claude -4.
It was completed 83 % Of direct booking chats than Google flights 22 % Gemini 2.5 for flash, and 91 % Of retail scenario on Amazon vs. 17 % For the roofs
“These are not additional improvements,” Cohen said.
A completion, not competitors
The AUI is not patching Apollo-1 as an alternative to large language models, but as his essential counterpart. In the words of Al -Aloil: “Transformer improves creative possibilities. Improves Apollo -1 behaviors for assurance. At times, they create a complete spectrum of the conversation AI.”
This model is already running in limited pilots, with unknown Fortune 500 companies running with 500 companies in sectors, including finance, travel and retail.
Oi has also confirmed that Strategic partnership with Google And for the project Normal availability in November 2025When it opens API, releases full documents, and adding sound and image capabilities. Interested potential users and partners can sign up to get more information when it AUI’s website is available on the form.
Until then, the company is wrapped up the details. When he was asked what came forward, Eliju smiled. “We just say that we are making an announcement,” he said. “Soon.”
Toward the conversation that works
All of its technical sophistication AP, Apollo -1 pitch is easy: Make AI can trust in doing business -don’t just talk. “We are on the mission to democratically to access access to AI working,” Cohen said, near the end of the interview.
Even if Apollo 1 becomes a new task -based dialogue standard. But if the architecture of Oi performs according to the promise, then the long -standing distribution between chat boats that humans and agents who reliably work reliably can eventually begin to close.