
Researchers at NVIDIA and the University of Hong Kong have released Orchestrator, an 8 billion parameter model that integrates various tools and large language models (LLMs) to solve complex problems. In our experiments, Orchestrator achieved high accuracy in tool usage benchmarks at a lower cost than much larger models, while also aligning with user preferences on which to use for a given query.
The model was trained Tool orchestraa novel reinforcement learning (RL) framework for training miniature models to act as intelligent coordinators. The approach is based on the idea that a small "The orchestrator" Managing a diverse team of specialized models and tools can be far more effective and efficient than a single, monolithic AI system.
The results show that this comprehensive approach can pave the way for more practical and scalable AI reasoning systems in the enterprise.
Limitations of using the current LLM tool
Giving llms Access to external tools There is a promising way to expand their capabilities beyond their training data and into agent tasks. By calling on resources such as search engines and code interpreters, AI agents can improve their accuracy and perform in-app tasks.
However, I accompanying paperthe researchers argue that current approaches to building tool-using agents do not exploit the full potential of this paradigm. Most systems equip a single, powerful model with a set of basic tools such as a web search or calculator.
He argues that humans, when they reason, “routinely extend themselves by demanding more than human intelligence resources, from domain experts to sophisticated processes and software systems.” Accordingly, the LLM should be able to interact with a wide range of tools in a variety of capacities.
Tool orchestration sample
This paper proposes a transition from a single model system to a composite, lightweight management system. "The orchestrator" Model. An orchestrator’s job is to analyze and break down a complex task, invoking the right tools in the right order to arrive at a solution.
This toolset contains not only standard utilities such as web search and code interpreters, but also other LLMs of varying capabilities that do the job. "Intelligent tools." For example, an orchestrator might give a quantitative question for a mathematical model or a programming challenge for a code generation model. Instead of placing the entire cognitive load on a large, generalist model, the orchestrator delegates narrow-down subproblems to specialized intelligent tools.
Based on this concept, the researchers developed ToolOrchestra, a method that uses RL to train a small language model to act as an orchestrator. The model learns when and how to call other models and tools, and how to combine their results in multi-turn reasoning. Tools are defined in a simple JSON format, specifying their name, description, and parameters.
The RL training process is guided by a reward system that produces a cost-effective and controllable agent. The reward balances three objectives: accuracy of the final response, efficiency in cost and latency, and alignment with user preferences. For example, the system is penalized for using excessive compute, and rewarded for choosing tools that the user has marked as preferred, such as in favor of an open-source model over a proprietary API for privacy reasons. To support this training, the team also developed an automated data pipeline that generated thousands of verifiable training examples in 10 different domains.
A small model with big results
Using the tool Orchestra, the researchers trained the orchestrator, which is based on an 8 billion parameter model Qwen3-8b. They evaluated its performance on three challenging benchmarks: Humanity’s final test (HLE), The frame And tau2-bench. This was compared against several baselines, including large, off-the-shelf LLMs both with and without tools.
The results show that even powerful models struggle without tools, and confirm their need for complex reasoning. While optimizing tools for larger models, this often comes with a steep increase in cost and latency.
In contrast, the 8B orchestrator produced impressive results. On HLE, a benchmark for PhD-level questions, Orchestrator significantly outperformed earlier methods at a fraction of the computational cost. On the TAU2-BENCH function-calling test, it outperformed various tools, calling a large model such as GPT-5 in only 40% of the steps and using cheaper options for the rest, while still beating an agent that used a large model for every step.
The researchers noted that RL-trained orchestrators adapted their strategies to new challenges, showing "A high degree of general reasoning ability." Importantly for enterprise applications, Orchestrator also generalizes well to models and pricing structures not seen during training. This flexibility makes these frameworks suitable for businesses that rely on a mix of public, private and bespoke AI models and tools. The low cost, high speed, and customizability make it a practical approach for building sophisticated AI agents that can scale.
As businesses look to deploy more advanced AI agents, this orchestration approach offers a path toward systems that are not only more intelligent, but also more economical and controllable. ( Model weight is currently available Under a non-commercial license, but also released by NVIDIA Training code (under valid Apache 2.0 license.)
As the paper concludes, the future may lie in an even more advanced version of this concept: “Looking ahead, we envision more sophisticated iterative orchestrator systems to push the upper bounds of intelligence (and) to further increase efficiency in solving increasingly complex agentic tasks.”