
When many businesses weren’t even thinking about agent behavior or infrastructure, booking.com He had already “stumbled across” his native home exchange recommendation system. This early experience allowed the company to take a step back and avoid getting swept up in the AI ​​agent hype. Instead, it’s taking a disciplined, layered, modular approach to model development: small, travel-specific models for cheap, fast style; Large-scale language models (LLMs) for reasoning and understanding; And home-built domain-toned diagnostics when health matters. With this hybrid strategy—combined with selective collaboration with OpenAI—Booking.com has seen double the accuracy of key retrieval, ranking, and customer interaction tasks. As Pranu Pathak, Booking.com’s head of AI product development, puts it in a new podcast for VentureBeat: “Do you make it very specialized and very refined and then have an army of a hundred agents? Or do you keep it generic and have five agents that are good for general tasks, but have a lot of balance around you? Check out the new one. Beyond the pilot Podcast hereand continue reading for highlights.
Moving beyond guesswork to deep personalization without being ‘weird’
Recommendation systems are fundamental to Booking.com’s customer-facing platforms. However, traditional recommendation tools have been less about recommendation and more about guesswork, Pathak acknowledged. So, from the beginning, he and his team resolved to avoid generic tools: as he said, pricing and recommendation should be based on the customer’s context. Early pre-gen tooling was a small language model, described by Pathak as “Brit’s scale and size”, to detect the intent and title of Booking.com. The model leveraged customer input around their problem to determine whether it could be resolved through self-service or bumped up to a human agent. “We started with an architecture of ‘what if you intend to detect and that’s how you analyzed that structure,'” explained Pathak. “It was very, very similar to the first few agent architectures that came out causally and defined tool calls.” His team has since evolved this architecture to include an LLM orchestrator that classifies queries, triggers Retrieval Argument Generation (RAG) and calls APIs or small, specialized language models. “We’ve been able to scale the system fairly well because it was so close in architecture that with some opportunities, we now have a full agent stack,” Pathak said. As a result, Booking.com is seeing a 2x increase in title searches, freeing up 1.5 to 1.7x the bandwidth of human agents. Further topics, even complex ones previously identified as ‘other’ and requiring additions, are being automated. Ultimately, it supports more self-service, and frees human agents to focus on individual customers with specific problems for which the platform doesn’t have a dedicated tool flow—say, a family unable to access their hotel room at 2 a.m. when the front desk is closed. Not only does this “really start to compound,” Pathak noted, but it has a direct, long-term impact on customer retention. “One thing we’ve seen is that the better we are at customer service, the more loyal our customers are.” Another recent rollout is personalized filtering. Booking.com has between 200 and 250 search filters on its website, Pathak said. So, his team introduced a free text box that users could type in to receive instantly generated filters. “It becomes such an important indicator for personalization in terms of what you’re looking for in your words rather than the clickstream,” Pathak said. As a result, this indicates to Booking.com what customers actually want. For example, hot tubs were one of the most popular applications when the personalization of the filter was first eliminated. It was not even considered before. There wasn’t even a filter. Now that filter is live. “I had no idea,” notes Pathak. “I’ve never honestly looked for a hot tub in my room.” When it comes to personalization, though, there’s a fine line. Pathak emphasized that memory is complex. While it’s important to have long-term memories and ready threads with customers—maintaining information like their general budget, preferred hotel star rating or whether they need disability access—it should be on their terms and protect their privacy. Booking.com is extremely mindful of memory, seeking consent so as not to become “weird” when collecting customer information. “Managing memory is actually much more difficult than building memory,” Pathak said. “The tech is there, we have the technical chops to build it. We want to make sure we don’t launch a memory object that doesn’t respect users’ consent, that doesn’t feel very natural.”
Finding the balance of build versus buy
As agents mature, Booking.com is navigating a central question facing the entire industry: How tight do agents get? Instead of committing to either highly specialized agents or a crowd of laypeople, the company aims for reversible decisions and avoids “one-way doors” that lock its architecture into long-term, expensive paths. Pathak’s strategy is: specialize where possible, specialize where necessary and keep agent design flexible to help ensure flexibility. Pathak and his team are “very mindful” of use cases, evaluating where to build more general, reusable agents or more task-specific cases. They try to use the smallest possible model, with the highest level of accuracy and output quality for each use case. Whatever can be generalized is. Latency is another important consideration. When factual accuracy and avoidance of deception are paramount, his team will use a large, very slow model. But with search and recommendations, user expectations set the pace. . Booking.com takes a similarly flexible tack when it comes to monitoring and evaluation: if it’s general-purpose monitoring that someone else is better at building and has horizontal potential, they’ll buy it. But if these are instances where brand guidelines must be enforced, they will develop their own rules. Ultimately, Booking.com leans toward being “super predictable,” nimble and flexible. “At this point with everything going on with AI, we’re a little bit averse to walking through side doors,” Pathak said. “We want as many of our decisions to be reversed as possible. We don’t want to be locked into a decision that we can’t reverse two years from now.”
What other builders can learn from Booking.com’s AI journey
Booking.com’s AI journey could serve as an important blueprint for other enterprises. Looking back, Pathak admits that he started with a “pretty complicated” tech stack. Now they’re in a good place with it, “but we could have started something simpler and seen how users interacted with it.” Given that, he offered this valuable advice: If you’re just getting started with LLMS or agents, out-of-the-box APIs will do just fine. “There’s enough customization with APIs that you can get a lot out of it before you decide you want to do more.” On the other hand, if a use case requires customization via a standard API call, that makes for in-house tools. Still, he stressed: Don’t start with the complicated stuff. Tackle “the simplest, most painful problem you can find and the simplest, most obvious solution.” “Identify product-market fit, then investigate the ecosystem, he advises — but don’t just tear up old infrastructure because a new use case demands something special (like moving an entire cloud strategy from AWS to Azure to use an OpenAI endpoint). Ultimately: “Don’t close yourself off too quickly,” Pathak notes. “Don’t make decisions that are one-way doors unless you’re very confident that’s the solution you’re going to go with. want.”