Qwenlong-l1 solves long contexts that stumps existing llms

Join our daily and weekly newsletters for the latest updates and special content related to the industry’s leading AI coverage. Get more information

Aliba Group Has introduced Qwenlong-L1A new framework that enables the larger model of language (LLMS) to argue on very long inputs. This development can open a new wave of enterprise applications that require models so that they can understand and draw insights from wide documents such as detailed corporate filing, long financial statements, or complex legal agreements.

The challenge of reasoning in the long form for AI

Recent developments in major reasoning models (LRM), especially through Kimk Learning (RL), have improved their ability to solve their problem. Research shows that when RL Fine Tuning is trained, LRMs mastered human “slow thinking” where they develop a sophisticated strategy to deal with complex tasks.

However, this improvement is mainly seen when the models work with relatively short pieces, usually about 4,000 tokens. The ability to measure these models’ reasoning (eg, 120,000 tokens) remains a major challenge. In such a long form, reasoning requires a strong understanding of the entire context and the ability to analyze multilateral. “This limit creates an important obstacle to practical applications that require interactions with external knowledge, such as deep research, where LRM will have to collect information from the knowledge -related environment and take action.” Paper.

Researchers formulate these challenges in the concept of “Long Context’s reasoning RL”. Unlike short contexts, which often relies on the knowledge stored within the model, the long -term argument RL requires models to recover and ground information related to long input. Only then can they create chains of reasoning based on the information contained.

Training models are difficult for this by RL and often results in ineffective learning and unstable correction processes. Models struggle to lose their ability to change good solutions or find diverse reasoning paths.

Qwenlong-L1: a multi-step approach

Kevin Long-L1 is a reinforcement framework that is designed to help skillfully transfer with short texts in long contexts. The framework carefully enhances the current short context LRM through structural, multi -stage process.

Warm -up Supervisor Fine Toning (SFT): The model first goes through the SFT phase, where it is trained on examples of long -standing reasoning. This phase establishes a solid foundation, which enables the model to accurately ground information for longer inputs. It helps to understand the context, create chains of logical reasoning, and promote basic abilities in getting answers.

Curriculum Guide Step War RL: At this stage, the model is trained through numerous stages, the length of the input documents is slowly increasing. This organized, phased approach helps model its reasoning strategy slowly to long context. It avoids instability when most models are suddenly trained in long texts.

Aware of taking samples of hard -known prejudice: The final phase of training includes challenging examples from the previous training stages, which ensures that the model is continuing to learn from the toughest problems. It prefers difficult events and encourages the model to find ways of more diverse and complex reasoning.

Qwenlong-L1 Process (Source: Arxiv) — *Qwenlong-L1 The source of the process: archeo*

Beyond this systematic training, the Kevin Long-L1 also uses a separate reward system. Although training for short-lived reasoning works often depends on strict principle-based rewards (eg, a correct response to the mathematical issue), but the Kevin Long-L 1 hires a hybrid prize procedure. This combines rule -based verification, which ensures precision by checking strictly following the standard of accuracy, which “one” with “Llm-AS-A-JUGE. This judge compares the integration of the model created response with ground truth, which can express accurate answers while dealing with long, controversial documents.

Putting qwenlong-l1 in the test

The Alibaba team reviewed Qwenlong-L1 using the document question (DOCQA) as a basic task. This scenario is highly related to enterprise requirements, where AI has to understand dense documents to answer complex questions.

The experimental results of the seven long context showed the capabilities of Kevin Long-L1 in the DOCQA’s seven benchmarks. Specifically, Qwenlong-L1-32b model (based on DPSEC-R 1-Distal-Quin-32 B) Anthropic’s cloud-3.7 achieved the comparison performance with thinking, and performed better like Openi’s O3-Mini and QWen3-235B-A22B. Small Qwenlong-L1-14B model also performed Google’s Gemini 2.0 Flash Thinking and QWen3-32B.

An important search for real -world applications is how the result of RL training results in the model of special long context arguments in the model. This article noted that the model trained with Qwenlong-L1 “grounding” (by combining answers to specific parts of a document), “subgul setting” (breaking complex questions), “back tracking” (recognizing the middle regulations and improving them, and “bettering them”).

For example, although a base model can be detected in a basic document through irrelevant details or trapped in a loop to analyze more than irrelevant information, the trained model of Kevin Long-L1 demonstrated the ability to engage in effective self-discrimination. It can successfully filter these engaging details, back track with the wrong paths, and reach the correct answer.

The technique like Qwenlong-L1 can significantly increase the efficacy of AI in the enterprise. Possible applications include legal tech (analyzing thousands of pages legal documents), finance (deep research on annual reports and financial filing for risk evaluation or investment opportunities) and customer service (more informed interaction of customer’s long interaction) to provide more informed support). Researchers have issued Code for Qwenlong-L1 Recipe And Weight for trained models.

Daily Insights on Business Use Matters with Daily VB

If you want to impress your boss, the VB Daily covers you. We give you internal scope what companies are doing with Generative AI, from regulatory shifts to practical deployments, so that you can share insights for more and more ROIs.

Read our privacy policy

Thanks for subscribing. Check more VB Newsletter here.

There was a mistake.

The challenge of reasoning in the long form for AI

Qwenlong-L1: a multi-step approach

Putting qwenlong-l1 in the test

Editor's pick

Get latest news

Qwenlong-l1 solves long contexts that stumps existing llms

The challenge of reasoning in the long form for AI

Qwenlong-L1: a multi-step approach

Putting qwenlong-l1 in the test

Hot vs cold rain before bed: Which is better for good night good sleep?

Frank Mac Court Jr

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news