Qwenlong-l1 solves long contexts that stumps existing llms

by SkillAiNest

Join our daily and weekly newsletters for the latest updates and special content related to the industry’s leading AI coverage. Get more information


Aliba Group Has introduced Qwenlong-L1A new framework that enables the larger model of language (LLMS) to argue on very long inputs. This development can open a new wave of enterprise applications that require models so that they can understand and draw insights from wide documents such as detailed corporate filing, long financial statements, or complex legal agreements.

The challenge of reasoning in the long form for AI

Recent developments in major reasoning models (LRM), especially through Kimk Learning (RL), have improved their ability to solve their problem. Research shows that when RL Fine Tuning is trained, LRMs mastered human “slow thinking” where they develop a sophisticated strategy to deal with complex tasks.

However, this improvement is mainly seen when the models work with relatively short pieces, usually about 4,000 tokens. The ability to measure these models’ reasoning (eg, 120,000 tokens) remains a major challenge. In such a long form, reasoning requires a strong understanding of the entire context and the ability to analyze multilateral. “This limit creates an important obstacle to practical applications that require interactions with external knowledge, such as deep research, where LRM will have to collect information from the knowledge -related environment and take action.” Paper.

Researchers formulate these challenges in the concept of “Long Context’s reasoning RL”. Unlike short contexts, which often relies on the knowledge stored within the model, the long -term argument RL requires models to recover and ground information related to long input. Only then can they create chains of reasoning based on the information contained.

Training models are difficult for this by RL and often results in ineffective learning and unstable correction processes. Models struggle to lose their ability to change good solutions or find diverse reasoning paths.

Qwenlong-L1: a multi-step approach

Kevin Long-L1 is a reinforcement framework that is designed to help skillfully transfer with short texts in long contexts. The framework carefully enhances the current short context LRM through structural, multi -stage process.

Warm -up Supervisor Fine Toning (SFT): The model first goes through the SFT phase, where it is trained on examples of long -standing reasoning. This phase establishes a solid foundation, which enables the model to accurately ground information for longer inputs. It helps to understand the context, create chains of logical reasoning, and promote basic abilities in getting answers.

Curriculum Guide Step War RL: At this stage, the model is trained through numerous stages, the length of the input documents is slowly increasing. This organized, phased approach helps model its reasoning strategy slowly to long context. It avoids instability when most models are suddenly trained in long texts.

Aware of taking samples of hard -known prejudice: The final phase of training includes challenging examples from the previous training stages, which ensures that the model is continuing to learn from the toughest problems. It prefers difficult events and encourages the model to find ways of more diverse and complex reasoning.

Qwenlong-L1 Process (Source: Arxiv)
Qwenlong-L1 The source of the process: archeo

Beyond this systematic training, the Kevin Long-L1 also uses a separate reward system. Although training for short-lived reasoning works often depends on strict principle-based rewards (eg, a correct response to the mathematical issue), but the Kevin Long-L 1 hires a hybrid prize procedure. This combines rule -based verification, which ensures precision by checking strictly following the standard of accuracy, which “one” with “Llm-AS-A-JUGE. This judge compares the integration of the model created response with ground truth, which can express accurate answers while dealing with long, controversial documents.

Putting qwenlong-l1 in the test

The Alibaba team reviewed Qwenlong-L1 using the document question (DOCQA) as a basic task. This scenario is highly related to enterprise requirements, where AI has to understand dense documents to answer complex questions.

The experimental results of the seven long context showed the capabilities of Kevin Long-L1 in the DOCQA’s seven benchmarks. Specifically, Qwenlong-L1-32b model (based on DPSEC-R 1-Distal-Quin-32 B) Anthropic’s cloud-3.7 achieved the comparison performance with thinking, and performed better like Openi’s O3-Mini and QWen3-235B-A22B. Small Qwenlong-L1-14B model also performed Google’s Gemini 2.0 Flash Thinking and QWen3-32B.

Source: archevy
Source: archevy

An important search for real -world applications is how the result of RL training results in the model of special long context arguments in the model. This article noted that the model trained with Qwenlong-L1 “grounding” (by combining answers to specific parts of a document), “subgul setting” (breaking complex questions), “back tracking” (recognizing the middle regulations and improving them, and “bettering them”).

For example, although a base model can be detected in a basic document through irrelevant details or trapped in a loop to analyze more than irrelevant information, the trained model of Kevin Long-L1 demonstrated the ability to engage in effective self-discrimination. It can successfully filter these engaging details, back track with the wrong paths, and reach the correct answer.

The technique like Qwenlong-L1 can significantly increase the efficacy of AI in the enterprise. Possible applications include legal tech (analyzing thousands of pages legal documents), finance (deep research on annual reports and financial filing for risk evaluation or investment opportunities) and customer service (more informed interaction of customer’s long interaction) to provide more informed support). Researchers have issued Code for Qwenlong-L1 Recipe And Weight for trained models.

You may also like

Leave a Comment

At Skillainest, we believe the future belongs to those who embrace AI, upgrade their skills, and stay ahead of the curve.

Get latest news

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025 Skillainest.Designed and Developed by Pro