Inside Ring-1T: Ant Engineers Solve the Challenges of Reinforcement Learning at the Trillion Scale

Of China Ant groupan affiliate of Alibaba, detailed technical information surrounding its new model, Color-1Twhich the company says is “the first open-source reasoning model with a trillion total parameters.”

The Ring-1T is meant to compete with other flagship models like the GPT5 and O series Open Eyeas well GoogleGemini 2.5. With the new release of the latest model, Ant has expanded the geopolitical debate about who will be Dominate the AI race: China or America.

ANT Group said the Ring-1T is better suited for mathematical and logic problems, code generation and scientific problem solving.

“With nearly 50 billion dynamic parameters per token, Ring-1T achieved state-of-the-art performance in several challenging benchmarks—despite relying entirely on natural language reasoning capabilities,” ANT said. A paper.

The Ring-1T, which was first released on preview in September, adopts the same architecture as the Ring 2.0 and was trained on the base model the company released earlier this month. Ant said this allows the model to support up to 128,000 tokens.

To train a model as large as Rang-1T, researchers had to develop new methods to scale reinforcement learning (RL).

New methods of training

The ANT group developed three “coherent innovations” to support Ring-1 TKRL and training, which is a challenge given the size of the model and the generally large compute requirements involved. These three are Icepop, C3PO++ and ASYSTEM.

Ice Pop removes noisy gradient updates to stabilize training without motion. This helps eliminate the destructive training-training fallacy in RL. The researchers noted that when training models, especially those that use a mixture of elements (MOE) architecture such as Ring-1T, the probability calculation can often be inconsistent.

“This problem is particularly evident in training MOE models with RL due to the inherent use of dynamic routing mechanisms. Additionally, in long QoT settings, these inconsistencies gradually accumulate over iterations and can become larger,” the researchers said.

Icepop “Suppresses unstable training updates via double-sided masking calibration.”

The next new way researchers are developing is C3PO++, which is an improved version of the C3PO system that Ant previously established. This mechanism manages how Rang-1T and other additional large-parameter models generate and process training examples, or what they call rollout, so GPUs don’t sit idle.

The way it works is it will break the work into rollouts into chunks to execute in parallel. One group is the inference pool, which generates new data, and the other is the training pool, which collects results to update the model. C3PO++ creates a token budget to control how much data is processed, ensuring that the GPU is used efficiently.

The last new approach, the S system, adopts a single controller+SPMD (single program, multiple data) architecture to enable heterogeneous operations.

Benchmark results

ANT refers to a benchmark measuring performance in math, coding, logical reasoning and general tasks. They tested it against models like DEPSEC-V 3.1 Terminus Think, Kevin-35B-A22B-Think-2507, Gemini 2.5 Pro and GPT5 Think.

In benchmark testing, the Ring-1T performed strongly, coming in second place to OpenAI’s GPT5 in most benchmarks. ANT said the Ring-1T performed the best of all the open-weight models it tested.

The model posted a score of 93.4% on the AIME 25 leaderboard, second only to GPT5. In coding, Ring-1T outperformed both Dipsec and Kevin.

“This demonstrates that our carefully synthesized dataset has shaped Ring-1T’s strong performance on programming applications, providing a strong foundation for future efforts on agent applications,” the company said.

The Ring-1T shows how much Chinese companies are investing in the models

The Ring-1 is just the latest model from the T-chain that aims to rival the GPT-5 and Gemini.

Chinese companies are releasing impressive models at a rapid pace after the surprise launch of DeepSock in January. Ant’s parent company, Alibabareleased recently Qwen3-Omnia multimodal model that integrates text, image, audio and video spatially. Deepsec also continues to improve its models and earlier this month, Depsec-OCR launched. This new model reiterates how models process information.

The battle for AI dominance between the US and China continues, with the development of new methods to train and scale extra-large models of Ring-1T and ANT.

Editor's pick

Get latest news

Inside Ring-1T: Ant Engineers Solve the Challenges of Reinforcement Learning at the Trillion Scale

New methods of training

Benchmark results

The Ring-1T shows how much Chinese companies are investing in the models

How to Work with Tuml Files in Python

How the Model Context Protocol Works

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news